Python Set-up Tutorial and Workshop
- It's EASY.
- Python is a well-designed high level language with a large standard library and an extensive ecosystem of 3rd party libraries.
- It's READABLE.
- Readability of code is important. Python is designed to force you to write readable code.
- It can be fun to write obscure, clever code. It feels like solving a puzzle. You should almost always fight this impulse.
- Collaborators (especially your most frequent collaborator - your future self) will thank you.
- It's FREE.
- You can use Python on any machine, anywhere, without any licensing constraints.
Why IPython Notebook server?
- Good for learning
- The user doesn't have to install anything or set anything up.
- It provides convenient access to documentation.
- Great for sharing
- It provides a centralized place to share code with each other.
- Notebooks with integrated plots and text/LaTeX annotation are a great way to tell a story.
- Data and package distribution
- We can make large datasets available without you needing to download anything.
- We can manage the installation of any needed packages.
For most of class today, we will be using the class IPython server through a browser, but towards the end of class, it will be useful to have Python installed on your local machine.
If you are running Linux or OS/X, you almost certainly already do. Type 'python --version' at a command line to confirm.
If you are running Windows, you can download and install Python from http://www.python.org/ftp/python/2.7.6/python-2.7.6.msi
Go to http://rosalind.info and create an account. (A convenient way to do this is to click 'Log in' and use an OpenID like a Google account.)
If you are have never used Python before or just want some review
Start with Rosalind's Python tutorial excercises:
- Installing Python - We need to do this problem because the website won't give us access to the rest until we do. Ignore the contents of the 'click to expand' box for this one.
- Variables and Some Arithmetic
- Strings and Lists
- Conditions and Loops
- Dictionaries - Notice that we skipped 'Working with Files' for now.
If you are already comfortable in Python
Jump straight into some basic bioinformatics:
- Counting DNA Nucleotides
- Transcribing DNA into RNA
- Complementing a Strand of DNA
- Counting Point Mutations - Use the built-in function zip() in your code.
- Finding a Motif in DNA - For extra challenge: use the re module. How do you deal with overlapping motifs?
- Translating RNA into Protein
- Hints for 6
- Use the Biopython module, which is already installed on the class server.
- For guidance, execute "from Bio.Seq import Seq", then "Seq?" and "Seq.translate?".
- The remaining problems are best done using Python installed on your local machine instead of through the browser.
- Enumerating Gene Orders - Use a function from the itertools module in your code.
- Computing GC Content
- Rosalind: Finding a Protein Motif
- Hints for 8
- In your 'Computing GC Content' code, separate the FASTA parsing functionality into a function. Import this file into your current code as a module and use it to do your fasta parsing. Some guidance: modules and __name__ == 'main' idiom.
- Use the urllib module to read data from the web.
- Use the re module to do the motif finding. (Note: some approaches to this will have to take extra care to make sure overlapping motifs can be found.)
Note: these are a non-interactive snapshot of notebooks that you can also find on the class server.