Python Set-up Tutorial and Workshop

From Computational Statistics Course Wiki
Jump to: navigation, search

Motivation

Relevant XKCD

Why Python?

It's EASY.
Python is a well-designed high level language with a large standard library and an extensive ecosystem of 3rd party libraries.
It's READABLE.
Readability of code is important. Python is designed to force you to write readable code.
It can be fun to write obscure, clever code. It feels like solving a puzzle. You should almost always fight this impulse.
Collaborators (especially your most frequent collaborator - your future self) will thank you.
It's FREE.
You can use Python on any machine, anywhere, without any licensing constraints.

Why IPython Notebook server?

  • Good for learning
The user doesn't have to install anything or set anything up.
It provides convenient access to documentation.
  • Great for sharing
It provides a centralized place to share code with each other.
Notebooks with integrated plots and text/LaTeX annotation are a great way to tell a story.
  • Data and package distribution
We can make large datasets available without you needing to download anything.
We can manage the installation of any needed packages.

Activity

For most of class today, we will be using the class IPython server through a browser, but towards the end of class, it will be useful to have Python installed on your local machine.

If you are running Linux or OS/X, you almost certainly already do. Type 'python --version' at a command line to confirm.

If you are running Windows, you can download and install Python from http://www.python.org/ftp/python/2.7.6/python-2.7.6.msi

Go to http://rosalind.info and create an account. (A convenient way to do this is to click 'Log in' and use an OpenID like a Google account.)

If you are have never used Python before or just want some review

Start with Rosalind's Python tutorial excercises:

  1. Installing Python - We need to do this problem because the website won't give us access to the rest until we do. Ignore the contents of the 'click to expand' box for this one.
  2. Variables and Some Arithmetic
  3. Strings and Lists
  4. Conditions and Loops
  5. Dictionaries - Notice that we skipped 'Working with Files' for now.

If you are already comfortable in Python

Jump straight into some basic bioinformatics:

  1. Counting DNA Nucleotides
  2. Transcribing DNA into RNA
  3. Complementing a Strand of DNA
  4. Counting Point Mutations - Use the built-in function zip() in your code.
  5. Finding a Motif in DNA - For extra challenge: use the re module. How do you deal with overlapping motifs?
  6. Translating RNA into Protein
    Hints for 6
    Use the Biopython module, which is already installed on the class server.
    For guidance, execute "from Bio.Seq import Seq", then "Seq?" and "Seq.translate?".
    The remaining problems are best done using Python installed on your local machine instead of through the browser.
  7. Enumerating Gene Orders - Use a function from the itertools module in your code.
  8. Computing GC Content
  9. Rosalind: Finding a Protein Motif
    Hints for 8
    In your 'Computing GC Content' code, separate the FASTA parsing functionality into a function. Import this file into your current code as a module and use it to do your fasta parsing. Some guidance: modules and __name__ == 'main' idiom.
    Use the urllib module to read data from the web.
    Use the re module to do the motif finding. (Note: some approaches to this will have to take extra care to make sure overlapping motifs can be found.)

Jeff's Solutions

Note: these are a non-interactive snapshot of notebooks that you can also find on the class server.