CS395T/CAM383M Computational Statistics  

Go Back   CS395T/CAM383M Computational Statistics > Previous year: Spring, 2010 > Student Term Projects

Thread Tools Display Modes
Old 03-08-2010, 08:53 AM
wpress wpress is offline
Join Date: Jan 2009
Posts: 222
Default Required Term Project: Assignment and Calendar of Dates

A term project is required of every student taking the course for
credit. You will be posting your project on the web for the world to
see, so do something that you will be proud of! (If you wish to use
your forum screen name or other pseudonym for privacy reasons, that is
fine, but be sure I know who you are.) Collaboration is fine, but the
scope or depth of a project involving more than one student should be
correspondingly larger.


Monday, March 22, 2009 (in class): Project proposal (paper copy, not
web post) due from each student. One page or less.

Wednesday, March 24 (in class): Proposals returned by me with
approval and/or comments.

Monday, April 12, 2009: Mid-course project report due from each
student (or collaborative group), to be posted on the course web site.
(Create a thread that you will add to in posting your final project.)
Limited to 3 pages, this should be a detailed outline or summary of
what the final project will be, and should include a bibliography of
key references that are being used. It should also indicate what will
be in the final report (e.g., how many pages of written discussion,
how much original computer code, what data sets, etc.). This won't
be graded, but it likely that its quality will have a strong
correlation with the grade on the final project.

Wednesday, April 14, 2009: I'll post comments (hopefully helpful!)
on every project posting. You should also add (helpful) comments
to each other's mid-course postings. (Remember that participation
is a part of the grade!)

Wednesday, May 5, 2009, noon. Final written reports due as file(s)
posted to the course web forum before 5:00 p.m., absolute deadline. If
you have technical trouble posting any files, you can email them to me
before 5:00 p.m. (If you have your own web site, you can put your
project files there and post a message on the course web site with

Report types and suggested topics:

Reports can be of any one of the following three types:

I. Prepare a lecture with written materials (slides) of comparable
scope to those used in class, but on a topic not covered in class. The
deliverable is 15 or more fairly dense PowerPointstyle slides, plus
any additional notes that you want to include, plus be prepared to
discuss any slides briefly in your individual oral interview. You
won't have to give the whole lecture! You are definitely expected to
do more than summarize a single textbook.

For examples of what students did last year, see

Here is an additional list of possibilities, with textbook references.

Genetic algorithms for optimization [Givens and Hoeting, Sec. 3.5]
Suffix Trees and Linear Time Suffix Trees [Gusfield, Chs. 5 and 6]
Wavelets and Wavelet Smoothing [Hastie et al, Sec 5.9. NR3 Sec. 13.10]
Logistic Regression [Hastie et al., Sec. 4.4]
Boosting Methods [Hastie et al., Ch. 10]
Gaussian Process Regression and Kriging Interpolation [NR3, Secs. 3.7 and 15.9]
The Analysis of Variance (ANOVA) [Ewens and Grant, Sec 9.5]
BLAST Sequence Comparison [Ewens and Grant, Ch. 10]
Neural Networks [Hastie et al., Ch. 11]
Model Selection (AIC, BIC, Cross-Validation, Vapnik-Chernovenkis, etc.) [Hastie et al., Ch.
MCMC (beyond what we do in class) [Givens and Hoeting, Ch. 8]
Nonparametric Density Estimation [Givens and Hoeting, Ch. 10]
Markov Random Fields and/or Inference on Graphical Models [Bishop, Ch. 8]
Variational Inference Methods [Bishop, Ch. 10]

Almost all of these topics
also have Wikipedia articles that can also serve as starting points.


II. Take a data set of nontrivial size, and do "exploratory
statistics" on it. That is, try to discover new things in the data
that are both statistically significant and scientifically (e.g.,
biologically, but other fields OK) meaningful. You will need to show
some understanding of both of the science behind the data set and of
the statistical techniques that you try. You can get a high grade even
if you don't actually discover anything new, as long as you can
clearly explain what you were looking for, and why. You can also get a
high grade by intentionally re-discovering things that are already
known, if you approach them in an interesting way. You can find a
data set on your own (colleagues, friends, or the web), or else I can
show you how to get any of the following: Complete genomes and
comparative genomes: You can download from genome.ucsc.edu (go to
downloads). Yeast gene expression: I can give you a version of the
Rosetta Yeast Database, consisting of expression profiles for all
~6000 yeast genes in ~300 different experiments, mostly knock-out
strains where a known gene has been de-activated. Your goal would be
to deduce things about the functions or interactions of genes.
Protein sequence families: At http://pfam.sanger.ac.uk you can find
many protein families (PFAM), with ~50 orthologous sequences from
different organisms in each family. You could learn to do multiple
alignments, or measure amino acid substitution rates in various
circumstances, or think of something else to try.


III. Anything else that you want to propose within the scope of the course. Be creative! Do something relevant and fun.
Reply With Quote

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

All times are GMT -6. The time now is 06:08 AM.

Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2019, Jelsoft Enterprises Ltd.