# Difference between revisions of "Segment 13. The Yeast Genome"

Bill Press (talk | contribs) |
|||

Line 11: | Line 11: | ||

The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc] | The direct YouTube link is [http://youtu.be/QSgUX-Do8Tc http://youtu.be/QSgUX-Do8Tc] | ||

− | Links to the slides: [http:// | + | Links to the slides: [http://wpressutexas.net/coursefiles/13.TheYeastGenome.pdf PDF file] or [http://wpressutexas.net/coursefiles/13.TheYeastGenome.ppt PowerPoint file] |

Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt] | Link to the file mentioned in the segment: [http://slate.ices.utexas.edu/coursefiles/SacCerChr4.txt.zip SacSerChr4.txt] | ||

Line 31: | Line 31: | ||

===Class Activity=== | ===Class Activity=== | ||

− | [http:// | + | [http://wpressutexas.net/coursefiles/chrIV.txt Yeast chromosome 4] |

− | [http:// | + | [http://wpressutexas.net/coursefiles/yeast_ORFs Activity slides] |

## Revision as of 14:30, 22 April 2016

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/QSgUX-Do8Tc&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/QSgUX-Do8Tc

Links to the slides: PDF file or PowerPoint file

Link to the file mentioned in the segment: SacSerChr4.txt

Link to all yeast chromosomes: UCSC

### Problems

#### To Calculate

1. With p=0.3, and various values of n, how big is the largest discrepancy between the Binomial probability pdf and the approximating Normal pdf? At what value of n does this value become smaller than **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle 10^{-15}}**
?

2. Show that if four random variables are (together) multinomially distributed, each separately is binomially distributed.

#### To Think About

1. The segment suggests that **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle A\ne T}**
and **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle C\ne G}**
comes about because genes are randomly distributed on one strand or the other. Could you use the observed discrepancies to estimate, even roughly, the number of genes in the yeast genome? If so, how? If not, why not?

2. Suppose that a Bayesian thinks that the prior probability of the hypothesis that "**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A=P_T}**
" is 0.9,
and that the set of all hypotheses that "**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_A\ne P_T}**
" have a total prior of 0.1. How might he calculate the odds ratio **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Prob}(P_A=P_T)/\text{Prob}(P_A\ne P_T)}**
? Hint: Are there nuisance variables to be marginalized over?