Segment 37. A Few Bits of Information Theory

From Computational Statistics Course Wiki
Jump to: navigation, search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

The direct YouTube link is http://youtu.be/ktzYOLDN3u4

Links to the slides: PDF file or PowerPoint file

Class Activity

There is no general way to estimate the entropy of a (non i.i.d.) process from the data it generates, because you may or may not be able to recognize its entropy-lowering internal structure. So, in general, even an accurate "estimate" is only an upper bound on the entropy.

Let's see how well we can do at estimating the true entropy of five different strings in the alphabet A, C, G, T. (Bill knows the answer, because he knows how they were generated. But he's not telling!)

The more you study the data, the better you'll do! (If you know how to use Hidden Markov Models, which we didn't have room for in this course, you might do even better.)

Media:entropystring1.txt

Media:entropystring2.txt

Media:entropystring3.txt

Media:entropystring4.txt

Media:entropystring5.txt