Difference between revisions of "Segment 23. Bootstrap Estimation of Uncertainty"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
Line 11: Line 11:
The direct YouTube link is [http://youtu.be/1OC9ul-1PVg http://youtu.be/1OC9ul-1PVg]
The direct YouTube link is [http://youtu.be/1OC9ul-1PVg http://youtu.be/1OC9ul-1PVg]
Links to the slides: [http://slate.ices.utexas.edu/coursefiles/23.Bootstrap.pdf PDF file] or [http://slate.ices.utexas.edu/coursefiles/23.Bootstrap.ppt PowerPoint file]
Links to the slides: [http://wpressutexas.net/coursefiles/23.Bootstrap.pdf PDF file] or [http://wpressutexas.net/coursefiles/23.Bootstrap.ppt PowerPoint file]
Line 38: Line 38:
[[Data Set]]
[[Data Set]]
[http://granite.ices.utexas.edu/coursewiki/images/9/96/Dataset.txt Dataset_txtfile]
[http://wpressutexas.net/coursewiki/images/9/96/Dataset.txt Dataset_txtfile]

Latest revision as of 14:40, 22 April 2016

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/1OC9ul-1PVg&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/1OC9ul-1PVg

Links to the slides: PDF file or PowerPoint file


To Compute

1. Generate 100 i.i.d. random draws from the beta distribution , for example using MATLAB's betarnd or Python's random.betavariate. Use these to estimate this statistic of the underlying distribution: "value of the 75% percentile point minus value of the 25th percentile point". Now use statistical bootstrap to estimate the distribution of uncertainty of your estimate, for example as a histogram.

2. Suppose instead that you can draw any number of desired samples (each 100 draws) from the distribution. How does the histogram of the desired statistic from these samples compare with the bootstrap histogram from problem 1?

3. What is the actual value of the desired statistic for this beta distribution, computed numerically (that is, not by random sampling)? (Hint: I did this in Mathematica in three lines.)

To Think About

1. Suppose your desired statistic (for a sample of N i.i.d. data values) was "minimum of the N values". What would the bootstrap estimate of the uncertainty look like in this case? Does this violate the bootstrap theorem? Why or why not?

2. If you knew the distribution, how would you compute the actual distribution for the statistic "minimum of N sampled values", not using random sampling in your computation?

3. For N data points, can you design a statistic so perverse (and different from one suggested in the segment) that the statistical bootstrap fails, even asymptotically as N becomes large?

Class Activity

Download the data set given below. It contains 100 draws from a 4 dimensional distribution i.e. each draw returns a 4 dimensional vector, The statistic which we are interested in is,


Carry out the following tasks:

  • Give a point estimate of the statistic.
  • Carry out bootstrapping and visualize the uncertainty in the statistic using a scatter plot.

Data Set