Feb12-Team2-Random Number Generator

From Computational Statistics Course Wiki
Jump to navigation Jump to search

This is the in class activity for Feb 12 (Segment 11), done by Team 2:

Deepesh Toshniwal, Todd_Swinson, Vsub

The activity description can be found here: Build_your_own_random_number_generator


Our approach was to come up with a function and draw samples from it by evaluating the function at a sequence of increasing values of x. The function we can up with is a sine wave with increasing oscillations. The idea is that as x increases by a fixed amount, the y value of the function will change considerably, due to the increasing frequency along the x-axis. The function we came up with is:

The seed value gives a starting position along the x-axis. This function is visualized below (with seed value of 0).

Seg11-team2-dist.png


We evaluated the function at increasing values of x and made a histogram of the results. As you can see, there are more values at the low and high end (< 0.1 and > 0.9). We believe the reason for this is because the function "spends more time" at those values (look at the peaks and dips of the wave and how they curve relatively slowly). On the other hand, the histogram shows that we get pretty much uniform values between 0.1 and 0.9, which validates that we are on the right track (x is hitting at some "random" y-value on the sine wave).

Seg11-team2-hist.png


So now we can draw values from our function, but they are clearly not uniformly distributed, as indicated by the histogram above. If we create a CDF of our function, we could use that to draw uniform values for any given x (similar to the transformation method for drawing random deviates explained in this segment's video).

We can estimate a CDF in a discrete way by taking the cumulative sum of our function at increasing values of x. We do this in the MATLAB code that follows.

%choose a seed using the system time
seed = datevec(now);
seed = seed(6);

%num samples to estimate a cdf for
h = 1e1;
n=1e7;
samples = zeros(n,1);

%get values from our function for increasing values of x
for x=1:n
    samples(x) = myRandFunction(x, seed);
end

%plot the histogram, and we notice there are too many samples
%at the low and high end
hist(samples)
[a,xcdf] = hist(samples,n/h);

%so we create a cdf by taking the cumulative sum of our samples,
%and normalizing
cdf = cumsum(a)/sum(a);

%can look at the cdf if we want to -- notice it is the CDF of a non-uniform distribution
%It's close to uniform, except at low and high values of x
%plot(xcdf,cdf);  

%save the cdf to a file that can be used my our myRand function
save('myRandCDF','xcdf','cdf');


Visualizing the CDF, we can see that is is almost uniform, except for low and high values of x, where it increases fastest. Now if we use our original sine function to generate a "random" y-value for some x, we can evaluate the CDF at that yvalue to draw a sample from .

Seg11-team2-cdf.png


We can proceed as before, by drawing samples from our sine function and then evaluating our [discrete representation of the] CDF to obtain the uniform random deviate. The MATLAB code is below. Notice that we interpolate the CDF's values because it is represented as a vector, not a continuous function.



%choose a seed using the current time
seed = datevec(now);
seed = seed(6);

%num samples to generate
n=1e4;
samples = zeros(n,1);

load myRandCDF;

for i=1:n
    %the new myRand function, which first samples from the sine function, then evaluates the CDF and interpolates
    samples(i) = myRandUniform(i, seed,xcdf,cdf);
end
hist(samples)

Our histogram looks pretty good now. Using the provided testran function, we obtained a score of 0.6910. Not bad.

Seg11-team2-newhist.png


The first 15 "random samples" obtained using our random number generator. The values "look" random.

EDU>> samples(1:15)

ans =

    0.8205
    0.3766
    0.1133
    0.6650
    0.7069
    0.0009
    0.8001
    0.3002
    0.7064
    0.1663
    0.9111
    0.4830
    0.0287
    0.2635
    0.3796



Our random functions are below. First the sine function, then the one that uses the CDF to produce uniform random samples.

function r = myRandFunction(x, seed)
    %the function is a sine wave of increasing oscillation
    r = 0.5 * (sin(exp(sqrt(x)) + seed) + 1);
end
function r = myRandUniform(x, seed,xcdf,cdf)
    %the function is a sine wave of increasing oscillation
    r = 0.5 * (sin(exp(sqrt(x)) + seed) + 1);
    
    %attempt to locate the closest x-value of the cdf,
    %and interpolate to obtain a value of the cdf between 0 and 1
    [val, ind] = min(abs(xcdf-r));
    if (xcdf(ind)-r>=0)
        if (ind ~=1)
            h = xcdf(ind) - xcdf(ind-1);
            r = val*cdf(ind-1)/h + (1-val/h)*cdf(ind);
        else
            r = cdf(1);
        end
    else
        if (ind~=length(xcdf))
            h = xcdf(ind+1) - xcdf(ind);
            r = val*cdf(ind+1)/h + (1-val/h)*cdf(ind);
        else
            r = cdf(end);
        end
    end
end