CS395T/CAM383M Computational Statistics Sarah's Term Project
04-12-2010
Sarah's Term Project

For the final project, I intend to prepare lecture slides on the topic of Neural Networks. Hastie’s coverage of the topic seems well organized as an introductory chapter and my slides may follow his progression; however, other sources will also be used. I have found a few neural network simulations online which would be a good source for visual examples. It’s a little difficult to grasp exactly how they work until you see one in action.
So far, I think my slides will look a little like this:
• Basic definition
• Based off of the biological neuron: each unit represents a neuron and the connections represent synapses. Synapses fire when the total signal passed exceeds a certain level (activation function)
• Neural Networks are just nonlinear statistical models used for regression or classification
• The "vanilla" neural net (the single hidden layer back-propagation network) closely resembles expectation maximization
• Each connection has an associated weight and training these to the right values is what training a neural net is all about
• Uses
• Classification: to determine to which of a number of discrete classes a given input case belongs
• Given a set of X training examples that are labeled into N categories, we train a neural network to classify new data into one of the N categories.
• Regression: to predict the value of a (usually) continuous variable
• A lot like expectation maximization
• Data are best–fitted to a specified relationship which is usually linear. The result is an equation in which each of the inputs xj is multiplied by a weight wj and the sum of all such products and a constant θ then gives an estimate of the output $\Large y = \sum_j wj*xj + \theta$
• Training
• Weights
• Starting Weights
• Starting weights are generally random values near zero, making the sigmoid roughly linear and collapsing the neural network into a linear model. Model thus starts linear and becomes nonlinear as the weights increase
• Starting weights at exactly 0 leads to zero derivatives and perfect symmetry. In other words, the algorithm never moves
• Starting with large weights leads to poor solutions
• Weight Training
• Back Propagation
• Alternative methods (ex: Particle swarm optimization)
• Learning Rate
• Hidden layers
• Better to have too many hidden units than too few
• With too few, the model might not have enough flexibility to capture nonlinearities.
• With too many, the extra weights could be zero and ignored.
• Choice of number of hidden layers guided by experimentation and background knowledge. Guess and Tell.
• Each layer extracts features of the input for regression or classification
• Problems to run into
• Overfitting
• Training too long can fit the weights to only recognize things that mimic the training set exactly
• Multiple minima
• If there are many places for the network to settle, then results depend on the starting weights
• Good idea to start with many different starting weights and choose the solution with the lowest error
• Too few/too many hidden layers or hidden units
• Long training time (can be reduced if using parallel computing)
• Examples

References
• Bishop, C.M. (1995) Neural Networks for Pattern Recognition, Oxford: Oxford University Press
• Hastie, T. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer.
• Haykin, S. (1999) Neural Networks: A Comprehensive Foundation, Prentice Hall.
• Smith, Murray (1993) Neural Networks for Statistical Modeling, Van Nostrand Reinhold

Last edited by simboden; 04-13-2010
04-13-2010
Looks good. I think that the simpler examples will be a particularly important part of this, since many neural net treatments are long on formalism and short on practical examples.
04-13-2010
Are you also going to discuss the downsides of neural networks? I've always wanted to learn about NNs, but every time I bring them up, someone gets a look on his or her face. Apparently it's difficult to deconstruct them, to figure out why they're making a given decision.
04-16-2010
I can talk about the downsides. That would be a great addition actually. Thanks
04-17-2010
Starting of with perceptron example will also be beneficial. One can show how it fails to handle non-linear problems (XOR case) and how the generalization for multi-layer perceptron helps (universal approximator theorem).

One can also discuss different squashing functions at the neurons and their impacts.
04-19-2010
Some examples can be found here:
Neural Network Examples and Demonstrations
Pattern Recognition - an example

Perhaps it would be helpful to include some discussion on the applications of neural networks.

Also, this might be useful:
"What are artificial neural networks?"
Nat Biotechnol. 2008 Feb;26(2):195-197
http://www.nature.com/nbt/journal/v2...l/nbt1386.html
05-03-2010
That's a great idea Aayush. That would definitely be helpful in understanding hidden layers.

Jonathan - That primer is awesome. Definitely the most straight forward explanation of Neural Networks I've read. Thanks!!
05-04-2010
Final Project

Attached is my final slide presentation on Neural Networks.

I'll try again later.

Still not working. I've uploaded it here

Last edited by simboden; 05-04-2010
05-07-2010
Nice distillation of the basic issues. I found myself wishing for more quantitative examples, equations, etc.

