# Prapti Neupane's Term Project Page

Here is the link to the Variational Bayes lecture pdf.

Variational Inference

I will turn in a powerpoint lecture which will include about 12-15 slides. The lecture will develop the variational method to approximate the posterior distribution of the parameters theoretically. Sketched below is an outline of content of the lecture.

1. Motivation

Provide Statistical Mechanics as a motivation or an area where Variational Inference could potentially be used

2. Variational Bayes

Explain why the process is called variational Bayes

3. The problem

X observed variables, Z latent variables and parameters Interested in posterior probablity distribution of a parameter vector w given data D. Traditional model fitting approaches involves optimizing the parameter vector to find the mode of the posterior distribution Variational Bayes can be used to find good approximations to the posterior probability distribution of the parameters p(Z|X) as well as the marginal probability of the data P(X).

4. The Variational Bayes theorem

Let p(Z|X) be the posterior distribution of multivariate parameters Z. Let q(Z|X) be the approximate distribution restricted to the set of separable distributions. Then the minimum of the Kl Divergence is reached for q_j(Z_j) proportional to exp(expectation of ln p(X,Z) over all i not equal to j) .

5. Decomposition of the log marginal probability

ln p(X) = L(q) + KL (q||p), where q is the distribution we will use to approximate p KL – relative entropy or Kullback-Liebler divergence, or KL divergence, between the distributions p(Z|X) and q(Z) L(q) is a lower bound on ln p(X)

6. Getting the optimal approximation

Kl divergence positive. So L(q) is closest to ln p(X) when KL is minimized This happens when q(Z) is equal to posterior distribution Often the true posterior distribution is intractable So restrict to a family of tractable distributions q(Z) and seek a member of the family for which the KL divergence is minimized

7. Commonly used family of distributions Factorized distributions Partition the elements of Z into disjoint groups Z_i where i= 1,....,M and then restrict q to the family of separable solutions so q = Product of q_i(Z_i) over all i's. Now make a variational optimization of L(q) with respect to all of the distributions q_i(Z_i) Do this by optimizing with respect to each of the factors in turn Obtain the Variatonal Bayes Theorem

8. Application of Variational Bayes for the univariate Gaussian

Obtain that if we choose a conjugate prior for mu as well as sigma, the best approximation to the posterior distribution of the mean (mu) is another Gaussian and that for sigma is a Gamma distribution. Note that the approximate distribution take the form of the conjugate prior naturally from the structure of the likelihood function without having restricted the family of approximate distributions to these functional forms Each of the approximate distributions depends on moments evaluated with respect to the other distribution. So take an iterative approach reminiscent of the EM algorithm

9. Iterative VB (IVB) algorithm

Make an initial guess for the expected value of one parameter say sigma Use this to recompute the distribution q(mu) Using this revised distribution compute the first and second moment of mu. Use these values to recompute the distribution q(sigma).

10.Looking ahead

We could apply the variational method to obtain a solution for the Bayesian mixture of Gaussians. There is a close similarity between the variational solution for the Bayesian mixture of Gaussians and the EM algorithm for maximum likelihood. If we consider the limit as N → Infinity, the variational treatment converges to the maximum likelihood Em algorithm for mixture of Gaussians.

11. Bibliography Bishop, Christopher M. “Pattern Recognition and Machine Learning.” Springer 2006. Chapters 1 and 10 Mackay, David J.C. “Information Theory, Inference and Learning Algorithms,” Cambridge University Press 2003. Chapter IV Smidl, V., Quinn, A. “The Variational Bayes Method in Signal Processing” Springer-Verlag Berlin Heidelberg 2006. Chapter 3

Comments:

My lecture is mostly going to be just deriving the optimal approximating distribution. I don't know how much algebra I should show on the slides.

- Just noting here that we talked about this in person, and that it seems fine. I hope that you can get the volcano example (in some form, simplified or not) to work. Wpress 15:39, 21 April 2012 (CDT)