![]() |
|
#1
|
|||
|
|||
|
For my project, I implemented Expectation Maximization of Gaussian Mixture Models with Nvidia's CUDA SDK. This allows me to run the EM algorithm in a massively parallel nature, resulting in order of magnitude speedups over the conventional C++ version.
![]() ![]() Links: Project Home Page Matlab File Exchange Report Source Code Bonus: Parallel IQAgent Readme contents: ABOUT ================================================== ============================== This is a parallel implementation of the Expectation Maximization algorithm for Gaussian Mixture Models, designed to run on NVidia graphics cards supporting CUDA. On my machine, it provides up to 60x performance increases. See the report available at http://andrewharp.com/gmmcuda for more information. The interesting code is all in gpugaumixmod.h and gpugaumixmod_kernel.h. The reference CPU implementation is in gaumixmod.h. It can be integrated into any C program on a CUDA enabled system. Additionally, Matlab integration is provided in gmm.cu. E-mail me with any questions or comments! COMPILING ================================================== ============================== You'll probably have trouble compiling as-is, as the config files are set up to run on my Windows Vista 64bit machine, but it's just a standard Cuda kernel underneath so it should be portable. A precompiled Windows 64-bit version is included. See compile.m for the command I use to compile the CUDA/Mex files. Go here to find the toolkit that contains the files you'll need for compiling on your platform: http://developer.nvidia.com/object/matlab_cuda.html RUNNING ================================================== ============================== Once compiled, start off by running gmm_example in Matlab to see it in action. See experiment1, experiment2, experiment3 for ready to run experiments -Andrew Harp Last edited by aharp; 05-07-2009 at 11:09 AM. Reason: added Matlab File Exchange link |
|
#2
|
|||
|
|||
|
That is pretty neat. You have to have Nvidia card to run it? Would it be amendable to porting to MPI?
|
|
#3
|
|||
|
|||
|
Yep, you have to have one of the Nvidia cards listed here.
I don't really know anything about MPI, but looking at discussions boards seems to indicate you could do it if you distributed the data right and handled kernel synchronization yourself. Basically MPI wouldn't know anything about CUDA directly, but each distributed CPU could have its own GPU to help it handle its workload. Last edited by aharp; 05-07-2009 at 10:45 PM. |
|
#4
|
|||
|
|||
|
I decomposed the loops into more parallel threads and blocks, and I was able to increase the performance ratio against the reference version from ~60x to ~150x in the best case.
Additionally, the worst case performance is about double, and it increases more dramatically to best case performance as you increase either number of dimensions or number of clusters. The new version is available here. I've attached the original source to this post only so that a copy of my turned in version would still be available. |
| Thread Tools | |
| Display Modes | |
|
|