Do you like Mandelbrot Set? Do you like Animations? Wanna have all together right now? Here it comes. (It has been supposed to be an exciting announcement, nevermind...)
A small demonstration example for CUDA with OpenGL Pixelbuffer put together in a Qt4 Application.
Source Project Solution VS 2010 "qt4_mandelbrot"
16.04.12: This example is "deprecated". I have reworked and bugfixed it for the Linux tutorial, see here. I will release the VS2010 edition soon, but you can copy the new source from the Linux tutorial.
19.10.11: Bugfix: cudaGLSetGLDevice(0) must be called only in initGL()
(Requirements see previous tutorial on qt and cuda in vs2010)
In my previous tutorial on Qt, CUDA, VS2010 I have shown you how to integrate all the libs together. You have been able to run an empty CUDA kernel in a Qt Application. It already has an OpenGL Widget (yet black empty screen). Now you want to create a 2D Image with Pixel-Fun-Stuff processed right on the GPU. OpenGL is just used as Presenter.
Before I have started I had following questions:
- How to connect .cu source with .cpp source ?
(i.e. running kernel functions by extern classes)
- How to calculate and paint the result on gpu side ?
(without involving cpu cycles/host such as memcopy)
extern "C" void launch_kernel(uchar4*, unsigned int, unsigned int, int);
(You cannot include your .cu files, since they are not simply C files. The implementation on cuda side will be linked after compilation, so launch_kernel() will find its definition here.)
2. I assume you know how to create and bind textures in OpenGL. You may heard of pixel buffer objects too. It's well explained on this site. We will allocate our image space on gpu side creating a pixelbuffer object. CUDA will use this object for pixel manipulation (of course on gpu side as well). Our image then will be bound to an OpenGL Quad as texture. I want to give you an encouraging quote from one of my references ():
As we will see, CUDA and OpenGL interoperability is very fast!But there is a little restriction you should know: The Pixelbuffer Access is exclusive. Only one can access the pixel buffer at the same time, either CUDA or OpenGL.
The reason (aside from the speed of CUDA) is that CUDA maps OpenGL buffer(s) into the CUDA memory space with a call to cudaGLMapBufferObject(). On a single GPU system, no data movement is required! Once provided with a pointer, CUDA programmers are then free to exploit their knowledge of CUDA to write fast and efficient kernels that operate on the mapped OpenGL buffers. ()
I also recommend presentation  about CUDA and OpenGL, especially the part starting on page 22 (Steps To Draw An Image From Cuda). You will see how to work with the pixel buffer in OpenGL and Cuda.
In our Demonstration Project we use Qt (QGLBuffer) for dealing with the Pixelbuffer, so we dont have to care for OpenGL extensions (maybe glew for proc adresses and so on). We create the pixel buffer object as follows (simplePBO.cpp::createPBO()):
In simplePBO.cpp::initCuda() the first cuda device is choosen ( cudaGLSetGLDevice(0) ). You will have to change on your own, if it doesnt fit. You can check your cuda devices with this little exe I wrote from : CUDA Device Checker (output on console). A more advanced GUI based CUDA Checker you can obtain here named as CUDA-Z.
Ok, I do not want to explain every method here, just catch the code and explore the comments and consider the references [1; 3].
Last thing I want to mention is the image size. Due to the thread dimensions (16 per block) image size has to be a multiple of 16. So dont get confused about it. You also could set a fixed image size of 512 or 528 or something like that (see simplePBO.cpp::initCuda()).
 - CUDA, Supercomputing for the Masses, from http://drdobbs.com/cpp/222600097
 - CUDA By Example, An Introduction To General-Purpose GPU Programming. Book source codes you can download here
 - What Every CUDA Programmer Should Know About OpenGL, PDF Version