It’s already a while back that I completed the coursera class “Heterogeneous Parallel Programming“. It was mainly concerned with cuda, which is Nvidia’s GPGPU framework. GPGPU is about running common computations on the graphics card. The class also quickly covered OpenCL, OpenACC, C++AMP and MPI.

In the programming assignments, we juggled a lot with low level details such as distributing the work load to thread blocks, which I almost didn’t care about when using OpenCL so far. After seeing cuda and OpenCL, it was a little surprise, that C++AMP is indeed a more convenient programming model, and not just a C++ compiler for the graphics card. Let’s hope that it gets ported to other platforms soon.

The most eye opening revelation for me was, that it is possible to parallelize prefix sum computation. When I was first presented with the problem, I thought that’s a showcase for serial execution. But apparently it’s not. Making it parallel is a two step process. First make a number of blocks, and compute the sum at the boundary for each one using something like a tree structure (in parallel). Once you have that, it’s more obvious, how to parallelize the rest.

In all the great online classes I attended over the last year, there was one topic missing. Finally I found an offering for a Computer Graphics class. After all, that’s the field I ‘ve been working in for the last five and a half years. The class is offered at and is from Berkley. It’s the first class I’m taking from edx, and the style of the class is comparable to coursera and udacity.

The first part of the class was concerned about OpenGL, and we implemented an interactive scene viewer. Although I didn’t work directly with regular OpenGL before, only with WebGL which is based on OpenGL ES, it was mostly repetition. But nonetheless it was good training for working with homogeneous coordinates and matrices with different orderings. For grading, we had to produce 12 screenshots from the same scene with different transformations. Once it was implemented I had only to change the order of some transformations to have all images correct.

The second part was concerned with ray tracing. Eventhough I was familiar with the basic concept, working with it was new to me. And in the class, we had to build a ray tracer from scratch.The theory sounded straight forward. But somehow I was not so lucky in implementing it. In every new part I made some silly mistake. I developed it not exemplary test driven, but with unit tests for every key part that I wanted to verify. With that in place I could usually find and correct the problem in time. For grading, we had to produce seven images.

There is an increasing noise about GPGPU computing and how much faster than CPU (even parallel) it is. If you didn’t hear about all that, GPGPU is about using the computer’s graphics card(s) to do general purpose computations. The key to the performance lies in the parallel architecture of these devices. From what I read, an average graphics card has 64 parallel units, but they are not as versatile as the CPU of which a typical PC these days has 4 cores. That means, if the task is well suited, it can boost performance significantly, but if not, it’s nothing more than a lot of wasted work.

So I wanted to see for myself. To get started I read the book “OpenCL Programming Guide“. It gave a good overview. But now it was time to give it a try.

