OpenCL First Steps

There is an increasing noise about GPGPU computing and how much faster than CPU (even parallel) it is. If you didn’t hear about all that, GPGPU is about using the computer’s graphics card(s) to do general purpose computations. The key to the performance lies in the parallel architecture of these devices. From what I read, an average graphics card has 64 parallel units, but they are not as versatile as the CPU of which a typical PC these days has 4 cores. That means, if the task is well suited, it can boost performance significantly, but if not, it’s nothing more than a lot of wasted work.

So I wanted to see for myself. To get started I read the book “OpenCL Programming Guide“. It gave a good overview. But now it was time to give it a try.

The first step was to get the libraries and drivers installed. As always, the first place I looked was the ubuntu repositories, but that was kinda bummer. So I went duckduckgoing (that’s like googling, but with DuckDuckGo ), and I found some descriptions on how to install the drivers here and here. They both suggest that for all three providers (NVIDIA, ATI, Intel), you’d have to use the proprietary installer. I hate this. There is such a wonderful package system with debian and derivatives, it makes you feel like on Windows if you have to use such an installer. So I looked further, and indeed :

$ apt-file search "CL/cl.h"
 nvidia-current-dev: /usr/include/nvidia-current/CL/cl.h
 nvidia-current-updates-dev: /usr/include/nvidia-current-updates/CL/cl.h
$ sudo apt-get install nvidia-current-dev

Unfortunately that is only for Nvidia. ATI is in debian experimental, so I hope it will appear in ubuntu someday as well. Apparently there is an RPM from Intel that can be converted to a DEB with alien, but that has some caveats. For now, NVIDIA is enough to get started. When I realized that the OpenCL C++ wrapper is not in the repository, I thought about packaging it myself. But then I discovered that a guy with much more experience is on the task already. So I just copied the few files to my project for the moment. Then I found a FindOpenCL.cmake file just before I started to write my own one. I had to just slightly modify it to find the NVIDIA files that I installed previously.

The first example I tried was “vector addition in C” then I converted it to C++ and later modified to better match the example in the book. In the process I also added errorhandling to the compiling stage just to find out that my graphics card doesn’t support double precision floating point numbers (that are the standard at my daytime job). Then I wanted to compare the performance between GPU and CPU. But in the first iteration, everything with buffers that were not too big, was barely measurable. So I had to run the stuff many times to get more accurate numbers. The code is at:  and here are some measurements from my computer:

VectorSize CPU AMD
GeForce 9800 GT CPU Intel i7 mobile
Intel 4000 mobuile GPU
10 2.48609e-06 0.000558952  2.618e-07  3.7654e-05
40 3.13896e-06 0.000570314  6.29196e-07 4.32716e-05
160 7.86075e-06 0.00058319  1.76039e-06 3.11682e-05
640 2.53343e-05 0.000502504  6.41616e-06 3.1311e-05
2560 8.38907e-05 0.000536779  2.66609e-05 4.39072e-05
10240 0.000347115 0.000918079  8.27613e-05 7.17255e-05
40960 0.00180239 0.00181058  0.000391192 9.42563e-05
163840 0.00682403 0.00560538  0.000714286 0.000424242
655360 0.0283929 0.0221429  0.00623318 0.00228972
2621440 0.114286 0.07875  0.0238596 0.00796875

Sure, these numbers don’t say much, as adding two numbers is not much of a computation. But it shows already, that all the overhead of copying the buffers back and forth can be amortized with big enough tasks. And the framework I built can become useful to compare the timings of more elaborate computations.

Next step would be to pick an existing algorithm and see if I can make it faster on the GPU.

Update Dec 2013:  added measurements for my new Dell XPS13DE ultrabook



, , ,



Leave a Reply

Your email address will not be published. Required fields are marked *