{"id":380,"date":"2012-03-01T21:13:04","date_gmt":"2012-03-01T19:13:04","guid":{"rendered":"http:\/\/blog.ulrichard.ch\/?p=380"},"modified":"2012-03-01T21:13:04","modified_gmt":"2012-03-01T19:13:04","slug":"opencl-first-steps","status":"publish","type":"post","link":"https:\/\/ulrichard.ch\/blog\/?p=380","title":{"rendered":"OpenCL First Steps"},"content":{"rendered":"<p>There is an increasing noise about <a href=\"http:\/\/gpgpu.org\/\">GPGPU<\/a> computing and how much faster than CPU (even parallel) it is. If you didn&#8217;t hear about all that, GPGPU is about using the computer&#8217;s graphics card(s) to do general purpose computations. The key to the performance lies in the parallel architecture of these devices. From what I read, an average graphics card has 64 parallel units, but they are not as versatile as the CPU of which a typical PC these days has 4 cores. That means, if the task is well suited, it can boost performance significantly, but if not, it&#8217;s nothing more than a lot of wasted work.<\/p>\n<p>So I wanted to see for myself. To get started I read the book &#8220;<a href=\"http:\/\/www.amazon.de\/OpenCL-Programming-OpenGL-Aaftab-Munshi\/dp\/0321749642\/ref=sr_1_1?ie=UTF8&amp;qid=1330624451&amp;sr=8-1\">OpenCL Programming Guide<\/a>&#8220;. It gave a good overview. But now it was time to give it a try.<\/p>\n<p><!--more-->The first step was to get the libraries and drivers installed. As always, the first place I looked was the ubuntu repositories, but that was kinda <a href=\"http:\/\/blog.ulrichard.ch\/?p=305\">bummer<\/a>. So I went duckduckgoing (that&#8217;s like googling, but with <a href=\"http:\/\/duckduckgo.com\/about.html\">DuckDuckGo<\/a> ), and I found some descriptions on how to install the drivers <a href=\"http:\/\/www.thebigblob.com\/getting-started-with-opencl-and-gpu-computing\/\">here<\/a> and <a href=\"http:\/\/www.streamcomputing.eu\/blog\/2011-06-24\/install-opencl-on-debianubuntu-orderly\/\">here<\/a>. They both suggest that for all three providers (NVIDIA, ATI, Intel), you&#8217;d have to use the proprietary installer. I hate this. There is such a wonderful package system with debian and derivatives, it makes you feel like on Windows if you have to use such an installer. So I looked further, and indeed :<\/p>\n<pre class=\"brush: bash; gutter: false; first-line: 1\">$ apt-file search \"CL\/cl.h\"\n nvidia-current-dev: \/usr\/include\/nvidia-current\/CL\/cl.h\n nvidia-current-updates-dev: \/usr\/include\/nvidia-current-updates\/CL\/cl.h\n$ sudo apt-get install nvidia-current-dev<\/pre>\n<p>Unfortunately that is only for Nvidia. <a href=\"http:\/\/wiki.debian.org\/ATIStream\">ATI is in debian experimental<\/a>, so I hope it will appear in ubuntu someday as well. Apparently there is an RPM from Intel that can be converted to a DEB with alien, but that has some caveats. For now, NVIDIA is enough to get started. When I realized that the OpenCL C++ wrapper is not in the repository, I thought about packaging it myself. But then I discovered that a guy with much more experience <a href=\"http:\/\/www.streamcomputing.eu\/blog\/2011-06-24\/install-opencl-on-debianubuntu-orderly\/\">is on the task already<\/a>. So I just copied the few files to my project for the moment. Then I found a <a href=\"http:\/\/code.google.com\/p\/opencl-book-samples\/source\/browse\/trunk\/cmake\/FindOpenCL.cmake?r=14\">FindOpenCL.cmake<\/a> file just before I started to write my own one. I had to just slightly modify it to find the NVIDIA files that I installed previously.<\/p>\n<p>The first example I tried was &#8220;<a href=\"http:\/\/www.thebigblob.com\/getting-started-with-opencl-and-gpu-computing\/\">vector addition in C<\/a>&#8221; then I converted it to C++ and later modified to better match the example in the book. In the process I also added errorhandling to the compiling stage just to find out that my graphics card doesn&#8217;t support double precision floating point numbers (that are the standard at my daytime job). Then I wanted to compare the performance between GPU and CPU. But in the first iteration, everything with buffers that were not too big, was barely measurable. So I had to run the stuff many times to get more accurate numbers. The code is at: <a href=\"https:\/\/github.com\/ulrichard\/ai-class-NLP\/tree\/master\/OpenCL\/FirstTry\">https:\/\/github.com\/ulrichard\/ai-class-NLP\/tree\/master\/OpenCL\/FirstTry<\/a>\u00c2\u00a0 and here are some measurements from my computer:<\/p>\n<table border=\"1\">\n<tbody>\n<tr>\n<td><strong>VectorSize<\/strong><\/td>\n<td><strong>CPU AMD<br \/>\n<\/strong><\/td>\n<td><strong>GeForce 9800 GT<\/strong><\/td>\n<td><strong>CPU Intel i7 mobile<br \/>\n<\/strong><\/td>\n<td><strong>Intel 4000 mobuile GPU<\/strong><\/td>\n<\/tr>\n<tr>\n<td>10<\/td>\n<td>2.48609e-06<\/td>\n<td>0.000558952<\/td>\n<td>\u00c2\u00a02.618e-07<\/td>\n<td>\u00c2\u00a03.7654e-05<\/td>\n<\/tr>\n<tr>\n<td>40<\/td>\n<td>3.13896e-06<\/td>\n<td>0.000570314<\/td>\n<td>\u00c2\u00a06.29196e-07<\/td>\n<td>4.32716e-05<\/td>\n<\/tr>\n<tr>\n<td>160<\/td>\n<td>7.86075e-06<\/td>\n<td>0.00058319<\/td>\n<td>\u00c2\u00a01.76039e-06<\/td>\n<td>3.11682e-05<\/td>\n<\/tr>\n<tr>\n<td>640<\/td>\n<td>2.53343e-05<\/td>\n<td>0.000502504<\/td>\n<td>\u00c2\u00a06.41616e-06<\/td>\n<td>3.1311e-05<\/td>\n<\/tr>\n<tr>\n<td>2560<\/td>\n<td>8.38907e-05<\/td>\n<td>0.000536779<\/td>\n<td>\u00c2\u00a02.66609e-05<\/td>\n<td>4.39072e-05<\/td>\n<\/tr>\n<tr>\n<td>10240<\/td>\n<td>0.000347115<\/td>\n<td>0.000918079<\/td>\n<td>\u00c2\u00a08.27613e-05<\/td>\n<td>7.17255e-05<\/td>\n<\/tr>\n<tr>\n<td>40960<\/td>\n<td>0.00180239<\/td>\n<td>0.00181058<\/td>\n<td>\u00c2\u00a00.000391192<\/td>\n<td>9.42563e-05<\/td>\n<\/tr>\n<tr>\n<td>163840<\/td>\n<td>0.00682403<\/td>\n<td>0.00560538<\/td>\n<td>\u00c2\u00a00.000714286<\/td>\n<td>0.000424242<\/td>\n<\/tr>\n<tr>\n<td>655360<\/td>\n<td>0.0283929<\/td>\n<td>0.0221429<\/td>\n<td>\u00c2\u00a00.00623318<\/td>\n<td>0.00228972<\/td>\n<\/tr>\n<tr>\n<td>2621440<\/td>\n<td>0.114286<\/td>\n<td>0.07875<\/td>\n<td>\u00c2\u00a00.0238596<\/td>\n<td>0.00796875<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Sure, these numbers don&#8217;t say much, as adding two numbers is not much of a computation. But it shows already, that all the overhead of copying the buffers back and forth can be amortized with big enough tasks. And the framework I built can become useful to compare the timings of more elaborate computations.<\/p>\n<p>Next step would be to pick an existing algorithm and see if I can make it faster on the GPU.<\/p>\n<p>Update Dec 2013:\u00c2\u00a0 added measurements for my new <a href=\"http:\/\/www.dell.com\/ch\/unternehmen\/p\/xps-13-linux\/pd?refid=xps-13-linux\">Dell XPS13DE ultrabook<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There is an increasing noise about GPGPU computing and how much faster than CPU (even parallel) it is. If you didn&#8217;t hear about all that, GPGPU is about using the computer&#8217;s graphics card(s) to do general purpose computations. The key to the performance lies in the parallel architecture of these devices. From what I read, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,7,1,10],"tags":[43,44,68,98,135,160,230],"class_list":["post-380","post","type-post","status-publish","format-standard","hentry","category-projects","category-software","category-uncategorized","category-work","tag-c","tag-cad","tag-debian","tag-gpgpu","tag-linux","tag-opencl","tag-ubuntu"],"_links":{"self":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts\/380","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=380"}],"version-history":[{"count":0,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=\/wp\/v2\/posts\/380\/revisions"}],"wp:attachment":[{"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ulrichard.ch\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}