an ultrabook for developers

My old netbook still runs, but it shows signs of senility. I have been thinking of a replacement for a while, but as it still worked, that was constantly postponed. When I first read about project sputnik, I thought this is great news and I want one. The device that followed looked very nice, but was a little bit over my budget. Only when the value of BitCoin rised to new hights, I ordered a Dell XPS13 developer edition. The dell representative told me that they don’t YET accept BitCoin for payment, but he was well aware of what it is. Apparently the device shipped from Asia. Since I didn’t know that, I waited eagerly and checked the status every day. After it was in delivery already three days after ordering, I didn’t understand why UPS didn’t even receive the box more than two weeks after that.

The device is really slick. I had no issues so far, not even with the graphics driver. That is also why I wanted this device that comes with ubuntu, and fully supports it. All the drivers are in the vanilla kernel. The graphics card drivers were always the culprit with my previous netbooks. They both had binary drivers when they came out, no 3D acceleration, and the situation degraded gradually. After the second OS upgrade I usually even lost 2D acceleration. Now that I have an ultrabook with a GPU that is apparently fully supported, I wanted to see how well the GPU performed. So I grabbed my very first OpenCL program to give it a try. I was glad to see, that the intel OpenCL driver was already packaged in the ubuntu repository, and that the 4400 GPU support was recently added. This situation is much better than when I started with OpenCL. But I soon realized that this GPU or it’s driver doesn’t support the kind of memory sharing that I used in the example. So, I had to slightly rewrite the host program, no big deal. On the other hand, it would support double precision floats which my geforce in the workstation doesn’t. But after that, I found out that this tiny ultrabook outperforms my five year old workstation by a big margin on CPU and GPU. And that is by using only a fraction of the power. Then I applied the same changes to my GPU accelerated ray tracer. The ultrabook ran the homework image in 15 minutes. So this one was a bit slower than the workstation.

In general, the experience with the XPS13DE is just great. Everything is so responsive, totally different than with the Atom based netbook. The only thing I would have ordered differently if I had a choice was a bigger SSD. Although I was lucky already, If I had ordered a month earlier, It would have come with 128 instead of the 256GB SSD.

The setup was about as follows:

  • OS install with smart card backed full disk encryption
  • setup smart card authentication for ssh
  • checkout of my git home repo.
  • software install with my setup script that adds ppa repositories and apt-get installs everything I need
  • Checking out all source repositores (git and hg) that I usually work with that are not already submodules of my home repo
  • integrate the plasma-desktop into unity so that I could still use the bitcoin plasmoids. But the experience with this integration was not so good, so I reverted that. I will look into writing a screenlet for gnome.
  • syncing the git repos for photos and music. They are why I would have wished for a bigger SSD.
  • syncing the BitCoin block chain

I’m grateful that the BitCoin price surge gave me the opportunity to “vote with my wallet“. Otherwise I would maybe ended up doing the same as last time: buying a cheaper model with a mediocre operating system that I don’t want. That would send the wrong signals, and reinforce the vicious circle. At least Dell has realized that people want good hardware with good linux support. Yes, people are willing to pay a premium for good hardware support for a free and open operating system.

GPGPU programming class

It’s already a while back that I completed the coursera class “Heterogeneous Parallel Programming“. It was mainly concerned with cuda, which is Nvidia’s GPGPU framework. GPGPU is about running common computations on the graphics card. The class also quickly covered OpenCL, OpenACC, C++AMP and MPI.

In the programming assignments, we juggled a lot with low level details such as distributing the work load to thread blocks, which I almost didn’t care about when using OpenCL so far. After seeing cuda and OpenCL, it was a little surprise, that C++AMP is indeed a more convenient programming model, and not just a C++ compiler for the graphics card. Let’s hope that it gets ported to other platforms soon.

The most eye opening revelation for me was, that it is possible to parallelize prefix sum computation. When I was first presented with the problem, I thought that’s a showcase for serial execution. But apparently it’s not. Making it parallel is a two step process. First make a number of blocks, and compute the sum at the boundary for each one using something like a tree structure (in parallel). Once you have that, it’s more obvious, how to parallelize the rest.

accelerated ray tracer

In all the great online classes I attended over the last year, there was one topic missing. Finally I found an offering for a Computer Graphics class. After all, that’s the field I ‘ve been working in for the last five and a half years. The class is offered at edx.org and is from Berkley. It’s the first class I’m taking from edx, and the style of the class is comparable to coursera and udacity.

The first part of the class was concerned about OpenGL, and we implemented an interactive scene viewer. Although I didn’t work directly with regular OpenGL before, only with WebGL which is based on OpenGL ES, it was mostly repetition. But nonetheless it was good training for working with homogeneous coordinates and matrices with different orderings. For grading, we had to produce 12 screenshots from the same scene with different transformations. Once it was implemented I had only to change the order of some transformations to have all images correct.

The second part was concerned with ray tracing. Eventhough I was familiar with the basic concept, working with it was new to me. And in the class, we had to build a ray tracer from scratch.The theory sounded straight forward. But somehow I was not so lucky in implementing it. In every new part I made some silly mistake. I developed it not exemplary test driven, but with unit tests for every key part that I wanted to verify. With that in place I could usually find and correct the problem in time. For grading, we had to produce seven images. Continue reading “accelerated ray tracer”

OpenCL First Steps

There is an increasing noise about GPGPU computing and how much faster than CPU (even parallel) it is. If you didn’t hear about all that, GPGPU is about using the computer’s graphics card(s) to do general purpose computations. The key to the performance lies in the parallel architecture of these devices. From what I read, an average graphics card has 64 parallel units, but they are not as versatile as the CPU of which a typical PC these days has 4 cores. That means, if the task is well suited, it can boost performance significantly, but if not, it’s nothing more than a lot of wasted work.

So I wanted to see for myself. To get started I read the book “OpenCL Programming Guide“. It gave a good overview. But now it was time to give it a try.

Continue reading “OpenCL First Steps”