accelerated ray tracer

In all the great online classes I attended over the last year, there was one topic missing. Finally I found an offering for a Computer Graphics class. After all, that’s the field I ‘ve been working in for the last five and a half years. The class is offered at edx.org and is from Berkley. It’s the first class I’m taking from edx, and the style of the class is comparable to coursera and udacity.

The first part of the class was concerned about OpenGL, and we implemented an interactive scene viewer. Although I didn’t work directly with regular OpenGL before, only with WebGL which is based on OpenGL ES, it was mostly repetition. But nonetheless it was good training for working with homogeneous coordinates and matrices with different orderings. For grading, we had to produce 12 screenshots from the same scene with different transformations. Once it was implemented I had only to change the order of some transformations to have all images correct.

The second part was concerned with ray tracing. Eventhough I was familiar with the basic concept, working with it was new to me. And in the class, we had to build a ray tracer from scratch.The theory sounded straight forward. But somehow I was not so lucky in implementing it. In every new part I made some silly mistake. I developed it not exemplary test driven, but with unit tests for every key part that I wanted to verify. With that in place I could usually find and correct the problem in time. For grading, we had to produce seven images. The first four were from the same scene with different lighting. Most were not too spectacular, but the one with full specular showed really nice reflections. You don’t usually get to see that with shaded scenes as I’m used to. One thing worth mentioning is that normals have special transformation. The next image had 1000 spheres. I had it almost right, but in the distance I had some artefacts. These stemmed from numeric precision problems. To avoid self shading, you move the intersection point a little bit in surface normal direction before shooting the rays to the light sources. I started out with 1e-5 as the small value, but quickly moved to 1e-4. For the spheres in the distance I had to use 1e-3 to get rid of those black pixels.

The last image posed a special challenge. I stopped the regular ray tracer that worked nicely for the other images after more than 40 hours and not nearing completion. It’s understandable that a scene with 100’000 triangles is a challenge for an un-accelerated implementation. The first improvement that I implemented, was space partitioning. I divided the largest extent by 10 and separated the triangles into equally spaced cubes. That brought the time down to two and a half hours. Multi threading would have been easy, but I wanted to use the GPU. I thought long about how I could handle the recursion and control path divergion in the OpenCL kernel. But then I decided I would start much simpler. I just moved the box- and triangle intersection to the GPU. I figured that this was the most time consuming and easiest to parallelize part. And indeed, execution time dropped to ten miutes.

Update:

So I had a look with the profiler. Turns out it’s cheaper to intersect the triangles twice, that overlap the border of the partitioning box, than sorting out the duplicates. Now I’m at two and a half minutes.
Actually it was the first time, I used a profiler on linux. At work, we use AQTime which integrates nicely into VisualStudio. On linux, you just have to pass the  -pg flag to the compiler and the linker. Then you run your instrumented application. Just make sure it exits cleanly. You can then turn the generated output into readable form with : gprof myApp > profile.txt    It’s as easy as that. The generated report looks like this:

  %   cumulative   self
 time   seconds   seconds    name
 40.51     23.04    23.04    Shell::getIntersectingBoxes(Ray const&) const
 35.99     43.51    20.47    Region::intersect(Ray const&) const
 20.36     55.09    11.58    Shell::intersect(Ray const&) const
  1.31     55.84     0.75    Region::~Region()
  0.66     56.21     0.38    hit_face(double, double, double, double, double, double)
  0.46     56.47     0.26    std::vector<float, std::allocator<float> >::_M_default_append(unsigned long)
  0.18     56.57     0.10    Shell::intersect(std::tuple<unsigned long, unsigned long, unsigned long, Material, glm::detail::tmat4x4<float> > const&, Ray const&) const
  0.07     56.61     0.04    Scene::produceImages(boost::filesystem3::path const&, unsigned long) const
  0.07     56.65     0.04    Shell::partition(unsigned long) const

Comments

One response to “accelerated ray tracer”

  1. […] and GPU. And that is by using only a fraction of the power. Then I applied the same changes to my GPU accelerated ray tracer. The ultrabook ran the homework image in 15 minutes. So this one was a bit slower than the […]

Leave a Reply

Your email address will not be published. Required fields are marked *