How to implement multithreading in a Java ray tracer

Karl A :

I'm writing a ray tracing program in Java and have implemented multithreading using Runnable interface. Each thread renders a portion of the 800 vertical lines. When using two threads, they will render 400 lines each. For 8 threads, 100 lines each, and so on.

My solution is currently working, but the time it takes to render increases when using more threads. The amount of time to render each line increases by approx 3x when you double the number of threads. But more lines render simultanously so the total render time increases by approx 50% which is still bad. It should be faster.

I know that the threads only render the correct lines, I have checked that the same pixels don't get rendered multiple times. I tried to make sure each thread writes to its own variables during the rendering loop, and sum/combine them after the entire loop is done. My CPU has 8 threads, and the CPU usage is high but not 100% when rendering even on 8 threads.

class Multithread implements Runnable {
  Camera camera;
  CountDownLatch latch;
  ...

  //Constructor for thread
  Multithread(Scene s, Camera c, int thread, int threadcount, CountDownLatch cdl){
      camera = c;
      latch = cdl;
      ...
  }

  public void run(){
      try{
          ...
          //This is the render function
          camera.render(...);

          //When all threads unlatch, main class will write PNG
          latch.countDown();
      }
      catch (Exception e){System.out.println ("Exception is caught");}
  }
}

public class Camera {
    //The final pixel values are stored in the 2D-array
    ColorDbl[][] finalImage;

    Camera(int w){
        Width = w;
        finalImage = new ColorDbl[w][w]
    }

    //Start rendering
    void render(Scene S, int start, int end){

        //Create temporary, partial image
        ColorDbl[][] tempImage = new ColorDbl[Width][Width];

        Ray r;
        ColorDbl temp;
        //Render lines of pixels in the interval start-end
        for(int j = start; j < end; ++j){
            for(int i = 0; i < Width; ++i){
                r = new Ray(...);
                temp = r.CastRay(...);
                tempImage[i][j] = temp;
            }
        }

        //Copy rendered lines to final image
        for(int j=start; j<end; ++j){
            for(int i=0; i<Width; ++i){
                finalImage[i][j] = tempImage[i][j];
            }
        }
    }

    public static void main(String[] args) throws IOException{
        //Create camera and scene
        Camera camera = new Camera(800);
        Scene scene = new Scene();

        //Create threads
        int threadcount = 4;
        CountDownLatch latch = new CountDownLatch(threadcount);
        for (int thread=0; thread<threadcount; thread++){
            new Thread(new Multithread(scene, camera, thread, threadcount, latch)).start();
        }

        //Wait for threads to finish
        try{
          latch.await();
        }catch(InterruptedException e){System.out.println ("Exception");}

        //Write PNG
        c.write(...);
    }
}

When using 2 threads instead of 1, I expect almost a doubling of render speed, but instead it takes 50% longer. I don't expect anyone to solve my issue, but I would really appreciate some guidance when it comes to implementing multithreading. Am I going about this the wrong way?

Philipp Claßen :

In the source code that you posted, I do not see an obvious bottleneck. When parallel code is running slower, the most common explanations are either overhead because of synchronization, or doing extra work.

When it comes to synchronization, high congestion can make parallel code run very slowly. It can mean threads (or processes) are fighting over limited resources (e.g., waiting for locks), but it can also be more subtle like accessing the same memory using atomic operations, which can become quite costly. In your example, I did not see anything like that. The only synchronization operation seem to be the countdown latches at the end, which should not be significant. Unequal workloads can also harm scalability, but it seems unlikely in your example.

Doing extra work could be an issue. Maybe you are copying more data in the parallel version than in the sequential one? That could explain some overhead. Another guess is that in the parallel version, the cache locality has been negatively impacted. Note that the effect of the cache is significant (as a rule of thumb, memory accesses can become a factor of 50-100 times slower when your workload no longer fits in the cache).

How to find your bottleneck? In general, that is called profiling. There are specialized tools, for example, VisualVM is a free tool for Java that can be used as a profiler. Another even simpler, but often very effective first approach is to run your program and take some random thread dumps. If you have an obvious bottleneck, it is likely that you will see it in the stack trace.

The technique is often called poor man's profiler, but I found it very effective (see this answer for more details). In addition, you can also apply it safely in production, so it is a neat trick when you have to optimize code that you cannot run on your local machine.

IDE's (like Eclipse or IntelliJ) have support to take Thread dumps but you can also trigger it directly from the command line if you know the process id:

 kill -3 JAVA_PID

The program (or the JVM that runs it) will then print the current stack trace of all current threads. If you repeat that a couple of times, you should get an idea where your program is spending most of its time.

You can also compare it with your sequential version. Maybe you notice some pattern that explains the overhead of the parallel version.

I hope that helped a bit to get started.

How to implement multithreading in a Java ray tracer

Guess you like