Design and Implementation
of an Interactive Ray Tracer



MARIA E. JUMP
mjump@cs.umd.edu
www.cs.umd.edu/~mjump/art/

The support of the University of Maryland under
the Senior Summer Scholars Program is gratefully acknowledged.





September 14, 1998





Graphics are used in virtually every aspect of modern computer applications from user interfaces to data visualization to television commercials and motion pictures. As computers become more powerful, the demand for more realistic graphics continues to grow. The fundamental difficulty in using a computer to generate images with a high degree of realism lies in the complexity of the real world. Every surface, texture, shadow, irregularity in surrounding objects of the world around you adds complexity to the image that you see. The ultimate goal of image synthesis is to create an image that is indistinguishable from a photograph [1].

One of the simplest and most realistic methods for image synthesis originated in the field of physics. Here, photons of light are traced as rays from their source through the scene. A photon of light that leaves a light source hits an object. Some of its energy is absorbed and some is reflected. The reflected portion of the photon of light may continue to be reflected off of other objects in the scene. It may hit a transparent object and be refracted. The rays of light that finally enter our eye are what we see [4].

This process of following rays of light is called "ray tracing" and is considered to be one of the most popular and powerful techniques in the image synthesis repertoire. With a three-dimensional model of a scene and infinite amount of time, ray tracing techniques can generate stunningly accurate images [2]. The main problem is that most of the light that leaves the light sources never hit our eye. Consequently, the vast majority of computation of light ray pathways is wasted.

Ray Tracing

The technique of ray tracing is based on reversing optics to correct for the wasted computational bandwidth caused by tracing rays that we never see. This means that only those rays that actually hit our eye are traced. Imagine holding up a wire screen in front of your eye. Each grid square in the screen corresponds to a pixel to be rendered in the final image. We shoot rays out from our eye through the center of each grid square and trace them through the scene.

If the ray hits an object then the intensity of the reflected light at this point is calculated by determining which light sources are visible. This is done by tracing rays from the point of intersection to the light source. If any of these rays succeed in reaching a light source before hitting another object, then the light source illuminates the object. If not, then a shadow is cast on the object.


Figure 1: Ray tracing [4].

Using the direction to the light source, the direction to the viewer, the perpendicular to the point of intersection, and the intersected object's color, the pixel's color can be calculated. This simply needs to be repeated for all of the pixels in the grid to generate our image.

Two factors complicate this simple model. The first arises because of the simplifying assumption that the light that illuminates a point on an object arrives directly from the light sources. This totally ignores the possibility that light might be arriving indirectly from other objects in the scene. Since indirect illumination could arrive from anywhere in the scene, it is quite difficult to model. Typically indirect light is modeled as ambient light that illuminates every point in the scene.

The second complication arises from highly reflective or transparent objects in the scene. In the case of reflective objects, such as mirrors or shiny objects, a reflective component must be taken into consideration. Using an associated coefficient of reflection, the portion of the ray that is reflected is calculated and recursively traced through the scene. In the case of transparent objects, such as glass balls, a refracted component must be taken into consideration. An index of refraction is used to calculate a refracted ray that is also recursively traced through the scene. When each of the reflected or refracted ray-trace colors is returned, it is factored into the color of the pixel at that point. In this way, highly reflective and transparent objects can be modeled [3].

In order to achieve photorealistic images, millions of rays representing individual rays of light need to be cast and traced through the scene. In addition, the reflected and refracted portions of each individual ray of light needs to be traced until the there is no energy left. This requires considerable computational time even on the fastest of today's machines and even the smallest of images can take a significant amount of time to render. Certainly nobody wants to wait 15-60 seconds just to realize that the image was not what they really wanted and then have to do it again. This was the motivation for the development of an Interactive Ray Tracer.

Interactive Ray Tracer

aRT is a real-time interactive ray tracing program that accepts a mathematical description of a three-dimensional world and an initial camera position as input. It then provides the user with a window into the three-dimensional world. As rays of light are traced through the scene, data is delivered back to the user in the form of a low resolution image that is 1/16 of the resolution of the actual image (see Figure 2a). aRT then progressively renders the image until it reaches full resolution or until it is interrupted by the user.


(a) Image at 1/16th resolution.

(b) Image at 1/8th resolution.

(c) Image at 1/4th resolution.

(d) Image at half resolution.

(e) Image at full resolution.

Figure 2: Stages of rendering a 200x200 image.

By providing immediate feedback to the user, aRT allows the user to make decisions regarding the position of the camera before waiting until the entire image is rendered. Using the arrow keys or mouse, the camera can be panned in the desired direction. In addition, the camera can be moved in and out using 'I' to move in and 'O' to move out. Moving the camera interrupts the process of rendering, adjusts the image and then continues to render.

In order to facilitate interactive movement of the camera and subsequent image generation, a data structure is used to store state and color information of the image. By storing this information, image data can simply be read from the data structure rather than recalculated every time the camera is repositioned.

Quadtree Data Structure

aRT was originally developed using a traditional two-dimensional data structure, the quadtree [5]. Its naturally hierarchical structure lends itself to the process of rendering our image at lower resolutions. This is accomplished by dividing the image size evenly into quadrants creating a new level in our tree and tracing a ray through the center of each quadrant. As data is calculated, the image is immediately rendered at each level of the tree representing lower resolutions. Each quadrant is divided recursively and the ray through the center is traced. This continues until each quadrant contains the data representing only one pixel of image data.


Figure 3: Example quadtree showing breakdown at different levels.

Several difficulties arose from using the quadtree:

  1. As the tree is built, rays that have already been cast at a higher level in the quadtree are necessarily recast in lower levels, thus increasing computation time and slowing render time.
  2. The quadtree did not easily lend itself to the restructuring that is necessary for the implementation of camera motion. The difficulty here lies in the complexity of the multi-level structure of the tree. Camera motion needs to be implemented at the lower levels in the tree requiring that large amounts of the tree be restructured whenever the camera was moved by even the smallest amounts.
  3. Final data can be found at any level and thus even the simplest operation requires a complete tree traversal.
  4. Complete traversal of the tree requires significant amounts of time adding to the complexity of both a restructuring of the tree as well as a simple redrawing of the image in its current state.
These problems led to the development of the imagetree data structure.

Imagetree Data Structure

An alternative to the initial quadtree structure was needed to minimize or eliminate the amount of time that was spent in the upper levels of the quadtree as well as lend itself to restructuring for horizontal and vertical translations that would be required to implement the motion of the camera. Starting with the traditional quadtree, we chopped off the upper levels and ended up with a data structure that is flat at the top, but complex in the lower levels. We call this data structure an imagetree. Each node of the imagetree contains information for the ray tracer as to the location of that node relative to the middle of the window. Since the hierarchical nature of the quadtree was still desired, it was natural to keep it as a secondary data structure. Thus each node maintains a pointer to a "mini-quadtree" which contains the actual image data.

Rays are cast at each level of the mini-quadtree, but the results are stored only in the lowest level. This, in addition to the removal of the upper-most levels of the original quadtree, reduces the number of rays that need to be cast by at least 25%. Reducing the number of rays also reduces the computational time required to generate an image thus speeding up rendering.


Figure 4: Graphical representation of an imagetree

Two additional features of aRT help to decrease the rendering time. The first is the ability to toggle the reflections and shading by pressing the 'R' key on the keyboard. This reduces the computational time of rendering the scene by eliminating all recursive traces through the scene. Only the color of the original object of intersection is used to determine pixel color producing a flat image.

 
Figure 5: Standard 200x200 image and flat image without reflections or shading.

The second is the ability to toggle the shadows by pressing 'S' on the keyboard. In this case, computation time is reduced because the rays back to the light sources are omitted from the trace producing an image without shadows.

 
Figure 6: Image with shadows and without shadows.

By using the imagetree rather than the quadtree, the number of rays that must be traced to generate our image is reduced. This decrease in rays results in a decrease in computation time and, thus, a decrease in the render time. In addition, the imagetree allows for the implementation of the camera motion in an efficient manner.

Camera Motion

aRT is capable of two types of camera motion. The first is panning or rotating the camera and the second is moving the camera in and out.

The implementation of a rotation of the camera is accomplished by translation in the horizontal and vertical directions. As this is a primary goal of aRT, the top level of our chopped quadtree is stored as a two-dimensional array of imagetree nodes. In this way, whenever the camera is rotated, a quick shift in the secondary quadtrees results in a shift in the image.


(a) Image before camera is
panned down and right.

(b) Immediately following the pan
of the camera, the current
picture is shifted.

(c) Image continues to render at
the highest level of the imagetree
at 1/16th resolution.

(d) Rendering continues
at 1/8th resolution.

(e) Half resolution.

(f) Full resolution after
the camera movement.

Figure 7: Stages of rendering after panning the camera.

In an effort to give the program a "memory" the two-dimensional array of imagetree nodes is 50% larger in each direction than the window. Thus it is able to store information so that if the camera is moved in one direction and then back again, the data structure remembers the rays that have been cast and they do not have to be retraced, just read from the secondary data structure.

Moving the camera in and out is more complex due to the nature of the movement. Whenever the camera is moved in or out, not only is there an enlargement or reduction in the current image, but the visual relationships among objects in the image can shift. Two factors determine the amount of this shift: the distance of the objects from the camera and the distances of the object from the middle of the image. Objects that are closer to the camera tend to shift more than objects that are farther away. Likewise, the farther from the middle of the image, the greater this change can be. As such, the data that is already in the imagetree is suspect and must be recalculated. Moving the camera in and out requires that much of the traced data be thrown out to be retraced. Thus, the movement of the camera includes the initial guess based on the current image and then retracing and redrawing the outer areas of the image. How much data is discarded depends on the distance from the middle of the image.


(a) Before moving the camera out.

(b) Immediately following
the move, the current
picture is reduced.

(c) Image continues to render
in decreasing resolution
from the center.

(d) 1/8th resolution.

(e) Half resolution.

(f) Full Resolution.

Figure 8: Stages of rendering after zooming in with the camera.

Concluding Remarks and Future Study

aRT was originally developed using standard methods and data structures. As development continued, the design and implementation of the imagetree allowed for faster rendering by reducing computations that were necessary to trace the rays through the scene. In addition, the imagetree enabled for efficient implementation of the camera's motion. It is unique in its ability to remember things that have previously been computed.

A great deal of work still needs to be done in order to realize the full potential of the imagetree data structure. Several ideas immediately come to mind:

  1. Extend move in/out such that a mathematical algorithm would determine which of the rays could be saved and which definitely could not.
  2. Implement an algorithm for rendering areas of greater interest first. This would require an algorithm that can distinguish areas of low interest from areas of high interest. This includes only rendering "blank walls" at the lowest resolution rather than wasting computational time rendering them at the higher resolution.
  3. Extend the imagetree grid to a spherical data structure that surrounds the camera and is able to pivot on any axis. This would allow continuous rotation of the camera while maintaining information about the world.

References

  1. Foley, James D., van Dam, Andries, Feiner, Steven K., & Hughes, John F. Computer Graphics: Principles and Practice (Second Edition in C). Addison-Wesley Publishing Company. 1996.
  2. Glassner, Andrew S. An Introduction to Ray Tracing. Academic Press Limited, San Diego, CA. 1989.
  3. Hearn, Donald, & Baker, M. Pauline. Computer Graphics. Prentice Halls, Englewood Cliffs, NJ. 1994
  4. Mount, David. Lecture Notes from CMSC427/828M. Spring 1997.
  5. Samet, Hanan. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, MA. 1990.
  6. Silicon Graphics, Inc. OpenGL WWW Center (http://www.sgi.com/Technology/OpenGL/). 1998.
  7. Wilt, Nicholas P. Object-Oriented Ray Tracing in C++. John Wiley & Sons, Inc. 1994.