XIA Chunlei
(Derpartment of Physics,Univertity of Southern California,Los Angeles,CA,USA 90089)
Global illumination is widely used in applications such as producing realistic images of virtual objects in video gaming,movie production or design processes in architecture,car and airplane industry.However,global illumination is impossible without ray tracing features,which is one of the fundamental techniques used in computer graphics.Ray tracing is a rendering technique that traces the path of light through pixels into an image plane.It is very costly to trace rays,especially when millions of rays are included,thus an optimized and fast ray tracing system is essential for global illumination[1].
Motivated by an approximate global illumination system for computer generated films such as Shrek[2],we implement ray tracing optimizations and distributed ray tracing techniques on a non-GPU based application and also implement ray tracing with global illumination effects on a GPU.We started with an existing ray tracing application[3],which provides the very basic functions but not optimized.We then extended it by incorporating several new techniques:bounce management,caching,and converting recursion to iteration.Performance improvements across three versions of the application are measured:the original ray tracer without the KD-tree structure,the original ray tracer with the KD-tree structure,and the optimized ray tracer with the KD-tree structure.We also implemented physically distributing optimized ray tracing application over multi-core processors and machines on a network and advanced ray tracing.Performances between ray tracing on single core and multi-core CPUs have been compared.We finally implemented a GPU ray tracer with several global illumination effects by using an open source rendering tool-Render Man[4].
The remainder of this paper is structured as follows:Section 2introduces the non-GPU-based ray tracer application and the optimizations applied to that application;distributed ray tracing implementations and GPU ray tracer are described in Section 3and Section 4respectively;and Section 5gives a summary of the article.
2011-10-17.
The non-GPU ray tracer is based on a C++ray tracing application[3]consisting of two different types of ray tracers.One ray tracer had no acceleration structure,while the other used a KD-tree acceleration structure.Both ray tracers already came implemented with reflection rays,transmission rays,and shadow feelers.In this section,four techniques (bounce management,caching,iterative ray tracing,and multi-core/network distribution)are applied to attempt to optimize this ray tracing application.All of the four optimizations were implemented on top of the KD-tree accelerated ray tracer.
Fig.1 Built-in scene used to test optimizations圖1 用于測試優(yōu)化的場景
In order to quantify performance improvements,we set up aperformance-measurement infrastructure in all versions of this tracer.This infrastructure was incorporated into the existing application to enable us to observe the improvement in rendering time with respect to trace depth as an independent variable.Our optimizations were tested on a standard scene(see Figure 1)available in the KD-tree accelerated ray tracer.For comparison purposes this built-in scene was also tested in the ray tracer with no acceleration structure.The remaining of this section discusses each of our four optimizations in detail.
Bounce management.Two types of rays were being traced in the existing application:reflection rays and transmission rays.Inspired by the path length optimization used in[2],we have limited the maximum number of bounces of reflection rays.The image quality is virtually unaffected by this optimization,as clearly seen in Figure 2wherein the image on the right is almost identical to the image on the left.In this example,the maximum number of bounces has been limited to 3,which significantly reduces the rendering time.
Fig.2 Image quality comparison between our optimizations(the right image)and the application using only the KD-tree acceleration structure圖2 只用KD-樹加速(左)和經(jīng)優(yōu)化處理過(右)的畫質(zhì)對比
Caching.This optimization caches some repetitive calculations of local or direct illumination.In particular,some calculations of the rendering equation that deal with the ambient light and a material's ambient and emissive colors are cached.Moreover,this cache is not built a priori by pre-processing,but built and maintained dynamically at run-time.
Converting recursion to iteration.The algorithm for ray tracing is recursive in nature.Recursion,however,can be slow for its heavy utilization of the stack.Inspired by[5],we converted the recursive ray tracing algorithm to an iterative one in order to overcome these drawbacks.However,only an average improvement of about 1%in rendering time has been observed(tested on the built-in scene of size 301 × 246 pixels).
The improvements from the three optimizations discussed thus far are shown graphically in Figure 3and Figure 4.The comparison of the ray tracing rendering time is depicted in Figure 3.The improvement in rendering time is noticeable when trace depth is greater than 4.Another fact to notice is that the improvement due to these optimizations is considerably more significant than the improvement due to KD-tree acceleration only.Figure 4shows the percentage improvement in rendering time due to the above optimizations only by excluding the effect of KD-tree acceleration.The optimizations described so far utilized an Intel Pentium M 1.73GHz CPU with 1.25GB of memory to conduct ray tracing.
Fig.3 Rendering time vs.trace depth for three ray tracers圖3 3種光線追蹤器的渲染時(shí)間與追蹤深度的關(guān)系
Fig.4 Percentage improvement in rendering time of our optimization as compared to the KD-tree acceleration alone圖4 基于KD-樹加速優(yōu)化后的性能提高與追蹤深度的關(guān)系
Multi-core/network distributed ray tracer.We physically distributed the ray tracer with the KD-tree to generate the scene in Figure 1.We divided the scene up so that each core or machine on a network would receive a contiguous rectangular region of the viewport through which rays will be sent for tracing.In our machines,as long as the distribution occurred across different processes,each ray tracing process ran on a different core.One interesting result came from tests we ran on two PCs both with different multi-core processors.One PC had an AMD Turion X2(2.0GHz)and the other had AMD Phenom 8400triple-core proces sor.On the Turion X2,we obtained a 20%-40%speed up after rendering the scene in Figure 1on a screen size of 900×675 pixels.For the same scene and scene size,the Phenom 8 400obtained about a 43%speedup.However if the system is distributed across a wired network,we suffered an average increase of 5%in rendering time.
Distributed ray tracing includes some advanced techniques which generate a variety of special effects by increasing the number of rays traced[3].For this obvious reason,distributed ray tracing is much slower than basic ray tracing.In this work,we have implemented antialiasing,depth of field,and soft shadows.Rendering times are given for an AMD Turion X2(2.0GHz)CPU.
Fig.5 A scene showing implementation of depth of field with antialiasing using distributed ray tracing for both effects(Original screen size:600×500,rendering time:35.616s)圖5 分布式光線追蹤實(shí)現(xiàn)的景深和抗鋸齒效果的例子(畫面大小為600×500,渲染時(shí)間為35.616s)
Depth of field.Depth of field is the effect results in sharp images for objects in focus,while those out of focus are blurred.It is commonly seen in photos or films where the scene has a broad range of depth,but it is not supported by the basic ray tracing model.We implemented depth of field ray tracing as an extension to the ray tracer application and applied this technique to a scene we built.This feature is realized by jitte-ring eye positions.Objects near the focal plane result in rays that are sampled closely to each other,while objects off the focal plane will have rays whose pixels are sampled far away from each other.The results of this implementation are depicted in Figure 5.Notice that closest and furthest spheres are blurred and the middle sphere is not blurred because it is on the focal plane.
Antialiasing.In the basic ray tracing model,the color of each pixel is determined by a single ray,thus,it is the color of one point where the ray intersects an object or background.In reality,apixel should cover an area instead of a point,a natural way to do this is to super sample the pixel by sending multiple eye-to-pixel rays around each pixel and get the average color.The position of each sub-pixel by jittering method as shown in the following figure:the points are jittered but they are constrained in each sub-pixel.This effect is shown in Figure 6.
Fig.6 Left image:Noticeable reduction in jagged edges on the leftmost image due to antialiasing;Right image:3×3square region randomly sampled in;each region to select subpixels for eye-to-subpixel rays圖6 左圖:應(yīng)用抗鋸齒后鋸齒緣顯著減少;右圖:在3×3區(qū)域內(nèi)隨機(jī)選擇的亞像素
Soft shadows.In the basic ray tracing model,the light is approximated by apoint light which produces hard shadows.In reality,most lights are either round or rectangular.To implement the feature of natural soft shadows,we cast multiple shadow feelers instead of a single shadow feeler.The multiple shadow feelers are uniformly distributed across the extent of the lights as seen from the point being illuminated.The image of soft shadows for the scene we built in Figure 7depicts soft shadows using an 8×8rectangular sampling region which was large enough to reduce banding.
Fig.7 A scene showing implementation of soft shadows.(Original screen size:301×246,rendering time~35s)圖7 軟陰影效果的例子(畫面大小為301×246,渲染時(shí)間約為35s)
As an example,F(xiàn)igure 8illustrates a scene that combines all these distributed ray tracing effects described in this section.
Fig.8 A scene combining the implementation of depth of field,antialiasing,and soft shadows.(Original screen size:301×246,rendering time:11.170s)圖8 綜合實(shí)現(xiàn)景深,抗鋸齒和軟陰影等效果的例子(畫面大小為301×246,渲染時(shí)間為11.170s)
There are different shading languages to be chosen to implement a GPU version of our ray tracer,such as Microsoft's High Level Shading Language(HLSL),OpenGL Shading Language and PhotoRealistic RenderMan from Pixar.We chose RenderMan because of the better support and fewer limitations compared to the other two.All the scenes were run on an AMD Turion X2(2.0GHz)and an ATI Radeon Xpress 1150 with a built-in GPU.All scenes were initially rendered to a 901×738screen size.To measure rendering time,we utilized Unix's time command.We successfully used this shading language to build scenes with some basic and advanced ray tracing effects.Surface shaders were built to create most of the effects,as illustrated below.
Reflection.Two types of reflections were implemented in RenderMan.Basic reflections with no distributed ray tracing effects were initially implemented.Aliasing occurred,so rather than simply using a technique like the distributed ray tracing version of antialiasing,we implemented glossy reflections to obtain a more interesting effect.Basic reflections were implemented using the recursive ray tracing algorithm developed by Turner Whitted in 1980[6].A surface shader was built that implements this algorithm which was applied to the two spheres and the parallelogram below it that are depicted in Figure 9.Rendering time for this scene was 4.525seconds.Noticeable aliasing occurred in the Whitted-style ray tracing for reflections implemented above.Rather than simply applying the usual antialiasing to give a nicer look to the reflections,we decided to try a distributed ray tracing technique called glossy reflections.A 3×3square region was sampled and reflected rays were perturbed across this region to produce the output shown in Figure 10.Both non-uniform and uniform pseudorandom distributions were used to sample the 3×3region,but no noticeable difference to the naked eye was detected.Figure 11shows the effects of glossy reflections mixed with a sphere using a turbulence surface shader[4]and another using a checkerboard surface shader.
Fig.9 The scene generated by Whitted-style ray tracer on the GPU,the rendering time is 4.525s圖9 基于GPU的Whitted光線追蹤器產(chǎn)生的場景,染時(shí)間為4.525s
Fig.10 The scene generated using glossy reflections implemented atop the Whitted-style ray tracer,the rendering time is 9.134s圖10 應(yīng)用光滑反射后產(chǎn)生的場景,渲染時(shí)間為9.134s
Refraction.We built two different types of refraction shaders just as we did with reflection.Basic refraction and glossy reflection were jointly implemented into a surface shader.We then modified the basic refraction to create translucency.Basic refraction was built on top of the Whitted style surface shader that we built for basic reflection.Transmission rays were calculated to produce that particular effect.The effects are shown in the images to the left of Figures 12and 13.The times to render the left images were 4.450s for Figure 12and 8.971sfor Figure 13.Another surface shader was built to support glossy reflection and translucency simultaneously.The effects of such a shader are noticeable in the images to the right of Figures 12and 13.The glossy-reflective translucent shader was applied to the sphere in Figure 12and the cube in Figure 13.Rendering time was 17.297sfor Figure 12and 61.380sfor Figure 13.A 3×3square region was randomly sampled to produce the translucency effects in both images.The dramatically increased rendering time of translucency in Figure 13is likely due to all the refractions occurring inside the cube.
Fig.11 Each sphere uses a glossy reflection surface shader.The left sphere also uses turbulence texturing,while the right sphere uses a checkerboard texture圖11 兩個(gè)球面均使用有光滑反射的表面著色,左邊的使用了渦流紋理,而右邊的使用了棋盤紋理
Fig.12 Ieft image:A glossy reflective sphere that is refracting the stripes from the patch behind;Right image:The same sphere is shown with translucency effects圖12 左圖:條紋經(jīng)光滑反射球面折射后的圖像;右圖:應(yīng)用半透明效果后的景象
Fig.13 The effects of translucency are more apparent in this figure of a sphere behind a refractive cube圖13 半透明效果:1個(gè)球體置于1個(gè)折射的立方體后的場景
Depth of Field.RenderMan provides built-in support for creating depth of field effects.By specifying the f-stop,focal length,and focal distance a depth of field effect was automatically created for our scenes.In Figure 14,the sphere in the back is on the focal plane and,thus,is in fo-cus.The sphere in front is noticeably blurred due to being out of focus.
Fig.14 Depth of field on the GPU(Rendering time:10.308s)圖14 景深在GPU上的實(shí)現(xiàn)(渲染時(shí)間為10.308s)
Motion blur.RenderMan also provides builtin support for motion blur.By specifying a shutter,a time period,and translations over a time period for an object,motion paths are generated.Figure 15depicts three spheres with motion blur.This scene took 21.338sto render.The background patch and three spheres are all textured with turbulence.The glossy reflections of the spheres produce a sparkly looking trail behind it.Figure 16brings together all the effects that we have implemented in the GPU ray tracer.
Fig.15 Motion blur on a GPU圖15 運(yùn)動模糊在GPU上的實(shí)現(xiàn)
In summary,4different techniques including bounce management,cache,iteration and multicore/network distribution have been applied to improve the performance of the ray tracer based on KD-tree,and it is shown that these optimization techniques significantly reduces the rendering time when the tracing depth is greater than 4.To extend the distributed ray tracer,some effects(depth of field,antialiasing and soft shadows)have been implemented.Finally,ray tracer was implemented on GPU and was used to create some advanced effects,such as reflection,refraction,depth of field and motion blur.
Fig.16 All the GPU-based effects together in one scene圖16 綜合了以上各種基于GPU效果的場景
[1]Wald I,Kollig T,Benthin C,et al.Interactive global illumination using fast ray tracing[A].Debevec P,Gibson S.Rendering Techniques 2002:13th Eurographics Workshop on Rendering[C].New York:ACM Press,2002:15-24.
[2]Tabellion E,Lamorlette A.An approximate global illumination system for computer generated films[J].ACM Transactions on Graphics,2004,23(3):469-476.
[3]Buss S R.3-D Computer Graphics:A Mathematical Introduction with Open GL[M].London:Cambridge University Press,2003.
[4]Cortes D,Raghavachary S.The Render Man Shading Language Guide[M].Boston:Course Technology PTR,2007.
[5]Christen M.Ray tracing on GPU[D].Switzerland:University of Applied Sciences Basel,2005.
[6]Whitted T.An improved illumination model for shaded display[J].Communications of the ACM,1980,23(6):343-349.