Langbahn Team – Weltmeisterschaft

Order-independent transparency

Order-independent transparency (OIT) is a class of techniques in rasterisational computer graphics for rendering transparency in a 3D scene, which do not require rendering geometry in sorted order for alpha compositing.

Description

Commonly, 3D geometry with transparency is rendered by blending (using alpha compositing) all surfaces into a single buffer (think of this as a canvas). Each surface occludes existing color and adds some of its own color depending on its alpha value, a ratio of light transmittance. The order in which surfaces are blended affects the total occlusion or visibility of each surface. For a correct result, surfaces must be blended from farthest to nearest or nearest to farthest, depending on the alpha compositing operation, over or under. Ordering may be achieved by rendering the geometry in sorted order, for example sorting triangles by depth, but can take a significant amount of time, not always produce a solution (in the case of intersecting or circularly overlapping geometry) and the implementation is complex. Instead, order-independent transparency sorts geometry per-pixel, after rasterisation. For exact results this requires storing all fragments before sorting and compositing.

History

The A-buffer is a computer graphics technique introduced in 1984 which stores per-pixel lists of fragment data (including micro-polygon information) in a software rasteriser, REYES, originally designed for anti-aliasing but also supporting transparency.

More recently, depth peeling[1] in 2001 described a hardware accelerated OIT technique. With limitations in graphics hardware the scene's geometry had to be rendered many times. A number of techniques have followed, to improve on the performance of depth peeling, still with the many-pass rendering limitation. For example, Dual Depth Peeling (2008).[2]

In 2009, two significant features were introduced in GPU hardware/drivers/Graphics APIs that allowed capturing and storing fragment data in a single rendering pass of the scene, something not previously possible. These are, the ability to write to arbitrary GPU memory from shaders and atomic operations. With these features a new class of OIT techniques became possible that do not require many rendering passes of the scene's geometry.

  • The first was storing the fragment data in a 3D array,[3] where fragments are stored along the z dimension for each pixel x/y. In practice, most of the 3D array is unused or overflows, as a scene's depth complexity is typically uneven. To avoid overflow the 3D array requires large amounts of memory, which in many cases is impractical.
  • Two approaches to reducing this memory overhead exist.
    1. Packing the 3D array with a prefix sum scan, or linearizing,[4] removed the unused memory issue but requires an additional depth complexity computation rendering pass of the geometry. The "Sparsity-aware" S-Buffer, Dynamic Fragment Buffer,[5] "deque" D-Buffer[citation needed], Linearized Layered Fragment Buffer[6] all pack fragment data with a prefix sum scan and are demonstrated with OIT.
    2. Storing fragments in per-pixel linked lists[7] provides tight packing of this data and in late 2011, driver improvements reduced the atomic operation contention overhead making the technique very competitive.[6]

Exact OIT

Exact, as opposed to approximate, OIT accurately computes the final color, for which all fragments must be sorted. For high depth complexity scenes, sorting becomes the bottleneck.

One issue with the sorting stage is local memory limited occupancy, in this case a SIMT attribute relating to the throughput and operation latency hiding of GPUs. Backwards memory allocation[8] (BMA) groups pixels by their depth complexity and sorts them in batches to improve the occupancy and hence performance of low depth complexity pixels in the context of a potentially high depth complexity scene. Up to a 3× overall OIT performance increase is reported.

Sorting is typically performed in a local array, however performance can be improved further by making use of the GPU's memory hierarchy and sorting in registers,[9] similarly to an external merge sort, especially in conjunction with BMA.

Approximate OIT

Approximate OIT techniques relax the constraint of exact rendering to provide faster results. Higher performance can be gained from not having to store all fragments or only partially sorting the geometry. A number of techniques also compress, or reduce, the fragment data. These include:

  • Stochastic Transparency: draw in a higher resolution in full opacity but discard some fragments. Downsampling will then yield transparency.[10]
  • Adaptive Transparency,[11] a two-pass technique where the first constructs a visibility function which compresses on the fly (this compression avoids having to fully sort the fragments) and the second uses this data to composite unordered fragments. Intel's pixel synchronization[12] avoids the need to store all fragments, removing the unbounded memory requirement of many other OIT techniques.
  • Weighted Blended Order-Independent Transparency replaced the over operator with a commutative approximation. Feeding depth information into the weight produces visually-acceptable occlusion.[13]

OIT in Hardware

  • The Sega Dreamcast games console included hardware support for automatic OIT.[14]

See also

References

  1. ^ Everitt, Cass (2001-05-15). "Interactive Order-Independent Transparency". Nvidia. Archived from the original on 2011-09-27. Retrieved 2008-10-12.
  2. ^ Bavoil, Louis (February 2008). "Order Independent Transparency with Dual Depth Peeling" (PDF). Nvidia. Retrieved 2013-03-12.
  3. ^ Fang Liu, Meng-Cheng Huang, Xue-Hui Liu, and En-Hua Wu. "Single pass depth peeling via CUDA rasterizer", In SIGGRAPH 2009: Talks (SIGGRAPH '09), 2009
  4. ^ Craig Peeper. "Prefix sum pass to linearize A-buffer storage", Patent application, Dec, 2008
  5. ^ Marilena Maule and João L.D. Comba and Rafael Torchelsen and Rui Bastos. "Memory-optimized order-independent transparency with Dynamic Fragment Buffer ", In Computers & Graphics, 2014.
  6. ^ a b Pyarelal Knowles, Geoff Leach and Fabio Zambetta. "Chapter 20: Efficient Layered Fragment Buffer Techniques", OpenGL Insights, pages 279-292, Editors Cozzi and Riccio, CRC Press, 2012
  7. ^ Jason C. Yang, Justin Hensley, Holger Grün, and Nicolas Thibieroz. "Real-time concurrent linked list construction on the GPU", In Proceedings of the 21st Eurographics conference on Rendering (EGSR'10), 2010
  8. ^ Knowles; et al. (Oct 2013). "Backwards Memory Allocation and Improved OIT" (PDF). Eurographics Digital Library. Archived from the original (PDF) on 2014-03-02. Retrieved 2014-01-21.
  9. ^ Knowles; et al. (June 2014). "Fast Sorting for Exact OIT of Complex Scenes" (PDF). Springer Berlin Heidelberg. Archived from the original (PDF) on 2014-08-09. Retrieved 2014-08-05.
  10. ^ Enderton, Eric (n.d.). "Stochastic Transparency" (PDF). IEEE Transactions on Visualization and Computer Graphics. 17 (8). Nvidia: 1036–1047. doi:10.1109/TVCG.2010.123. PMID 20921587. Retrieved 2013-03-12.
  11. ^ Salvi; et al. (2013-07-18). "Adaptive Transparency". Retrieved 2014-01-21.
  12. ^ Davies, Leigh (2013-07-18). "Order-Independent Transparency Approximation with Pixel Synchronization". Intel. Retrieved 2014-01-21.
  13. ^ McGuire, Morgan; Bavoil, Louis (2013). "Weighted Blended Order-Independent Transparency". Journal of Computer Graphics Techniques. 2 (2): 122–141.
  14. ^ "Optimizing Dreamcast Microsoft Direct3D Performance". Microsoft. 1999-03-01.