SlideShare a Scribd company logo
Visibility Optimization for Games Sampo Lappalainen Lead Programmer Umbra Software Ltd.
Introduction Background in graphics programming Hybrid Graphics, NVIDIA, Umbra Software With Umbra since 2008 Graphics middleware for console and PC games Emphasis on visibility
Roadmap Motivation Theory Practice Other applications Demo
MOTIVATION Why is visibility optimization important?
Game World
Our Villain
Our Hero
Screen Shot
Game Worlds Game developers want to make impressive game worlds Hardware sets limits on what can and can’t be done. Game developers need to push the hardware to it’s limits.
Visibility Optimization The most effective way to gain performance in games. Two basic ways to do visibility optimization: art and level design technology Games use a mix of both.
Visibility Optimization by Level Design Artists design game worlds so that performance is acceptable. Can be done in numerous ways e.g.: limiting view distance limiting polygon or object count modeling portals and cells
Visibility Optimization by Level Design
Visibility Optimization by Level Design Time consuming and usually boring work. Sets huge limits on what can and cannot be done. May lead to monotonic level design. Manual and non-recurring work.
Visibility Optimization by Technology
Visibility Optimization by Technology
Visibility Optimization by Technology Gains: No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc). AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.
THEORY Walkthrough of the key concepts
Terminology Culling – removing hidden objects from rendering Target – object that can be hidden by others Occluder – an object that blocks visibility  Rendering artifact – A non-intended glitch in the output image
Metrics for comparison GPU cost CPU cost Overall frame time Memory usage Precomputation time Manual work Culling power
Backface culling Taken care of by the HW Culling entire triangles based on their winding No need to render the insides of an object
Depth buffering Taken care of by the HW A two dimensional buffer for storing z-values for each screen pixel Before processing shaders for a pixel to be rendered, test the z-value. Allows drawing of unsorted geometry, however sorting still greatly improves performance
Hierarchical  depth buffering Replace depth buffer with a depth  pyramid Bottom of the pyramid: full-resolution depth buffer Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level Hierarchically rasterize the polygon starting from the highest level If polygon is further than the recorded pixel, early exit If polygon is closer, hierarchically test the lower levels If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid
Spatial hierarchies Enabled culling large portions of the game world with a single quick test Dynamic objects can be moved in the hierarchy runtime BSP-tree, kd-tree
Spatial  hierarchies
View frustum culling Culling objects that are outside the camera view cone Test using object bounds Tremendous speed-up using an hierarchy
View Frustum Culling
View Frustum Culling
Potentially Visible Set - PVS A data structure that defines  from-region-visibility  for a scene Computed in pre-process Scene is divided into  Cells Compute a bit matrix that lists all the visible objects for each cell Runtime a simple matrix lookup How to find a good sub-division for a scene? Cannot handle dynamic occluders Target volume: extension to handle dynamic targets
Portals Place portals in the scene that connect the cells to form a  portal graph In runtime, find the portals of the current cell that are in the frustum Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal Same limitations with dynamic objects as with PVS systems
Rasterization-based Render  occluder geometry  into a software coverage buffer Test visibility using  test geometry Use  temporal coherence  to determine the initial set to be rendered Handles both dynamic targets and occluders as long as they have occluder geometry
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Testing from coverage buffer
Occlusion Queries Supported by GPUs since 2001. GPU answers the question: “how many pixels would have been visible if this object would have been rendered”? Instead of rasterizing your own depth buffer, use the GPU depth buffer instead Normally the query is done using bounding volumes (effective but not necessary). No need for artist generated occluder geometry GPU-CPU synchronization needed
Occlusion Queries Determine the set of visible objects against the actual rendered geometry: all pixels can be used as occluding material!
Using Occlusion Queries Occlusion queries are a really powerful tool for visibility optimization. Like all other features of the GPU occlusion queries can be used ineffectively. Special tricks are needed to get the most out of occlusion queries.
Issuing Occlusion Queries disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject();
CPU-GPU synchronization With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing). With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not. The CPU needs to wait for the query results to be available. No parallel processing (which is really bad).
Issuing Occlusion Queries
Issuing Occlusion Queries
Issuing Occlusion Queries
Issuing Occlusion Queries Fortunately GPU design has a solution for this problem. GPUs can store multiple occlusion query results. Occlusion queries can be batched. Some GPUs have a limit on how many query results can be stored.
Batching Occlusion Queries disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); for (each query) { if (query->getResult() > 0) renderObject(); }
Batching Occlusion Queries
Latent Occlusion Queries Some stalls may be introduced between frames. The last query result needs to be read back before continuing. Avoid GPU stalls by using the query results from the previous frame. Read back the query results at the beginning of each frame. Sounds like a perfect solution?
Latent Occlusion Queries
Latent Occlusion Queries There are downsides to this. Visible popping artifacts when objects come visible. If the camera is moving slowly and FPS is good, no problem. When multiple objects become visible FPS typically drops (there’s a lot more to render) For example when a door is opened.
Latent Occlusion Queries
Latent Occlusion Queries
Latent Occlusion Queries
Latent Occlusion Queries Queries done to hierarchy nodes produce even larger artifacts Growing bounds helps, but is difficult to get to work with hierarchical queries The stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360) In this a price developers are ready to pay for artifact free occlusion culling?
Parallelism Most gaming platforms today come with more than one CPU Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources) Tile-based rasterization Parallel data structure traverse
PRACTICE What kind of systems have really been used?
Binary Space Partitioning As made famous by Doom and the Quake series A tree data structure for representing the scene Gordon and Chen 1991 paper used in Doom ( http://www.rothschild.haifa.ac.il/~gordon/ftb-bsp.pdf ) Teller’s 1992 PhD thesis used in Quake ( http://people.csail.mit.edu/seth/pubs/pubs.html )
Binary space partitioning Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front) Painter’s algorithm is too slow for large scenes Solution: change the order to front-to-back and keep track on which pixels have been drawn Quake introduced a pre-process step for computing a PVS based on the BSP model
Umbra   1 Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2 A data structure that supports dynamic and static visibility Software rasterizer and occlusion queries supported
Umbra   1 Database Spatial bounding volume hierarchy User updates Visibility traverse Input: camera parameters Output: visible object set Hierarchical visibility testing: a single query can hide large parts of the scene
Hierarchical Culling In typical game scenes most of the scene is hidden at any given point of view Problem: the size of the whole scene effects performance ( input sensitive system ). Only the visible objects are supposed to effect performance ( output sensitive system ).
Hierarchical Culling
Hierarchical Culling Solution: build a spatial hierarchy for the objects in the scene Culling hidden parts of the scene in constant time Occlude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden
Hierarchy Traversal Traverse the hierarchy to determine visibility Use temporal coherency On first frame, start from the root Store nodes where traversal ended and start traversing them on the next frame Nodes form a visibility barrier
Hierarchy Traversal
Hierarchy Traversal
Hierarchy Traversal
Dynamic Objects Object geometry may change (e.g. due to LODing). Objects may move If object geometry changes it may not fit into its old bounds Move the object upwards in the hierarchy so that the bounds can fit inside a node Push the object back down once there is idle time
Dynamic Objects If the object moves temporal bounding volumes can be used. Use history info to predict the object movement. The TBV doesn’t have to be updated every frame.
Dynamic Objects
Dynamic Objects
Umbra   2 Multi-core version of the previous tech Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake
Multi-core culling Two subtasks: rendering and visibility traversal Rendering issues rendering calls and occlusion queries. Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).
Multi-core culling Game tread needs to do updates before our visibility thread can continue (camera and object updates) Visibility thread updates the hierarchy After update the hierarchy can be traversed
Multi-core culling
Multi-core culling While the visibility thread is idle it can update the hierarchy: lazy hierarchy building collapsing nodes visibility barrier updates moving dynamic objects down etc.
Umbra   3 Used by Unity 3D,  Secret Studio Collection of visibility algorithms Umbra 1-2 feature sets Automatic portal generation in pre-process CPU rasterization and ray-tracing based portal culling algorithms PVS culling for low end systems
Umbra   3 Uses real geometry, no need for artists to create occluder geometry Support for streaming, distance queries, intersection queries
Automatic portal generation Works with both outdoor and indoor scenes Conservative occlusion The output is a graph where the nodes are cells and the edges are the portals Optionally a PVS can be computed Incremental updates
Umbra   3 recursive portal culling Recursive traverse of the portal graph from the camera view point, ray tracing Very accurate culling results Too slow for whole scene culling, currently used for reference and for dynamic object culling
 
Umbra   3 optimized portal culling Rasterize the portals into a coverage buffer Fast enough for even outdoor scenes In some cases over-estimates the visible set
 
Umbra   3 PVS culling Extremely fast Needed for low end systems such as smart phones Can be used to determine visibility for e.g. hunderds of AI bots The longer time spent computing, the more accurate the result
Killzone 3 See ”Practical occlusion culling for PS3”:  http://gdcvault.com/play/1014356/Practical-Occlusion-Culling-on Solution implemented spesifically for PlayStation 3 Rasterizes a 720p tiled depth buffer on the SPU’s Performs occlusion tests to a downsampled depth buffer using object bounds Occluder mesh selection done by artists
Battlefield 3 See ”Culling the Battlefield”:  http://publications.dice.se/attachments/ CullingTheBattlefield .pdf A cross-platform (XBOX360, PS3, PC) solution SIMD optimized frustum culling Software rasterizer for occlusion culling done to a 256x116  depth buffer Occluder geometry hand made by artists
OTHER APPLICATIONS What else can I use it for?
Lighting & shadows When applied from a light sources point of view a visibility algorithm can be used for finding shadow casters ” Shadow Caster Occlusion Culling for Efficient Shadow mapping” ( http://www.cg.tuwien.ac.at/research/publications/2011/bittner-2011-scc/bittner-2011-scc-paper.pdf )
Streaming Large game worlds have so much content that it cannot fit in the memory of a gaming platform Loading between zones takes away immersion A from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media
AI A visibility algorithm can be used to drive AI logic Data structures used in visibility determination can be modified to be used for distance or intersection testing
Sound occlusion Distance and intersection tests can be used to simulate the behaviour of sound Precomputing visibility and audio have a lot of overlap and make for an interesting field of study
FIN Sampo Lappalainen [email_address] http://www.umbra3.com

More Related Content

PPTX
Siggraph 2011: Occlusion culling in Alan Wake
PPT
Crysis Next-Gen Effects (GDC 2008)
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPT
Secrets of CryENGINE 3 Graphics Technology
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
Lighting Shading by John Hable
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
PPT
A Bit More Deferred Cry Engine3
Siggraph 2011: Occlusion culling in Alan Wake
Crysis Next-Gen Effects (GDC 2008)
Optimizing the Graphics Pipeline with Compute, GDC 2016
Secrets of CryENGINE 3 Graphics Technology
OpenGL 4.4 - Scene Rendering Techniques
Lighting Shading by John Hable
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
A Bit More Deferred Cry Engine3

What's hot (20)

PPTX
Lighting the City of Glass
PDF
Bindless Deferred Decals in The Surge 2
PPTX
The Rendering Pipeline - Challenges & Next Steps
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PDF
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
PPT
The Unique Lighting of Mirror's Edge
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPTX
The Rendering Technology of Killzone 2
PPT
Light prepass
PDF
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
PPTX
A Scalable Real-Time Many-Shadowed-Light Rendering System
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PPTX
Approaching zero driver overhead
PPTX
Terrain in Battlefield 3: A Modern, Complete and Scalable System
PPT
Bending the Graphics Pipeline
PPTX
Lighting you up in Battlefield 3
PDF
Screen Space Reflections in The Surge
PPTX
Hable John Uncharted2 Hdr Lighting
PPTX
DirectX 11 Rendering in Battlefield 3
Lighting the City of Glass
Bindless Deferred Decals in The Surge 2
The Rendering Pipeline - Challenges & Next Steps
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
The Unique Lighting of Mirror's Edge
Rendering Technologies from Crysis 3 (GDC 2013)
The Rendering Technology of Killzone 2
Light prepass
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
A Scalable Real-Time Many-Shadowed-Light Rendering System
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Approaching zero driver overhead
Terrain in Battlefield 3: A Modern, Complete and Scalable System
Bending the Graphics Pipeline
Lighting you up in Battlefield 3
Screen Space Reflections in The Surge
Hable John Uncharted2 Hdr Lighting
DirectX 11 Rendering in Battlefield 3
Ad

Viewers also liked (20)

PPTX
Game Worlds from Polygon Soup: Visibility, Spatial Connectivity and Rendering
PDF
Suggested Enrichment Program Using Cinderella (DGS) in Developing Geometric C...
PDF
The Effectiveness of an Enrichment Program Using Dynamic Geometry Software in...
PPTX
Presentation geogebra 250912
PDF
BSBG SLIDESHARE PRESENTATION
PDF
Poster eridob 2016
PDF
DieZeit-Konferenz 'Schule der Zukunft - alles digital?
PPT
AS Supporting Teaching and Learning of Linear Algebra
DOCX
PPT
PPT
Enhancement of e-learning in geomatics by the integration of dynamic mathemat...
PPTX
Studying Learning Expeditions in Crossactionspaces with Digital Didactical De...
PPT
Geogebra by Mr. L
PPTX
Geometry Softwares
PPT
Interactive Animation And Modeling By Drawing - Pedagogical Applications In M...
PPTX
Designing creative electronic books for mathematical creativity
PPT
ICME 2016 presentation
PPTX
El proceso de emprender
PDF
Digital Resources to Enhance Creative Mathematical Thinking in a Biomathemati...
PDF
Mandalas 3º (GeoGebra)
Game Worlds from Polygon Soup: Visibility, Spatial Connectivity and Rendering
Suggested Enrichment Program Using Cinderella (DGS) in Developing Geometric C...
The Effectiveness of an Enrichment Program Using Dynamic Geometry Software in...
Presentation geogebra 250912
BSBG SLIDESHARE PRESENTATION
Poster eridob 2016
DieZeit-Konferenz 'Schule der Zukunft - alles digital?
AS Supporting Teaching and Learning of Linear Algebra
Enhancement of e-learning in geomatics by the integration of dynamic mathemat...
Studying Learning Expeditions in Crossactionspaces with Digital Didactical De...
Geogebra by Mr. L
Geometry Softwares
Interactive Animation And Modeling By Drawing - Pedagogical Applications In M...
Designing creative electronic books for mathematical creativity
ICME 2016 presentation
El proceso de emprender
Digital Resources to Enhance Creative Mathematical Thinking in a Biomathemati...
Mandalas 3º (GeoGebra)
Ad

Similar to Visibility Optimization for Games (20)

PDF
Introduction occlusion
PDF
Hill Stephen Rendering Tools Splinter Cell Conviction
PDF
Game Engine Overview
PPT
visible surface detection in 3D objects for viewing
PDF
NVIDIA effects GDC09
PPT
Introduction To Massive Model Visualization
PPT
3D Graphics
PPT
hidden surface removal in computer graphics
PPT
2IV60_11_hidden_surfaces (6).ppt
PPT
Implementation
PDF
Rendering Large Models in the Browser in Real-Time
PPT
visible surface detection
PPT
PPTX
Decima Engine: Visibility in Horizon Zero Dawn
PPT
Computer graphics iv unit
PDF
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
PDF
PhD defense talk (portfolio of my expertise)
PPTX
Masked Occlusion Culling
PDF
Masked Software Occlusion Culling
PDF
Graphics
Introduction occlusion
Hill Stephen Rendering Tools Splinter Cell Conviction
Game Engine Overview
visible surface detection in 3D objects for viewing
NVIDIA effects GDC09
Introduction To Massive Model Visualization
3D Graphics
hidden surface removal in computer graphics
2IV60_11_hidden_surfaces (6).ppt
Implementation
Rendering Large Models in the Browser in Real-Time
visible surface detection
Decima Engine: Visibility in Horizon Zero Dawn
Computer graphics iv unit
Software Defined Visualization (SDVis): Get the Most Out of ParaView* with OS...
PhD defense talk (portfolio of my expertise)
Masked Occlusion Culling
Masked Software Occlusion Culling
Graphics

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
NewMind AI Monthly Chronicles - July 2025
Review of recent advances in non-invasive hemoglobin estimation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Visibility Optimization for Games

  • 1. Visibility Optimization for Games Sampo Lappalainen Lead Programmer Umbra Software Ltd.
  • 2. Introduction Background in graphics programming Hybrid Graphics, NVIDIA, Umbra Software With Umbra since 2008 Graphics middleware for console and PC games Emphasis on visibility
  • 3. Roadmap Motivation Theory Practice Other applications Demo
  • 4. MOTIVATION Why is visibility optimization important?
  • 9. Game Worlds Game developers want to make impressive game worlds Hardware sets limits on what can and can’t be done. Game developers need to push the hardware to it’s limits.
  • 10. Visibility Optimization The most effective way to gain performance in games. Two basic ways to do visibility optimization: art and level design technology Games use a mix of both.
  • 11. Visibility Optimization by Level Design Artists design game worlds so that performance is acceptable. Can be done in numerous ways e.g.: limiting view distance limiting polygon or object count modeling portals and cells
  • 13. Visibility Optimization by Level Design Time consuming and usually boring work. Sets huge limits on what can and cannot be done. May lead to monotonic level design. Manual and non-recurring work.
  • 16. Visibility Optimization by Technology Gains: No time wasted on rendering objects that don’t contribute to the output image (no state changes, no draw calls etc). AI, physics, game logic etc. can be done at lower accuracy (or skipped all together) for hidden objects.
  • 17. THEORY Walkthrough of the key concepts
  • 18. Terminology Culling – removing hidden objects from rendering Target – object that can be hidden by others Occluder – an object that blocks visibility Rendering artifact – A non-intended glitch in the output image
  • 19. Metrics for comparison GPU cost CPU cost Overall frame time Memory usage Precomputation time Manual work Culling power
  • 20. Backface culling Taken care of by the HW Culling entire triangles based on their winding No need to render the insides of an object
  • 21. Depth buffering Taken care of by the HW A two dimensional buffer for storing z-values for each screen pixel Before processing shaders for a pixel to be rendered, test the z-value. Allows drawing of unsorted geometry, however sorting still greatly improves performance
  • 22. Hierarchical depth buffering Replace depth buffer with a depth pyramid Bottom of the pyramid: full-resolution depth buffer Higher levels: smaller resolution depth buffers where a single pixel represents the maximum z-value in a group of pixels in the below level Hierarchically rasterize the polygon starting from the highest level If polygon is further than the recorded pixel, early exit If polygon is closer, hierarchically test the lower levels If the bottom of the pyramid is reached and the polygon is still closer, propagate the value up the pyramid
  • 23. Spatial hierarchies Enabled culling large portions of the game world with a single quick test Dynamic objects can be moved in the hierarchy runtime BSP-tree, kd-tree
  • 25. View frustum culling Culling objects that are outside the camera view cone Test using object bounds Tremendous speed-up using an hierarchy
  • 28. Potentially Visible Set - PVS A data structure that defines from-region-visibility for a scene Computed in pre-process Scene is divided into Cells Compute a bit matrix that lists all the visible objects for each cell Runtime a simple matrix lookup How to find a good sub-division for a scene? Cannot handle dynamic occluders Target volume: extension to handle dynamic targets
  • 29. Portals Place portals in the scene that connect the cells to form a portal graph In runtime, find the portals of the current cell that are in the frustum Traverse through all found portals to the adjacent cells and find all portals that are visible to the camera through the original portal Same limitations with dynamic objects as with PVS systems
  • 30. Rasterization-based Render occluder geometry into a software coverage buffer Test visibility using test geometry Use temporal coherence to determine the initial set to be rendered Handles both dynamic targets and occluders as long as they have occluder geometry
  • 39. Occlusion Queries Supported by GPUs since 2001. GPU answers the question: “how many pixels would have been visible if this object would have been rendered”? Instead of rasterizing your own depth buffer, use the GPU depth buffer instead Normally the query is done using bounding volumes (effective but not necessary). No need for artist generated occluder geometry GPU-CPU synchronization needed
  • 40. Occlusion Queries Determine the set of visible objects against the actual rendered geometry: all pixels can be used as occluding material!
  • 41. Using Occlusion Queries Occlusion queries are a really powerful tool for visibility optimization. Like all other features of the GPU occlusion queries can be used ineffectively. Special tricks are needed to get the most out of occlusion queries.
  • 42. Issuing Occlusion Queries disableColorWrite(); disableDepthWrite(); startQueryCounter(); renderObjectBounds(); stopQueryCounter(); enableColorWrite(); enableDepthWrite(); if (query->getResult() > 0) renderObject();
  • 43. CPU-GPU synchronization With normal draw calls the CPU issues a command to the GPU and can continue processing as usual (Parallel processing). With occlusion queries the CPU needs to get query results back to be able to know if the object was visible or not. The CPU needs to wait for the query results to be available. No parallel processing (which is really bad).
  • 47. Issuing Occlusion Queries Fortunately GPU design has a solution for this problem. GPUs can store multiple occlusion query results. Occlusion queries can be batched. Some GPUs have a limit on how many query results can be stored.
  • 48. Batching Occlusion Queries disableColorWrite(); disableDepthWrite(); for (each query) { startQueryCounter(); renderObjectBounds(); stopQueryCounter(); } enableDepthWrite(); enableColorWrite(); for (each query) { if (query->getResult() > 0) renderObject(); }
  • 50. Latent Occlusion Queries Some stalls may be introduced between frames. The last query result needs to be read back before continuing. Avoid GPU stalls by using the query results from the previous frame. Read back the query results at the beginning of each frame. Sounds like a perfect solution?
  • 52. Latent Occlusion Queries There are downsides to this. Visible popping artifacts when objects come visible. If the camera is moving slowly and FPS is good, no problem. When multiple objects become visible FPS typically drops (there’s a lot more to render) For example when a door is opened.
  • 56. Latent Occlusion Queries Queries done to hierarchy nodes produce even larger artifacts Growing bounds helps, but is difficult to get to work with hierarchical queries The stall in using occlusion query results on the same frame may be as short as 0.1ms (on XBOX 360) In this a price developers are ready to pay for artifact free occlusion culling?
  • 57. Parallelism Most gaming platforms today come with more than one CPU Using the same algorithm for multiple cameras (splitscreen, AI bots, light sources) Tile-based rasterization Parallel data structure traverse
  • 58. PRACTICE What kind of systems have really been used?
  • 59. Binary Space Partitioning As made famous by Doom and the Quake series A tree data structure for representing the scene Gordon and Chen 1991 paper used in Doom ( http://www.rothschild.haifa.ac.il/~gordon/ftb-bsp.pdf ) Teller’s 1992 PhD thesis used in Quake ( http://people.csail.mit.edu/seth/pubs/pubs.html )
  • 60. Binary space partitioning Before Doom BSP’s were used to do sorting for the painter’s algorithm (back-to-front) Painter’s algorithm is too slow for large scenes Solution: change the order to front-to-back and keep track on which pixels have been drawn Quake introduced a pre-process step for computing a PVS based on the BSP model
  • 61. Umbra   1 Used in Star Wars Galaxies, EverQuest 2, Age of Conan, Kingdom Heroes 2, Tian Xia 2 A data structure that supports dynamic and static visibility Software rasterizer and occlusion queries supported
  • 62. Umbra   1 Database Spatial bounding volume hierarchy User updates Visibility traverse Input: camera parameters Output: visible object set Hierarchical visibility testing: a single query can hide large parts of the scene
  • 63. Hierarchical Culling In typical game scenes most of the scene is hidden at any given point of view Problem: the size of the whole scene effects performance ( input sensitive system ). Only the visible objects are supposed to effect performance ( output sensitive system ).
  • 65. Hierarchical Culling Solution: build a spatial hierarchy for the objects in the scene Culling hidden parts of the scene in constant time Occlude groups of objects: if a hierarchy node is hidden all nodes below it are also hidden
  • 66. Hierarchy Traversal Traverse the hierarchy to determine visibility Use temporal coherency On first frame, start from the root Store nodes where traversal ended and start traversing them on the next frame Nodes form a visibility barrier
  • 70. Dynamic Objects Object geometry may change (e.g. due to LODing). Objects may move If object geometry changes it may not fit into its old bounds Move the object upwards in the hierarchy so that the bounds can fit inside a node Push the object back down once there is idle time
  • 71. Dynamic Objects If the object moves temporal bounding volumes can be used. Use history info to predict the object movement. The TBV doesn’t have to be updated every frame.
  • 74. Umbra   2 Multi-core version of the previous tech Used in e.g. Mass Effect 2, Dragon Age series, Alan Wake
  • 75. Multi-core culling Two subtasks: rendering and visibility traversal Rendering issues rendering calls and occlusion queries. Visibility processing takes care of hierarchy processing and high level culling (e.g. vf culling).
  • 76. Multi-core culling Game tread needs to do updates before our visibility thread can continue (camera and object updates) Visibility thread updates the hierarchy After update the hierarchy can be traversed
  • 78. Multi-core culling While the visibility thread is idle it can update the hierarchy: lazy hierarchy building collapsing nodes visibility barrier updates moving dynamic objects down etc.
  • 79. Umbra   3 Used by Unity 3D, Secret Studio Collection of visibility algorithms Umbra 1-2 feature sets Automatic portal generation in pre-process CPU rasterization and ray-tracing based portal culling algorithms PVS culling for low end systems
  • 80. Umbra   3 Uses real geometry, no need for artists to create occluder geometry Support for streaming, distance queries, intersection queries
  • 81. Automatic portal generation Works with both outdoor and indoor scenes Conservative occlusion The output is a graph where the nodes are cells and the edges are the portals Optionally a PVS can be computed Incremental updates
  • 82. Umbra   3 recursive portal culling Recursive traverse of the portal graph from the camera view point, ray tracing Very accurate culling results Too slow for whole scene culling, currently used for reference and for dynamic object culling
  • 83.  
  • 84. Umbra   3 optimized portal culling Rasterize the portals into a coverage buffer Fast enough for even outdoor scenes In some cases over-estimates the visible set
  • 85.  
  • 86. Umbra   3 PVS culling Extremely fast Needed for low end systems such as smart phones Can be used to determine visibility for e.g. hunderds of AI bots The longer time spent computing, the more accurate the result
  • 87. Killzone 3 See ”Practical occlusion culling for PS3”: http://gdcvault.com/play/1014356/Practical-Occlusion-Culling-on Solution implemented spesifically for PlayStation 3 Rasterizes a 720p tiled depth buffer on the SPU’s Performs occlusion tests to a downsampled depth buffer using object bounds Occluder mesh selection done by artists
  • 88. Battlefield 3 See ”Culling the Battlefield”: http://publications.dice.se/attachments/ CullingTheBattlefield .pdf A cross-platform (XBOX360, PS3, PC) solution SIMD optimized frustum culling Software rasterizer for occlusion culling done to a 256x116 depth buffer Occluder geometry hand made by artists
  • 89. OTHER APPLICATIONS What else can I use it for?
  • 90. Lighting & shadows When applied from a light sources point of view a visibility algorithm can be used for finding shadow casters ” Shadow Caster Occlusion Culling for Efficient Shadow mapping” ( http://www.cg.tuwien.ac.at/research/publications/2011/bittner-2011-scc/bittner-2011-scc-paper.pdf )
  • 91. Streaming Large game worlds have so much content that it cannot fit in the memory of a gaming platform Loading between zones takes away immersion A from-region visibility algorithm can be used to do visibility-based streaming over the network or from a storage media
  • 92. AI A visibility algorithm can be used to drive AI logic Data structures used in visibility determination can be modified to be used for distance or intersection testing
  • 93. Sound occlusion Distance and intersection tests can be used to simulate the behaviour of sound Precomputing visibility and audio have a lot of overlap and make for an interesting field of study
  • 94. FIN Sampo Lappalainen [email_address] http://www.umbra3.com

Editor's Notes

  • #8: Tehdään grafiikka moottori joka piirtää kamaa ruudulle -> helppoa. Artsti tekee modeleita -> modelit annetaan graffamoottorille ja piirretään. Inskät optimoi graffamoottoria ja artistit optimoi graffaa kunnes performance on kunnossa. Sit päädytään tähän tilanteeseen... 08/31/11 12:04
  • #9: Miten päädyttiin alkuperäsestä tilanteesta tähän? Tekki rajotti toimintaa niin paljon, että tää oli parasta mitä saatiin aikaan. 08/31/11 12:04
  • #10: Pelidevaajat tekee tekkiä jotta pelit saatas näyttämään hyvältä. Artistit pystyis tekemään hienompaa kamaa. Ongelmana ei oo piirtää hienoa grafiikkaa, ongelmana on piirtää hienoa grafiikkaa tarpeeks nopeesti. 08/31/11 12:04
  • #14: 08/31/11 12:04
  • #23: TODO kuva TODO viite
  • #24: TODO rename slide TODO pictures from Teppo’s presentation
  • #26: TODO code?
  • #27: VFn sisällä on vielä paljon cullattavaa. 08/31/11 12:04
  • #28: 08/31/11 12:04
  • #29: TODO lähteet
  • #30: TODO lähteet
  • #41: 08/31/11 12:04
  • #45: Objekti 3 on tullu just näkyviin. 08/31/11 12:04
  • #58: TODO note about SIMD? TODO MORE BEEF!
  • #60: TODO rethink
  • #61: TODO rethink
  • #63: TODO describe how it works
  • #71: Esimerkki seuraa. 08/31/11 12:04
  • #72: 08/31/11 12:04
  • #84: TODO Video
  • #85: TODO kuva miten toimii oikeasti
  • #87: TODO kuva portal vs pvs culling
  • #88: TODO link to paper