SlideShare a Scribd company logo
Embarrassingly
Parallel Computation
for Visibility
Jasin Bushnaief
Umbra Software
Who are we?
• The only occlusion culling middleware
  company in the world
• Founded in 2006
• Based in Helsinki
• 12 people
• Customers: Bungie (Halo), Guerrilla (Killzone),
  Remedy (Alan Wake), Bioware (Mass Effect),
  CD Projekt (Witcher), ArenaNet (Guild Wars)
  and many more
We’re going to talk about
• The past
  – Brief introduction to occlusion culling
  – Traditional methods of visibility computation
• The present
  – Umbra’s visibility computation algorithm
  – How it can be distributed
• The future
  – Challenges of modern games and engines
The Past:

SO, WHAT’S OCCLUSION CULLING
ANYWAY?
Graphics in games
• Game development process:
  – Artists create content
  – Engine runtime renders it
• Rendering
  – Content consists of objects
  – Which consist of triangles
  – Which get rendered by the GPU
• Our business: rendering optimization
Occlusion culling explained
• ”Culling is the process of removing breeding
  animals from a group based on specific criteria.”
  (Wikipedia)
• Hidden surface removal: ”Which surfaces do not
  contribute to the final rendered image on the
  screen?”
• Some popular HSR methods:
  – Frustum culling
  – Backface culling
  – Occlusion culling
Occlusion culling explained
• Occlusion culling: ”Which surfaces are blocked
  (occluded) by other surfaces?”
• Depth buffering is one way to do OC
  – Very accurate (i.e. pixel level)
  – Ubiquitous on hardware, easy problem to solve
  – Occurs very late in the pipeline
Occlusion culling explained
• Higher-level methods complement depth-
  buffering nicely
• These cull entire objects, groups of objects or
  entire sections of the scene
  – Not easy!
• The earlier, the better
Occlusion culling




Only the objects visible to
the camera are rendered
”Traditional” way to do OC
• Preprocess:
  – Divide scene into cells
  – Compute visibility between cells
     • Results in a visibility matrix (PVS)
• Runtime:
  – Locate the camera
  – Do a lookup into the PVS matrix
Simple example
Split scene into cells

 A            B


 C            D
Compute visibility (sampling)

     A            B
                             A B   C   D
                         A 1   1   1   0
                         B
                         C
                         D

     C            D
Compute visibility

A           B
                         A B     C   D
                     A 1     1   1   0
                     B   1   1   0   1
                     C
                     D

C           D
Compute visibility

A           B
                         A B     C   D
                     A 1     1   1   0
                     B   1   1   0   1
                     C   1   0   1   1
                     D

C           D
Compute visibility

A           B
                         A B     C   D
                     A 1     1   1   0
                     B   1   1   0   1
                     C   1   0   1   1
                     D 0     1   1   1

C           D
Runtime PVS culling

A           B
                          A B     C   D
                      A 1     1   1   0
                      B   1   1   0   1
                      C   1   0   1   1
                      D 0     1   1   1

C           D
Problem?
• Solving visibility between cells is very difficult
   – E.g. Solving analytically is actually O(n4)
• Global operation by nature
• Doesn’t play well with dynamic scenes
   – Worst case: a change in one cell requires
     recomputation of the entire matrix
The Present

UMBRA DOES IT BETTER
Welcome to the 2010s
• Modern game worlds are huge
• So it’d be cool if you didn’t need the entire
  scene in memory, ever
• It’d be even cooler if the heavy lifting could be
  distributed. Or sent to the Cloud™
• Buildings collapse. Things change.
The Umbra approach
• Don’t actually compute visibility for the entire
  scene
• Instead, process geometry to create a
  datastructure to solve visibility in the runtime
• Portal culling in the runtime
Data generation
• Data = portal graph
• Generate local graphs individually reasonably-
  sized geometry chunks (tiles), in parallel
• Combine the results into a global portal graph
  that can be quickly traversed
• Solve visibility quickly in the runtime using this
  graph
Will this work?
• Portal generation
  – Is very hard, but possible to do automatically
  – Only local geometry needed
  →Pretty much an embarrassingly parallel problem
• Runtime
  – Not as simple as a PVS lookup, but still quite fast
Simple example revisited
Split geometry into tiles
Dispatch tiles to worker nodes


 Tile 0   Tile 1   Tile 2   Tile 3
Generate portals


Tile 0     Tile 1   Tile 2   Tile 3
Combine portal graph
Runtime query: traverse portals
What did we do here?
 • Essentially a map-reduce
        – Split scene into distributable tiles
        – Generate local portal graph for each tile
        – Combine results, link global portal graph
                                                                   Runtime
Scene                Tile 0   Portals 0            Global portal             Visible
                                                   graph                     objects


                                          Reduce
                     Tile 1   Portals 1




                                                                   Query
               Map




                        ...      ...

                     Tile n   Portals n
The Future

THE NEXT GENERATION
Turns out...
• Even the initial ”map” is too much for large
  game worlds
• A global graph of a vast world is too expensive
  in the runtime
• You need to support multiple versions of some
  chunks for dynamic content
  – Quite a combinatorial problem
→ Next-gen games require an even better
solution!
So we did something like this
                               Runtime
          Tile 0   Portals 0             Graph A           Visible
                                                           objects




                               Combine




                                                   Query
          Tile 1   Portals 1

          Tile 2   Portals 2

          Tile 3   Portals 3             Graph B           Visible



                               Combine
                                                           objects




                                                   Query
...          ...      ...

          Tile n   Portals n
Got rid of ”map”
                               Runtime
      Tile 0       Portals 0             Graph A           Visible
                                                           objects




                               Combine




                                                   Query
      Tile 1       Portals 1

      Tile 2       Portals 2

      Tile 3       Portals 3             Graph B           Visible



                               Combine
                                                           objects




                                                   Query
...      ...          ...

      Tile n       Portals n
Split up ”reduce”, moved to runtime
                           Runtime
      Tile 0   Portals 0             Graph A           Visible
                                                       objects




                           Combine




                                               Query
      Tile 1   Portals 1

      Tile 2   Portals 2

      Tile 3   Portals 3             Graph B           Visible



                           Combine
                                                       objects




                                               Query
...      ...      ...

      Tile n   Portals n
Questions?




jasin@umbrasoftware.com

More Related Content

PDF
RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph
PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PPT
GDC 2012: Advanced Procedural Rendering in DX11
PPT
Creating Custom Charts With Ruby Vector Graphics
PPT
CS 354 Viewing Stuff
PDF
Killzone Shadow Fall Demo Postmortem
PDF
Copy Your Favourite Nokia App with Qt
PDF
Dissecting the Rendering of The Surge
RedisConf18 - Lower Latency Graph Queries in Cypher with Redis Graph
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
GDC 2012: Advanced Procedural Rendering in DX11
Creating Custom Charts With Ruby Vector Graphics
CS 354 Viewing Stuff
Killzone Shadow Fall Demo Postmortem
Copy Your Favourite Nokia App with Qt
Dissecting the Rendering of The Surge

Similar to Embarrassingly Parallel Computation for Occlusion Culling (19)

PDF
Graphicsand animations devoxx2010 (1)
KEY
Graphs in the Database: Rdbms In The Social Networks Age
PPTX
Benoit fouletier guillaume martin unity day- modern 2 d techniques-gce2014
PDF
PPT
05 cubetech
PDF
Designing an Objective-C Framework about 3D
PDF
[HTML5 BUG] 2,5D RTS Game in HTML5 by Dawid Lijewski
PDF
Bigdata roundtable-storm
PPT
CS 354 Procedural Methods
PDF
Workflows for developing next gen 3D browser games
PDF
CG simple openGL point & line-course 2
PDF
Pitfalls of object_oriented_programming_gcap_09
PDF
Automatically Defined Functions for Learning Classifier Systems
PDF
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
PDF
CG OpenGL surface detection+illumination+rendering models-course 9
PPTX
WOOster: A Map-Reduce based Platform for Graph Mining
PDF
09_Practical Multicore programming
PDF
TomTom for Business Process Managment (TomTom4BPM)
PDF
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
Graphicsand animations devoxx2010 (1)
Graphs in the Database: Rdbms In The Social Networks Age
Benoit fouletier guillaume martin unity day- modern 2 d techniques-gce2014
05 cubetech
Designing an Objective-C Framework about 3D
[HTML5 BUG] 2,5D RTS Game in HTML5 by Dawid Lijewski
Bigdata roundtable-storm
CS 354 Procedural Methods
Workflows for developing next gen 3D browser games
CG simple openGL point & line-course 2
Pitfalls of object_oriented_programming_gcap_09
Automatically Defined Functions for Learning Classifier Systems
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
CG OpenGL surface detection+illumination+rendering models-course 9
WOOster: A Map-Reduce based Platform for Graph Mining
09_Practical Multicore programming
TomTom for Business Process Managment (TomTom4BPM)
[Harvard CS264] 03 - Introduction to GPU Computing, CUDA Basics
Ad

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
sap open course for s4hana steps from ECC to s4
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Electronic commerce courselecture one. Pdf
Ad

Embarrassingly Parallel Computation for Occlusion Culling

  • 2. Who are we? • The only occlusion culling middleware company in the world • Founded in 2006 • Based in Helsinki • 12 people • Customers: Bungie (Halo), Guerrilla (Killzone), Remedy (Alan Wake), Bioware (Mass Effect), CD Projekt (Witcher), ArenaNet (Guild Wars) and many more
  • 3. We’re going to talk about • The past – Brief introduction to occlusion culling – Traditional methods of visibility computation • The present – Umbra’s visibility computation algorithm – How it can be distributed • The future – Challenges of modern games and engines
  • 4. The Past: SO, WHAT’S OCCLUSION CULLING ANYWAY?
  • 5. Graphics in games • Game development process: – Artists create content – Engine runtime renders it • Rendering – Content consists of objects – Which consist of triangles – Which get rendered by the GPU • Our business: rendering optimization
  • 6. Occlusion culling explained • ”Culling is the process of removing breeding animals from a group based on specific criteria.” (Wikipedia) • Hidden surface removal: ”Which surfaces do not contribute to the final rendered image on the screen?” • Some popular HSR methods: – Frustum culling – Backface culling – Occlusion culling
  • 7. Occlusion culling explained • Occlusion culling: ”Which surfaces are blocked (occluded) by other surfaces?” • Depth buffering is one way to do OC – Very accurate (i.e. pixel level) – Ubiquitous on hardware, easy problem to solve – Occurs very late in the pipeline
  • 8. Occlusion culling explained • Higher-level methods complement depth- buffering nicely • These cull entire objects, groups of objects or entire sections of the scene – Not easy! • The earlier, the better
  • 9. Occlusion culling Only the objects visible to the camera are rendered
  • 10. ”Traditional” way to do OC • Preprocess: – Divide scene into cells – Compute visibility between cells • Results in a visibility matrix (PVS) • Runtime: – Locate the camera – Do a lookup into the PVS matrix
  • 12. Split scene into cells A B C D
  • 13. Compute visibility (sampling) A B A B C D A 1 1 1 0 B C D C D
  • 14. Compute visibility A B A B C D A 1 1 1 0 B 1 1 0 1 C D C D
  • 15. Compute visibility A B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 D C D
  • 16. Compute visibility A B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 D 0 1 1 1 C D
  • 17. Runtime PVS culling A B A B C D A 1 1 1 0 B 1 1 0 1 C 1 0 1 1 D 0 1 1 1 C D
  • 18. Problem? • Solving visibility between cells is very difficult – E.g. Solving analytically is actually O(n4) • Global operation by nature • Doesn’t play well with dynamic scenes – Worst case: a change in one cell requires recomputation of the entire matrix
  • 20. Welcome to the 2010s • Modern game worlds are huge • So it’d be cool if you didn’t need the entire scene in memory, ever • It’d be even cooler if the heavy lifting could be distributed. Or sent to the Cloud™ • Buildings collapse. Things change.
  • 21. The Umbra approach • Don’t actually compute visibility for the entire scene • Instead, process geometry to create a datastructure to solve visibility in the runtime • Portal culling in the runtime
  • 22. Data generation • Data = portal graph • Generate local graphs individually reasonably- sized geometry chunks (tiles), in parallel • Combine the results into a global portal graph that can be quickly traversed • Solve visibility quickly in the runtime using this graph
  • 23. Will this work? • Portal generation – Is very hard, but possible to do automatically – Only local geometry needed →Pretty much an embarrassingly parallel problem • Runtime – Not as simple as a PVS lookup, but still quite fast
  • 26. Dispatch tiles to worker nodes Tile 0 Tile 1 Tile 2 Tile 3
  • 27. Generate portals Tile 0 Tile 1 Tile 2 Tile 3
  • 30. What did we do here? • Essentially a map-reduce – Split scene into distributable tiles – Generate local portal graph for each tile – Combine results, link global portal graph Runtime Scene Tile 0 Portals 0 Global portal Visible graph objects Reduce Tile 1 Portals 1 Query Map ... ... Tile n Portals n
  • 31. The Future THE NEXT GENERATION
  • 32. Turns out... • Even the initial ”map” is too much for large game worlds • A global graph of a vast world is too expensive in the runtime • You need to support multiple versions of some chunks for dynamic content – Quite a combinatorial problem → Next-gen games require an even better solution!
  • 33. So we did something like this Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query ... ... ... Tile n Portals n
  • 34. Got rid of ”map” Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query ... ... ... Tile n Portals n
  • 35. Split up ”reduce”, moved to runtime Runtime Tile 0 Portals 0 Graph A Visible objects Combine Query Tile 1 Portals 1 Tile 2 Portals 2 Tile 3 Portals 3 Graph B Visible Combine objects Query ... ... ... Tile n Portals n