SlideShare a Scribd company logo
The Intersection of  Game Engines & GPUs: Current & Future Johan Andersson Rendering Architect 2.5
Agenda Goal Share and discuss current & future graphics use cases in our games and implications for graphics hardware Areas Engine overview Shaders Parallelization Texturing Raytracing GPU compute Conclusions Q & A
Frostbite DICE proprietary engine Xbox 360 PS3 Windows (Direct3D 10) Focus Large outdoor environments Singleplayer & multiplayer Destruction! New: Content workflows
BFBC screenshot
BFBC screenshot
 
Graph-based surface shaders Artist-friendly Easy to create, tweak & manage Flexible Programmers & artists can extend & expose features Data-centric Encapsulates resources Transformable Rich high-level shading framework Used by all content & systems
 
Shader permutations Generate shader permutations For each  used combination of features/data HLSL vertex & pixel shaders Many features = permutation explosion Shader graphs, lighting, geometry Balance perf. vs permutations vs features Dynamic branching Live with many permutations
Shader subroutines Next step: Static subroutine linking Inline in all subroutines at call site Similar to a switch statement Reduces # permutations  Implementation moved to driver or GPU Doesn’t work with instancing Future step: Dynamic subroutines Control function pointers inside shader Problem solved, but coherency important
Rendering & Parallelization
Jobs Must utilize multi-core 6 HW threads on Xbox 360 6 SPUs on PS3 2-8 cores on PC Job  definition Fully independent stateless function PS3 SPU requirement Graph dependencies Task-parallel and data-parallel
Rendering jobs Refactor rendering systems to jobs Most will move to GPU Eventually One-way data flow Compute shaders & stream output Jobs Decal projection Particle simulation Terrain geometry processing Undergrowth generation [2] Frustum culling Occlusion culling Command buffer generation PS3: Triangle culling
Parallel command buffer recording  Dispatch draw calls and state to multiple command buffers in parallel Scales linearly with # cores 1500-4000 draw calls per frame Super-important for all platforms, used on: Xbox 360 PS3 (SPU-based) No support in DX10!
DX10 parallel command buffer rec. Single most important DX10 issue  For us and many others (in the future) Until future API support Reduce draw calls with instancing Trade GPU performance for CPU performance Reduce state & constant updates Slow dynamic constant path   Manual software command buffers  Difficult to update dynamic resources efficiently in parallel due to API
PS3 geometry processing (1/2) Slow GPU triangle & vertex setup  Unique situation with ”free” processors Not fully utilized Solution: SPU triangle culling Trade SPU time for GPU performance Cull back faces, micro-triangles, frustum Sony PS3 EDGE library 5 jobs processes frame geometry in parallel Output is new index buffer for each draw call
PS3 geometry processing (2/2) Great flexibility and programmability! Custom processing Partition bounding box culling Triangle part culling Clip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes) Future: No vertex & geometry shaders DIY compute shaders with fixed-func tesselation and triangle setup units Output buffer streaming still important
Occlusion culling Buildings occlude objects Tons of objects Difficult to implement Building destruction Dynamic occludees Heavy GPU occlusion queries Invisible objects still have to Update logic & animations Generate command buffer Processed on CPU & GPU
Software occlusion culling Solution: Rasterize course zbuffer on SPU/CPU Low-poly occluder meshes 100m view distance Max 10000 vertices/frame Manually conservative 256x114 float z-buffer Created for PS3, now on all Cull all objects against zbuffer Before passed to all other systems = big savings Screen-space bbox test
GPU occlusion culling Want GPU rasterization & testing, but: Occlusion queries introduces overhead & latency Can be manageable, not ideal Conditional rendering only helps GPU Not CPU, frame memory or draw calls Future1: Low-latency extra GPU exec context Rasterization and testing done on GPU Lockstep with CPU Future2: Move entire cull & rendering to GPU Scene graph, cull, systems, dispatch. End goal.
Texturing
Texture formats Using DXT1/5 color maps, sRGB BC5 (3Dc) normal maps BC4 (DXT5A) for grayscale masks sRGB support for BC4/5 would be nice DXT1 replacement needed Low quality 565 color bleeding RG/RGB masks compresses badly HDR envmaps & lightmaps RGB DXT1 mask DXT color bleed
 
Future texture sampling Texture sampling derivatives 1st order  texel  derivatives 2nd order as well? Implement in sampler unit Bad performance or quality with shader sampling  Artifacts with ddx/ddy technique Replace normalmaps with easily compressed bumpmaps Bicubic upsampling Terrain masks Terrain heightmap Derived normals [2]
 
Current sparse textures Save memory for terrain Static quadtree mask texture Dynamic sparse destruction mask Implementation Indirection texture lookup in atlas Arrays too small, want 8192 slices Correct bilinear filtering by borders Siggraph’07 course for details [2] Source mask Atlas texture
HW sparse textures Virtual texture HW texture filtering & mipmapping Fallback on non-resident tile access  Lower mipmap, default value or shader bool At least 32k x 32k, fp issues with larger? Application-controlled tile commit/free ~128 x 128 tiles Feedback mechanism for referenced tiles Easy view-dependent allocation Future: Latency-free allocation & generation Alt1. CPU thread callback & block Alt2. Keep everything on GPU. ”Command” shader?
Cached Procedural Unique Texturing Unique dynamic sparse texture on all objects  Defined by texture shader graph Combine procedurals, compositing, streaming and  uv-space geometry Dynamically commit & render visible tiles Highly complex compositing Thanks to high frame-to-frame coherency Upsample and refine New dynamic effects made possible Affect every surface
Raytracing
Raytracing Much recent debate & interest in RTRT What we are interested in: Performance!!  Rasterization for primary rays Deterministic Easy integration into engines Just another method for certain effects & objects Not replace whole pipeline  Efficient dynamic geometry Procedural & manual animation (foliage, characters) Destruction (foliage, buildings, objects)
Mirror’s Edge
Raytraced reflections wanted Glass & metal Mostly planar surfaces Reflection locality Correct reflections for important objects Main character Simplified world geometry & shading for rest Common for games Brickmaps? [3]
Mirror’s Edge Soft reflections
GPGPU
GPGPU uses Effect physics Particle vs world soft collision AI pathfinding AI visibility View rasterization. Obstruction from smoke & foliage Procedural animation Trees, undergrowth, hair Post-processing
CUDA DOF post-process filter Thesis work at DICE [4] Test CUDA and performance Poisson disc blur Multi-passed diffusion Seperable diffusion Good: Easy to learn (C) Map complex algorithms Thread & memory control Bad: Performance vs shaders Beta interop Vendor-specific Circle of confusion map Output
GPU Compute programming model Wanted: Easy & efficient Direct3D 10 interop Low-latency Compute tasks Vendor-independent base interface OpenCL? Efficient CPU multi-core backend Server, older GPUs, debugging MCUDA [5] Eventually platform-independent Future consoles
Conclusions Shader subroutines More software-controlled pipeline More texture sampler functionality Limited-case raytracing GPU compute for games
Questions? Contact: johan.andersson@dice.se
References [1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering Architecture and  Real-time Procedural Shading & Texturing Techniques”.  GDC 2007.  Link [2] Andersson, Johan. ” Terrain Rendering in Frostbite using Procedural Shader Splatting”.  Siggraph 2007.  Link [3]  Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004.  Link [4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008. [5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008.
Bonus slides
Real-time REYES Very interesting Displacement mapping & procedurals Stochastic sampling Potentially more efficient & general Compared to maxed out rasterization & tessellation on everything = pixel-sized triangles But No experience  More research & experimentation needed
Terrain detail Deriving normal from heightfield good in distance Future: HW tessellation & procedural displacement shaders for up close ground detail
Texture arrays Use cases: Everything! Rich parameterized shaders Vary slice index per instance, triangle or texel  Instancing without comprimising on variation or perf. Cascaded shadow maps HW PCF only in DX 10.1   Stable Cascaded Bounding Box Shadow Maps Sparse textures More slices plz For tile pools. 64x64x8192
Other raytracing uses Global Illumination & Ambient Occlusion Incremental Photon Mapping? Async collision raycasts AI pathfinding, gameplay, sound obstruction Seperate collision world from visual world CPU job-based now

More Related Content

PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PDF
Killzone Shadow Fall Demo Postmortem
PDF
Bindless Deferred Decals in The Surge 2
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PDF
Dissecting the Rendering of The Surge
PPT
Secrets of CryENGINE 3 Graphics Technology
PPT
Crysis Next-Gen Effects (GDC 2008)
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Killzone Shadow Fall Demo Postmortem
Bindless Deferred Decals in The Surge 2
FrameGraph: Extensible Rendering Architecture in Frostbite
Dissecting the Rendering of The Surge
Secrets of CryENGINE 3 Graphics Technology
Crysis Next-Gen Effects (GDC 2008)

What's hot (20)

PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
PPTX
Stochastic Screen-Space Reflections
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPTX
Lighting you up in Battlefield 3
PDF
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
PPT
A Bit More Deferred Cry Engine3
PPTX
Hable John Uncharted2 Hdr Lighting
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PPTX
Shiny PC Graphics in Battlefield 3
PPT
Light prepass
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
PDF
Screen Space Reflections in The Surge
PPTX
A Real-time Radiosity Architecture
PPT
Star Ocean 4 - Flexible Shader Managment and Post-processing
PPT
Z Buffer Optimizations
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PDF
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
Stochastic Screen-Space Reflections
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Optimizing the Graphics Pipeline with Compute, GDC 2016
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Physically Based and Unified Volumetric Rendering in Frostbite
Lighting you up in Battlefield 3
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
A Bit More Deferred Cry Engine3
Hable John Uncharted2 Hdr Lighting
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Shiny PC Graphics in Battlefield 3
Light prepass
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
Screen Space Reflections in The Surge
A Real-time Radiosity Architecture
Star Ocean 4 - Flexible Shader Managment and Post-processing
Z Buffer Optimizations
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Ad

Viewers also liked (20)

PPTX
Parallel Futures of a Game Engine
PPTX
Parallel Futures of a Game Engine (v2.0)
PPTX
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
PPT
Bending the Graphics Pipeline
PPTX
The Rendering Pipeline - Challenges & Next Steps
PPTX
Lighting the City of Glass
PPTX
Scope Stack Allocation
PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
PPTX
5 Major Challenges in Real-time Rendering (2012)
PPTX
DirectX 11 Rendering in Battlefield 3
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPTX
High Dynamic Range color grading and display in Frostbite
PPT
Introduction to Data Oriented Design
PPS
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
PDF
Executable Bloat - How it happens and how we can fight it
PPT
5 Major Challenges in Interactive Rendering
PPT
Destruction Masking in Frostbite 2 using Volume Distance Fields
PPTX
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
PPTX
Mantle for Developers
PPT
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Parallel Futures of a Game Engine
Parallel Futures of a Game Engine (v2.0)
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
Bending the Graphics Pipeline
The Rendering Pipeline - Challenges & Next Steps
Lighting the City of Glass
Scope Stack Allocation
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
5 Major Challenges in Real-time Rendering (2012)
DirectX 11 Rendering in Battlefield 3
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
High Dynamic Range color grading and display in Frostbite
Introduction to Data Oriented Design
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Executable Bloat - How it happens and how we can fight it
5 Major Challenges in Interactive Rendering
Destruction Masking in Frostbite 2 using Volume Distance Fields
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
Mantle for Developers
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Ad

Similar to The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008) (20)

PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPT
NVIDIA Graphics, Cg, and Transparency
PDF
Taking Killzone Shadow Fall Image Quality Into The Next Generation
PDF
Modern Graphics Pipeline Overview
PPT
CS 354 GPU Architecture
PPT
NVIDIA's OpenGL Functionality
PPT
D3 D10 Unleashed New Features And Effects
PPTX
Real-time lightmap baking
PDF
A0280105
PDF
Smedberg niklas bringing_aaa_graphics
PPTX
Summer Games University - Day 3
PDF
2D Games to HPC
PDF
3 d to_hpc
PDF
3 d to _hpc
PDF
Optimizing the graphics pipeline with compute
PPT
OpenGL 4 for 2010
KEY
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
PDF
The Next Generation of PhyreEngine
PPT
Felwyrld Tech
Rendering Technologies from Crysis 3 (GDC 2013)
NVIDIA Graphics, Cg, and Transparency
Taking Killzone Shadow Fall Image Quality Into The Next Generation
Modern Graphics Pipeline Overview
CS 354 GPU Architecture
NVIDIA's OpenGL Functionality
D3 D10 Unleashed New Features And Effects
Real-time lightmap baking
A0280105
Smedberg niklas bringing_aaa_graphics
Summer Games University - Day 3
2D Games to HPC
3 d to_hpc
3 d to _hpc
Optimizing the graphics pipeline with compute
OpenGL 4 for 2010
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
The Next Generation of PhyreEngine
Felwyrld Tech

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Big Data Technologies - Introduction.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectral efficient network and resource selection model in 5G networks
Big Data Technologies - Introduction.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
CIFDAQ's Market Insight: SEC Turns Pro Crypto
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
The AUB Centre for AI in Media Proposal.docx
Mobile App Security Testing_ A Comprehensive Guide.pdf

The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware 2008)

  • 1. The Intersection of Game Engines & GPUs: Current & Future Johan Andersson Rendering Architect 2.5
  • 2. Agenda Goal Share and discuss current & future graphics use cases in our games and implications for graphics hardware Areas Engine overview Shaders Parallelization Texturing Raytracing GPU compute Conclusions Q & A
  • 3. Frostbite DICE proprietary engine Xbox 360 PS3 Windows (Direct3D 10) Focus Large outdoor environments Singleplayer & multiplayer Destruction! New: Content workflows
  • 6.  
  • 7. Graph-based surface shaders Artist-friendly Easy to create, tweak & manage Flexible Programmers & artists can extend & expose features Data-centric Encapsulates resources Transformable Rich high-level shading framework Used by all content & systems
  • 8.  
  • 9. Shader permutations Generate shader permutations For each used combination of features/data HLSL vertex & pixel shaders Many features = permutation explosion Shader graphs, lighting, geometry Balance perf. vs permutations vs features Dynamic branching Live with many permutations
  • 10. Shader subroutines Next step: Static subroutine linking Inline in all subroutines at call site Similar to a switch statement Reduces # permutations Implementation moved to driver or GPU Doesn’t work with instancing Future step: Dynamic subroutines Control function pointers inside shader Problem solved, but coherency important
  • 12. Jobs Must utilize multi-core 6 HW threads on Xbox 360 6 SPUs on PS3 2-8 cores on PC Job definition Fully independent stateless function PS3 SPU requirement Graph dependencies Task-parallel and data-parallel
  • 13. Rendering jobs Refactor rendering systems to jobs Most will move to GPU Eventually One-way data flow Compute shaders & stream output Jobs Decal projection Particle simulation Terrain geometry processing Undergrowth generation [2] Frustum culling Occlusion culling Command buffer generation PS3: Triangle culling
  • 14. Parallel command buffer recording Dispatch draw calls and state to multiple command buffers in parallel Scales linearly with # cores 1500-4000 draw calls per frame Super-important for all platforms, used on: Xbox 360 PS3 (SPU-based) No support in DX10!
  • 15. DX10 parallel command buffer rec. Single most important DX10 issue For us and many others (in the future) Until future API support Reduce draw calls with instancing Trade GPU performance for CPU performance Reduce state & constant updates Slow dynamic constant path  Manual software command buffers Difficult to update dynamic resources efficiently in parallel due to API
  • 16. PS3 geometry processing (1/2) Slow GPU triangle & vertex setup Unique situation with ”free” processors Not fully utilized Solution: SPU triangle culling Trade SPU time for GPU performance Cull back faces, micro-triangles, frustum Sony PS3 EDGE library 5 jobs processes frame geometry in parallel Output is new index buffer for each draw call
  • 17. PS3 geometry processing (2/2) Great flexibility and programmability! Custom processing Partition bounding box culling Triangle part culling Clip plane triangle trivial accept & reject Triangle cull volumes (inverse clip planes) Future: No vertex & geometry shaders DIY compute shaders with fixed-func tesselation and triangle setup units Output buffer streaming still important
  • 18. Occlusion culling Buildings occlude objects Tons of objects Difficult to implement Building destruction Dynamic occludees Heavy GPU occlusion queries Invisible objects still have to Update logic & animations Generate command buffer Processed on CPU & GPU
  • 19. Software occlusion culling Solution: Rasterize course zbuffer on SPU/CPU Low-poly occluder meshes 100m view distance Max 10000 vertices/frame Manually conservative 256x114 float z-buffer Created for PS3, now on all Cull all objects against zbuffer Before passed to all other systems = big savings Screen-space bbox test
  • 20. GPU occlusion culling Want GPU rasterization & testing, but: Occlusion queries introduces overhead & latency Can be manageable, not ideal Conditional rendering only helps GPU Not CPU, frame memory or draw calls Future1: Low-latency extra GPU exec context Rasterization and testing done on GPU Lockstep with CPU Future2: Move entire cull & rendering to GPU Scene graph, cull, systems, dispatch. End goal.
  • 22. Texture formats Using DXT1/5 color maps, sRGB BC5 (3Dc) normal maps BC4 (DXT5A) for grayscale masks sRGB support for BC4/5 would be nice DXT1 replacement needed Low quality 565 color bleeding RG/RGB masks compresses badly HDR envmaps & lightmaps RGB DXT1 mask DXT color bleed
  • 23.  
  • 24. Future texture sampling Texture sampling derivatives 1st order texel derivatives 2nd order as well? Implement in sampler unit Bad performance or quality with shader sampling Artifacts with ddx/ddy technique Replace normalmaps with easily compressed bumpmaps Bicubic upsampling Terrain masks Terrain heightmap Derived normals [2]
  • 25.  
  • 26. Current sparse textures Save memory for terrain Static quadtree mask texture Dynamic sparse destruction mask Implementation Indirection texture lookup in atlas Arrays too small, want 8192 slices Correct bilinear filtering by borders Siggraph’07 course for details [2] Source mask Atlas texture
  • 27. HW sparse textures Virtual texture HW texture filtering & mipmapping Fallback on non-resident tile access Lower mipmap, default value or shader bool At least 32k x 32k, fp issues with larger? Application-controlled tile commit/free ~128 x 128 tiles Feedback mechanism for referenced tiles Easy view-dependent allocation Future: Latency-free allocation & generation Alt1. CPU thread callback & block Alt2. Keep everything on GPU. ”Command” shader?
  • 28. Cached Procedural Unique Texturing Unique dynamic sparse texture on all objects Defined by texture shader graph Combine procedurals, compositing, streaming and uv-space geometry Dynamically commit & render visible tiles Highly complex compositing Thanks to high frame-to-frame coherency Upsample and refine New dynamic effects made possible Affect every surface
  • 30. Raytracing Much recent debate & interest in RTRT What we are interested in: Performance!! Rasterization for primary rays Deterministic Easy integration into engines Just another method for certain effects & objects Not replace whole pipeline Efficient dynamic geometry Procedural & manual animation (foliage, characters) Destruction (foliage, buildings, objects)
  • 32. Raytraced reflections wanted Glass & metal Mostly planar surfaces Reflection locality Correct reflections for important objects Main character Simplified world geometry & shading for rest Common for games Brickmaps? [3]
  • 33. Mirror’s Edge Soft reflections
  • 34. GPGPU
  • 35. GPGPU uses Effect physics Particle vs world soft collision AI pathfinding AI visibility View rasterization. Obstruction from smoke & foliage Procedural animation Trees, undergrowth, hair Post-processing
  • 36. CUDA DOF post-process filter Thesis work at DICE [4] Test CUDA and performance Poisson disc blur Multi-passed diffusion Seperable diffusion Good: Easy to learn (C) Map complex algorithms Thread & memory control Bad: Performance vs shaders Beta interop Vendor-specific Circle of confusion map Output
  • 37. GPU Compute programming model Wanted: Easy & efficient Direct3D 10 interop Low-latency Compute tasks Vendor-independent base interface OpenCL? Efficient CPU multi-core backend Server, older GPUs, debugging MCUDA [5] Eventually platform-independent Future consoles
  • 38. Conclusions Shader subroutines More software-controlled pipeline More texture sampler functionality Limited-case raytracing GPU compute for games
  • 40. References [1] Tartarchuk, Natasha & Andersson, Johan. ”Rendering Architecture and Real-time Procedural Shading & Texturing Techniques”. GDC 2007. Link [2] Andersson, Johan. ” Terrain Rendering in Frostbite using Procedural Shader Splatting”. Siggraph 2007. Link [3] Christensen, Per H. & Batali, Dana. "An Irradiance Atlas for Global Illumination in Complex Production Scenes“. Eurographics Symposium on Rendering 2004. Link [4] Lonroth, Per & Unger, Mattias. ”Advanced Real-time Post-Processing using GPGPU techniques”. Master thesis, 2008. [5] John Stratton, Sam Stone, Wen-mei Hwu. "MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores". Technical report, University of Illinois at Urbana-Champaign, IMPACT-08-01, March, 2008.
  • 42. Real-time REYES Very interesting Displacement mapping & procedurals Stochastic sampling Potentially more efficient & general Compared to maxed out rasterization & tessellation on everything = pixel-sized triangles But No experience More research & experimentation needed
  • 43. Terrain detail Deriving normal from heightfield good in distance Future: HW tessellation & procedural displacement shaders for up close ground detail
  • 44. Texture arrays Use cases: Everything! Rich parameterized shaders Vary slice index per instance, triangle or texel Instancing without comprimising on variation or perf. Cascaded shadow maps HW PCF only in DX 10.1  Stable Cascaded Bounding Box Shadow Maps Sparse textures More slices plz For tile pools. 64x64x8192
  • 45. Other raytracing uses Global Illumination & Ambient Occlusion Incremental Photon Mapping? Async collision raycasts AI pathfinding, gameplay, sound obstruction Seperate collision world from visual world CPU job-based now