SlideShare a Scribd company logo
Develop2012 deferred sanchez_stachowiak
Solving Some Common Problems
in a Modern Deferred Rendering
Engine
Jose Luis Sanchez Bonet
Tomasz Stachowiak
/* @h3r2tic */
Deferred rendering – pros and cons
• Pros ( some )
– Very scalable
– No shader permutation explosion
– G-Buffer useful in other techniques
• SSAO, SRAA, decals, …
• Cons ( some )
– Difficult to use multiple shading models
– Does not handle translucent geometry
• Some variants do, but may be impractical
• The BRDF defines the look of a surface
– Bidirectional Reflectance Distribution Function
𝐿 𝑜 = 𝐿 𝑒 +
Ω
𝐿𝑖 ∙ ∙ cos 𝜃 ∙ 𝛿𝜔
• Typically games use just one ( Blinn-Phong )
– Simple, but inaccurate
• Very important in physically based rendering
– Want more: Oren-Nayar, Kajiya-Kay, Penner, Cook-Torrance, …
Reflectance models
BRDFs vs. rendering
• Forward rendering
– Material shader directly evaluates the BRDF
• Trivial
• Deferred rendering
– Light shaders decoupled from materials
– No obvious solution
Material G-Buffer Light
BRDF ???
BRDFs vs. deferred – branching?
• Read shading model ID in the lighting shader, branch
• Might be the way to go on next-gen
• Expensive on current consoles
– Tax for branches never taken
• Don’t want to pay it for every light
Three different BRDFs, only one used
( branch always yields the first one )
Platform 1 BRDF 2 BRDFs 3 BRDFs
360 1.85 ms 2.1 ms
( + 0.25 ms )
2.35 ms
( + 0.5 ms )
PS3 1.9 ms 2.48 ms
( + 0.58 ms )
2.8 ms
( + 0.9 ms )
BRDFs vs. deferred – LUTs?
• Pre-calculate BRDF look-up tables
• Might be shippable enough
– See: S.T.A.L.K.E.R.
• Limited control over parameters
– Roughness
– Anisotropy, etc.
• BRDFs highly dimensional
– Isotropic with roughness control → 3D LUT
BRDFs vs. deferred – our approach
• One default BRDF
– Others a relatively rare case
• Shading model ID in stencil
• Multi-pass light rendering
• Mask out parts of the scene in each pass
Multi-pass – tax avoidance
• For each light
– Find all affected BRDFs
– Render the light volume once for each model
• Analogous to multi-pass forward rendering!
• Store bounding volumes of objects with non-standard
BRDFs
– Intersect with light volumes
Making it practical
• Needs to work with depth culling of lights
• Hierarchical stencil on 360 and PS3
Depth culling of lights
• Assuming viewer is outside
the light volume
• Render back faces of light
volume
– Increment stencil; no color
output
• Render front faces
– Only where stencil = 0; write
color
• Render back faces
– Clear stencil; no color output
Depth culling of lights
• Assuming viewer is outside
the light volume
• Start with stencil = 0
• Render front faces
– Only where stencil = 0; write
color
• Render back faces
– Clear stencil; no color output
Depth culling of lights
• Assuming viewer is outside
the light volume
• Start with stencil = 0
• Render back faces of light
volume
– Increment stencil; no color
output
• Render back faces
– Clear stencil; no color output
Culling with BRDFs
• Pack the culling bit and BRDF together
• Use masks to read/affect required parts
• Assuming 8 supported BRDFs:
Unused BRDF ID
Culling
bit
7 6 5 4 3 2 1 0
culling_mask = 0x01
brdf_mask = 0x0E
brdf_shift = 1
Light volume rendering passes
Handling miscellaneous data in stencil
• Stencil value may contain extra data
– Used in earlier / later rendering passes
– Need to ignore it somehow
– Stencil read mask?
• Doesn’t work with the 360’s hi-stencil
Garbage BRDF ID
Culling
bit
7 6 5 4 3 2 1 0
Stencil operation
Read Read mask Comparison Operator
Write
(masked )
Result
<, <=, >, ==, …
++, --, = 0, …
Hierarchical stencil operation
Read Read mask Comparison Operator
Write
(masked )
Result
Hi-stencil
comparison
Hi-stencil
Hi-stencil
comparison
Hi-stencil
<, <=, >, ==, … ==, !=
PS3 360
<, <=, >, ==, …
++, --, = 0, …
Spanner in the works
Breaks if stencil
contains garbage
we can’t mask out
Handling stencil garbage
• Can’t do it in a non-destructive manner
– Take off and nuke the entire site from orbit
– It’s the only way to be sure
• Extra cleaning pass?
– Don’t want to pay for it!
• Do it as we go!
• Save your stencil if you need it
– Sorry for calling it garbage :`(
– We were already restoring it later on the 360
– Don’t need to destroy it on the PS3, use a read mask!
Performance
Platform 1 BRDF 2 BRDFs 3 BRDFs
360
( branching )
1.85 ms
2.1 ms
( + 0.25 ms )
2.35 ms
( + 0.5 ms )
360
( stencil )
1.85 ms
1.99 ms
( + 0.14 ms )
2.13 ms
( + 0.28 ms )
PS3
( branching )
1.9 ms
2.48 ms
( + 0.58 ms )
2.8 ms
( + 0.9 ms )
PS3
( stencil )
1.9 ms
2.13 ms
( + 0.23 ms )
2.31 ms
( + 0.41 ms )
For each BRDF
Platform Initial setup Mask Render Cleanup
360 0.03 ms 0.1 ms >= 0.036 ms 0.022 ms
PS3 0.03 ms 0.1 ms >= 0.06 ms 0.14 ms
Multi-pass light rendering – final notes
• No change in single-BRDF rendering
– Use your madly optimized routines
• No need for a ‘default’ shading model
– It’s just our use case
– As long as you efficiently find influenced BRDFs
• Flush your hi-stencil
• Tiny lights? Try branching instead.
– Performance figures only from huge lights!
– With tiny lights, hi-stencil juggling becomes inefficient
Lighting alpha objects in deferred
rendering engines
• Classic solutions:
– Forward rendering.
– CPU based, one light probe per each object.
• Our solution:
– GPU based.
– More than one light probe.
– Calculate a lightmap for each object each frame.
– Used for objects and particle systems.
– Fits perfectly into a deferred rendering pipeline.
• Object space map:
Our solution for alpha objects
Every pixel stores the local space
position on the object’s surface
Image attribution: Zephyris at en.wikipedia.
• For each object:
– Use baked positions as light probes
• Transform object space map into world space
– Render lights, reusing deferred shading code
– Accumulate into lightmap
– Render object in alpha pass using lightmap
Our solution for alpha objects
Image attribution: Zephyris at en.wikipedia.
• Camera oriented quad fitted around and
centered in the particle system.
Our solution for particle systems
• For each particle system:
– Allocate a texture quad and fill it with interpolated
positions as light probes
– Render lights and accumulate into lightmap
– Render particles in alpha pass, converting from
clip space to lightmap coordinates.
Our solution for particle systems
Implementation details
• For performance reasons we pack all position
maps to a single texture.
• Every entity that needs alpha lighting will
allocate and use a region inside the texture.
World space
position
Light maps
Integration with deferred rendering
Fill G-Buffer
(Solid pass)
Render Lights Render Alpha
Deferred rendering
Our solution
Fill G-Buffer
(Solid pass)
Fill world
space light
probes
position map
Render lights
Render lights using world
space light probes map as
input and calculate alpha
lightmap
Render alpha using
alpha lightmap
Improvements
• Calculate a second texture with light direction
information.
• Other parameterizations for particle systems:
– Dust (one pixel per mote).
– Ribbons (a line of pixels).
• 3D volume slices for particle systems.
– Allocate a region for every slice
– Adds depth to the lighting solution.
3D volume slices
Slice n map
Slice 0 map
.
.
.
For each
slice we
allocate one
region
Demo
Demo
http://www.creative-assembly.com/jobs/
WE ARE HIRING!
Questions?
Jose Luis Sanchez Bonet
jose.sanchez@creative-assembly.com
Tomasz Stachowiak
tomasz.stachowiak@creative-assembly.com
twitter: h3r2tic

More Related Content

PPTX
DirectX 11 Rendering in Battlefield 3
PPTX
Hable John Uncharted2 Hdr Lighting
PDF
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
PPT
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PPT
Crysis Next-Gen Effects (GDC 2008)
PDF
Android タブレット、スマートウォッチにLinuxを入れて色々と遊んでみよう Hacking of Android Tablet and Smart...
PPT
Visibility Optimization for Games
DirectX 11 Rendering in Battlefield 3
Hable John Uncharted2 Hdr Lighting
A 2.5D Culling for Forward+ (SIGGRAPH ASIA 2012)
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Crysis Next-Gen Effects (GDC 2008)
Android タブレット、スマートウォッチにLinuxを入れて色々と遊んでみよう Hacking of Android Tablet and Smart...
Visibility Optimization for Games

What's hot (20)

PDF
CG 論文講読会 2013/2/12 "A reconstruction filter for plausible motion blur"
PDF
The Next Generation of PhyreEngine
PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
PPT
Z Buffer Optimizations
PPT
Light prepass
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPT
A Bit More Deferred Cry Engine3
PPTX
Masked Occlusion Culling
PPSX
Dx11 performancereloaded
PPSX
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
PPTX
Beyond porting
PDF
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
PDF
LiFePO4 Battery Simplified SPICE Behavioral Model(PSpice Version)
PDF
SPU Shaders
PDF
Littmann stethoscopes comparison
PDF
Bindless Deferred Decals in The Surge 2
PPTX
Parallel Futures of a Game Engine
PPTX
Real-time lightmap baking
PDF
Cracking into Doom (1993) WAD Files
CG 論文講読会 2013/2/12 "A reconstruction filter for plausible motion blur"
The Next Generation of PhyreEngine
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Z Buffer Optimizations
Light prepass
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
A Bit More Deferred Cry Engine3
Masked Occlusion Culling
Dx11 performancereloaded
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Beyond porting
LLVMで遊ぶ(整数圧縮とか、x86向けの自動ベクトル化とか)
LiFePO4 Battery Simplified SPICE Behavioral Model(PSpice Version)
SPU Shaders
Littmann stethoscopes comparison
Bindless Deferred Decals in The Surge 2
Parallel Futures of a Game Engine
Real-time lightmap baking
Cracking into Doom (1993) WAD Files
Ad

Viewers also liked (19)

DOCX
Receta mejorada natillas chocolate
PDF
Twilio smart communicationaward2016_syatabe
DOC
Cómo es tu día activo
PDF
99.SAR 2014
POT
PORÍFEROS (Andrés y Paco)
PPTX
Proteinas
PDF
Práctica 1
DOCX
La realidad virtual
PPTX
Failure analysis for dummies
PDF
Tudo que-c3a9-sc3b3lido-desmancha-no-ar-introduc3a7c3a3o
PDF
2006 alfa1 614727_b_ad01
PPT
Katalis
PDF
C18 Regulasi Ekspresi Gen
PPTX
Kerajinan dari fiberglass
PDF
Kerajinan bahan lunak dan wirausaha pdf
PDF
Screen Space Decals in Warhammer 40,000: Space Marine
PPTX
8D : Méthode de résolution de problèmes
PPT
Gyan Sept.17, 2009
PPTX
Administração Horizontal
Receta mejorada natillas chocolate
Twilio smart communicationaward2016_syatabe
Cómo es tu día activo
99.SAR 2014
PORÍFEROS (Andrés y Paco)
Proteinas
Práctica 1
La realidad virtual
Failure analysis for dummies
Tudo que-c3a9-sc3b3lido-desmancha-no-ar-introduc3a7c3a3o
2006 alfa1 614727_b_ad01
Katalis
C18 Regulasi Ekspresi Gen
Kerajinan dari fiberglass
Kerajinan bahan lunak dan wirausaha pdf
Screen Space Decals in Warhammer 40,000: Space Marine
8D : Méthode de résolution de problèmes
Gyan Sept.17, 2009
Administração Horizontal
Ad

Similar to Develop2012 deferred sanchez_stachowiak (20)

PDF
Deferred shading
PPT
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
PPTX
Developing Next-Generation Games with Stage3D (Molehill)
PPT
Tessellation on any_budget-gdc2011
PDF
Rendering basics
PDF
[UniteKorea2013] The Unity Rendering Pipeline
PPTX
The Rendering Technology of Killzone 2
PDF
Clean architecture for shaders unite2019
PPTX
High Dynamic Range color grading and display in Frostbite
PDF
Rendering Tech of Space Marine
PPTX
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
PPT
Paris Master Class 2011 - 05 Post-Processing Pipeline
PDF
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
PDF
Smedberg niklas bringing_aaa_graphics
PDF
Modern Graphics Pipeline Overview
PPTX
Making a game with Molehill: Zombie Tycoon
PPT
Overview of graphics systems.ppt
PPTX
OpenGL Shading Language
PDF
Efficient Usage of Compute Shaders on Xbox One and PS4
PDF
Epic_GDC2011_Samaritan
Deferred shading
Paris Master Class 2011 - 01 Deferred Lighting, MSAA
Developing Next-Generation Games with Stage3D (Molehill)
Tessellation on any_budget-gdc2011
Rendering basics
[UniteKorea2013] The Unity Rendering Pipeline
The Rendering Technology of Killzone 2
Clean architecture for shaders unite2019
High Dynamic Range color grading and display in Frostbite
Rendering Tech of Space Marine
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Paris Master Class 2011 - 05 Post-Processing Pipeline
Дмитрий Вовк - Learn iOS Game Optimization. Ultimate Guide
Smedberg niklas bringing_aaa_graphics
Modern Graphics Pipeline Overview
Making a game with Molehill: Zombie Tycoon
Overview of graphics systems.ppt
OpenGL Shading Language
Efficient Usage of Compute Shaders on Xbox One and PS4
Epic_GDC2011_Samaritan

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
1. Introduction to Computer Programming.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
Tartificialntelligence_presentation.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Getting Started with Data Integration: FME Form 101
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
1. Introduction to Computer Programming.pptx
Empathic Computing: Creating Shared Understanding
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Machine Learning_overview_presentation.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Tartificialntelligence_presentation.pptx
A comparative analysis of optical character recognition models for extracting...
Advanced methodologies resolving dimensionality complications for autism neur...
Getting Started with Data Integration: FME Form 101
NewMind AI Weekly Chronicles - August'25-Week II
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative study of natural language inference in Swahili using monolingua...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Develop2012 deferred sanchez_stachowiak

  • 2. Solving Some Common Problems in a Modern Deferred Rendering Engine Jose Luis Sanchez Bonet Tomasz Stachowiak /* @h3r2tic */
  • 3. Deferred rendering – pros and cons • Pros ( some ) – Very scalable – No shader permutation explosion – G-Buffer useful in other techniques • SSAO, SRAA, decals, … • Cons ( some ) – Difficult to use multiple shading models – Does not handle translucent geometry • Some variants do, but may be impractical
  • 4. • The BRDF defines the look of a surface – Bidirectional Reflectance Distribution Function 𝐿 𝑜 = 𝐿 𝑒 + Ω 𝐿𝑖 ∙ ∙ cos 𝜃 ∙ 𝛿𝜔 • Typically games use just one ( Blinn-Phong ) – Simple, but inaccurate • Very important in physically based rendering – Want more: Oren-Nayar, Kajiya-Kay, Penner, Cook-Torrance, … Reflectance models
  • 5. BRDFs vs. rendering • Forward rendering – Material shader directly evaluates the BRDF • Trivial • Deferred rendering – Light shaders decoupled from materials – No obvious solution Material G-Buffer Light BRDF ???
  • 6. BRDFs vs. deferred – branching? • Read shading model ID in the lighting shader, branch • Might be the way to go on next-gen • Expensive on current consoles – Tax for branches never taken • Don’t want to pay it for every light Three different BRDFs, only one used ( branch always yields the first one ) Platform 1 BRDF 2 BRDFs 3 BRDFs 360 1.85 ms 2.1 ms ( + 0.25 ms ) 2.35 ms ( + 0.5 ms ) PS3 1.9 ms 2.48 ms ( + 0.58 ms ) 2.8 ms ( + 0.9 ms )
  • 7. BRDFs vs. deferred – LUTs? • Pre-calculate BRDF look-up tables • Might be shippable enough – See: S.T.A.L.K.E.R. • Limited control over parameters – Roughness – Anisotropy, etc. • BRDFs highly dimensional – Isotropic with roughness control → 3D LUT
  • 8. BRDFs vs. deferred – our approach • One default BRDF – Others a relatively rare case • Shading model ID in stencil • Multi-pass light rendering • Mask out parts of the scene in each pass
  • 9. Multi-pass – tax avoidance • For each light – Find all affected BRDFs – Render the light volume once for each model • Analogous to multi-pass forward rendering! • Store bounding volumes of objects with non-standard BRDFs – Intersect with light volumes
  • 10. Making it practical • Needs to work with depth culling of lights • Hierarchical stencil on 360 and PS3
  • 11. Depth culling of lights • Assuming viewer is outside the light volume • Render back faces of light volume – Increment stencil; no color output • Render front faces – Only where stencil = 0; write color • Render back faces – Clear stencil; no color output
  • 12. Depth culling of lights • Assuming viewer is outside the light volume • Start with stencil = 0 • Render front faces – Only where stencil = 0; write color • Render back faces – Clear stencil; no color output
  • 13. Depth culling of lights • Assuming viewer is outside the light volume • Start with stencil = 0 • Render back faces of light volume – Increment stencil; no color output • Render back faces – Clear stencil; no color output
  • 14. Culling with BRDFs • Pack the culling bit and BRDF together • Use masks to read/affect required parts • Assuming 8 supported BRDFs: Unused BRDF ID Culling bit 7 6 5 4 3 2 1 0 culling_mask = 0x01 brdf_mask = 0x0E brdf_shift = 1
  • 16. Handling miscellaneous data in stencil • Stencil value may contain extra data – Used in earlier / later rendering passes – Need to ignore it somehow – Stencil read mask? • Doesn’t work with the 360’s hi-stencil Garbage BRDF ID Culling bit 7 6 5 4 3 2 1 0
  • 17. Stencil operation Read Read mask Comparison Operator Write (masked ) Result <, <=, >, ==, … ++, --, = 0, …
  • 18. Hierarchical stencil operation Read Read mask Comparison Operator Write (masked ) Result Hi-stencil comparison Hi-stencil Hi-stencil comparison Hi-stencil <, <=, >, ==, … ==, != PS3 360 <, <=, >, ==, … ++, --, = 0, …
  • 19. Spanner in the works Breaks if stencil contains garbage we can’t mask out
  • 20. Handling stencil garbage • Can’t do it in a non-destructive manner – Take off and nuke the entire site from orbit – It’s the only way to be sure • Extra cleaning pass? – Don’t want to pay for it! • Do it as we go! • Save your stencil if you need it – Sorry for calling it garbage :`( – We were already restoring it later on the 360 – Don’t need to destroy it on the PS3, use a read mask!
  • 21. Performance Platform 1 BRDF 2 BRDFs 3 BRDFs 360 ( branching ) 1.85 ms 2.1 ms ( + 0.25 ms ) 2.35 ms ( + 0.5 ms ) 360 ( stencil ) 1.85 ms 1.99 ms ( + 0.14 ms ) 2.13 ms ( + 0.28 ms ) PS3 ( branching ) 1.9 ms 2.48 ms ( + 0.58 ms ) 2.8 ms ( + 0.9 ms ) PS3 ( stencil ) 1.9 ms 2.13 ms ( + 0.23 ms ) 2.31 ms ( + 0.41 ms ) For each BRDF Platform Initial setup Mask Render Cleanup 360 0.03 ms 0.1 ms >= 0.036 ms 0.022 ms PS3 0.03 ms 0.1 ms >= 0.06 ms 0.14 ms
  • 22. Multi-pass light rendering – final notes • No change in single-BRDF rendering – Use your madly optimized routines • No need for a ‘default’ shading model – It’s just our use case – As long as you efficiently find influenced BRDFs • Flush your hi-stencil • Tiny lights? Try branching instead. – Performance figures only from huge lights! – With tiny lights, hi-stencil juggling becomes inefficient
  • 23. Lighting alpha objects in deferred rendering engines • Classic solutions: – Forward rendering. – CPU based, one light probe per each object. • Our solution: – GPU based. – More than one light probe. – Calculate a lightmap for each object each frame. – Used for objects and particle systems. – Fits perfectly into a deferred rendering pipeline.
  • 24. • Object space map: Our solution for alpha objects Every pixel stores the local space position on the object’s surface Image attribution: Zephyris at en.wikipedia.
  • 25. • For each object: – Use baked positions as light probes • Transform object space map into world space – Render lights, reusing deferred shading code – Accumulate into lightmap – Render object in alpha pass using lightmap Our solution for alpha objects Image attribution: Zephyris at en.wikipedia.
  • 26. • Camera oriented quad fitted around and centered in the particle system. Our solution for particle systems
  • 27. • For each particle system: – Allocate a texture quad and fill it with interpolated positions as light probes – Render lights and accumulate into lightmap – Render particles in alpha pass, converting from clip space to lightmap coordinates. Our solution for particle systems
  • 28. Implementation details • For performance reasons we pack all position maps to a single texture. • Every entity that needs alpha lighting will allocate and use a region inside the texture. World space position Light maps
  • 29. Integration with deferred rendering Fill G-Buffer (Solid pass) Render Lights Render Alpha Deferred rendering
  • 30. Our solution Fill G-Buffer (Solid pass) Fill world space light probes position map Render lights Render lights using world space light probes map as input and calculate alpha lightmap Render alpha using alpha lightmap
  • 31. Improvements • Calculate a second texture with light direction information. • Other parameterizations for particle systems: – Dust (one pixel per mote). – Ribbons (a line of pixels). • 3D volume slices for particle systems. – Allocate a region for every slice – Adds depth to the lighting solution.
  • 32. 3D volume slices Slice n map Slice 0 map . . . For each slice we allocate one region
  • 33. Demo
  • 34. Demo
  • 36. Questions? Jose Luis Sanchez Bonet jose.sanchez@creative-assembly.com Tomasz Stachowiak tomasz.stachowiak@creative-assembly.com twitter: h3r2tic

Editor's Notes

  • #3: Good news everyone! I'm Tom, this is Jose, and we're going to talk about deferred rendering. The focus is on current generation consoles, but the presented techniques can be used on just about any platform, so we hope anyone can benefit from them.
  • #4: Deferred rendering has been very popular lately due to its scalability, and because it plays nicely with other techniques, which can reuse the G-Buffer. At the same time, it doesn’t come without downsides. We are going to cover two of them in this presentation, and propose the custom solutions we've developed for our upcoming console title. The two problems are: handling many shading models, and rendering translucent geometry. I'm going to cover the former in the first half of the presentation, and then Jose will talk about translucency.
  • #5: In graphics rendering, we use simple mathematical formulas, to approximate the look of some classes of surfaces. The most commonly used model, or Bidirectional Reflectance Distribution Function, is Blinn-Phong, which works reasonably well as an approximation of some dielectrics. It is used due to its simplicity, but for the same reason, it cannot reproduce the look of many surfaces accurately. You might want to render your plastics with Blinn, skin with Eric Penner's pre-integrated model, hair with Kajiya-Kay or Marschner, brushed metal with anisotropic Ward, and so on. The visual properties of these surfaces are vastly different, and can not be covered with just a single, simple mathematical model.
  • #6: So how do we render with multiple shading models? If you use forward rendering, this is trivial. Because the BRDF is combined with the material in the same shader, it just works. However, in deferred rendering, we need to evaluate the reflectance model in the light shader, and these don't bear any connection to material shaders that the BRDFs are associated with.
  • #7: One approach would be to branch in the light shader. That is, the solid pass emits an identifier of the BRDF into the G-Buffer. The light shader reads it and branches upon its value. This solution might be viable on next-gen hardware, but it doesn't fare quite well on current consoles. In a small test case we did with a single full-screen light, branching brough the rendering cost from 1.85 to 2.1 milliseconds for just a single extra shading model. This is the tax you pay for not even taking the branch. That is, our test case is synthetic, and only the first BRDF is ever used. And it gets much worse on the PS3, which doesn't even have control flow instructions.
  • #8: One could also tabulate the BRDF data, and sample it using a combination of an ID, as well as some geometric parameters, such as N dot L and N dot H. One such approach has been used successfully in the game S.T.A.L.K.E.R., so it might be enough for your title as well. The trouble is, BRDFs are highly dimensional functions, so tabulation might be difficult; for example, the data for an isotropic BRDF parameterized by surface roughness, is already at least a 3-dimensional function. /* See Michael Ashikhmin’s "Distribution-Based BRDFs“. */
  • #9: We decided to use a single reflectance model for most of our scene geometry, and then special-case rendering in rare instances, such as skin and hair. The core of the idea is pretty simple: when rendering the solid pass, we store the ID of the shading model in the stencil buffer. Then in the lighting pass, we draw light geometry once for each BRDF, using the ID as a mask.
  • #10: Implemented like this, the idea would be inefficient. We would be multiplying the number of draw calls and shader switches by the number of supported BRDFs. However, when rendering a light, we can detect which BRDFs it can potentially use, and skip any extra processing. If you think of it, this is a very similar idea to multi-pass forward rendering. Here's a scene with two objects, both of which use different shading models. We have two lights influencing them. The light on the left is interesting, in that it will only affect just one object, hence only one BRDF. Therefore it doesn't need to run the multi-BRDF code path at all. To accomplish this optimization, we store the bounding boxes of all objects which use non-standard shading models. During light rendering, we intersect light volumes with these bounds, and conservatively find a list of all BRDFs which a light may potentially touch. /* We could in theory detect which BRDFs a light may affect and only use dynamic branching there, but then we either always pay a high cost, or we would need to create lots of shader permutations, for example “shading model A and B, A with B and C, A with C, B with C, et cetera.” For this reason we are just going to use multi-pass rendering. */
  • #11: Now, there are two more bits to the algorithm, needed to make it practical. Firstly, it needs to work with the commonly used stencil and depth-based light culling trick. Secondly, it must play well with the hierarchical stencil buffer. Let's start with a quick reminder of depth culling for lights. Consider a surface rendered into the G-buffer, and three lights. The left one is completely in front of the surface, so cannot influence it. The right one is behind the surface, so cannot influence it either. Only the middle one contributes to lighting, because its volume intersects the surface in the G-buffer.
  • #12: So how do we accomplish that using stencil testing? Let's consider the case when the viewer is outside of the light's volume. The stencil is initially clear.
  • #13: We start by writing the value of one into the stencil by using the back faces of the light volume. This will result in the stencil being set where the light is completely in front of the surface. Therefore we only want to render where the stencil is zero …
  • #14: … and we do so using the front faces with stencil testing enabled. Note that this is a vanilla version of the algorithm, and you may be using an optimized one.
  • #15: Extending this idea to selectively rendering multiple shading models, we need to pack both the culling bit and the shading model identifier in the stencil buffer. Because stencil testing supports read and write masks, we can act upon and affect portions of the stencil value. Here’s a sample layout assuming a maximum of eight supported BRDFs. Note that the BRDF bits can be placed at any offset in the byte.
  • #16: OK, let's get down to the actual rendering passes. First of all, we will be using the hierarchical stencil buffer, so that the GPU may reject entire rasterization tiles. This is where the bulk of our time savings actually comes from, as the regular stencil test happens after you’ve already paid the pixel shading cost. We start the same as with just depth-based culling. We draw back faces of the light volume with the stencil set to Increment. Once again, this marks areas we don’t want to render to. At this point, we have determined the list of BRDFs the light can potentially influence. For each of them, we create a hi-stencil mask first, then we render the volume again with the actual shader. Creating the mask is fairly cheap, so even though we render twice, we typically save time by hi-stencil culling the expensive shader. Finally, the last step restores the affected stencil area, so that the next light can render.
  • #17: We have been assuming that the stencil values are clear of any unrelated data. Yet in practice, they will carry multiple meanings, and rendering engines will have their own 'magic' stencil encodings. /* One example would be using a single bit of stencil to mask out dynamic objects from being affected by deferred decals. */ Unfortunately, such extra bits turn out to be garbage from the point of view of the proposed algorithm, and we cannot simply ignore them with read masks, at least not on the XBox 360.
  • #18: Let's take a look at the stencil operation to figure out why. The GPU first reads the original value and applies a user-specified mask to it. This value is then compared with a reference constant using one of several predicates, such as Greater, Less, Equal, et cetera. Upon the result of this comparison as well as the the depth test, an operator may be applied to the stencil value, such as incrementing or zeroing it. Finally, the resulting value is written back into the stencil buffer.
  • #19: How does the hi-stencil integrate with this pipeline? On the PS3, we get to specify a mask and a comparison function for the hi-stencil test, very much like in the regular one. This means that we can ignore any bits we don’t like. The 360 however, takes its hi-stencil value from the completely opposite end of the pipe, from the final value written back to the stencil buffer. Furthermore, we may only specify a trivial equality or inequality predicate against a reference value.
  • #20: Unfortunately, this throws a spanner in our hi-stencil mask creation. Since the 360 can only create its mask from the full value, any garbage bits will cause the corresponding tiles to be culled.
  • #21: Well, if we can’t ignore the extra bits, I say we nuke them from orbit. The easiest way would be to have a separate pass which cleans the stencil buffer, removing any garbage bits. On the other hand, we don't want to add any more fixed cost steps into our rendering, especially at the end of the current hardware generation, when everyone is battling for the last microseconds. Fortunately, we can clear out the garbage bits as we go. When creating the hi-stencil mask, we will set the regular stencil operator to do so, while skipping over the ID of the shading model. Now, I've been calling these "garbage bits", but you may have good reasons for extra information in your stencil buffer. Chances are that on the 360 you restore them at a later point anyway, due to limited EDRAM resources. On the PS3 we don't need to clobber the bits at all, due to its more flexible hi-stencil buffer creation process.
  • #22: How’s performance then? Let’s recall the figures from one of the first slides. With the dynamic branching approach, we had to pay a pretty hefty tax, especially on the PS3. How does the proposed algorithm stack against that? We still pay a slight tax, but only for the lights which render with multiple shading models, and only for the models we actually use. This is especially important if we support many shading models, but each light affects very few on average. Then we end up paying a considerably smaller cost for the extra shading models
  • #23: That's pretty much the whole algorithm. I'd just like to emphasize a few extra points. First of all, nothing is changed for single-BRDF rendering! If you conservatively figure out that a light only influences geometry with a single reflectance model, you can reuse your old light rendering code! Secondly, you don't really need to have a 'default' shading model for the whole level. As long as you can quickly classify which BRDFs a light can potentially influence, then you're golden. Next, remember to flush your hi-stencil when changing the reference value or the comparison function, otherwise you’ll get false culling. Finally, we’ve only given performance figures for lights taking a up significant portion of the screen. When a light is small and rendered with multiple BRDFs, the cost will be dominated by hi-stencil juggling. It might be worthwhile to use dynamic branching in the light shader below a certain size threshold. Okay, that’s all for me, now Jose is going to tell you about lighting translucent geometry!
  • #24: Classic solutions: Forward rendering. Best quality solution, it calculates lighting for every pixel. Problems: Too expensive, especially if a lot of alpha layers are used. Shader permutation explosion if you want to support a lot of light types and combinations. Completely different than deferred rendering, we need to support two pipelines. We can use Forward+, but we are aiming to X360 and PS3. Calculated in CPU, one light probe (intensity, SH, etc) for each object. Problems: Only one light probe per object, it means same light configuration for all of the objects, a lot of issues with big ones. It is not easy to support shadow map casting lights. Our solution: GPU based. More than one light probe per object. Quality between the two classic solutions. It is just a lightmap for every object updated every frame. Lighting is calculated in object space. It can be used for objects and particle systems. It fits perfectly into a deferred engine pipeline.
  • #25: For each alpha object we will create a distribution of light probes on the surface. Artists will define an UV channel with an unwrapped version of the object (like lightmaps), during export we will create a texture (we call it object space map, the size will depend of the surface area of the object). Every pixel in the object space map will represent a local space position on the surface of the object.
  • #26: We convert every probe from the object space map to world space using the world matrix of the object. Render lights: We render a pass with a very similar shader that in deferred rendering. The input is a texture with world space light probe positions (calculated from the object space map) and the output will be a lightmap with the light that the light probes receive. It can reuse a lot of functions from deferred rendering code, like shadowmapping. Render object in alpha pass using lightmap. We use the UV channel for the object space map to access the lightmap.
  • #27: For each particle system we need a set of light probes distributed around it. As the particles are camera oriented, we are going to use a camera oriented quad fitted around and centered in the particle system. It is not a perfect representation, but it is really fast and it is simple, and it works in practice. If the particle system intersects the camera frustum we can just fit our quad, so we can improve the quality when the particle system fills the screen.
  • #28: For recovering the lighting information we just use a 2D matrix that converts from clip space coordinates (our quad is screen space orientated) to lightmap texture space.
  • #29: The two solution have a lot in common. For performance reasons we pack all the world space position maps to one single texture, so we can calculate the lighting of all the objects at the same time. Two GPU textures: Input: World space position texture, similar to the gbuffer in deferred rendering. Output: Accumulated light. Every object that needs calculate lighting will allocate a region inside the textures and fill it with the positions of the light probes. The size of the region can depend on the screen space size of the object to improve performance and scalability. For improving performance, we check on CPU every light against every object, so we only apply the light shader to the regions that are inside the light.
  • #30: Deferred rendering engine. Fill gbuffer Render lights Render alpha
  • #31: Added two extra steps in our deferred engine.
  • #32: Having light direction information will allow bump mapping, occlusion and scattering effects.
  • #33: For performance reasons, we can disable 3D volume slices when the particle system is far from the camera.
  • #34: Thanks to Howard Rayner, our technical artist and vfx magician for preparing these demos!