SlideShare a Scribd company logo
Advancements in Tiled-Based
Compute Rendering
Gareth Thomas
Developer Technology Engineer, AMD
Agenda
●Current Tech
●Culling Improvements
●Clustered Rendering
●Summary
Proven Tech – Out in the Wild
●Tiled Deferred [Andersson09]
●Frostbite
●UE4
●Ryse
●Forward+ [Harada et al 12]
●DiRT & GRID Series
●The Order: 1886
●Ryse
Tiled Rendering 101
1
2
3
[1] [1,2,3] [2,3]
Tiled Rendering 101
● Divide screen
into tiles
● Fit asymmetric
frustum around
each tile
Tile0 Tile1 Tile3Tile2
Tiled Rendering 101
● Use z buffer from
depth pre-pass
as input
● Find min and max
depth per tile
● Use this frustum for
intersection testing
Tiled Rendering 101
•Position
•Radius
Light0
•Position
•Radius
Light1
•Position
•Radius
Light2
•Position
•Radius
Light3
•Position
•Radius
Light4
…
•Position
•Radius
Light10
Index1 •1
Tiled Rendering 101
•Position
•Radius
Light0
•Position
•Radius
Light1
•Position
•Radius
Light2
•Position
•Radius
Light3
•Position
•Radius
Light4
…
•Position
•Radius
Light10
•4Index2
•Lights=2Index0
Index3 •Empty
Index4 •Empty
…
1
4
Targets for Improvement
●Z Prepass (on Forward+)
●Depth bounds
●Light Culling
●Color Pass
Depth Bounds
● Determine min and max
bounds of the depth buffer
on a per tile basis
● Atomic Min Max [Andersson09]
// read one depth sample per thread
// reinterpret as uint
// atomic min & max
// reinterpret back to float
Parallel Reduction
●Atomics are useful but not efficient
●Compute-friendly algorithm
●Great material already available:
●“Optimizing Parallel Reduction in CUDA” [Harris07]
●“Compute Shader Optimizations for AMD GPUs: Parallel Reduction” [Engel14]
59 86 95 53 97 18 28 46
57 16 25 43
depth[tid] = min(depth[tid],depth[tid+8])
25 13
depth[tid] = min(depth[tid],depth[tid+4])
13
depth[tid] = min(depth[tid],depth[tid+2])
1
depth[tid] = min(depth[tid],depth[tid+1])
Implementation details
●First pass reads 4 depth samples
●Needs to be separate pass
●Write bounds to UAV
●Maybe useful for other things too
Advancements in-tiled-rendering
Parallel Reduction - Performance
Atomic
Min/Max
Parallel
Reduction
AMD R9 290X 1.8ms 1.60ms
NVIDIA GTX 980 1.8ms 1.54ms
● Combined cost of depth bounds and light culling of 2048 lights at 3840x2160
● Parallel reduction pass takes ~0.35ms
● Faster than Atomic Min/Max on the GPUs tested
Light Culling:
The Intersection Test
Sphere-Frustum Test
Sphere-Frustum Test
AABB around Frustum
Frustum planes
AABB around
long frustum
AABB around
short frustum
Arvo Intersection Test [Arvo90]
Single Point Light
Frustum/Sphere Test
Arvo AABB/Sphere Test
Culling Spot Lights
●Don’t put bounding
sphere around spot light
origin
●Tightly bound spot light
inside sphere at P with
radius r
spot position
P
r
θ
r
d
Depth Discontinuities
Depth Discontinuities
False Positives
Scene Geometry
2.5D Culling [Harada et al 12]
Scene Geometry
Geometry Mask
1 1 1 1
1 1 1
Light Mask
HalfZ
Scene Geometry
HalfZ
MinZ
MaxZ HalfZ low bits
HalfZ high bits
numLights near side
numLights far side
light indices…
3
4
lo
hi
16 bit light index buffer
size: maxLightsPerTile x 2 + 4
Modified HalfZ
HalfZ
MinZ
MaxZ
MinZ2
MaxZ2
●Calculate Min & Max Z as normal
●Calculate HalfZ
●Second set of Min and Max values using
HalfZ and max & min respectively
●Test against near bounds and far bounds
●Write to either one list
●Or write to two lists cf. HalfZ
●Doubles the work in the depth bounds pass
●Worst case converges on HalfZ
Sponza Atrium + 1 million sub pixel triangles
Advancements in-tiled-rendering
MinMax depth bounds, Frustum culling
MinMax depth bounds, AABB culling
MinMax depth bounds, Hybrid culling (AABB + Frustum sides)
Modified HalfZ depth bounds, AABB culling
Unreal Engine 4, Infiltrator Demo
Modified HalfZ in one light list
MinMax Depth Bounds
Advancements in-tiled-rendering
Advancements in-tiled-rendering
Advancements in-tiled-rendering
What happens if we cull 32x32 tiles?
Still using 16x16 thread groups
Advancements in-tiled-rendering
Culling Conclusion
●Modified HalfZ with AABBs generally works best
●Even though generating MinZ2 and MaxZ2 adds a little cost
●Even though culling each light against two AABBs instead of one
●32x32 tiles saves a good chunk of time in the culling stage
●…at the cost of color pass efficiency when pushing larger number of lights
Clustered Rendering [Olsson et al12]
●Production proven in Forza Horizon 2
●Additional benefits on top of 2D
culling:
●No mandatory Z prepass
●Just works™ for transparencies and
volumetric effects
●Can a further reduction in lights per
pixel improve performance?
Clustered Rendering 101
● Divide screen
into tiles
● Fit asymmetric
frustum around
each tile
Tile0 Tile1 Tile3Tile2
● Divide down Z
axis into n
slices or
clusters
Clustered Rendering
●Divide up Z axis
exponentially
●Start at some sensible
near slice
●Cap at some sensible
value
Provision for far lights
● Fade them out
● Drop back to glares
● Prebake
Light Culling
●View space AABBs worked best on
2D grid
●Bad when running say 16 slices
●View space frustum planes are
better
●Calculate per tile planes
●Then test each slice near and far
●Optionally, then test AABBs
VRAM Usage
●16x16 pixel 2D grid requires numTilesX x numTilesY x
maxLights
●1080p: 120x68x512xuint16 = 8MB
●4k: 240x135x512xuint16 = 32MB
●List for each light type (points & spots): 64MB
●So 32 slices: 1GB for point lights only 
●Either use coarser grid
●Or use a compacted list
Compacted List
●Option 1:
●Do all culling on CPU [Olsson et al12] [Persson13][Dufresne14]
●But some of the lights may be spawned by the GPU
●My CPU is a precious resource!
● Option 2:
●Cull on GPU
●Keep track of how many lights per slice in TGSM
●Write table of offsets in light list header
●Only need maxLights x “safety factor” per tile
Coarse Grid
●Example:
●4k resolution
●64x64 pixel tiles with 64 slices
●maxLights = 512
●60 x 34 tiles x 64 slices x 512 x
uint16 = 128MB
Advancements in-tiled-rendering
Advancements in-tiled-rendering
Advancements in-tiled-rendering
Z Prepass
●Very scene dependant
●Often considered too expensive
●DirectX12 can help draw submission cost
●Should already have a super optimized depth only path for
shadows!
● Position only streams
● Index buffer to batch materials together
●A partial prepass can really help lighten the geometry load
Conclusions
●Parallel Reduction - faster than atomic min/max
●AABB-Sphere test in conjunction with Modified HalfZ is a
good choice
●Clustered shading
●Potentially a big saving on the tile culling
●Less overhead for low light numbers
●Offers other benefits over 2D tiling
●Aggressive culling is very worthwhile
●The best optimisation for your expensive color scene
References
●[Andersson09] Johan Andersson, “Parallel Graphics in Frostbite – Current & Future”, Beyond
Programmable Shading, SIGGRAPH 2009
●[Harada et al12] Takahiro Harada, Jay McKee, Jason C Yang, “Forward+: Bringing Deferred
Lighting to the Next Level”, Eurographics 2012
●[Harris07] Mark Harris, “Optimizing Parallel Reduction in CUDA”, NVIDIA 2007
●[Engel14] Wolfgang Engel, “Compute Shader Optimizations for AMD GPUs: Parallel Reduction”,
Confetti 2014
●[Harada12] Takahiro Harada, “A 2.5D Culling for Forward+”, Technical Briefs, SIGGRAPH Asia
2012
●[Arvo90] Jim Arvo, “A simple method for box-sphere intersection testing”, Graphics Gems 1990
●[Dufresne14] Marc Fauconneau Dufresne, “Forward Clustered Shading”, Intel 2014
●[Persson13] Emil Persson, “Practical Clustered Shading”, Avalanche 2013
●[Olsson et al12] Ola Olsson, Markus Billeter, Ulf Assarsson, “Clustered Deferred and Forward
Shading”, HPG 2012
●[Schulz14] Nicolas Schulz, “Moving to the Next Generation – The Rendering Technology of
Ryse”, GDC 2014
Thanks
●Jason Stewart, AMD
●Epic Rendering Team
●Emil Persson, Avalanche Studios
Questions?
gareth.thomas@amd.com

More Related Content

PPT
A Bit More Deferred Cry Engine3
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
PPTX
Moving Frostbite to Physically Based Rendering
PDF
Advanced Scenegraph Rendering Pipeline
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PDF
Rendering Tech of Space Marine
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Frostbite on Mobile
A Bit More Deferred Cry Engine3
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Moving Frostbite to Physically Based Rendering
Advanced Scenegraph Rendering Pipeline
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Rendering Tech of Space Marine
Optimizing the Graphics Pipeline with Compute, GDC 2016
Frostbite on Mobile

What's hot (20)

PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PPSX
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
PPTX
Shiny PC Graphics in Battlefield 3
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PDF
OpenGL 4.4 - Scene Rendering Techniques
PDF
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
PPT
Crysis Next-Gen Effects (GDC 2008)
PPTX
Lighting the City of Glass
PDF
Hill Stephen Rendering Tools Splinter Cell Conviction
PDF
Lighting of Killzone: Shadow Fall
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPTX
Terrain in Battlefield 3: A Modern, Complete and Scalable System
PDF
Dissecting the Rendering of The Surge
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPT
Secrets of CryENGINE 3 Graphics Technology
PDF
Screen Space Reflections in The Surge
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PPTX
Stochastic Screen-Space Reflections
PPTX
DirectX 11 Rendering in Battlefield 3
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Shiny PC Graphics in Battlefield 3
Rendering Technologies from Crysis 3 (GDC 2013)
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
OpenGL 4.4 - Scene Rendering Techniques
An introduction to Realistic Ocean Rendering through FFT - Fabio Suriano - Co...
Crysis Next-Gen Effects (GDC 2008)
Lighting the City of Glass
Hill Stephen Rendering Tools Splinter Cell Conviction
Lighting of Killzone: Shadow Fall
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain in Battlefield 3: A Modern, Complete and Scalable System
Dissecting the Rendering of The Surge
Physically Based and Unified Volumetric Rendering in Frostbite
Secrets of CryENGINE 3 Graphics Technology
Screen Space Reflections in The Surge
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
Stochastic Screen-Space Reflections
DirectX 11 Rendering in Battlefield 3
Ad

Viewers also liked (10)

PPSX
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
PDF
Siggraph2016 - The Devil is in the Details: idTech 666
PPT
GDC 2012: Advanced Procedural Rendering in DX11
PPT
Bending the Graphics Pipeline
PPTX
Parallel Futures of a Game Engine (v2.0)
PDF
CG 論文講読会 2013/5/20 "Clustered deferred and forward shading"
PPTX
Parallel Futures of a Game Engine
PDF
Forward+ (EUROGRAPHICS 2012)
PDF
How to Lead Customer Value Creation by Dan Olsen at Leading the Product Melbo...
PDF
Unite2014: Mastering Physically Based Shading in Unity 5
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Siggraph2016 - The Devil is in the Details: idTech 666
GDC 2012: Advanced Procedural Rendering in DX11
Bending the Graphics Pipeline
Parallel Futures of a Game Engine (v2.0)
CG 論文講読会 2013/5/20 "Clustered deferred and forward shading"
Parallel Futures of a Game Engine
Forward+ (EUROGRAPHICS 2012)
How to Lead Customer Value Creation by Dan Olsen at Leading the Product Melbo...
Unite2014: Mastering Physically Based Shading in Unity 5
Ad

Similar to Advancements in-tiled-rendering (20)

PDF
Offscreenparticle
PDF
Deferred shading
PPTX
Masked Occlusion Culling
PDF
Unity: Next Level Rendering Quality
PPT
Destruction Masking in Frostbite 2 using Volume Distance Fields
PDF
Foveated Ray Tracing for VR on Multiple GPUs
PDF
NVIDIA effects GDC09
PDF
GPU Accelerated Domain Decomposition
PPTX
Massive Point Light Soft Shadows
PDF
High-Performance GPU Programming for Deep Learning
PPTX
FlameWorks GTC 2014
PPTX
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
PDF
Shaders in Unity by Zoel
PDF
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
PDF
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
PPT
Paris Master Class 2011 - 07 Dynamic Global Illumination
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
PPTX
Deferred shading
PDF
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor
Offscreenparticle
Deferred shading
Masked Occlusion Culling
Unity: Next Level Rendering Quality
Destruction Masking in Frostbite 2 using Volume Distance Fields
Foveated Ray Tracing for VR on Multiple GPUs
NVIDIA effects GDC09
GPU Accelerated Domain Decomposition
Massive Point Light Soft Shadows
High-Performance GPU Programming for Deep Learning
FlameWorks GTC 2014
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Shaders in Unity by Zoel
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
Data Processing Using THEOS Satellite Imagery for Disaster Monitoring (Case S...
Paris Master Class 2011 - 07 Dynamic Global Illumination
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
Deferred shading
GS-4152, AMD’s Radeon R9-290X, One Big dGPU, by Michael Mantor

More from mistercteam (17)

PPTX
Preliminary xsx die_fact_finding
PDF
20150207 howes-gpgpu8-dark secrets
PDF
S0333 gtc2012-gmac-programming-cuda
PDF
201210 howes-hsa and-the_modern_gpu
PPTX
3 673 (1)
PDF
3 boyd direct3_d12 (1)
PDF
5 baker oxide (1)
PDF
The technology behind_the_elemental_demo_16x9-1248544805
PDF
Lecture14
PDF
01 intro-bps-2011
PDF
Gdce 2010 dx11
PDF
Hpg2011 papers kazakov
PPSX
Dx11 performancereloaded
PDF
Mantle programming-guide-and-api-reference
PPSX
D3 d12 a-new-meaning-for-efficiency-and-performance
PPSX
D3 d12 a-new-meaning-for-efficiency-and-performance
PPSX
Getting the-best-out-of-d3 d12
Preliminary xsx die_fact_finding
20150207 howes-gpgpu8-dark secrets
S0333 gtc2012-gmac-programming-cuda
201210 howes-hsa and-the_modern_gpu
3 673 (1)
3 boyd direct3_d12 (1)
5 baker oxide (1)
The technology behind_the_elemental_demo_16x9-1248544805
Lecture14
01 intro-bps-2011
Gdce 2010 dx11
Hpg2011 papers kazakov
Dx11 performancereloaded
Mantle programming-guide-and-api-reference
D3 d12 a-new-meaning-for-efficiency-and-performance
D3 d12 a-new-meaning-for-efficiency-and-performance
Getting the-best-out-of-d3 d12

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Advanced IT Governance
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Monthly Chronicles - July 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced IT Governance
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced Soft Computing BINUS July 2025.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
NewMind AI Weekly Chronicles - August'25 Week I
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Monthly Chronicles - July 2025

Advancements in-tiled-rendering

Editor's Notes

  • #15: Max Z might also be useful for transparent light list or tiled-based particle rendering
  • #19: Simple test, prone to false positives
  • #21: Alternative approach – use AABBs. Still prone to false positives.
  • #22: AABB case is much better when depth bounds are small, but bad with large depth discontinuities. AABB suggested by Brian Karis from Epic
  • #23: Martin Mittring from Epic initially implemented this method
  • #30: Doesn’t require any changes to colour pixel shaders. Just trims light lists. Not perfect – consider purple light.
  • #31: HalfZ requires a code change at in the colour pixel shaders to determine which light list to read
  • #32: UE4 uses just one list, so like 2.5D, requires no extra work in colour pass. Probably a good idea if the number of lights per tile is low.
  • #36: Better results for small depth ranges. Long frusta generate large AABBs
  • #40: UE4 Infiltrator Demo
  • #49: Add diagrams
  • #53: Show diagram of layout. Mention that some lights might overlap slices, hence the safety factor Mention that TGSM needs to be kept under control or waves in flight will be reduced.
  • #61: Plug Jason’s GPU Pro article