SlideShare a Scribd company logo
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite 2019
Developing and Optimizing
a Procedural Game:
The Elder Scrolls: Blades
Simon-Pierre Thibault
Sergei Savchenko
Bethesda Game Studios
The Elder Scrolls: Blades
• Advanced visuals on mobile
platforms
• Procedural dungeons
• To achieve this:
• Built custom lighting solutions
• Significantly optimized for
performance and memory
Level Building
Challenges
Level Example
• Made of standalone building blocks:
Rooms
• Assembled by our dungeon generator at
runtime
• Rooms have different shapes and sizes
• Rooms are universal for each theme
• Rooms as streamed during gameplay
to support larger levels.
Room
Prefab that contains all the data required to
use that piece in a level
• Art Content
• Level design data
• Gameplay data
• Lighting
Level Lighting
High visual quality is an important pillar
for this project
• Based on the built-in render pipeline
• Uses a modified version of the Unity
Standard shader
• Mix of real-time and baked lighting
Lightmaps in a
Procedural Context
Lightmap Seams
• Lighting at connections differ from
room to room
• Our global illumination is not really
global…
• Rooms are attached at runtime, so we
can’t predict that at bake time
• Solutions?
Lightmap Blending
Concept: Sample the lightmap on the other side of a connection to
blend with it
• Assign the secondary lightmap to the material
• For each vertex
• Assign a blend factor based on the distance to the connection (vertex color
alpha)
• Assign a extra uv to sample the lightmap texture on the other side (3rd uv set)
• Edit the shader to sample the secondary lightmap
Prototype Results
Connection
Extension Mesh
• Generate lightmap information beyond
the connection
• Never rendered in-game
• Used at runtime to find a proper color
for blending
• Will avoid stretching the color at the
connection
Connection Extension Results
In Game Final Results
Limitations and Drawbacks
• Extra texture fetch per pixel rendered
• Large runtime cost to calculate secondary lightmap UVs for
each vertex near the connection
• Incompatible with Unity’s static batching
Light Probes in a
Procedural Context
Light Probes
• Contain directional lighting information
for a point in space
• Is a built-in feature of Unity
• Allows dynamic objects to sample
baked lighting
• Doesn’t work with our procedural
pipeline
• Light probes data is saved in scenes
• Light probes cannot me moved
Light Probes Pipeline
Custom runtime
solution
Light probes
generation
(Editor)
Unity Renderer Shader
Custom Light Probes Runtime System
• Write probes data to an asset we attach to each room
• Load light probes data for each room
• Attach probes for each room into a global probes network
• Sample the network for each dynamic renderer based on world
position
• Send the interpolated light probe information to the shader
Light Probes data
• The editor generates a
UnityEngine.Rendering.SphericalHarmonicsL2 instance for
each light probe.
• The shader expects data in a different format.
• This is explained in the Unity documentation:
• “The Unity shader code for reconstruction is found in UnityCG.cginc
and is using the method from Appendix A10 Shader/CPU code for
Irradiance Environment Maps from Peter-Pikes paper.”
• https://docs.unity3d.com/Manual/LightProbes-
TechnicalInformation.html
Writing the shader
properties
Use MaterialPropertyBlocks to fill in shader
properties without going through a
material.
https://docs.unity3d.com/ScriptReference/
MaterialPropertyBlock.html
Use the [PerRendererData] attribute on your
shader properties
https://docs.unity3d.com/Manual/SL-
Properties.html
LightProbeUsage.CustomProvided was
added in 2018.1
Considerations
• Sampling the network for a large amount of renderers can be
expensive
• Only sample once for static objects
• Only update dynamic objects if they have moved
• Because we place probe on a grid, we sample the network in
constant time.
• Unity handles ambient color through the same shader
properties
• Ambient color needs to be added to the probe's values
Performance…
Blade’s Frame:
CPU side Rendering Main Performance Drivers:
• # of draw calls
• # of rendering passes
• Efficiency of batching
• Efficiency of the Graphics API
Reducing the number of draw calls
• Dynamic visibility culling
• Portal based strategy for
dungeons
• Distance based strategy for
forests
• Dynamic occlusion culling for
the town
• Buildings as occluders
Occlusion Culling
Faster Graphics APIs?
vs.
OpenGL vs Vulkan:
Chipset CPU GPU 32/ogl 64/vkn
Loading
Sim
Time
Warm
up
Issue Loading
Sim
Time
Warmu
p Issue
Exynos 8895 Octa
Octa-core (4x2.3 GHz Mongoose M2 &
4x1.7 GHz Cortex-A53)
Mali-G71
MP20 25.9 0.041 no 28.1 0.048 no
Qualcomm MSM8998
Snapdragon 835
Octa-core (4x2.35 GHz Kryo & 4x1.9 GHz
Kryo) Adreno 540 29.3 0.0352 yes 21.3 0.0357 no
Qualcomm SDM845
Snapdragon 845
Octa-core (4x2.8 GHz Kryo 385 Gold &
4x1.7 GHz Kryo 385 Silver) Adreno 630 21.8 0.0347 yes 17.6 0.0349 no
Qualcomm MSM8998
Snapdragon 835
Octa-core (4x2.35 GHz Kryo & 4x1.9 GHz
Kryo) Adreno 540 29.5 0.0354 yes 20.4 0.0348 no
Hisilicon Kirin 970
Octa-core (4x2.4 GHz Cortex-A73 & 4x1.8
GHz Cortex-A53)
Mali-G72
MP12 20.8 0.092 no 23.6 0.14 no
Dynamic Graphics API Selection
CustomPlayerActivity
onCreate this.getIntent().putExtra("unity", "-force-gles");
E.g.: If some
Mali based
device
yes
To Summarize:
• Vulkan is definitely viable and fast
• Your game’s performance needs to be tested on Adrenos and
Malis
• Use dynamic graphics API selection to try to get gains on
both sides
Threads and Cores:
Core 0 Core 1
Core 2 Core 3
Core 4 Core 5
Core 6 Core 7
WorkersWorkersWorkersWorkers
UnityMain
UnityGFXDeviceW
?
Thread Affinity
WorkersWorkersWorkersWorkers UnityMain UnityGFXDeviceW
?
Default Affinity Model for 8 cores:
Core 0 Core 1
Core 2 Core 3
Core 4 Core 5
Core 6 Core 7
WorkersWorkersWorkersWorkers
UnityMain
UnityGFXDeviceW
Customized Affinity Model for 8 cores:
Core 0 Core 1
Core 2 Core 3
Core 4 Core 5
Core 6 Core 7
WorkersWorkers
UnityMain UnityGFXDeviceW
WorkersWorkers
With Affinities Adjusted:
?
To Summarize:
• On Android Devices Unity prefers big cores, this may or may not be beneficial in
all scenarios
• Affinities can be adjusted on initialization to assign UnityMain and Rendering
threads to their own cores and to redistribute workers
• One should not expect a major performance gain from this (but the cost of this
change is also low)
Memory…
• Xcode reports 1.18 G is
used…
• Unity simple mem profiler
says: 451.4M used…
• Unity Details says ~569M
used…
• Instruments say 786.86M
resident with ~650M dirty…
Memory Types
Device Virtual Memory
External
Device Resident Memory
Compressed
Internal
Clean, Unloaded
Native memory Managed Heap
The Game
Reusable
iOS Memory Tracking
• Use task_info call to get accurate current memory summary:
iOS
Memory
Tracking:
Device Memory
Managed Heap
Memory Compression
To Summarize
• Different tools account for subset of types when reporting (Xcode widget is
the most relevant)
• iOS devices have complex memory management system with many memory
types
• Current memory use can be fetched from the OS
Boehm
GC Heap
Growth:
Heap Expansions
Unused Space
Heap Growth Control:
• Various statics from Boehm-Demers GC
implementation can be externed and
accessed
• Specifically GC_free_space_divisor from
alloc.c
• Divisor value controls managed heap’s
growth increment
• Default value: 3
• Custom value e.g: 16
Low Level Memory API
• Memory Profiler API
permits capturing
snapshots for both
native and managed
memory
Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite 2019
To Summarize
• Boehm GC can be tweaked in the run time so that heap
expands in smaller increments
• C# objects may cost significantly in terms of memory
• Low level memory API enables building custom tools for
memory tracking and leak detection
Questions?
Simon-Pierre Thibault
Sergei Savchenko
Bethesda Game Studios

More Related Content

PPTX
Lighting the City of Glass
PDF
「原神」におけるコンソールプラットフォーム開発
PDF
Graphics Gems from CryENGINE 3 (Siggraph 2013)
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PPT
Secrets of CryENGINE 3 Graphics Technology
PPTX
Moving Frostbite to Physically Based Rendering
PDF
Advanced Scenegraph Rendering Pipeline
PPTX
The Rendering Pipeline - Challenges & Next Steps
Lighting the City of Glass
「原神」におけるコンソールプラットフォーム開発
Graphics Gems from CryENGINE 3 (Siggraph 2013)
Rendering Technologies from Crysis 3 (GDC 2013)
Secrets of CryENGINE 3 Graphics Technology
Moving Frostbite to Physically Based Rendering
Advanced Scenegraph Rendering Pipeline
The Rendering Pipeline - Challenges & Next Steps

What's hot (20)

PDF
Physically Based Lighting in Unreal Engine 4
PPT
Crysis Next-Gen Effects (GDC 2008)
PPT
Light prepass
PPTX
LOD and Culling Systems That Scale - Unite LA
PPTX
Parallel Futures of a Game Engine (v2.0)
PDF
Mask Material only in Early Z-passの効果と仕組み
PPTX
A Bizarre Way to do Real-Time Lighting
PDF
Unreal Summit 2016 Seoul Lighting the Planetary World of Project A1
PPTX
Stochastic Screen-Space Reflections
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
PPT
The Unique Lighting of Mirror's Edge
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
PDF
Screen Space Reflections in The Surge
PPTX
Game Development Step by Step
PDF
Lighting Shading by John Hable
PPTX
The Rendering Technology of Killzone 2
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Physically Based Lighting in Unreal Engine 4
Crysis Next-Gen Effects (GDC 2008)
Light prepass
LOD and Culling Systems That Scale - Unite LA
Parallel Futures of a Game Engine (v2.0)
Mask Material only in Early Z-passの効果と仕組み
A Bizarre Way to do Real-Time Lighting
Unreal Summit 2016 Seoul Lighting the Planetary World of Project A1
Stochastic Screen-Space Reflections
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
The Unique Lighting of Mirror's Edge
Optimizing the Graphics Pipeline with Compute, GDC 2016
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Screen Space Reflections in The Surge
Game Development Step by Step
Lighting Shading by John Hable
The Rendering Technology of Killzone 2
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Ad

Similar to Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite 2019 (20)

PPTX
Unity - Internals: memory and performance
PDF
Developing applications and games in Unity engine - Matej Jariabka, Rudolf Ka...
PDF
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法
PDF
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法
PPTX
High resolution animated scenes from stills
PDF
De Re PlayStation Vita
PPTX
Game engines and Their Influence in Game Design
PPTX
Developing Next-Generation Games with Stage3D (Molehill)
PDF
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
PPTX
The next generation of GPU APIs for Game Engines
PPTX
Шлигін Олександр “Розробка ігор в Unity загальні помилки” GameDev Conference ...
PDF
Cocos2d programming
PDF
Smedberg niklas bringing_aaa_graphics
PDF
OpenGL ES and Mobile GPU
PDF
Solving Visibility and Streaming in The Witcher 3: Wild Hunt with Umbra 3
PPTX
Gpu with cuda architecture
PDF
Cocos2d game programming 2
PDF
1-Introduction (Game Design and Development)
PPTX
Making a game with Molehill: Zombie Tycoon
PPTX
GPU Algorithms and trends 2018
Unity - Internals: memory and performance
Developing applications and games in Unity engine - Matej Jariabka, Rudolf Ka...
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法
High resolution animated scenes from stills
De Re PlayStation Vita
Game engines and Their Influence in Game Design
Developing Next-Generation Games with Stage3D (Molehill)
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile Games
The next generation of GPU APIs for Game Engines
Шлигін Олександр “Розробка ігор в Unity загальні помилки” GameDev Conference ...
Cocos2d programming
Smedberg niklas bringing_aaa_graphics
OpenGL ES and Mobile GPU
Solving Visibility and Streaming in The Witcher 3: Wild Hunt with Umbra 3
Gpu with cuda architecture
Cocos2d game programming 2
1-Introduction (Game Design and Development)
Making a game with Molehill: Zombie Tycoon
GPU Algorithms and trends 2018
Ad

More from Unity Technologies (20)

PDF
Build Immersive Worlds in Virtual Reality
PDF
Augmenting reality: Bring digital objects into the real world
PDF
Let’s get real: An introduction to AR, VR, MR, XR and more
PDF
Using synthetic data for computer vision model training
PDF
The Tipping Point: How Virtual Experiences Are Transforming Global Industries
PDF
Unity Roadmap 2020: Live games
PDF
Unity Roadmap 2020: Core Engine & Creator Tools
PDF
How ABB shapes the future of industry with Microsoft HoloLens and Unity - Uni...
PPTX
Unity XR platform has a new architecture – Unite Copenhagen 2019
PDF
Turn Revit Models into real-time 3D experiences
PDF
How Daimler uses mobile mixed realities for training and sales - Unite Copenh...
PDF
How Volvo embraced real-time 3D and shook up the auto industry- Unite Copenha...
PDF
QA your code: The new Unity Test Framework – Unite Copenhagen 2019
PDF
Engineering.com webinar: Real-time 3D and digital twins: The power of a virtu...
PDF
Supplying scalable VR training applications with Innoactive - Unite Copenhage...
PDF
XR and real-time 3D in automotive digital marketing strategies | Visionaries ...
PDF
Real-time CG animation in Unity: unpacking the Sherman project - Unite Copenh...
PDF
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...
PDF
What's ahead for film and animation with Unity 2020 - Unite Copenhagen 2019
PDF
How to Improve Visual Rendering Quality in VR - Unite Copenhagen 2019
Build Immersive Worlds in Virtual Reality
Augmenting reality: Bring digital objects into the real world
Let’s get real: An introduction to AR, VR, MR, XR and more
Using synthetic data for computer vision model training
The Tipping Point: How Virtual Experiences Are Transforming Global Industries
Unity Roadmap 2020: Live games
Unity Roadmap 2020: Core Engine & Creator Tools
How ABB shapes the future of industry with Microsoft HoloLens and Unity - Uni...
Unity XR platform has a new architecture – Unite Copenhagen 2019
Turn Revit Models into real-time 3D experiences
How Daimler uses mobile mixed realities for training and sales - Unite Copenh...
How Volvo embraced real-time 3D and shook up the auto industry- Unite Copenha...
QA your code: The new Unity Test Framework – Unite Copenhagen 2019
Engineering.com webinar: Real-time 3D and digital twins: The power of a virtu...
Supplying scalable VR training applications with Innoactive - Unite Copenhage...
XR and real-time 3D in automotive digital marketing strategies | Visionaries ...
Real-time CG animation in Unity: unpacking the Sherman project - Unite Copenh...
Creating next-gen VR and MR experiences using Varjo VR-1 and XR-1 - Unite Cop...
What's ahead for film and animation with Unity 2020 - Unite Copenhagen 2019
How to Improve Visual Rendering Quality in VR - Unite Copenhagen 2019

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Modernizing your data center with Dell and AMD
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
Teaching material agriculture food technology
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Modernizing your data center with Dell and AMD
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
NewMind AI Monthly Chronicles - July 2025
NewMind AI Weekly Chronicles - August'25 Week I

Developing and optimizing a procedural game: The Elder Scrolls Blades- Unite 2019

  • 2. Developing and Optimizing a Procedural Game: The Elder Scrolls: Blades Simon-Pierre Thibault Sergei Savchenko Bethesda Game Studios
  • 3. The Elder Scrolls: Blades • Advanced visuals on mobile platforms • Procedural dungeons • To achieve this: • Built custom lighting solutions • Significantly optimized for performance and memory
  • 5. Level Example • Made of standalone building blocks: Rooms • Assembled by our dungeon generator at runtime • Rooms have different shapes and sizes • Rooms are universal for each theme • Rooms as streamed during gameplay to support larger levels.
  • 6. Room Prefab that contains all the data required to use that piece in a level • Art Content • Level design data • Gameplay data • Lighting
  • 7. Level Lighting High visual quality is an important pillar for this project • Based on the built-in render pipeline • Uses a modified version of the Unity Standard shader • Mix of real-time and baked lighting
  • 9. Lightmap Seams • Lighting at connections differ from room to room • Our global illumination is not really global… • Rooms are attached at runtime, so we can’t predict that at bake time • Solutions?
  • 10. Lightmap Blending Concept: Sample the lightmap on the other side of a connection to blend with it • Assign the secondary lightmap to the material • For each vertex • Assign a blend factor based on the distance to the connection (vertex color alpha) • Assign a extra uv to sample the lightmap texture on the other side (3rd uv set) • Edit the shader to sample the secondary lightmap
  • 12. Connection Extension Mesh • Generate lightmap information beyond the connection • Never rendered in-game • Used at runtime to find a proper color for blending • Will avoid stretching the color at the connection
  • 14. In Game Final Results
  • 15. Limitations and Drawbacks • Extra texture fetch per pixel rendered • Large runtime cost to calculate secondary lightmap UVs for each vertex near the connection • Incompatible with Unity’s static batching
  • 16. Light Probes in a Procedural Context
  • 17. Light Probes • Contain directional lighting information for a point in space • Is a built-in feature of Unity • Allows dynamic objects to sample baked lighting • Doesn’t work with our procedural pipeline • Light probes data is saved in scenes • Light probes cannot me moved
  • 18. Light Probes Pipeline Custom runtime solution Light probes generation (Editor) Unity Renderer Shader
  • 19. Custom Light Probes Runtime System • Write probes data to an asset we attach to each room • Load light probes data for each room • Attach probes for each room into a global probes network • Sample the network for each dynamic renderer based on world position • Send the interpolated light probe information to the shader
  • 20. Light Probes data • The editor generates a UnityEngine.Rendering.SphericalHarmonicsL2 instance for each light probe. • The shader expects data in a different format. • This is explained in the Unity documentation: • “The Unity shader code for reconstruction is found in UnityCG.cginc and is using the method from Appendix A10 Shader/CPU code for Irradiance Environment Maps from Peter-Pikes paper.” • https://docs.unity3d.com/Manual/LightProbes- TechnicalInformation.html
  • 21. Writing the shader properties Use MaterialPropertyBlocks to fill in shader properties without going through a material. https://docs.unity3d.com/ScriptReference/ MaterialPropertyBlock.html Use the [PerRendererData] attribute on your shader properties https://docs.unity3d.com/Manual/SL- Properties.html LightProbeUsage.CustomProvided was added in 2018.1
  • 22. Considerations • Sampling the network for a large amount of renderers can be expensive • Only sample once for static objects • Only update dynamic objects if they have moved • Because we place probe on a grid, we sample the network in constant time. • Unity handles ambient color through the same shader properties • Ambient color needs to be added to the probe's values
  • 24. Blade’s Frame: CPU side Rendering Main Performance Drivers: • # of draw calls • # of rendering passes • Efficiency of batching • Efficiency of the Graphics API
  • 25. Reducing the number of draw calls • Dynamic visibility culling • Portal based strategy for dungeons • Distance based strategy for forests • Dynamic occlusion culling for the town • Buildings as occluders Occlusion Culling
  • 27. OpenGL vs Vulkan: Chipset CPU GPU 32/ogl 64/vkn Loading Sim Time Warm up Issue Loading Sim Time Warmu p Issue Exynos 8895 Octa Octa-core (4x2.3 GHz Mongoose M2 & 4x1.7 GHz Cortex-A53) Mali-G71 MP20 25.9 0.041 no 28.1 0.048 no Qualcomm MSM8998 Snapdragon 835 Octa-core (4x2.35 GHz Kryo & 4x1.9 GHz Kryo) Adreno 540 29.3 0.0352 yes 21.3 0.0357 no Qualcomm SDM845 Snapdragon 845 Octa-core (4x2.8 GHz Kryo 385 Gold & 4x1.7 GHz Kryo 385 Silver) Adreno 630 21.8 0.0347 yes 17.6 0.0349 no Qualcomm MSM8998 Snapdragon 835 Octa-core (4x2.35 GHz Kryo & 4x1.9 GHz Kryo) Adreno 540 29.5 0.0354 yes 20.4 0.0348 no Hisilicon Kirin 970 Octa-core (4x2.4 GHz Cortex-A73 & 4x1.8 GHz Cortex-A53) Mali-G72 MP12 20.8 0.092 no 23.6 0.14 no
  • 28. Dynamic Graphics API Selection CustomPlayerActivity onCreate this.getIntent().putExtra("unity", "-force-gles"); E.g.: If some Mali based device yes
  • 29. To Summarize: • Vulkan is definitely viable and fast • Your game’s performance needs to be tested on Adrenos and Malis • Use dynamic graphics API selection to try to get gains on both sides
  • 30. Threads and Cores: Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 WorkersWorkersWorkersWorkers UnityMain UnityGFXDeviceW ?
  • 32. Default Affinity Model for 8 cores: Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 WorkersWorkersWorkersWorkers UnityMain UnityGFXDeviceW
  • 33. Customized Affinity Model for 8 cores: Core 0 Core 1 Core 2 Core 3 Core 4 Core 5 Core 6 Core 7 WorkersWorkers UnityMain UnityGFXDeviceW WorkersWorkers
  • 35. To Summarize: • On Android Devices Unity prefers big cores, this may or may not be beneficial in all scenarios • Affinities can be adjusted on initialization to assign UnityMain and Rendering threads to their own cores and to redistribute workers • One should not expect a major performance gain from this (but the cost of this change is also low)
  • 37. • Xcode reports 1.18 G is used… • Unity simple mem profiler says: 451.4M used… • Unity Details says ~569M used… • Instruments say 786.86M resident with ~650M dirty…
  • 38. Memory Types Device Virtual Memory External Device Resident Memory Compressed Internal Clean, Unloaded Native memory Managed Heap The Game Reusable
  • 39. iOS Memory Tracking • Use task_info call to get accurate current memory summary:
  • 41. To Summarize • Different tools account for subset of types when reporting (Xcode widget is the most relevant) • iOS devices have complex memory management system with many memory types • Current memory use can be fetched from the OS
  • 43. Heap Growth Control: • Various statics from Boehm-Demers GC implementation can be externed and accessed • Specifically GC_free_space_divisor from alloc.c • Divisor value controls managed heap’s growth increment • Default value: 3 • Custom value e.g: 16
  • 44. Low Level Memory API • Memory Profiler API permits capturing snapshots for both native and managed memory
  • 46. To Summarize • Boehm GC can be tweaked in the run time so that heap expands in smaller increments • C# objects may cost significantly in terms of memory • Low level memory API enables building custom tools for memory tracking and leak detection

Editor's Notes

  • #3: Intros
  • #4: How difficult it is to build a procedural game with advanced visuals for mobile platforms? We will try to offer some insights Our experience is from working on The Elder Scrolls Blades – it reimagined TES for modern mobile platforms Specifically we will talk about: precomputed lighting in procedural context  some aspects of performance/optimization that we dealt with
  • #6: This is an example of what a level looks like from an editor point of view Made of standalone blocks called rooms The red lines represent connections between rooms Different shapes and size but they all fit together Any room can be attached to any other rooms for the same theme. All fit on a grid that acts as a framework for the level generator To support longer levels without using too much memory rooms are streamed in while the player progresses through the level
  • #7: Example of a single rooms Rooms are prefab that contain all the data we need to use that piece in a level
  • #8: Unity render pipeline Scriptable render pipeline was still experimental when we started the project Realtime lighting Looks great with the PBR shader, especially with high definition normal maps Allows dynamic shadow in key areas like this window Baked global illumination Both lightmaps and light probes Add depth to scenes with indirect lighting Can add lot of baked lights without affecting performance
  • #10: On this screenshot what you see is a harsh cut produced by lightmaps along the floor and wall were two different rooms connect. The problem here is that since lightmaps are generated separately for each room, we can get end up with dramatically different colors and brightness around room connection. Because of the way we bake global illumination per room, our global illumination isn’t really global. We also can’t predict which rooms will get connected together because they’re assembled randomly at runtime. Bake time solutions… We concluded that we needed a runtime solution to this
  • #11: That runtime solution is lightmap blending. The concept here is rather simple, what if we could blend the lightmap on each side of a connection to make the seam disappear? Then we could start with a 50-50 blend at the connection and the fade it out based on distance. Well you need a few things to achieve this. One, we need to add a secondary lightmap for blending to the materials on each side. Each side needs the other side’s lightmap And since a blend like this needs to be applied per pixel, we need to assign a blend factor to each vertex. And we need a uv set to sample the secondary lightmap texture. Once we have that we simply edit the shader to sample both lightmaps and blend them based on the blending factor. All of this is setup can be done at runtime when attaching two rooms together
  • #12: With this technique we only end up using the lightmap color at the seam and stretching it, so any local detail can get blow out of proportion which doesn’t look good. The problem is that we don’t have enough to blend that shadow correctly, we don’t know what that shadow in the blue room would have looked like if it had continued naturally. Well what if we could create that information?
  • #13: This is exactly what we did! We’ve created a mesh that is precisely the shape of the connection but stretched based on the blending distance. We make it so that it gets lightmapped by Unity at bake time, but we never actually render it in game. We only use it’s lightmap data. With this we have more accurate data for blending so we don’t need to stretch anything anymore.
  • #15: The harsh cut is gone and we don’t have any noticeable artifact. The light on the floor from that torch does fade pretty quickly at the connection. It’s not 100% accurate, but still an acceptable compromise.
  • #16: Because of the blending we do have an additional texture fetch per pixel, but we only do that for static renderers that are near a connection. So it’s not very significant. The worst performance problem comes from computing the uv set for the secondary lightmap. It involves finding the closest point on the extension mesh for each vertex of the blended renderers. This can be quite heavy calculations but what we did is run all of that asynchronously in a background thread. When streaming a level, the new rooms are spawned outside of the view of the player so we don’t need to apply lightmap blending immediately. And because we modify meshes at runtime by adding an extra uv set this doesn’t work with Unity’s static batching. You’ll have to either remove the static batching option on meshes that are blended or disable the feature globally in your project settings.
  • #17: So this takes care of our biggest lightmap problem, but I mentioned earlier that we also use light probes as part of our lighting pipeline.
  • #18: Light probes are object that contain directional lighting information for a specific point in space. They’re a built-in feature in Unity. If you setup a LightProbeGroup component in your scene, Unity will generate light probe information when you bake the lighting for that scene. At runtime they allow the engine to sample baked lighting information for dynamic objects. They’re a complementary feature to lightmaps. Lightmap are for applying baked lightins to static object and light probes are for applying baked lighting to dynamic objects. The problem with light probes is that Unity’s implementation doesn’t support our procedural pipeline. The first reason is that light probe data is stored in scenes only, but we use prefabs for our rooms. The second and main reason is that probe positions are set at bake time and cannot be moved. A specific room’s position in world space is not fixed in our case. We have to be able to spawn that room wherever our level generator tell us to.
  • #19: At a high level this is how the light probes pipeline is separated in Unity. We have the generation part that runs in the editor when baking lighting. There’s the renderer that samples light probes at runtime for each dynamic object. It finds the probes that surround the object’s position and interpolate their values and then there’s the shader that uses that information to render the object. The only part of this that’s not compatible with our procedural pipeline is the renderer. So what if we just replaced with our own custom implementation?
  • #20: So how would do that? We write the light probes data generated by Unity to a binary file that we attach to each room prefab. When a new room is attached at runtime we deserialize the probes data and attach the probe to a global probes network for the level. Then every frame we sample the probes network to find probe information for each dynamic object based on their position. Then we send over that data to the shader.
  • #21: How do we manage that data? Well once lighting an been baked the LightPorbes class contains a SphericalHarmonicsL2 object for each probe in your scene. Under the hood this is just an array of floats so it’s easy to serialize. The main problem we faced here is that the shader expects a different format and we didn’t know how to translate a SphericalHarmonicsL2 object into the format the shader expects. Well it turns out there’s a very important line about this in the Unity documentation. It says: “The Unity shader code for reconstruction is found in UnityCG.cginc and is using the method from Appendix A10 Shader/CPU code for Irradiance Environment Maps from Peter-Pikes paper.” That links to a paper called Stupid Spherical Harmonics tricks by Peter-Pike Sloan. Appendix 10 of that paper contains the precise formulas that you need to feed a SphericalHarmonicsL2 object to Unty’s Standard shader.
  • #22: But how exactly do you feed that data to the shader. The standard way of filling in shader properties is through materials. But probe values could be different for every single object even if they share the same material. We don’t want to create a unique material for every object, that would prevent batching. Unity has a feature for cases like this called MaterialPropertyBlocks. This allows you to fill in shader properties per object without having to go through a material. When using this you also need to set the [PerRendererData] tag on you shader property so that the engine knows your property will be filled in through MaterialPropertyBlocks. For probes data specifically you should set the lightProbeUsage value of all your renderers to “Custom Provided” This tells Unity that you’re responsible for filling in spherical harmonics. That way Unity won’t override your values.
  • #23: Overall replacing that system worked well for us, but here are a few considerations. Sampling and interpolating probes for each dynamic object each frame can be pretty expensive so you need a way to know which renderers will or won’t move. Objects that don’t move should sample the probes once when spawned and then never again. Objects that do move should not sample probes if they have not moved since the last sampling. In our case we fill rooms with probe based on a 3D grid which means we a regular layout for our probes. This is very helpful because it allows us to find the closest probes to any point in space in constant time. Without a regular layout it becomes a much more complex problem to find the surrounding probes for any position. Unity’s default system supports that with by tetrahedral tessellation, but we didn’t need to reimplement that on our side. Another thing to keep in mind is that the shader properties used for spherical harmonics also contain the scene’s ambient color, so you’ll need to handle that yourself by adding the ambient color with the probe’s values before sending them to the shader.
  • #24: Not running at high enough frame rate Android might be more affected
  • #25: CPU side rendering is costly and could often be a bottleneck While GPU side could be as well but Tile based GPU deal well with overdraw Generally shader performance is easier to deal with The number of draw calls is a big performance driver for both CPU and often GPU Control the number of passes Shadows Depth pass Pixel lights etc. Efficiency of batching: static and dynamic Efficiency of graphics API used…
  • #26: Culling is significant in reducing the number of draw calls A procedural game cannot use precomputed visibility culling Dynamic visibility culling  interior strategy exterior strategy Dynamic occlusion culling for town environment Buildings are natural occluders for other buildings, vegetations, decorations, NPCs etc.
  • #27: Graphics API is a huge performance driver What is faster OpenGL or Vulkan? OpenGL is a traditional API maintaining internal state Vulkan is much more modern with many facilities to reference states and rendering data
  • #28: Performance depends on graphics drivers… Loading times and FPS We discovered shader cache on some OpenGL drivers Vulkan is generally faster… But could be slower compared to OpenGL on other devices
  • #29: Dynamic graphics API selection to attempt the best of both world Unity cannot change graphics API in the run time But there is a provision for auto selection and fall back  Can also be initialized with a different API Requires custom Unity build To implement -- needs custom player Activity (java) Checking device or SOC name to pass a flag to unity
  • #30: Vulkan is viable Not many graphics artefacts Faster overall Beware validation layer Helps shader compilation caching Test in your specific case Both frame performance And loading times Dynamic API selection when beneficial Works in debug out of the box May need a custom Unity build May make it into main line Thanks to Unity for their help working with us to implement
  • #31: Continuing with Android:  How Unity distributes threads to different cores  Most modern Android devices are 8-cores There are big cores that are clocked higher And small cores that are slower but consume less power Unity game can easily have 30-40 threads running Most important ones are: UnityMain UnityGFXDeviceW – multithreaded rendering is presumed to be used Worker threads These will be used more and more with DOTs and Jobs system Some systems use these already  E.g.: skinning, physics, particles Use Systrace to find out how threads are distributed
  • #32: Core jumping Contention between workers and main/rendering threads Little cores not used…
  • #33: Putting everything on big cores might actually be ok These are faster Workers and not uber busy We experimented with alternative affinities
  • #34: Reserve cores for UnityMain and UnityGFXDeviceW Allocate workers (and choreographer and audio etc.) to other cores big and small
  • #35: This is what the result looks like Generally found this to be measurably faster (5-10%) Only 4 workers We asked Unity for a feature to allocate more workers
  • #36: Nav mesh generation benefits from workers on little cores Need to code a custom plugin to discover thread ids and adjust affinities Plugins are very easy with IL2CPP Setaffinity syscall to set core mask Not a major performance improvement Yet measurable We asked Unity for more workers Systrace is very useful iOS has much better single core performance (fewer cores though) Thanks to Google for their help
  • #37: Running out of memory iOS is more affected Generally less memory on iOS devices, many with only 2G
  • #38: Tools give different reading Xcode widget is the most important reading – OS jettisons apps Other tools give smaller estimates  Modern mobile devices memory system is complex
  • #39: Native memory for asset data: textures, meshes etc – Reference counted Managed memory for Unity Objects and C# objects – GC collected Doesn’t account for everything in a Unity app though Virtual memory space – large, exceeds physical space Physical wired memory is only a small part Consists of wired dirty memory, Reusable (GPU uses this for storage during rendering) External, encrypted code Some read-only assets can be unloaded Some dirty memory can be compressed
  • #40: Use mach functionality to get detailed view
  • #41: A practical view of memory types evolving over time Overall use of internal memory is important Managed heap state is important Compressed memory utilization is important
  • #42: Xcode widget reflects what OS is using for decision making There are many memory types and several levels of allocators Tools compute memory differently and often conservatively Use mach functionality to get details
  • #43: Heap growth is sizable steps Sometimes we are short of memory and have MM of space in unused managed heap slack
  • #44: Unity uses Boehm-Demers GC Easy enough to write a plugin that externs some GC statics One to note is GC_free_space_divisor that controls how heap will expand Can increase the divisor to have heap expanding less
  • #45: Unity is great of exposing GameObject hierarchy Tools can be used to automate finding GameObject leaks It is harder for native and managed memory There is a Unity memory profiler which is improving Allows to compare captures and drill down Custom tools can be build using low level memory API The API provides a snapshot of all memory blocks in native and managed memory
  • #46: We had to build a custom tool to help finding C# level leaks Can take a snapshot that compare with previous snapshot based on type names Also built a feature to see which types refer to which types Often times it is not drilling down into individual object but high level view that helps figuring out leaks Here is an example? Perhaps a leak?
  • #47: We have access to Boehm statics Exercise caution as with anything low level Thanks to the team