SlideShare a Scribd company logo
Halcyon + Vulkan
Munich Khronos Meetup
Graham Wihlidal
SEED – Electronic Arts
“PICA PICA”
 Exploratory mini-game & world
 Goals
 Hybrid rendering with DXR [Andersson 2018]
 Clean and consistent visuals
 Self-learning AI agents [Harmer 2018]
 Procedural worlds [Opara 2018]
 No precomputation
 Uses SEED’s Halcyon R&D framework
S E E D // Halcyon + Vulkan
HALCYON
Halcyon Goals
 Rapid prototyping framework
 Different purpose than Frostbite
 Fast experimentation vs. AAA games
 Windows, Linux, macOS
S E E D // Halcyon + Vulkan
Halcyon Goals
 Minimize or eliminate busy-work
 Artist “meta-data” meshes
 Occlusion
 GI / Lighting
 Collision
 Level-of-detail
 Live reloading of all assets
 Insanely fast iteration times
S E E D // Halcyon + Vulkan
Halcyon Goals
 Only target modern APIs
 Direct3D 12
 Vulkan 1.1
 Metal 2
 Multi-GPU
 Explicit heterogeneous mGPU
 No AFR nonsense
 No linked adapters
S E E D // Halcyon + Vulkan
Halcyon Goals
 Local or remote streaming
 Minimal boilerplate code
 Variety of rendering techniques and approaches
 Rasterization
 Path and ray tracing
 Hybrid
S E E D // Halcyon + Vulkan
Hybrid Rendering
S E E D // Halcyon + Vulkan
Direct Shadows
(ray trace or raster)
Direct Lighting
(compute)
Reflections
(ray trace or compute)
Global Illumination
(ray trace)
Post Processing
(compute)
Transparency & Translucency
(ray trace)
Ambient Occlusion
(ray trace or compute)
Deferred Shading
(raster)
Hybrid Rendering
Rasterization Only
Halcyon Goals
 “PICA PICA” and Halcyon built from scratch
 Implemented lots of bespoke technology
 Minimal effort to add a new API or platform
 Efficient and flexible rendering was a major focus
S E E D // Halcyon + Vulkan
Rendering Components
 Render Backend
 Render Device
 Render Handles
 Render Commands
 Render Graph
 Render Proxy
S E E D // Halcyon + Vulkan
Rendering Components
S E E D // Halcyon + Vulkan
Render Handles
Render Commands
Render Backend
Render Device
Render Backend
Render DeviceRender Device
Render Backend
Render Proxy
Render Graph Render Graph
Application
Render Backend
Render Backend
 Live-reloadable DLLs
 Enumerates adapters and capabilities
 Swap chain support
 Extensions (i.e. ray tracing, sub groups, …)
 Determine adapter(s) to use
S E E D // Halcyon + Vulkan
Render Backend
 Provides debugging and profiling
 RenderDoc integration, validation layers, …
 Create and destroy render devices
S E E D // Halcyon + Vulkan
Render Backend
S E E D // Halcyon + Vulkan
 Direct3D 12
 Vulkan 1.1
 Metal 2
 Proxy
 Mock
Render Backend
S E E D // Halcyon + Vulkan
 Direct3D 12
 Shader Model 6.X
 DirectX Ray Tracing
 Bindless Resources
 Explicit Multi-GPU
 DirectML (soon..)
 …
Render Backend
S E E D // Halcyon + Vulkan
 Vulkan 1.1
 Sub-groups
 Descriptor indexing
 External memory
 Multi-draw indirect
 Ray tracing (soon..)
 …
Render Backend
S E E D // Halcyon + Vulkan
 Metal 2
 Early development
 Primarily desktop
 Argument buffers
 Machine learning
 …
Render Backend
S E E D // Halcyon + Vulkan
 Proxy
 Not discussed in this presentation
Render Backend
S E E D // Halcyon + Vulkan
 Mock
 Performs resource tracking and validation
 Command stream is parsed and evaluated
 No submission to an API
 Useful for unit tests and debugging
Render Device
Render Device
S E E D // Halcyon + Vulkan
 Abstraction of a logical GPU adapter
 e.g. VkDevice, ID3D12Device, …
 Provides interface to GPU queues
 Command list submission
Render Device
S E E D // Halcyon + Vulkan
 Ownership of GPU resources
 Create & Destroy
 Lifetime tracking of resources
 Mapping render handles  device resources
Render Handles
S E E D // Halcyon + Vulkan
Render Handles
 Resources associated by handle
 Lightweight (64 bits)
 Constant-time lookup
 Type safety (i.e. buffer vs texture)
 Can be serialized or transmitted
 Generational for safety
 e.g. double-delete, usage after delete
S E E D // Halcyon + Vulkan
Render Handles
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
 Handles allow one-to-many cardinality [handle->devices]
 Each device can have a unique representation of the handle
S E E D // Halcyon + Vulkan
Render Handles
 Can query if a device has a handle loaded
 Safely add and remove devices
 Handle owned by application, representation can reload on device
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
S E E D // Halcyon + Vulkan
Render Handles
 Shared resources are supported
 Primary device owner, secondaries alias primary
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
S E E D // Halcyon + Vulkan
Render Handles
 Can also mix and match backends in the same process!
 Made debugging VK implementation much easier
 DX12 on left half of screen, VK on right half of screen
ID3D12Resource ID3D12Resource
VK: Adapter 0
VkImage
Proxy: Adapter 0
Render Handle
DX12: Adapter 1DX12: Adapter 0
Render Handle
Render Commands
Render Commands
 Draw
 DrawIndirect
 Dispatch
 DispatchIndirect
 UpdateBuffer
 UpdateTexture
 CopyBuffer
 CopyTexture
 Barriers
 Transitions
 BeginTiming
 EndTiming
 ResolveTimings
 BeginEvent
 EndEvent
 BeginRenderPass
 EndRenderPass
 RayTrace
 UpdateTopLevel
 UpdateBottomLevel
 UpdateShaderTable
S E E D // Halcyon + Vulkan
Render Commands
 Queue type specified
 Spec validation
 Allowed to run?
 e.g. draws on compute
 Automatic scheduling
 Where can it run?
 Async compute
S E E D // Halcyon + Vulkan
Render Commands
S E E D // Halcyon + Vulkan
Render Command List
 Encodes high level commands
 Tracks queue types encountered
 Queue mask indicating scheduling rules
 Commands are stateless - parallel recording
S E E D // Halcyon + Vulkan
Render Compilation
 Render command lists are “compiled”
 Translation to low level API
 Can compile once, submit multiple times
 Serial operation (memcpy speed)
 Perfect redundant state filtering
S E E D // Halcyon + Vulkan
Render Graph
Render Graph
 Inspired by FrameGraph [O’Donnell 2017]
 Automatically handle transient resources
 Import explicitly managed resources
 Automatic resource transitions
 Render target batching
 DiscardResource
 Memory aliasing barriers
 …
S E E D // Halcyon + Vulkan
Render Graph
 Basic memory management
 Not targeting current consoles
 Fine grained memory reuse sub-optimal with current PC drivers
 Lose ~5% on aliasing barriers and discards
 Automatic queue scheduling
 Ongoing research
 Need heuristics on task duration and bottlenecks
 e.g. Memory vs ALU
 Not enough to specify dependencies
S E E D // Halcyon + Vulkan
Render Graph
 Frame Graph  Render Graph: No concept of a “frame”
 Fully automatic transitions and split barriers
 Single implementation, regardless of backend
 Translation from high level render command stream
 API differences hidden from render graph
 Support for mGPU
 Mostly implicit and automatic
 Can specify a scheduling policy
 Not discussed in this presentation
S E E D // Halcyon + Vulkan
Render Graph
 Composition of multiple graphs at varying frequencies
 Same GPU: async compute
 mGPU: graphs per GPU
 Out-of-core: server cluster, remote streaming
S E E D // Halcyon + Vulkan
Render Graph
 Composition of multiple graphs at varying frequencies
 e.g. translucency, refraction, global illumination
S E E D // Halcyon + Vulkan
Render Graph
 Two phases
 Graph construction
 Specify inputs and outputs
 Serial operation (by design)
 Graph evaluation
 Highly parallelized
 Record high level render commands
 Automatic barriers and transitions
S E E D // Halcyon + Vulkan
 Construction phase
 Evaluation phase
Render Graph
 Automatic profiling data
 GPU and CPU counters per-pass
 Works with mGPU
 Each GPU is profiled
S E E D // Halcyon + Vulkan
Render Graph
 Live debugging overlay
 Evaluated passes in-order of execution
 Input and output dependencies
 Resource version information
S E E D // Halcyon + Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
Render Graph
Some of our render graph passes:
S E E D // Halcyon + Vulkan
 Bloom
 BottomLevelUpdate
 BrdfLut
 CocDerive
 DepthPyramid
 DiffuseSh
 Dof
 Final
 GBuffer
 Gtao
 IblReflection
 ImGui
 InstanceTransforms
 Lighting
 MotionBlur
 Present
 RayTracing
 RayTracingAccum
 ReflectionFilter
 ReflectionSample
 ReflectionTrace
 Rtao
 Screenshot
 Segmentation
 ShaderTableUpdate
 ShadowFilter
 ShadowMask
 ShadowCascades
 ShadowTrace
 Skinning
 Ssr
 SurfelGapFill
 SurfelLighting
 SurfelPositions
 SurfelSpawn
 Svgf
 TemporalAa
 TemporalReproject
 TopLevelUpdate
 TranslucencyTrace
 Velocity
 Visibility
Shaders
Shaders
 Complex materials
 Multiple microfacet layers
 [Stachowiak 2018]
 Energy conserving
 Automatic Fresnel between layers
 All lighting & rendering modes
 Raster, path-traced reference, hybrid
 Iterate with different looks
 Bake down permutations for production
S E E D // Halcyon + Vulkan
Objects with Multi-Layered Materials
Shaders
 Exclusively HLSL
 Shader Model 6.X
 Majority are compute shaders
 Performance is critical
 Group shared memory
 Wave-ops / Sub-groups
S E E D // Halcyon + Vulkan
Shaders
 No reflection
 Avoid costly lookups
 Only explicit bindings
 … except for validation
 Extensive use of HLSL spaces
 Updates at varying frequency
 Bindless
S E E D // Halcyon + Vulkan
Shaders
S E E D // Halcyon + Vulkan
SPIR-VDXIL
Vulkan 1.1Direct3D 12
ISPCMSL
SPIRV-CROSS
AVX2, …Metal 2
DXC
HLSL
Shader Arguments
 Commands refer to resources with “Shader Arguments”
 Each argument represents an HLSL space
 MaxShaderParameters  4 [Configurable]
 # of spaces, not # of resources
S E E D // Halcyon + Vulkan
Shader Arguments
 Each argument contains:
 “ShaderViews” handle
 Constant buffer handle and offset
 “ShaderViews”
 Collection of SRV and UAV handles
S E E D // Halcyon + Vulkan
Shader Arguments
 Constant buffers are all dynamic
 Avoid temporary descriptors
 Just a few large buffers, offsets change frequently
 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC
 DX12 Root Descriptors (pass in GPU VA)
 All descriptor sets are only written once
 Persisted / cached
S E E D // Halcyon + Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
Vulkan Implementation
 Architecture simplified development effort
 Vulkan specific:
 Backend and device implementation
 Memory allocators (e.g. AMD VMA)
 Barrier and transition logic
 Resource binding model
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan Implementation
S E E D // Halcyon + Vulkan
Vulkan! 
Vulkan! 
Vulkan! 
Vulkan Implementation
 Shader compilation (HLSL  SPIR-V)
 Patch SPIR-V to match DX12
 Using spirv-reflect from Hai and Cort
 spvReflectCreateShaderModule
 spvReflectEnumerateDescriptorSets
 spvReflectChangeDescriptorBindingNumbers
 spvReflectGetCodeSize / spvReflectGetCode
 spvReflectDestroyShaderModule
S E E D // Halcyon + Vulkan
SPIR-V Patching
 SPV_REFLECT_RESOURCE_FLAG_SRV
 Offset += 1000
 SPV_REFLECT_RESOURCE_FLAG_SAMPLER
 Offset += 2000
 SPV_REFLECT_RESOURCE_FLAG_UAV
 Offset += 3000
S E E D // Halcyon + Vulkan
SPIR-V Patching
 SPV_REFLECT_RESOURCE_FLAG_CBV
 Offset Unchanged: 0
 Descriptor Set += MAX_SHADER_ARGUMENTS
 CBVs move to their own descriptor sets
 ShaderViews become persistent and immutable
S E E D // Halcyon + Vulkan
SPIR-V Patching
S E E D // Halcyon + Vulkan
 If 2 of 4 HLSL spaces in use:
Unbound
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Dynamic Constant Buffer (Offset: 0)
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Unbound
Dynamic Constant Buffer (Offset: 0)
Set 0
Set 1
Set 2
Set 3
Set 4
Set 5
Vulkan Implementation
 Translate commands
 Read command list
 Write Vulkan API
S E E D // Halcyon + Vulkan
S E E D // Halcyon + Vulkan
S E E D // Halcyon + Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
Ongoing Work!
VK nearing DX12
Tools
Tools
 RenderDoc
 NV Nsight
 AMD RGP
S E E D // Halcyon + Vulkan
S E E D // Halcyon + Vulkan
Tools
S E E D // Halcyon + Vulkan
Tools
S E E D // Halcyon + Vulkan
 C++ Export!
 Standalone
Khronos Munich 2018 - Halcyon and Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
Dear ImGui + ImGuizmo
 Live tweaking
 Very useful!
S E E D // Halcyon + Vulkan
Khronos Munich 2018 - Halcyon and Vulkan
References
S E E D // Halcyon + Vulkan
 [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”.
available online
 [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”.
available online
 [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus
Nordin. “Imitation Learning with Concurrent Actions in 3D Games”.
available online
 [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”.
available online
 [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”.
available online
Thanks
 Matthäus Chajdas
 Rys Sommefeldt
 Timothy Lottes
 Tobias Hector
 Neil Henning
 John Kessenich
 Hai Nguyen
 Nuno Subtil
 Adam Sawicki
 Alon Or-bach
 Baldur Karlsson
 Cort Stratton
 Mathias Schott
 Rolando Caloca
 Sebastian Aaltonen
 Hans-Kristian Arntzen
 Yuriy O’Donnell
 Arseny Kapoulkine
 Tex Riddell
 Marcelo Lopez Ruiz
 Lei Zhang
 Greg Roth
 Noah Fredriks
 Qun Lin
 Ehsan Nasiri,
 Steven Perron
 Alan Baker
 Diego Novillo
Thanks
 SEED
 Johan Andersson
 Colin Barré-Brisebois
 Jasper Bekkers
 Joakim Bergdahl
 Ken Brown
 Dean Calver
 Dirk de la Hunt
 Jenna Frisk
 Paul Greveson
 Henrik Halen
 Effeli Holst
 Andrew Lauritzen
 Magnus Nordin
 Niklas Nummelin
 Anastasia Opara
 Kristoffer Sjöö
 Ida Winterhaven
 Tomasz Stachowiak
 Microsoft
 Chas Boyd
 Ivan Nevraev
 Amar Patel
 Matt Sandy
 NVIDIA
 Tomas Akenine-Möller
 Nir Benty
 Jiho Choi
 Peter Harrison
 Alex Hyder
 Jon Jansen
 Aaron Lefohn
 Ignacio Llamas
 Henry Moreton
 Martin Stich
S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N
S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E
S E E D . E A . C O M
W E ‘ R E H I R I N G !
Questions?

More Related Content

PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PDF
「原神」におけるコンソールプラットフォーム開発
PPTX
Rendering Technologies from Crysis 3 (GDC 2013)
PDF
Bindless Deferred Decals in The Surge 2
PDF
アニメーションとスキニングをBurstで独自実装する.pdf
PPTX
なぜなにリアルタイムレンダリング
PDF
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
「原神」におけるコンソールプラットフォーム開発
Rendering Technologies from Crysis 3 (GDC 2013)
Bindless Deferred Decals in The Surge 2
アニメーションとスキニングをBurstで独自実装する.pdf
なぜなにリアルタイムレンダリング
The Rendering Technology of 'Lords of the Fallen' (Game Connection Europe 2014)
FrameGraph: Extensible Rendering Architecture in Frostbite

What's hot (20)

PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PDF
シェーダーを活用した3Dライブ演出のアップデート ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スクスタ)の開発事例~​
PDF
Design your 3d game engine
PDF
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
PDF
【Unity道場】新しいPrefabワークフロー入門
PDF
Advanced Scenegraph Rendering Pipeline
PDF
Unityと.NET
PDF
ゲームサーバ開発現場の考え方
PDF
06_게임엔진구성
PPTX
背景を作って苦労してみた ~Amplify Impostors~
PPTX
UnityでUI開発を高速化した件
PPTX
DirectX 11 Rendering in Battlefield 3
PPT
Secrets of CryENGINE 3 Graphics Technology
PDF
【Unite Tokyo 2019】大量のアセットも怖くない!~HTTP/2による高速な通信の実装例~
PPTX
ガルガンチュア on Oculus Quest - 72FPSへの挑戦 -
PDF
【Unity道場】VectorGraphicsで作る エモい表現
PPTX
【CEDEC2018】Scriptable Render Pipelineを使ってみよう
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPTX
Scene Graphs & Component Based Game Engines
PPTX
Approaching zero driver overhead
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
シェーダーを活用した3Dライブ演出のアップデート ~『ラブライブ!スクールアイドルフェスティバル ALL STARS』(スクスタ)の開発事例~​
Design your 3d game engine
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
【Unity道場】新しいPrefabワークフロー入門
Advanced Scenegraph Rendering Pipeline
Unityと.NET
ゲームサーバ開発現場の考え方
06_게임엔진구성
背景を作って苦労してみた ~Amplify Impostors~
UnityでUI開発を高速化した件
DirectX 11 Rendering in Battlefield 3
Secrets of CryENGINE 3 Graphics Technology
【Unite Tokyo 2019】大量のアセットも怖くない!~HTTP/2による高速な通信の実装例~
ガルガンチュア on Oculus Quest - 72FPSへの挑戦 -
【Unity道場】VectorGraphicsで作る エモい表現
【CEDEC2018】Scriptable Render Pipelineを使ってみよう
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Scene Graphs & Component Based Game Engines
Approaching zero driver overhead
Ad

Similar to Khronos Munich 2018 - Halcyon and Vulkan (20)

PDF
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
PPTX
Siggraph 2016 - Vulkan and nvidia : the essentials
PPTX
vkFX: Effect(ive) approach for Vulkan API
PDF
SEED - Halcyon Architecture
PPTX
Getting started with Ray Tracing in Unity 2019.3 - Unite Copenhagen 2019
PPTX
NvFX GTC 2013
PPT
Your Game Needs Direct3D 11, So Get Started Now!
PDF
Gdc 14 bringing unreal engine 4 to open_gl
PDF
Minko - Targeting Flash/Stage3D with C++ and GLSL
PPT
Riding the Elephant - Hadoop 2.0
PPT
Advanced Graphics Workshop - GFX2011
PDF
Optimizing NN inference performance on Arm NEON and Vulkan
PDF
Managed DirectX
PPTX
Dragon flow neutron lightning talk
PDF
Multithreading in Android
PPTX
Dragonflow 01 2016 TLV meetup
PPT
sector-sphere
PPT
Sector Sphere 2009
PPTX
Creative Coders March 2013 - Introducing Starling Framework
PPT
Android Developer Meetup
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Siggraph 2016 - Vulkan and nvidia : the essentials
vkFX: Effect(ive) approach for Vulkan API
SEED - Halcyon Architecture
Getting started with Ray Tracing in Unity 2019.3 - Unite Copenhagen 2019
NvFX GTC 2013
Your Game Needs Direct3D 11, So Get Started Now!
Gdc 14 bringing unreal engine 4 to open_gl
Minko - Targeting Flash/Stage3D with C++ and GLSL
Riding the Elephant - Hadoop 2.0
Advanced Graphics Workshop - GFX2011
Optimizing NN inference performance on Arm NEON and Vulkan
Managed DirectX
Dragon flow neutron lightning talk
Multithreading in Android
Dragonflow 01 2016 TLV meetup
sector-sphere
Sector Sphere 2009
Creative Coders March 2013 - Introducing Starling Framework
Android Developer Meetup
Ad

More from Electronic Arts / DICE (20)

PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PPT
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
PDF
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
PPTX
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
PPTX
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
PDF
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
PDF
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
PDF
Creativity of Rules and Patterns: Designing Procedural Systems
PPTX
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
PPTX
Future Directions for Compute-for-Graphics
PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
PPTX
High Dynamic Range color grading and display in Frostbite
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPTX
Lighting the City of Glass
PPTX
Photogrammetry and Star Wars Battlefront
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPTX
Stochastic Screen-Space Reflections
PPTX
Frostbite on Mobile
GDC2019 - SEED - Towards Deep Generative Models in Game Development
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Creativity of Rules and Patterns: Designing Procedural Systems
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Future Directions for Compute-for-Graphics
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
High Dynamic Range color grading and display in Frostbite
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Lighting the City of Glass
Photogrammetry and Star Wars Battlefront
Physically Based and Unified Volumetric Rendering in Frostbite
Stochastic Screen-Space Reflections
Frostbite on Mobile

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KodekX | Application Modernization Development
NewMind AI Monthly Chronicles - July 2025
Encapsulation_ Review paper, used for researhc scholars
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy

Khronos Munich 2018 - Halcyon and Vulkan

  • 1. Halcyon + Vulkan Munich Khronos Meetup Graham Wihlidal SEED – Electronic Arts
  • 2. “PICA PICA”  Exploratory mini-game & world  Goals  Hybrid rendering with DXR [Andersson 2018]  Clean and consistent visuals  Self-learning AI agents [Harmer 2018]  Procedural worlds [Opara 2018]  No precomputation  Uses SEED’s Halcyon R&D framework S E E D // Halcyon + Vulkan
  • 4. Halcyon Goals  Rapid prototyping framework  Different purpose than Frostbite  Fast experimentation vs. AAA games  Windows, Linux, macOS S E E D // Halcyon + Vulkan
  • 5. Halcyon Goals  Minimize or eliminate busy-work  Artist “meta-data” meshes  Occlusion  GI / Lighting  Collision  Level-of-detail  Live reloading of all assets  Insanely fast iteration times S E E D // Halcyon + Vulkan
  • 6. Halcyon Goals  Only target modern APIs  Direct3D 12  Vulkan 1.1  Metal 2  Multi-GPU  Explicit heterogeneous mGPU  No AFR nonsense  No linked adapters S E E D // Halcyon + Vulkan
  • 7. Halcyon Goals  Local or remote streaming  Minimal boilerplate code  Variety of rendering techniques and approaches  Rasterization  Path and ray tracing  Hybrid S E E D // Halcyon + Vulkan
  • 8. Hybrid Rendering S E E D // Halcyon + Vulkan Direct Shadows (ray trace or raster) Direct Lighting (compute) Reflections (ray trace or compute) Global Illumination (ray trace) Post Processing (compute) Transparency & Translucency (ray trace) Ambient Occlusion (ray trace or compute) Deferred Shading (raster)
  • 11. Halcyon Goals  “PICA PICA” and Halcyon built from scratch  Implemented lots of bespoke technology  Minimal effort to add a new API or platform  Efficient and flexible rendering was a major focus S E E D // Halcyon + Vulkan
  • 12. Rendering Components  Render Backend  Render Device  Render Handles  Render Commands  Render Graph  Render Proxy S E E D // Halcyon + Vulkan
  • 13. Rendering Components S E E D // Halcyon + Vulkan Render Handles Render Commands Render Backend Render Device Render Backend Render DeviceRender Device Render Backend Render Proxy Render Graph Render Graph Application
  • 15. Render Backend  Live-reloadable DLLs  Enumerates adapters and capabilities  Swap chain support  Extensions (i.e. ray tracing, sub groups, …)  Determine adapter(s) to use S E E D // Halcyon + Vulkan
  • 16. Render Backend  Provides debugging and profiling  RenderDoc integration, validation layers, …  Create and destroy render devices S E E D // Halcyon + Vulkan
  • 17. Render Backend S E E D // Halcyon + Vulkan  Direct3D 12  Vulkan 1.1  Metal 2  Proxy  Mock
  • 18. Render Backend S E E D // Halcyon + Vulkan  Direct3D 12  Shader Model 6.X  DirectX Ray Tracing  Bindless Resources  Explicit Multi-GPU  DirectML (soon..)  …
  • 19. Render Backend S E E D // Halcyon + Vulkan  Vulkan 1.1  Sub-groups  Descriptor indexing  External memory  Multi-draw indirect  Ray tracing (soon..)  …
  • 20. Render Backend S E E D // Halcyon + Vulkan  Metal 2  Early development  Primarily desktop  Argument buffers  Machine learning  …
  • 21. Render Backend S E E D // Halcyon + Vulkan  Proxy  Not discussed in this presentation
  • 22. Render Backend S E E D // Halcyon + Vulkan  Mock  Performs resource tracking and validation  Command stream is parsed and evaluated  No submission to an API  Useful for unit tests and debugging
  • 24. Render Device S E E D // Halcyon + Vulkan  Abstraction of a logical GPU adapter  e.g. VkDevice, ID3D12Device, …  Provides interface to GPU queues  Command list submission
  • 25. Render Device S E E D // Halcyon + Vulkan  Ownership of GPU resources  Create & Destroy  Lifetime tracking of resources  Mapping render handles  device resources
  • 27. S E E D // Halcyon + Vulkan Render Handles  Resources associated by handle  Lightweight (64 bits)  Constant-time lookup  Type safety (i.e. buffer vs texture)  Can be serialized or transmitted  Generational for safety  e.g. double-delete, usage after delete
  • 28. S E E D // Halcyon + Vulkan Render Handles ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle  Handles allow one-to-many cardinality [handle->devices]  Each device can have a unique representation of the handle
  • 29. S E E D // Halcyon + Vulkan Render Handles  Can query if a device has a handle loaded  Safely add and remove devices  Handle owned by application, representation can reload on device ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 30. S E E D // Halcyon + Vulkan Render Handles  Shared resources are supported  Primary device owner, secondaries alias primary ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 31. S E E D // Halcyon + Vulkan Render Handles  Can also mix and match backends in the same process!  Made debugging VK implementation much easier  DX12 on left half of screen, VK on right half of screen ID3D12Resource ID3D12Resource VK: Adapter 0 VkImage Proxy: Adapter 0 Render Handle DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 33. Render Commands  Draw  DrawIndirect  Dispatch  DispatchIndirect  UpdateBuffer  UpdateTexture  CopyBuffer  CopyTexture  Barriers  Transitions  BeginTiming  EndTiming  ResolveTimings  BeginEvent  EndEvent  BeginRenderPass  EndRenderPass  RayTrace  UpdateTopLevel  UpdateBottomLevel  UpdateShaderTable S E E D // Halcyon + Vulkan
  • 34. Render Commands  Queue type specified  Spec validation  Allowed to run?  e.g. draws on compute  Automatic scheduling  Where can it run?  Async compute S E E D // Halcyon + Vulkan
  • 35. Render Commands S E E D // Halcyon + Vulkan
  • 36. Render Command List  Encodes high level commands  Tracks queue types encountered  Queue mask indicating scheduling rules  Commands are stateless - parallel recording S E E D // Halcyon + Vulkan
  • 37. Render Compilation  Render command lists are “compiled”  Translation to low level API  Can compile once, submit multiple times  Serial operation (memcpy speed)  Perfect redundant state filtering S E E D // Halcyon + Vulkan
  • 39. Render Graph  Inspired by FrameGraph [O’Donnell 2017]  Automatically handle transient resources  Import explicitly managed resources  Automatic resource transitions  Render target batching  DiscardResource  Memory aliasing barriers  … S E E D // Halcyon + Vulkan
  • 40. Render Graph  Basic memory management  Not targeting current consoles  Fine grained memory reuse sub-optimal with current PC drivers  Lose ~5% on aliasing barriers and discards  Automatic queue scheduling  Ongoing research  Need heuristics on task duration and bottlenecks  e.g. Memory vs ALU  Not enough to specify dependencies S E E D // Halcyon + Vulkan
  • 41. Render Graph  Frame Graph  Render Graph: No concept of a “frame”  Fully automatic transitions and split barriers  Single implementation, regardless of backend  Translation from high level render command stream  API differences hidden from render graph  Support for mGPU  Mostly implicit and automatic  Can specify a scheduling policy  Not discussed in this presentation S E E D // Halcyon + Vulkan
  • 42. Render Graph  Composition of multiple graphs at varying frequencies  Same GPU: async compute  mGPU: graphs per GPU  Out-of-core: server cluster, remote streaming S E E D // Halcyon + Vulkan
  • 43. Render Graph  Composition of multiple graphs at varying frequencies  e.g. translucency, refraction, global illumination S E E D // Halcyon + Vulkan
  • 44. Render Graph  Two phases  Graph construction  Specify inputs and outputs  Serial operation (by design)  Graph evaluation  Highly parallelized  Record high level render commands  Automatic barriers and transitions S E E D // Halcyon + Vulkan
  • 45.  Construction phase  Evaluation phase
  • 46. Render Graph  Automatic profiling data  GPU and CPU counters per-pass  Works with mGPU  Each GPU is profiled S E E D // Halcyon + Vulkan
  • 47. Render Graph  Live debugging overlay  Evaluated passes in-order of execution  Input and output dependencies  Resource version information S E E D // Halcyon + Vulkan
  • 49. Render Graph Some of our render graph passes: S E E D // Halcyon + Vulkan  Bloom  BottomLevelUpdate  BrdfLut  CocDerive  DepthPyramid  DiffuseSh  Dof  Final  GBuffer  Gtao  IblReflection  ImGui  InstanceTransforms  Lighting  MotionBlur  Present  RayTracing  RayTracingAccum  ReflectionFilter  ReflectionSample  ReflectionTrace  Rtao  Screenshot  Segmentation  ShaderTableUpdate  ShadowFilter  ShadowMask  ShadowCascades  ShadowTrace  Skinning  Ssr  SurfelGapFill  SurfelLighting  SurfelPositions  SurfelSpawn  Svgf  TemporalAa  TemporalReproject  TopLevelUpdate  TranslucencyTrace  Velocity  Visibility
  • 51. Shaders  Complex materials  Multiple microfacet layers  [Stachowiak 2018]  Energy conserving  Automatic Fresnel between layers  All lighting & rendering modes  Raster, path-traced reference, hybrid  Iterate with different looks  Bake down permutations for production S E E D // Halcyon + Vulkan Objects with Multi-Layered Materials
  • 52. Shaders  Exclusively HLSL  Shader Model 6.X  Majority are compute shaders  Performance is critical  Group shared memory  Wave-ops / Sub-groups S E E D // Halcyon + Vulkan
  • 53. Shaders  No reflection  Avoid costly lookups  Only explicit bindings  … except for validation  Extensive use of HLSL spaces  Updates at varying frequency  Bindless S E E D // Halcyon + Vulkan
  • 54. Shaders S E E D // Halcyon + Vulkan SPIR-VDXIL Vulkan 1.1Direct3D 12 ISPCMSL SPIRV-CROSS AVX2, …Metal 2 DXC HLSL
  • 55. Shader Arguments  Commands refer to resources with “Shader Arguments”  Each argument represents an HLSL space  MaxShaderParameters  4 [Configurable]  # of spaces, not # of resources S E E D // Halcyon + Vulkan
  • 56. Shader Arguments  Each argument contains:  “ShaderViews” handle  Constant buffer handle and offset  “ShaderViews”  Collection of SRV and UAV handles S E E D // Halcyon + Vulkan
  • 57. Shader Arguments  Constant buffers are all dynamic  Avoid temporary descriptors  Just a few large buffers, offsets change frequently  VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC  DX12 Root Descriptors (pass in GPU VA)  All descriptor sets are only written once  Persisted / cached S E E D // Halcyon + Vulkan
  • 59. Vulkan Implementation  Architecture simplified development effort  Vulkan specific:  Backend and device implementation  Memory allocators (e.g. AMD VMA)  Barrier and transition logic  Resource binding model S E E D // Halcyon + Vulkan
  • 60. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 61. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 62. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 63. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 64. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 65. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 66. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 67. Vulkan Implementation S E E D // Halcyon + Vulkan
  • 71. Vulkan Implementation  Shader compilation (HLSL  SPIR-V)  Patch SPIR-V to match DX12  Using spirv-reflect from Hai and Cort  spvReflectCreateShaderModule  spvReflectEnumerateDescriptorSets  spvReflectChangeDescriptorBindingNumbers  spvReflectGetCodeSize / spvReflectGetCode  spvReflectDestroyShaderModule S E E D // Halcyon + Vulkan
  • 72. SPIR-V Patching  SPV_REFLECT_RESOURCE_FLAG_SRV  Offset += 1000  SPV_REFLECT_RESOURCE_FLAG_SAMPLER  Offset += 2000  SPV_REFLECT_RESOURCE_FLAG_UAV  Offset += 3000 S E E D // Halcyon + Vulkan
  • 73. SPIR-V Patching  SPV_REFLECT_RESOURCE_FLAG_CBV  Offset Unchanged: 0  Descriptor Set += MAX_SHADER_ARGUMENTS  CBVs move to their own descriptor sets  ShaderViews become persistent and immutable S E E D // Halcyon + Vulkan
  • 74. SPIR-V Patching S E E D // Halcyon + Vulkan  If 2 of 4 HLSL spaces in use: Unbound SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Dynamic Constant Buffer (Offset: 0) SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Unbound Dynamic Constant Buffer (Offset: 0) Set 0 Set 1 Set 2 Set 3 Set 4 Set 5
  • 75. Vulkan Implementation  Translate commands  Read command list  Write Vulkan API S E E D // Halcyon + Vulkan
  • 76. S E E D // Halcyon + Vulkan
  • 77. S E E D // Halcyon + Vulkan
  • 80. Tools
  • 81. Tools  RenderDoc  NV Nsight  AMD RGP S E E D // Halcyon + Vulkan
  • 82. S E E D // Halcyon + Vulkan
  • 83. Tools S E E D // Halcyon + Vulkan
  • 84. Tools S E E D // Halcyon + Vulkan  C++ Export!  Standalone
  • 87. Dear ImGui + ImGuizmo  Live tweaking  Very useful! S E E D // Halcyon + Vulkan
  • 89. References S E E D // Halcyon + Vulkan  [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”. available online  [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”. available online  [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus Nordin. “Imitation Learning with Concurrent Actions in 3D Games”. available online  [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”. available online  [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”. available online
  • 90. Thanks  Matthäus Chajdas  Rys Sommefeldt  Timothy Lottes  Tobias Hector  Neil Henning  John Kessenich  Hai Nguyen  Nuno Subtil  Adam Sawicki  Alon Or-bach  Baldur Karlsson  Cort Stratton  Mathias Schott  Rolando Caloca  Sebastian Aaltonen  Hans-Kristian Arntzen  Yuriy O’Donnell  Arseny Kapoulkine  Tex Riddell  Marcelo Lopez Ruiz  Lei Zhang  Greg Roth  Noah Fredriks  Qun Lin  Ehsan Nasiri,  Steven Perron  Alan Baker  Diego Novillo
  • 91. Thanks  SEED  Johan Andersson  Colin Barré-Brisebois  Jasper Bekkers  Joakim Bergdahl  Ken Brown  Dean Calver  Dirk de la Hunt  Jenna Frisk  Paul Greveson  Henrik Halen  Effeli Holst  Andrew Lauritzen  Magnus Nordin  Niklas Nummelin  Anastasia Opara  Kristoffer Sjöö  Ida Winterhaven  Tomasz Stachowiak  Microsoft  Chas Boyd  Ivan Nevraev  Amar Patel  Matt Sandy  NVIDIA  Tomas Akenine-Möller  Nir Benty  Jiho Choi  Peter Harrison  Alex Hyder  Jon Jansen  Aaron Lefohn  Ignacio Llamas  Henry Moreton  Martin Stich
  • 92. S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E S E E D . E A . C O M W E ‘ R E H I R I N G !

Editor's Notes

  • #2: My name is Graham Wihlidal, and I’m a senior rendering engineer at SEED in Stockholm, Sweden. Previously, I was on the Frostbite rendering team, and at BioWare for a long time.
  • #3: We have built the PICA PICA from the ground up in our custom R&D framework called Halcyon. It is a flexible experimentation framework that is very capable of rendering fast and shiny pixels.
  • #4: So lets talk a bit about Halcyon itself
  • #5: Halcyon is a rapid prototyping framework, serving a different purpose than our flagship AAA engine, Frostbite. And Halcyon is currently supported on Windows, Linux, and macOS.
  • #6: A major goal of Halcyon is to minimize or eliminate busy-work; something I call - artist “meta-data” meshes. Show me one artist that actually enjoys making these meshes over something more creative, and I guarantee you they are brainwashed, and need an intervention and our caring support. Another critical goal, is the live reloading of all assets. We don’t want to take a coffee break while we shut down Halcyon, launch a data build, come back, and resume whatever we were doing.
  • #7: One luxury we had by starting from scratch, was choosing our feature set and min spec. We decided to only target modern APIs, so we are not restricted by legacy. Another interesting goal is to provide easy access to multiple GPUs, without sacrificing API cleanliness or maintainability. To accomplish this, we decided on explicit heterogeneous mGPU, not linked adapters. We also are avoiding any AFR nonsense for a number of reasons, including problems with temporal techniques.
  • #8: We chose to support rendering locally, and also performing some computation or rendering remotely, and transmitting the results back to the application. In order to deliver on our promise of fast experimentation, we needed to ensure a minimal amount of boilerplate code. This includes code to load a shader, set up pipelines and render state, query scene representations, etc. This was critical in developing and supporting a vast number of rendering techniques and approaches that we have in Halcyon.
  • #9: We have a unique rendering pipeline which is inspired by classic techniques in real-time rendering, with ray tracing sprinkled on top. We have a deferred renderer with compute-based lighting, and a fairly standard post-processing stack. And then there are a few pluggable components. We can render shadows via ray tracing or cascaded shadow maps. Reflections can be traced or screen-space marched. Same story for ambient occlusion. Only our global illumination and translucency actually require ray tracing.
  • #10: Here is an example of a scene using our hybrid rendering pipeline
  • #11: And here is the same scene uses our traditional pure rasterization rendering pipeline. As you can see, most of our visual fidelity holds up between the two, especially with respect to our materials.
  • #12: PICA PICA and Halcyon were both built from scratch, and our goals required us to implement a lot of bespoke technology. In the end, our flexible architecture means it is minimal effort to add a new API or platform, and also render our scenes very efficiently.
  • #15: The first component I will talk about is render backend
  • #16: Render backends are implemented as live-reloadable DLLs. Each backend represents a particular API, and provides enumeration of adapters and capabilities. In addition, the backends will determine the ideal adapters to use, depending on the desired purpose.
  • #17: Each render backend also supports a debugging and profiling interface – this provides functionality like RenderDoc integration, CPU and GPU validation layers, etc.. Most importantly, render backends support the creation and destruction of render devices, which I’ll cover shortly.
  • #18: We have a variety of render backends implemented
  • #19: For DirectX 12, we support the latest shader model, DirectX ray tracing, full bindless resources, explicit heterogeneous mGPU, and we plan to add support for DirectML.
  • #20: Vulkan is a similar story to DirectX 12, except we haven’t implemented multi-GPU or ray tracing support at this time, but it is planned.
  • #21: Metal 2 is still in early development, but very promising. We are primarily interested in using it on macOS instead of iOS.
  • #22: We have another pretty crazy backend which I unfortunately won’t get into today, but a near future presentation will go in depth on this one.
  • #23: Finally, we have our mock render backend which is for unit testing, debugging, and validation. This backend does all the same work the other backends do, except translation from high level to low level just runs a validation test suite instead of submitting to a graphics API.
  • #24: The next component to discuss is render device
  • #25: Render device is an abstraction of a logical GPU adapter, such as VkDevice or ID3D12Device. It provides an interface to various GPU queues, and provides API specific command list scheduling and submission functionality
  • #26: Render device has full ownership and lifetime tracking of its GPU resources, and provides create and destroy functionality for all the high level render resource types. The high level application refers to these resources using handles, so render device also internally provides an efficient mapping of these handles to internal device resources.
  • #27: Speaking of render handles…
  • #28: The handles are lightweight (just 64 bits), and the associated resources are fetched with a constant-time lookup. The handles are type safe, like trying to pass a buffer in place of a texture. The handles can be serialized or transmitted, as long as some contract is in place about what the handle represents. And the handles are generational for safety, which protects against double-delete, or using a handle after deletion
  • #29: Handles allow a one-to-many cardinality where a single handle can have a unique representation on each device
  • #30: There is an API to query if a particular render device has a handle loaded or not. This makes it safely add and remove devices while the application is running. The handle is owned by the application, so the representation can be loaded or reloaded for any number of devices.
  • #31: Shared resources are also supported, allowing for copying data between various render devices.
  • #32: A crazy feature of this architecture is that we can mix and match different backends in the same process! This made debugging Vulkan much easier, as I could have Dx12 rendering on the left half of the screen, while Vulkan was rendering on the right half of the screen.
  • #33: An key rendering component is our high level command stream
  • #34: We developed an API agnostic high level command stream that allows the application to efficiently express how a scene should be updated and rendered, but letting each backend control how this is done.
  • #35: Each render command specifies a queue type, which is primarily for spec validation (such as putting graphics work on a compute queue), and also to aid in automatic scheduling, like with async compute.
  • #36: As an example, here is a compute dispatch command, which specifies Compute as the queue type. The underlying backend and scheduler can interpret this accordingly
  • #37: The high level commands are encoded into a render command list. As each command is encoded, a queue mask is updated which helps indicate the scheduling rules and restrictions for that command list. The commands are stateless, which allows for fully parallel recording.
  • #38: The render command lists are compiled by each render backend, which means the high level commands are translated to the low level API. Command lists can be compiled once, and submitted multiple times, in appropriate cases. The low level translation is a serial operation by design, as this is basically running at memcpy speed. Doing this part serial means we get perfect redundant state filtering.
  • #39: Another significant rendering component is our render graph
  • #40: Render graph is inspired by Frostbite’s Frame Graph Just like Frame Graph, we automatically handle transient resources. Complex rendering uses a number of temporary buffers and textures, that stay alive just for the purpose and duration of an algorithm. These are called transient resources, and the graph automatically manages these lifetimes efficiently. Automatic resource transitions are also performed
  • #41: Compared to Frame Graph, we opted for a simpler memory management model. We are not targeting current consoles with Halcyon, so we don’t fine grained memory reuse. Current PC drivers and APIs are not as efficient in this area (for a variety of reasons), and you lose ~5% on aliasing barriers and discards. In render graph, we support automatic queue scheduling (graphics, copy, compute, etc..). This is an area of ongoing research, as it’s not enough to just specify input and output dependencies. You also need heuristics on task duration and bottlenecks to further improve the scheduling.
  • #42: We call our implementation Render Graph, because we don’t have the concept of a “frame”. Our transitions and split barriers are fully automatic, and there is a single implementation, regardless of backend. This is thanks to our high level render command stream, which hides API differences from render graph. We also support multi-gpu, which is mostly implicit and automatic, with the exception of a scheduling policy you configure on your pass – this represents what devices a pass runs on, and what transfers are performed going in or out of the pass.
  • #43: Render graph supports composition of multiple graphs at varying frequencies. These graphs can run on the same GPU - such as async compute. They can run with multi-gpu, with a graphs on each GPU. And they can even run out of core, such as in a server cluster, or on another machine streamed remotely.
  • #44: There are a number of techniques that can easily run at a different frequency from the main graph, such as object space translucency and reflection, or our surfel based global illumination.
  • #45: Render graph runs in two phases, construction and evaluation. Construction specifies the input and output dependencies, and this is a serial operation by design. If you implement a hierarchical gaussian blur as a single pass that gets chained X times, you want to read the size of the input in each step, and generate the proper sized output. Maybe you could split it up into some threads, but tracking dependencies in order to do it in a parallel fashion might be more costly than actually running it serially. Evaluation is highly parallelized, and this is where high level render commands are recorded, and automatic barriers and transitions are inserted.
  • #46: Here is an example render graph pass, with the construction phase up top, and the evaluation phase at the bottom.
  • #47: Render graph can collect automatic profiling data, presenting you with GPU and CPU counters per-pass Additionally, this works with mGPU, where each GPU is shown in the list
  • #48: There is also a live debugging overlay, using ImGui. This overlay shows the evaluated render graph passes in-order of execution, the input and output dependencies, and the resource version information.
  • #50: This is a list of some of our render graph passes in PICA PICA
  • #51: Shaders are also an important component
  • #52: We implemented a system which supports multiple microfacet layers arranged into stacks. The stacks could also be probabilistically blended to support prefiltering of material mixtures. This allowed us to rapidly experiment with the look of our demo, while at the same time enforcing energy conservation, and avoiding common gamedev hacks like “metalness”. An in-engine material editor meant that we could quickly jump in and change the look of everything. The material system works with all our render modes, be that pure rasterization, path tracing, or hybrid. For performance, we can bake down certain permutations for production.
  • #53: We exclusively use HLSL 6 as a source language, and the majority of our shaders are compute Performance is critical, so we rely heavily on group shared memory, and wave operations
  • #54: We don’t rely on any reflection information, outside of validation. We avoid costly lookups by requiring explicit bindings. We also make extensive use of HLSL spaces, where the spaces can be updated at varying frequency.
  • #55: Here is a flow graph of our shader compilation. We always start with HLSL compiled with the DXC shader compiler. The DXIL is used by Direct3D 12, and the SPIR-V is used by Vulkan 1.1. We also have support for taking the SPIR-V, running it through SPIRV-CROSS, and generating MSL for Metal, or ISPC to run on the CPU.
  • #56: We have a concept in Halcyon called shader arguments, where each argument represents an HLSL space. We limit the maximum number of arguments to 4 for efficiency, but this can be configured. It is important to note that this limit represents the maximum number of spaces, not the maximum number of resources.
  • #57: Each shader argument contains a “ShaderViews” handle, which refers to a collection of SRV and UAV handles. Additionally, each shader argument also contains a constant buffer handle and offset into the buffer.
  • #58: Our constant buffers are all dynamic, and we avoid having temporary descriptors. We have just a few large buffers, and offsets into these buffers change frequently. For Vulkan, we use the uniform buffer dynamic descriptor type, and on Dx12 we use root descriptors, just passing in a GPU virtual address. All our descriptor sets are only written once, then persisted or cached.
  • #59: Here is an example HLSL snippet showing one usage of spaces. Each space contains a collection of SRVs and a constant buffer. The shader argument configuration on the CPU is shown below.
  • #60: Our architecture simplified the development effort, for Vulkan we just needed a new backend and device implementation, API specific memory allocators, barrier and transition logic, and a resource binding model that aligned with our shader compilation.
  • #61: The first stage was to get command translation working, and performing the correct barriers and transitions. This was done using the Vulkan validation layers, and lots of glorious printf debugging
  • #62: The second stage was getting basic ImGui, resource creation, and swap chain flip working. This meant I could easily toggle any display mode or render settings while debugging.
  • #63: Naturally, fun bugs occurred
  • #64: The third stage was to bring up more of the Halcyon asset loading operations, and get a basic entry point running
  • #65: Plenty of fun bugs with this, as well
  • #66: Plenty of fun bugs with this, as well
  • #67: The final stage was to bring up absolutely everything else, including render graph! To simplify things, I worked on getting just the normals and albedo display modes working. I relied on the fact that render graph will cull any passes not contributing to the final result, so I could easily remove problematic passes from running while I get the basics working.
  • #68: With that working, I started to bring up the more complicated passes. As expected, there were plenty of fun issues to sort through.
  • #69: Eventually, everything worked!
  • #70: Eventually, everything worked!
  • #71: Eventually, everything worked!
  • #72: An important part of our Vulkan backend, was consuming HLSL as a source language, and fixing up the SPIR-V to behave the same as our Dx12 resource binding model. We decided to use the spirv-reflect library from Hai and Cort; it does a great job at providing SPIR-V reflection using DX12 terminology, but we use it exclusively to patch up our descriptor binding numbers.
  • #73: SRVs, Samplers, and UAVs are simple. These types are uniquely namespaced in DX12, so t0 and s0 wouldn’t collide. This is not the case in SPIR-V, so we apply a simple offset to each type to emulate this behavior.
  • #74: Constant or uniform buffers are a bit more interesting. We want to move CBVs to their own descriptor sets, in order to make our ShaderViews representing the other resource types persistent and immutable. To do so, we don’t adjust the offset, as we’ll have a single CBV per descriptor set. However, we do shift the descriptor set number by the max number of shader arguments. This means if descriptor set 0 contained a constant buffer, that constant buffer would move to descriptor set 5 (if max shader arguments is 4).
  • #75: If a dispatch or draw is using 2 HLSL spaces, the patched SPIR-V will require the following descriptor set layout. Notice the shifted offsets for the SRVs, Samplers, and UAVs, and how the dynamic constant buffers have been hoisted out to their own descriptor sets.
  • #76: Another important aspect of a new render backend is the translation from our high level command stream to low level API calls.
  • #77: Here is an example of translating a high level compute dispatch command to Vulkan.
  • #78: Here is an example of translating a high level begin timing command to Vulkan timestamps
  • #79: And here is an example of translating a high level resolve timings command to Vulkan. Notice that the complexity behind fence tracking or resetting the query pool is completely hidden from the calling code. Each backend implementation can choose how to handle this efficiently.
  • #80: This comparison shows that our Vulkan implementation is nearing the performance of our DirectX 12 version, and is completely usable. There are a number of reasons for the delta, but none of them represent any amount of significant work to resolve.
  • #81: I will briefly mention some useful tools used for the Vulkan implementation
  • #82: For debugging and profiling, the usual suspects were quite helpful, and used extensively.
  • #83: For debugging and profiling, the usual suspects were quite helpful, and used extensively.
  • #84: The API statistics view in Nvidia Nsight is a great way to look at the count and cost of each API call.
  • #85: Another awesome feature with Nsight is the ability to export a capture as a standalone C++ application. Nsight will write out binary blobs of your resources in the capture, and write out source code issues your API calls. You can build this app and debug problems in an isolated solution.
  • #86: Another awesome feature with Nsight is the ability to export a capture as a standalone C++ application. Nsight will write out binary blobs of your resources in the capture, and write out source code issues your API calls. You can build this app and debug problems in an isolated solution.
  • #87: Here is our bug being reproduced in the standalone C++ export
  • #88: With our goal of having everything be live-reloadable and live-tweakable, we used DearImGui and ImGuizmo extensively for a number of useful overlays
  • #91: A special thanks to all these people, that were helpful in our Vulkan journey
  • #92: And before I finish, I would like to thank all the people who contributed to the PICA PICA project. It was a very awesome and dedicated effort by our team, and we could not have done it without our external partners either.
  • #93: On one last note, I would like to point out that we’re hiring for multiple positions at SEED. If you’re interested, please give me a shout!
  • #94: Thank you for listening! And if we still have some time, I would love to answer your questions