Khronos Munich 2018 - Halcyon and Vulkan

Halcyon + Vulkan
Munich Khronos Meetup
Graham Wihlidal
SEED – Electronic Arts

“PICA PICA”
 Exploratory mini-game & world
 Goals
 Hybrid rendering with DXR [Andersson 2018]
 Clean and consistent visuals
 Self-learning AI agents [Harmer 2018]
 Procedural worlds [Opara 2018]
 No precomputation
 Uses SEED’s Halcyon R&D framework
S E E D // Halcyon + Vulkan

Halcyon Goals
 Rapid prototyping framework
 Different purpose than Frostbite
 Fast experimentation vs. AAA games
 Windows, Linux, macOS

Halcyon Goals
 Minimize or eliminate busy-work
 Artist “meta-data” meshes
 Occlusion
 GI / Lighting
 Collision
 Level-of-detail
 Live reloading of all assets
 Insanely fast iteration times

Halcyon Goals
 Only target modern APIs
 Direct3D 12
 Vulkan 1.1
 Metal 2
 Multi-GPU
 Explicit heterogeneous mGPU
 No AFR nonsense
 No linked adapters

Halcyon Goals
 Local or remote streaming
 Minimal boilerplate code
 Variety of rendering techniques and approaches
 Rasterization
 Path and ray tracing
 Hybrid

Hybrid Rendering
Direct Shadows
(ray trace or raster)
Direct Lighting
(compute)
Reflections
(ray trace or compute)
Global Illumination
(ray trace)
Post Processing
(compute)
Transparency & Translucency
(ray trace)
Ambient Occlusion
(ray trace or compute)
Deferred Shading
(raster)

Halcyon Goals
 “PICA PICA” and Halcyon built from scratch
 Implemented lots of bespoke technology
 Minimal effort to add a new API or platform
 Efficient and flexible rendering was a major focus

Rendering Components
 Render Backend
 Render Device
 Render Handles
 Render Commands
 Render Graph
 Render Proxy

Rendering Components
Render Handles
Render Commands
Render Backend
Render Device
Render Backend
Render DeviceRender Device
Render Backend
Render Proxy
Render Graph Render Graph
Application

Render Backend
 Live-reloadable DLLs
 Enumerates adapters and capabilities
 Swap chain support
 Extensions (i.e. ray tracing, sub groups, …)
 Determine adapter(s) to use

Render Backend
 Provides debugging and profiling
 RenderDoc integration, validation layers, …
 Create and destroy render devices

Render Backend
 Direct3D 12
 Vulkan 1.1
 Metal 2
 Proxy
 Mock

Render Backend
 Direct3D 12
 Shader Model 6.X
 DirectX Ray Tracing
 Bindless Resources
 Explicit Multi-GPU
 DirectML (soon..)
 …

Render Backend
 Vulkan 1.1
 Sub-groups
 Descriptor indexing
 External memory
 Multi-draw indirect
 Ray tracing (soon..)
 …

Render Backend
 Metal 2
 Early development
 Primarily desktop
 Argument buffers
 Machine learning
 …

Render Backend
 Proxy
 Not discussed in this presentation

Render Backend
 Mock
 Performs resource tracking and validation
 Command stream is parsed and evaluated
 No submission to an API
 Useful for unit tests and debugging

Render Device
 Abstraction of a logical GPU adapter
 e.g. VkDevice, ID3D12Device, …
 Provides interface to GPU queues
 Command list submission

Render Device
 Ownership of GPU resources
 Create & Destroy
 Lifetime tracking of resources
 Mapping render handles  device resources

Render Handles
 Resources associated by handle
 Lightweight (64 bits)
 Constant-time lookup
 Type safety (i.e. buffer vs texture)
 Can be serialized or transmitted
 Generational for safety
 e.g. double-delete, usage after delete

Render Handles
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
 Handles allow one-to-many cardinality [handle->devices]
 Each device can have a unique representation of the handle

Render Handles
 Can query if a device has a handle loaded
 Safely add and remove devices
 Handle owned by application, representation can reload on device
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
Render Handle

Render Handles
 Shared resources are supported
 Primary device owner, secondaries alias primary
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
Render Handle

Render Handles
 Can also mix and match backends in the same process!
 Made debugging VK implementation much easier
 DX12 on left half of screen, VK on right half of screen
VK: Adapter 0
VkImage
Proxy: Adapter 0
Render Handle
Render Handle

Render Commands
 Draw
 DrawIndirect
 Dispatch
 DispatchIndirect
 UpdateBuffer
 UpdateTexture
 CopyBuffer
 CopyTexture
 Barriers
 Transitions
 BeginTiming
 EndTiming
 ResolveTimings
 BeginEvent
 EndEvent
 BeginRenderPass
 EndRenderPass
 RayTrace
 UpdateTopLevel
 UpdateBottomLevel
 UpdateShaderTable

Render Commands
 Queue type specified
 Spec validation
 Allowed to run?
 e.g. draws on compute
 Automatic scheduling
 Where can it run?
 Async compute

Render Commands

Render Command List
 Encodes high level commands
 Tracks queue types encountered
 Queue mask indicating scheduling rules
 Commands are stateless - parallel recording

Render Compilation
 Render command lists are “compiled”
 Translation to low level API
 Can compile once, submit multiple times
 Serial operation (memcpy speed)
 Perfect redundant state filtering

Render Graph
 Inspired by FrameGraph [O’Donnell 2017]
 Automatically handle transient resources
 Import explicitly managed resources
 Automatic resource transitions
 Render target batching
 DiscardResource
 Memory aliasing barriers
 …

Render Graph
 Basic memory management
 Not targeting current consoles
 Fine grained memory reuse sub-optimal with current PC drivers
 Lose ~5% on aliasing barriers and discards
 Automatic queue scheduling
 Ongoing research
 Need heuristics on task duration and bottlenecks
 e.g. Memory vs ALU
 Not enough to specify dependencies

Render Graph
 Frame Graph  Render Graph: No concept of a “frame”
 Fully automatic transitions and split barriers
 Single implementation, regardless of backend
 Translation from high level render command stream
 API differences hidden from render graph
 Support for mGPU
 Mostly implicit and automatic
 Can specify a scheduling policy
 Not discussed in this presentation

Render Graph
 Composition of multiple graphs at varying frequencies
 Same GPU: async compute
 mGPU: graphs per GPU
 Out-of-core: server cluster, remote streaming

Render Graph
 Composition of multiple graphs at varying frequencies
 e.g. translucency, refraction, global illumination

Render Graph
 Two phases
 Graph construction
 Specify inputs and outputs
 Serial operation (by design)
 Graph evaluation
 Highly parallelized
 Record high level render commands
 Automatic barriers and transitions

 Construction phase
 Evaluation phase

Render Graph
 Automatic profiling data
 GPU and CPU counters per-pass
 Works with mGPU
 Each GPU is profiled

Render Graph
 Live debugging overlay
 Evaluated passes in-order of execution
 Input and output dependencies
 Resource version information

Khronos Munich 2018 - Halcyon and Vulkan

Render Graph
Some of our render graph passes:
 Bloom
 BottomLevelUpdate
 BrdfLut
 CocDerive
 DepthPyramid
 DiffuseSh
 Dof
 Final
 GBuffer
 Gtao
 IblReflection
 ImGui
 InstanceTransforms
 Lighting
 MotionBlur
 Present
 RayTracing
 RayTracingAccum
 ReflectionFilter
 ReflectionSample
 ReflectionTrace
 Rtao
 Screenshot
 Segmentation
 ShaderTableUpdate
 ShadowFilter
 ShadowMask
 ShadowCascades
 ShadowTrace
 Skinning
 Ssr
 SurfelGapFill
 SurfelLighting
 SurfelPositions
 SurfelSpawn
 Svgf
 TemporalAa
 TemporalReproject
 TopLevelUpdate
 TranslucencyTrace
 Velocity
 Visibility

Shaders
 Complex materials
 Multiple microfacet layers
 [Stachowiak 2018]
 Energy conserving
 Automatic Fresnel between layers
 All lighting & rendering modes
 Raster, path-traced reference, hybrid
 Iterate with different looks
 Bake down permutations for production
Objects with Multi-Layered Materials

Shaders
 Exclusively HLSL
 Shader Model 6.X
 Majority are compute shaders
 Performance is critical
 Group shared memory
 Wave-ops / Sub-groups

Shaders
 No reflection
 Avoid costly lookups
 Only explicit bindings
 … except for validation
 Extensive use of HLSL spaces
 Updates at varying frequency
 Bindless

Shaders
SPIR-VDXIL
Vulkan 1.1Direct3D 12
ISPCMSL
SPIRV-CROSS
AVX2, …Metal 2
DXC
HLSL

Shader Arguments
 Commands refer to resources with “Shader Arguments”
 Each argument represents an HLSL space
 MaxShaderParameters  4 [Configurable]
 # of spaces, not # of resources

Shader Arguments
 Each argument contains:
 “ShaderViews” handle
 Constant buffer handle and offset
 “ShaderViews”
 Collection of SRV and UAV handles

Shader Arguments
 Constant buffers are all dynamic
 Avoid temporary descriptors
 Just a few large buffers, offsets change frequently
 VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC
 DX12 Root Descriptors (pass in GPU VA)
 All descriptor sets are only written once
 Persisted / cached

Vulkan Implementation
 Architecture simplified development effort
 Vulkan specific:
 Backend and device implementation
 Memory allocators (e.g. AMD VMA)
 Barrier and transition logic
 Resource binding model

 Shader compilation (HLSL  SPIR-V)
 Patch SPIR-V to match DX12
 Using spirv-reflect from Hai and Cort
 spvReflectCreateShaderModule
 spvReflectEnumerateDescriptorSets
 spvReflectChangeDescriptorBindingNumbers
 spvReflectGetCodeSize / spvReflectGetCode
 spvReflectDestroyShaderModule

SPIR-V Patching
 SPV_REFLECT_RESOURCE_FLAG_SRV
 Offset += 1000
 SPV_REFLECT_RESOURCE_FLAG_SAMPLER
 Offset += 2000
 SPV_REFLECT_RESOURCE_FLAG_UAV
 Offset += 3000

SPIR-V Patching
 SPV_REFLECT_RESOURCE_FLAG_CBV
 Offset Unchanged: 0
 Descriptor Set += MAX_SHADER_ARGUMENTS
 CBVs move to their own descriptor sets
 ShaderViews become persistent and immutable

SPIR-V Patching
 If 2 of 4 HLSL spaces in use:
Unbound
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Dynamic Constant Buffer (Offset: 0)
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Unbound
Dynamic Constant Buffer (Offset: 0)
Set 0
Set 1
Set 2
Set 3
Set 4
Set 5

 Translate commands
 Read command list
 Write Vulkan API

Tools
 RenderDoc
 NV Nsight
 AMD RGP

Tools

Tools
 C++ Export!
 Standalone

Dear ImGui + ImGuizmo
 Live tweaking
 Very useful!

References
 [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”.
available online
 [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”.
available online
 [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus
Nordin. “Imitation Learning with Concurrent Actions in 3D Games”.
available online
 [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”.
available online
 [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”.
available online

Thanks
 Matthäus Chajdas
 Rys Sommefeldt
 Timothy Lottes
 Tobias Hector
 Neil Henning
 John Kessenich
 Hai Nguyen
 Nuno Subtil
 Adam Sawicki
 Alon Or-bach
 Baldur Karlsson
 Cort Stratton
 Mathias Schott
 Rolando Caloca
 Sebastian Aaltonen
 Hans-Kristian Arntzen
 Yuriy O’Donnell
 Arseny Kapoulkine
 Tex Riddell
 Marcelo Lopez Ruiz
 Lei Zhang
 Greg Roth
 Noah Fredriks
 Qun Lin
 Ehsan Nasiri,
 Steven Perron
 Alan Baker
 Diego Novillo

Thanks
 SEED
 Johan Andersson
 Colin Barré-Brisebois
 Jasper Bekkers
 Joakim Bergdahl
 Ken Brown
 Dean Calver
 Dirk de la Hunt
 Jenna Frisk
 Paul Greveson
 Henrik Halen
 Effeli Holst
 Andrew Lauritzen
 Magnus Nordin
 Niklas Nummelin
 Anastasia Opara
 Kristoffer Sjöö
 Ida Winterhaven
 Tomasz Stachowiak
 Microsoft
 Chas Boyd
 Ivan Nevraev
 Amar Patel
 Matt Sandy
 NVIDIA
 Tomas Akenine-Möller
 Nir Benty
 Jiho Choi
 Peter Harrison
 Alex Hyder
 Jon Jansen
 Aaron Lefohn
 Ignacio Llamas
 Henry Moreton
 Martin Stich

S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N
S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E
S E E D . E A . C O M
W E ‘ R E H I R I N G !

Khronos Munich 2018 - Halcyon and Vulkan

More Related Content

What's hot (20)

Similar to Khronos Munich 2018 - Halcyon and Vulkan (20)

More from Electronic Arts / DICE (20)

Recently uploaded (20)

Khronos Munich 2018 - Halcyon and Vulkan

Editor's Notes