SlideShare a Scribd company logo
Halcyon Architecture
“Director’s Cut”
Graham Wihlidal
SEED – Electronic Arts
SEED - Halcyon Architecture
PICA PICA Trailer
https://www.youtube.com/watch?v=LXo0WdlELJk
“PICA PICA”
▪ Exploratory mini-game & world
▪ Goals
▪ Hybrid rendering with DXR [Andersson 2018]
▪ Clean and consistent visuals
▪ Self-learning AI agents [Harmer 2018]
▪ Procedural worlds [Opara 2018]
▪ No precomputation
▪ Uses SEED’s Halcyon R&D framework
S E E D // Halcyon Architecture “Director’s Cut”
HALCYON
Halcyon Goals
▪ Rapid prototyping framework
▪ Different purpose than Frostbite
▪ Fast experimentation vs. AAA games
▪ Windows, Linux, macOS
S E E D // Halcyon Architecture “Director’s Cut”
Halcyon Goals
▪ Minimize or eliminate busy-work
▪ Artist “meta-data” meshes
▪ Occlusion
▪ GI / Lighting
▪ Collision
▪ Level-of-detail
▪ Live reloading of all assets
▪ Insanely fast iteration times
S E E D // Halcyon Architecture “Director’s Cut”
Halcyon Goals
▪ Only target modern APIs
▪ Direct3D 12
▪ Vulkan 1.1
▪ Metal 2
▪ Multi-GPU
▪ Explicit heterogeneous mGPU
▪ No AFR nonsense
▪ No linked adapters
S E E D // Halcyon Architecture “Director’s Cut”
Halcyon Goals
▪ Local or remote streaming
▪ Minimal boilerplate code
▪ Variety of rendering techniques and approaches
▪ Rasterization
▪ Path and ray tracing
▪ Hybrid
S E E D // Halcyon Architecture “Director’s Cut”
Hybrid Rendering
S E E D // Halcyon Architecture “Director’s Cut”
Direct Shadows
(ray trace or raster)
Direct Lighting
(compute)
Reflections
(ray trace or compute)
Global Illumination
(ray trace)
Post Processing
(compute)
Transparency & Translucency
(ray trace)
Ambient Occlusion
(ray trace or compute)
Deferred Shading
(raster)
Hybrid Rendering
Rasterization Only
Halcyon Goals
▪ “PICA PICA” and Halcyon built from scratch
▪ Implemented lots of bespoke technology
▪ Minimal effort to add a new API or platform
▪ Efficient and flexible rendering was a major focus
S E E D // Halcyon Architecture “Director’s Cut”
Rendering Components
▪ Render Backend
▪ Render Device
▪ Render Handles
▪ Render Commands
▪ Render Graph
▪ Render Proxy
S E E D // Halcyon Architecture “Director’s Cut”
Halcyon Rendering
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
Render Commands
Render Backend
Render Device
Render Backend
Render DeviceRender Device
Render Backend
Render Proxy
Render Graph Render Graph
Application
Render Backend
Render Backend
▪ Live-reloadable DLLs
▪ Enumerates adapters and capabilities
▪ Swap chain support
▪ Extensions (i.e. ray tracing, sub groups, …)
▪ Determine adapter(s) to use
S E E D // Halcyon Architecture “Director’s Cut”
Render Backend
▪ Provides debugging and profiling
▪ RenderDoc integration, validation layers, …
▪ Create and destroy render devices
S E E D // Halcyon Architecture “Director’s Cut”
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Direct3D 12
▪ Vulkan 1.1
▪ Metal 2
▪ Proxy
▪ Mock
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Direct3D 12
▪ Shader Model 6.X
▪ DirectX Ray Tracing
▪ Bindless Resources
▪ Explicit Multi-GPU
▪ DirectML (soon..)
▪ …
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Vulkan 1.1
▪ Sub-groups
▪ Descriptor indexing
▪ External memory
▪ Multi-draw indirect
▪ Ray tracing (soon..)
▪ …
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Metal 2
▪ Early development
▪ Primarily desktop
▪ Argument buffers
▪ Machine learning
▪ …
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Proxy
▪ Discussed later in the presentation
Render Backend
S E E D // Halcyon Architecture “Director’s Cut”
▪ Mock
▪ Performs resource tracking and validation
▪ Command stream is parsed and evaluated
▪ No submission to an API
▪ Useful for unit tests and debugging
Render Device
Render Device
S E E D // Halcyon Architecture “Director’s Cut”
▪ Abstraction of a logical GPU adapter
▪ e.g. VkDevice, ID3D12Device, …
▪ Provides interface to GPU queues
▪ Command list submission
Render Device
S E E D // Halcyon Architecture “Director’s Cut”
▪ Ownership of GPU resources
▪ Create & Destroy
▪ Lifetime tracking of resources
▪ Mapping render handles → device resources
Render Handles
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
▪ Resources associated by handle
▪ Lightweight (64 bits)
▪ Constant-time lookup
▪ Type safety (i.e. buffer vs texture)
▪ Can be serialized or transmitted
▪ Generational for safety
▪ e.g. double-delete, usage after delete
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
▪ Handles allow one-to-many cardinality [handle->devices]
▪ Each device can have a unique representation of the handle
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
▪ Can query if a device has a handle loaded
▪ Safely add and remove devices
▪ Handle owned by application, representation can reload on device
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
▪ Shared resources are supported
▪ Primary device owner, secondaries alias primary
ID3D12Resource ID3D12Resource
DX12: Adapter 2
ID3D12Resource
DX12: Adapter 3
ID3D12Resource
DX12: Adapter 1DX12: Adapter 0
Render Handle
S E E D // Halcyon Architecture “Director’s Cut”
Render Handles
▪ Can also mix and match backends in the same process!
▪ Made debugging VK implementation much easier
▪ DX12 on left half of screen, VK on right half of screen
ID3D12Resource ID3D12Resource
VK: Adapter 0
VkImage
Proxy: Adapter 0
Render Handle
DX12: Adapter 1DX12: Adapter 0
Render Handle
Render Commands
Render Commands
▪ Draw
▪ DrawIndirect
▪ Dispatch
▪ DispatchIndirect
▪ UpdateBuffer
▪ UpdateTexture
▪ CopyBuffer
▪ CopyTexture
▪ Barriers
▪ Transitions
▪ BeginTiming
▪ EndTiming
▪ ResolveTimings
▪ BeginEvent
▪ EndEvent
▪ BeginRenderPass
▪ EndRenderPass
▪ RayTrace
▪ UpdateTopLevel
▪ UpdateBottomLevel
▪ UpdateShaderTable
S E E D // Halcyon Architecture “Director’s Cut”
Render Commands
▪ Queue type specified
▪ Spec validation
▪ Allowed to run?
▪ e.g. draws on compute
▪ Automatic scheduling
▪ Where can it run?
▪ Async compute
S E E D // Halcyon Architecture “Director’s Cut”
Render Commands
S E E D // Halcyon Architecture “Director’s Cut”
Render Command List
▪ Encodes high level commands
▪ Tracks queue types encountered
▪ Queue mask indicating scheduling rules
▪ Commands are stateless - parallel recording
S E E D // Halcyon Architecture “Director’s Cut”
Render Compilation
▪ Render command lists are “compiled”
▪ Translation to low level API
▪ Can compile once, submit multiple times
▪ Serial operation (memcpy speed)
▪ Perfect redundant state filtering
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
Render Graph
▪ Inspired by FrameGraph [O’Donnell 2017]
▪ Automatically handle transient resources
▪ Import explicitly managed resources
▪ Automatic resource transitions
▪ Render target batching
▪ DiscardResource
▪ Memory aliasing barriers
▪ …
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Basic memory management
▪ Not targeting current consoles
▪ Fine grained memory reuse sub-optimal with current PC drivers
▪ Lose ~5% on aliasing barriers and discards
▪ Automatic queue scheduling
▪ Ongoing research
▪ Need heuristics on task duration and bottlenecks
▪ e.g. Memory vs ALU
▪ Not enough to specify dependencies
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Frame Graph → Render Graph: No concept of a “frame”
▪ Fully automatic transitions and split barriers
▪ Single implementation, regardless of backend
▪ Translation from high level render command stream
▪ API differences hidden from render graph
▪ Support for mGPU
▪ Mostly implicit and automatic
▪ Can specify a scheduling policy
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Composition of multiple graphs at varying frequencies
▪ Same GPU: async compute
▪ mGPU: graphs per GPU
▪ Out-of-core: server cluster, remote streaming
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Composition of multiple graphs at varying frequencies
▪ e.g. translucency, refraction, global illumination
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Two phases
▪ Graph construction
▪ Specify inputs and outputs
▪ Serial operation (by design)
▪ Graph evaluation
▪ Highly parallelized
▪ Record high level render commands
▪ Automatic barriers and transitions
S E E D // Halcyon Architecture “Director’s Cut”
 Construction phase
 Evaluation phase
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Explicit heterogeneous mGPU
▪ Parallel fork-join approach
▪ Resources copied through system
memory using copy queue
▪ ~1ms for every 15mb transferred
▪ Minimize PCI-E transfers
▪ Immutable data replicated
▪ Tightly pack data
GPU1 GPU2 GPU3 GPU4
Partition1[Primary]
Partition2[Secondary]
Partition3[Secondary]
Partition4[Secondary]
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Workloads are divided into partitions
▪ Based on GPU device count
▪ Single primary device
▪ Other devices are secondaries
▪ Variety of scheduling and transfer
patterns are necessary
▪ Simple rules engine
GPU1 GPU2 GPU3 GPU4
Partition1[Primary]
Partition2[Secondary]
Partition3[Secondary]
Partition4[Secondary]
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Run ray generation on primary GPU
▪ Copy results in sub-regions to other
GPUs
▪ Run tracing phases on separate GPUs
▪ Copy tracing results back to primary
GPU
▪ Run filtering on primary GPU
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Only width is divided
▪ Simplifies textures vs. buffers
▪ Passes are unaware of GPU count
▪ Lots of fun coordinate snapping bugs
▪ i.e. 3 GPUs partitioned to 0.33333…
▪ Lots of fun coordinate snapping bugs
▪ 16 GPUs! (because, why not?)
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ RenderGraphSchedule
▪ NoDevices → Pass is disabled
▪ AllDevices → Pass runs on all devices
▪ PrimaryDevice → Pass only runs on primary device
▪ SecondaryDevices → Pass runs on secondaries if count > 1, otherwise primary
▪ OnlySecondaryDevices → Pass only runs on secondary devices, disabled unless mGPU
Requested Per Pass →
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ RenderTransferPartition
▪ PartitionAll → Select all partitions from device
▪ PartitionIsolated → Select isolated region from device
▪ RenderTransferFilter
▪ AllDevices → Transfer completes on all devices
▪ PrimaryDevice → Transfer completes on the primary device
▪ SecondaryDevices → Transfer completes on all secondary devices
Requested Per Pass →
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ PartitionAll → PartitionAll
▪ Copies full resource on one GPU to full resource on all specified GPUs
▪ PartitionAll → PartitionIsolated
▪ Copies full resource on one GPU to isolated regions on all specified GPUs (partial copies)
▪ PartitionIsolated → PartitionAll
▪ (Invalid configuration)
▪ PartitionIsolated → PartitionIsolated
▪ Copies isolated region on one GPU to isolated regions on all specified GPUs (partial copies)
Devices this pass will run on
Schedule transfers in or out
Scaling work dimensions for each GPU
Some bugs were obvious
Some bugs were obvious
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Some bugs were subtle
▪ Weird cell shading? ☺
▪ Incorrect transfers?
▪ Transfers in (input data)
▪ Transfers out (result data)
▪ Incorrect scheduling?
▪ Pass not running
▪ Pass running when it shouldn’t
▪ Partition window
Render Graph
Some of our render graph passes:
S E E D // Halcyon Architecture “Director’s Cut”
▪ Bloom
▪ BottomLevelUpdate
▪ BrdfLut
▪ CocDerive
▪ DepthPyramid
▪ DiffuseSh
▪ Dof
▪ Final
▪ GBuffer
▪ Gtao
▪ IblReflection
▪ ImGui
▪ InstanceTransforms
▪ Lighting
▪ MotionBlur
▪ Present
▪ RayTracing
▪ RayTracingAccum
▪ ReflectionFilter
▪ ReflectionSample
▪ ReflectionTrace
▪ Rtao
▪ Screenshot
▪ Segmentation
▪ ShaderTableUpdate
▪ ShadowFilter
▪ ShadowMask
▪ ShadowCascades
▪ ShadowTrace
▪ Skinning
▪ Ssr
▪ SurfelGapFill
▪ SurfelLighting
▪ SurfelPositions
▪ SurfelSpawn
▪ Svgf
▪ TemporalAa
▪ TemporalReproject
▪ TopLevelUpdate
▪ TranslucencyTrace
▪ Velocity
▪ Visibility
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Implicit data flow via explicit scopes
▪ “Long-distance” extensible parameter passing
▪ Scope given to each render pass
▪ Can create nested scope for sub-graph
▪ Results stored into scope
▪ Hygiene via nesting and shadowing
{
gbuffer <- render_opaque()
gbuffer <- render_decals(gbuffer)
{
gbuffer <- render_opaque()
render_lighting(gbuffer)
} -> envmap
apply_envmap(gbuffer, envmap)
}
struct RenderGraphAreaLight
{
RenderGraphResource triangleLightList;
uint32 triangleCount;
};
Render Graph
S E E D // Halcyon Architecture “Director’s Cut”
▪ Lookup by type
▪ scope.get<T>() -> &T
▪ Parameters in “plain old data” structs
▪ RenderGraphResource, RenderHandle
▪ float, int, mat4, etc.
{
gbuffer <- render_opaque()
gbuffer <- render_decals(gbuffer)
{
gbuffer <- render_opaque()
render_lighting(gbuffer)
} -> envmap
apply_envmap(gbuffer, envmap)
}
struct RenderGraphAreaLight
{
RenderGraphResource triangleLightList;
uint32 triangleCount;
};
Render Graph DSL
▪ Experimental
▪ Macro Magic
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Automatic profiling data
▪ GPU and CPU counters per-pass
▪ Works with mGPU
▪ Each GPU is profiled
S E E D // Halcyon Architecture “Director’s Cut”
Render Graph
▪ Live debugging overlay
▪ Evaluated passes in-order of execution
▪ Input and output dependencies
▪ Resource version information
S E E D // Halcyon Architecture “Director’s Cut”
SEED - Halcyon Architecture
Virtual Multi-GPU
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ Most developers have single GPU
▪ Uncommon for 2 GPU machines
▪ Rare for 3+ GPU
▪ Practical for show floor and cranking up to 11
▪ Impractical for regular development ☺
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ Build device indirection table
▪ Virtual device index → adapter index
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ Create multiple instances of a device
▪ Virtual GPUs execute sequentially (WDDM)
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ Increases overall wall time (don’t use for profiling)
▪ Amazing for development and testing!
DX12: Adapter 2
ID3D12ResourceID3D12Resource
DX12: Adapter 1
ID3D12Resource
DX12: Adapter 0
Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ PICA PICA developers all had 1 GPU
▪ Limited testing with 2 GPUs
▪ Show floor at GDC 2018 was 4 GPUs
▪ Virtual-only testing…
▪ Crossed fingers
▪ Worked flawlessly!
S E E D // Halcyon Architecture “Director’s Cut”
Virtual Multi-GPU
▪ Develop and debug multi-GPU with only a single GPU
▪ Virtual mGPU reliably reproduces most bugs!
▪ Entire features developed without physical mGPU
▪ i.e. Surfel GI (the night before GDC.. ☺)
Render Proxy
▪ Remote render backend
▪ Any API / Any OS
Render Proxy
S E E D // Halcyon Architecture “Director’s Cut”
▪ Render API calls are routed remotely
▪ Uses gRPC (high performance RPC framework)
▪ Use an API on an incompatible OS
▪ e.g. Direct3D 12 on macOS or Linux
Render Proxy
S E E D // Halcyon Architecture “Director’s Cut”
▪ Scale large workloads with a GPU cluster
▪ Some API as render graph mGPU
▪ Only rendering is routed, scene state is local
▪ Work from the couch!
▪ i.e. DirectX ray tracing on a MacBook ☺
Render Proxy
S E E D // Halcyon Architecture “Director’s Cut”
Render device → gRPC
Protobuf 3
Schema
▪ The possibilities are endless!
Render Proxy
S E E D // Halcyon Architecture “Director’s Cut”
Machine Learning
Machine Learning
▪ Deep reinforcement learning
▪ Rendering 36 semantic views
▪ Training with TensorFlow
▪ On-premise GPU cluster
▪ Inference with TensorFlow
▪ CPU AVX2
▪ In-process
S E E D // Halcyon Architecture “Director’s Cut”
Machine Learning
▪ Adding inferencing with DirectML
▪ Hardware accelerated inferencing operators
▪ Resource management
▪ Schedule ML work explicitly
▪ Interleave ML work with other GPU workloads
▪ Fall back for other APIs
S E E D // Halcyon Architecture “Director’s Cut”
Machine Learning
▪ Treat trained ML models like any other 3D asset
▪ Render Graph abstractions
▪ Reference the same render resources
▪ Similar to chaining compute passes
▪ Record “meta” render commands
▪ Backends can fuse or transform, if desired
S E E D // Halcyon Architecture “Director’s Cut”
Machine Learning
▪ Provide operators as render commands
S E E D // Halcyon Architecture “Director’s Cut”
▪ Activation
▪ Convolution
▪ Elementwise
▪ FC
▪ GRU
▪ LSTM
▪ MatMul
▪ Normalization
▪ Pooling
▪ Random
▪ RNN
▪ etc.
Asset Pipelines
Asset Pipelines
▪ Geometry
▪ Animations
▪ Shaders
▪ Sounds
▪ Music
▪ Textures
▪ Scenes
▪ etc.
S E E D // Halcyon Architecture “Director’s Cut”
Asset Pipelines
▪ Everything is content addressable
▪ Hash of data is the identity
▪ Sha256
▪ Merkle trees!
▪ Dependency evaluation
S E E D // Halcyon Architecture “Director’s Cut”
https://en.wikipedia.org/wiki/Merkle_tree
Asset Pipelines
▪ Containerized, running on Kubernetes
▪ Google Cloud Platform
▪ On-Premise Cluster
▪ AMD 1950X TR
▪ NV Titan V
▪ Communication using gRPC and Protobuf
▪ Google Cloud Storage
S E E D // Halcyon Architecture “Director’s Cut”
Asset Pipelines
▪ Analytics with Prometheus and Grafana
▪ Publish custom metrics
▪ Scraped into rich UI
▪ Collecting data is important!
S E E D // Halcyon Architecture “Director’s Cut”
SEED - Halcyon Architecture
Shaders
Shaders
▪ Complex materials
▪ Multiple microfacet layers
▪ [Stachowiak 2018]
▪ Energy conserving
▪ Automatic Fresnel between layers
▪ All lighting & rendering modes
▪ Raster, path-traced reference, hybrid
▪ Iterate with different looks
▪ Bake down permutations for production
S E E D // Halcyon Architecture “Director’s Cut”
Objects with Multi-Layered Materials
Shaders
▪ Exclusively HLSL
▪ Shader Model 6.X
▪ Majority are compute shaders
▪ Performance is critical
▪ Group shared memory
▪ Wave-ops / Sub-groups
S E E D // Halcyon Architecture “Director’s Cut”
Shaders
▪ No reflection
▪ Avoid costly lookups
▪ Only explicit bindings
▪ … except for validation
▪ Extensive use of HLSL spaces
▪ Updates at varying frequency
▪ Bindless
S E E D // Halcyon Architecture “Director’s Cut”
Shaders
S E E D // Halcyon Architecture “Director’s Cut”
SPIR-VDXIL
Vulkan 1.1Direct3D 12
ISPCMSL
SPIRV-CROSS
AVX2, …Metal 2
DXC
HLSL
Shader Arguments
▪ Commands refer to resources with “Shader Arguments”
▪ Each argument represents an HLSL space
▪ MaxShaderParameters → 4 [Configurable]
▪ # of spaces, not # of resources
S E E D // Halcyon Architecture “Director’s Cut”
Shader Arguments
▪ Each argument contains:
▪ “ShaderViews” handle
▪ Constant buffer handle and offset
▪ “ShaderViews”
▪ Collection of SRV and UAV handles
S E E D // Halcyon Architecture “Director’s Cut”
Shader Arguments
▪ Constant buffers are all dynamic
▪ Avoid temporary descriptors
▪ Just a few large buffers, offsets change frequently
▪ VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC
▪ DX12 Root Descriptors (pass in GPU VA)
▪ All descriptor sets are only written once
▪ Persisted / cached
S E E D // Halcyon Architecture “Director’s Cut”
SEED - Halcyon Architecture
Vulkan Implementation
▪ Architecture simplified development effort
▪ Vulkan specific:
▪ Backend and device implementation
▪ Memory allocators (e.g. AMD VMA)
▪ Barrier and transition logic
▪ Resource binding model
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan Implementation
S E E D // Halcyon Architecture “Director’s Cut”
Vulkan! ☺
Vulkan! ☺
Vulkan! ☺
Vulkan Implementation
▪ Shader compilation (HLSL → SPIR-V)
▪ Patch SPIR-V to match DX12
▪ Using spirv-reflect from Hai and Cort
▪ spvReflectCreateShaderModule
▪ spvReflectEnumerateDescriptorSets
▪ spvReflectChangeDescriptorBindingNumbers
▪ spvReflectGetCodeSize / spvReflectGetCode
▪ spvReflectDestroyShaderModule
S E E D // Halcyon Architecture “Director’s Cut”
SPIR-V Patching
▪ SPV_REFLECT_RESOURCE_FLAG_SRV
▪ Offset += 1000
▪ SPV_REFLECT_RESOURCE_FLAG_SAMPLER
▪ Offset += 2000
▪ SPV_REFLECT_RESOURCE_FLAG_UAV
▪ Offset += 3000
S E E D // Halcyon Architecture “Director’s Cut”
SPIR-V Patching
▪ SPV_REFLECT_RESOURCE_FLAG_CBV
▪ Offset Unchanged: 0
▪ Descriptor Set += MAX_SHADER_ARGUMENTS
▪ CBVs move to their own descriptor sets
▪ ShaderViews become persistent and immutable
S E E D // Halcyon Architecture “Director’s Cut”
SPIR-V Patching
S E E D // Halcyon Architecture “Director’s Cut”
▪ If 2 of 4 HLSL spaces in use:
Unbound
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Dynamic Constant Buffer (Offset: 0)
SRVs (>=1000) UAVs (>=3000)Samplers (>=2000)
Unbound
Dynamic Constant Buffer (Offset: 0)
Set 0
Set 1
Set 2
Set 3
Set 4
Set 5
Vulkan Implementation
▪ Translate commands
▪ Read command list
▪ Write Vulkan API
S E E D // Halcyon Architecture “Director’s Cut”
S E E D // Halcyon Architecture “Director’s Cut”
S E E D // Halcyon Architecture “Director’s Cut”
SEED - Halcyon Architecture
Ongoing Work!
VK nearing DX12
Tools
Tools
▪ RenderDoc
▪ NV Nsight
▪ AMD RGP
S E E D // Halcyon Architecture “Director’s Cut”
S E E D // Halcyon Architecture “Director’s Cut”
Tools
S E E D // Halcyon Architecture
Tools
S E E D // Halcyon Architecture
▪ C++ Export!
▪ Standalone
SEED - Halcyon Architecture
SEED - Halcyon Architecture
Dear ImGui + ImGuizmo
▪ Live tweaking
▪ Very useful!
S E E D // Halcyon Architecture “Director’s Cut”
SEED - Halcyon Architecture
References
S E E D // Halcyon Architecture “Director’s Cut”
▪ [Wihlidal 2018] Graham Wihlidal, Colin Barré-Brisebois. “Modern Graphics Abstractions & Real-Time Ray Tracing” .
available online
▪ [Wihlidal 2018] Graham Wihlidal. “Halcyon + Vulkan”.
available online
▪ [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”.
available online
▪ [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”.
available online
▪ [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus Nordin. “Imitation
Learning with Concurrent Actions in 3D Games”.
available online
▪ [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”.
available online
▪ [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”.
available online
Thanks
▪ Matthäus Chajdas
▪ Rys Sommefeldt
▪ Timothy Lottes
▪ Tobias Hector
▪ Neil Henning
▪ John Kessenich
▪ Hai Nguyen
▪ Nuno Subtil
▪ Adam Sawicki
▪ Alon Or-bach
▪ Baldur Karlsson
▪ Cort Stratton
▪ Mathias Schott
▪ Rolando Caloca
▪ Sebastian Aaltonen
▪ Hans-Kristian Arntzen
▪ Yuriy O’Donnell
▪ Arseny Kapoulkine
▪ Tex Riddell
▪ Marcelo Lopez Ruiz
▪ Lei Zhang
▪ Greg Roth
▪ Noah Fredriks
▪ Qun Lin
▪ Ehsan Nasiri,
▪ Steven Perron
▪ Alan Baker
▪ Diego Novillo
▪ Tomasz Stachowiak
Thanks
▪ SEED
▪ Johan Andersson
▪ Colin Barré-Brisebois
▪ Jasper Bekkers
▪ Joakim Bergdahl
▪ Ken Brown
▪ Dean Calver
▪ Dirk de la Hunt
▪ Jenna Frisk
▪ Paul Greveson
▪ Henrik Halen
▪ Effeli Holst
▪ Andrew Lauritzen
▪ Magnus Nordin
▪ Niklas Nummelin
▪ Anastasia Opara
▪ Kristoffer Sjöö
▪ Ida Winterhaven
▪ Tomasz Stachowiak
▪ Microsoft
▪ Chas Boyd
▪ Ivan Nevraev
▪ Amar Patel
▪ Matt Sandy
▪ NVIDIA
▪ Tomas Akenine-Möller
▪ Nir Benty
▪ Jiho Choi
▪ Peter Harrison
▪ Alex Hyder
▪ Jon Jansen
▪ Aaron Lefohn
▪ Ignacio Llamas
▪ Henry Moreton
▪ Martin Stich
S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N
S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E
S E E D . E A . C O M
W E ‘ R E H I R I N G !
Questions?
Graham Wihlidal
graham@ea.com
@gwihlidal

More Related Content

PDF
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
PDF
Ndc2010 전형규 마비노기2 캐릭터 렌더링 기술
PPTX
Approaching zero driver overhead
PDF
[Kgc2012] deferred forward 이창희
PPT
Crysis Next-Gen Effects (GDC 2008)
PDF
Screen Space Decals in Warhammer 40,000: Space Marine
PDF
멀티스레드 렌더링 (Multithreaded rendering)
PPT
프레임레이트 향상을 위한 공간분할 및 오브젝트 컬링 기법
쉐도우맵을 압축하여 대규모씬에 라이팅을 적용해보자
Ndc2010 전형규 마비노기2 캐릭터 렌더링 기술
Approaching zero driver overhead
[Kgc2012] deferred forward 이창희
Crysis Next-Gen Effects (GDC 2008)
Screen Space Decals in Warhammer 40,000: Space Marine
멀티스레드 렌더링 (Multithreaded rendering)
프레임레이트 향상을 위한 공간분할 및 오브젝트 컬링 기법

What's hot (20)

PDF
Screen Space Reflections in The Surge
PDF
Voxelizaition with GPU
PPTX
191019 Forward / Deferred Rendering
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PDF
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
PDF
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
PPTX
Terrain in Battlefield 3: A Modern, Complete and Scalable System
PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PDF
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPTX
Hierachical z Map Occlusion Culling
PPTX
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
PPT
Shadow mapping 정리
PPTX
[Ndc11 박민근] deferred shading
PDF
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
PDF
전형규, 가성비 좋은 렌더링 테크닉 10선, NDC2012
PPSX
Advancements in-tiled-rendering
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
PDF
나만의 엔진 개발하기
Screen Space Reflections in The Surge
Voxelizaition with GPU
191019 Forward / Deferred Rendering
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
OpenGL NVIDIA Command-List: Approaching Zero Driver Overhead
자습해도 모르겠던 딥러닝, 머리속에 인스톨 시켜드립니다.
Terrain in Battlefield 3: A Modern, Complete and Scalable System
FrameGraph: Extensible Rendering Architecture in Frostbite
문석진, 프로젝트DH의 절차적 애니메이션 시스템 Ⅱ, NDC2018
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
Physically Based and Unified Volumetric Rendering in Frostbite
Hierachical z Map Occlusion Culling
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Shadow mapping 정리
[Ndc11 박민근] deferred shading
스크린 스페이스 데칼에 대해 자세히 알아보자(워햄머 40,000: 스페이스 마린)
전형규, 가성비 좋은 렌더링 테크닉 10선, NDC2012
Advancements in-tiled-rendering
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
나만의 엔진 개발하기
Ad

Similar to SEED - Halcyon Architecture (20)

PPTX
Khronos Munich 2018 - Halcyon and Vulkan
PDF
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
PDF
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
PPTX
Mantle for Developers
PDF
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
PDF
thesis
PDF
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
KEY
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
PDF
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
PPTX
Graphics Processing unit ppt
PDF
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
PDF
Modern Graphics Pipeline Overview
PDF
Hardware Accelerated 2D Rendering for Android
PPTX
Low-level Graphics APIs
PDF
3 d to _hpc
PDF
2D Games to HPC
PDF
3 d to_hpc
PPT
Models and architectures
PPT
D3 D10 Unleashed New Features And Effects
PDF
Minko - Flash Conference #5
Khronos Munich 2018 - Halcyon and Vulkan
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
Mantle for Developers
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axe
thesis
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
Evolution of the modern graphics architectures with a focus on GPUs | Turing1...
Graphics Processing unit ppt
Umbra Ignite 2015: Jérémy Virga – Dishonored 2 rendering engine architecture ...
Modern Graphics Pipeline Overview
Hardware Accelerated 2D Rendering for Android
Low-level Graphics APIs
3 d to _hpc
2D Games to HPC
3 d to_hpc
Models and architectures
D3 D10 Unleashed New Features And Effects
Minko - Flash Conference #5
Ad

More from Electronic Arts / DICE (20)

PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PPT
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
PDF
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
PPTX
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
PPTX
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
PDF
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
PDF
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
PDF
Creativity of Rules and Patterns: Designing Procedural Systems
PPTX
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
PPTX
Future Directions for Compute-for-Graphics
PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
PPTX
High Dynamic Range color grading and display in Frostbite
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPTX
Lighting the City of Glass
PPTX
Photogrammetry and Star Wars Battlefront
PPTX
Stochastic Screen-Space Reflections
PPTX
Frostbite on Mobile
PPTX
Moving Frostbite to Physically Based Rendering
GDC2019 - SEED - Towards Deep Generative Models in Game Development
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Creativity of Rules and Patterns: Designing Procedural Systems
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
Future Directions for Compute-for-Graphics
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
High Dynamic Range color grading and display in Frostbite
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
Lighting the City of Glass
Photogrammetry and Star Wars Battlefront
Stochastic Screen-Space Reflections
Frostbite on Mobile
Moving Frostbite to Physically Based Rendering

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
PDF
KodekX | Application Modernization Development
PDF
Advanced IT Governance
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Understanding_Digital_Forensics_Presentation.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
KodekX | Application Modernization Development
Advanced IT Governance
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

SEED - Halcyon Architecture

  • 1. Halcyon Architecture “Director’s Cut” Graham Wihlidal SEED – Electronic Arts
  • 4. “PICA PICA” ▪ Exploratory mini-game & world ▪ Goals ▪ Hybrid rendering with DXR [Andersson 2018] ▪ Clean and consistent visuals ▪ Self-learning AI agents [Harmer 2018] ▪ Procedural worlds [Opara 2018] ▪ No precomputation ▪ Uses SEED’s Halcyon R&D framework S E E D // Halcyon Architecture “Director’s Cut”
  • 6. Halcyon Goals ▪ Rapid prototyping framework ▪ Different purpose than Frostbite ▪ Fast experimentation vs. AAA games ▪ Windows, Linux, macOS S E E D // Halcyon Architecture “Director’s Cut”
  • 7. Halcyon Goals ▪ Minimize or eliminate busy-work ▪ Artist “meta-data” meshes ▪ Occlusion ▪ GI / Lighting ▪ Collision ▪ Level-of-detail ▪ Live reloading of all assets ▪ Insanely fast iteration times S E E D // Halcyon Architecture “Director’s Cut”
  • 8. Halcyon Goals ▪ Only target modern APIs ▪ Direct3D 12 ▪ Vulkan 1.1 ▪ Metal 2 ▪ Multi-GPU ▪ Explicit heterogeneous mGPU ▪ No AFR nonsense ▪ No linked adapters S E E D // Halcyon Architecture “Director’s Cut”
  • 9. Halcyon Goals ▪ Local or remote streaming ▪ Minimal boilerplate code ▪ Variety of rendering techniques and approaches ▪ Rasterization ▪ Path and ray tracing ▪ Hybrid S E E D // Halcyon Architecture “Director’s Cut”
  • 10. Hybrid Rendering S E E D // Halcyon Architecture “Director’s Cut” Direct Shadows (ray trace or raster) Direct Lighting (compute) Reflections (ray trace or compute) Global Illumination (ray trace) Post Processing (compute) Transparency & Translucency (ray trace) Ambient Occlusion (ray trace or compute) Deferred Shading (raster)
  • 13. Halcyon Goals ▪ “PICA PICA” and Halcyon built from scratch ▪ Implemented lots of bespoke technology ▪ Minimal effort to add a new API or platform ▪ Efficient and flexible rendering was a major focus S E E D // Halcyon Architecture “Director’s Cut”
  • 14. Rendering Components ▪ Render Backend ▪ Render Device ▪ Render Handles ▪ Render Commands ▪ Render Graph ▪ Render Proxy S E E D // Halcyon Architecture “Director’s Cut”
  • 15. Halcyon Rendering S E E D // Halcyon Architecture “Director’s Cut” Render Handles Render Commands Render Backend Render Device Render Backend Render DeviceRender Device Render Backend Render Proxy Render Graph Render Graph Application
  • 17. Render Backend ▪ Live-reloadable DLLs ▪ Enumerates adapters and capabilities ▪ Swap chain support ▪ Extensions (i.e. ray tracing, sub groups, …) ▪ Determine adapter(s) to use S E E D // Halcyon Architecture “Director’s Cut”
  • 18. Render Backend ▪ Provides debugging and profiling ▪ RenderDoc integration, validation layers, … ▪ Create and destroy render devices S E E D // Halcyon Architecture “Director’s Cut”
  • 19. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Direct3D 12 ▪ Vulkan 1.1 ▪ Metal 2 ▪ Proxy ▪ Mock
  • 20. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Direct3D 12 ▪ Shader Model 6.X ▪ DirectX Ray Tracing ▪ Bindless Resources ▪ Explicit Multi-GPU ▪ DirectML (soon..) ▪ …
  • 21. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Vulkan 1.1 ▪ Sub-groups ▪ Descriptor indexing ▪ External memory ▪ Multi-draw indirect ▪ Ray tracing (soon..) ▪ …
  • 22. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Metal 2 ▪ Early development ▪ Primarily desktop ▪ Argument buffers ▪ Machine learning ▪ …
  • 23. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Proxy ▪ Discussed later in the presentation
  • 24. Render Backend S E E D // Halcyon Architecture “Director’s Cut” ▪ Mock ▪ Performs resource tracking and validation ▪ Command stream is parsed and evaluated ▪ No submission to an API ▪ Useful for unit tests and debugging
  • 26. Render Device S E E D // Halcyon Architecture “Director’s Cut” ▪ Abstraction of a logical GPU adapter ▪ e.g. VkDevice, ID3D12Device, … ▪ Provides interface to GPU queues ▪ Command list submission
  • 27. Render Device S E E D // Halcyon Architecture “Director’s Cut” ▪ Ownership of GPU resources ▪ Create & Destroy ▪ Lifetime tracking of resources ▪ Mapping render handles → device resources
  • 29. S E E D // Halcyon Architecture “Director’s Cut” Render Handles ▪ Resources associated by handle ▪ Lightweight (64 bits) ▪ Constant-time lookup ▪ Type safety (i.e. buffer vs texture) ▪ Can be serialized or transmitted ▪ Generational for safety ▪ e.g. double-delete, usage after delete
  • 30. S E E D // Halcyon Architecture “Director’s Cut” Render Handles ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle ▪ Handles allow one-to-many cardinality [handle->devices] ▪ Each device can have a unique representation of the handle
  • 31. S E E D // Halcyon Architecture “Director’s Cut” Render Handles ▪ Can query if a device has a handle loaded ▪ Safely add and remove devices ▪ Handle owned by application, representation can reload on device ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 32. S E E D // Halcyon Architecture “Director’s Cut” Render Handles ▪ Shared resources are supported ▪ Primary device owner, secondaries alias primary ID3D12Resource ID3D12Resource DX12: Adapter 2 ID3D12Resource DX12: Adapter 3 ID3D12Resource DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 33. S E E D // Halcyon Architecture “Director’s Cut” Render Handles ▪ Can also mix and match backends in the same process! ▪ Made debugging VK implementation much easier ▪ DX12 on left half of screen, VK on right half of screen ID3D12Resource ID3D12Resource VK: Adapter 0 VkImage Proxy: Adapter 0 Render Handle DX12: Adapter 1DX12: Adapter 0 Render Handle
  • 35. Render Commands ▪ Draw ▪ DrawIndirect ▪ Dispatch ▪ DispatchIndirect ▪ UpdateBuffer ▪ UpdateTexture ▪ CopyBuffer ▪ CopyTexture ▪ Barriers ▪ Transitions ▪ BeginTiming ▪ EndTiming ▪ ResolveTimings ▪ BeginEvent ▪ EndEvent ▪ BeginRenderPass ▪ EndRenderPass ▪ RayTrace ▪ UpdateTopLevel ▪ UpdateBottomLevel ▪ UpdateShaderTable S E E D // Halcyon Architecture “Director’s Cut”
  • 36. Render Commands ▪ Queue type specified ▪ Spec validation ▪ Allowed to run? ▪ e.g. draws on compute ▪ Automatic scheduling ▪ Where can it run? ▪ Async compute S E E D // Halcyon Architecture “Director’s Cut”
  • 37. Render Commands S E E D // Halcyon Architecture “Director’s Cut”
  • 38. Render Command List ▪ Encodes high level commands ▪ Tracks queue types encountered ▪ Queue mask indicating scheduling rules ▪ Commands are stateless - parallel recording S E E D // Halcyon Architecture “Director’s Cut”
  • 39. Render Compilation ▪ Render command lists are “compiled” ▪ Translation to low level API ▪ Can compile once, submit multiple times ▪ Serial operation (memcpy speed) ▪ Perfect redundant state filtering S E E D // Halcyon Architecture “Director’s Cut”
  • 41. Render Graph ▪ Inspired by FrameGraph [O’Donnell 2017] ▪ Automatically handle transient resources ▪ Import explicitly managed resources ▪ Automatic resource transitions ▪ Render target batching ▪ DiscardResource ▪ Memory aliasing barriers ▪ … S E E D // Halcyon Architecture “Director’s Cut”
  • 42. Render Graph ▪ Basic memory management ▪ Not targeting current consoles ▪ Fine grained memory reuse sub-optimal with current PC drivers ▪ Lose ~5% on aliasing barriers and discards ▪ Automatic queue scheduling ▪ Ongoing research ▪ Need heuristics on task duration and bottlenecks ▪ e.g. Memory vs ALU ▪ Not enough to specify dependencies S E E D // Halcyon Architecture “Director’s Cut”
  • 43. Render Graph ▪ Frame Graph → Render Graph: No concept of a “frame” ▪ Fully automatic transitions and split barriers ▪ Single implementation, regardless of backend ▪ Translation from high level render command stream ▪ API differences hidden from render graph ▪ Support for mGPU ▪ Mostly implicit and automatic ▪ Can specify a scheduling policy S E E D // Halcyon Architecture “Director’s Cut”
  • 44. Render Graph ▪ Composition of multiple graphs at varying frequencies ▪ Same GPU: async compute ▪ mGPU: graphs per GPU ▪ Out-of-core: server cluster, remote streaming S E E D // Halcyon Architecture “Director’s Cut”
  • 45. Render Graph ▪ Composition of multiple graphs at varying frequencies ▪ e.g. translucency, refraction, global illumination S E E D // Halcyon Architecture “Director’s Cut”
  • 46. Render Graph ▪ Two phases ▪ Graph construction ▪ Specify inputs and outputs ▪ Serial operation (by design) ▪ Graph evaluation ▪ Highly parallelized ▪ Record high level render commands ▪ Automatic barriers and transitions S E E D // Halcyon Architecture “Director’s Cut”
  • 47.  Construction phase  Evaluation phase
  • 48. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Explicit heterogeneous mGPU ▪ Parallel fork-join approach ▪ Resources copied through system memory using copy queue ▪ ~1ms for every 15mb transferred ▪ Minimize PCI-E transfers ▪ Immutable data replicated ▪ Tightly pack data GPU1 GPU2 GPU3 GPU4 Partition1[Primary] Partition2[Secondary] Partition3[Secondary] Partition4[Secondary]
  • 49. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Workloads are divided into partitions ▪ Based on GPU device count ▪ Single primary device ▪ Other devices are secondaries ▪ Variety of scheduling and transfer patterns are necessary ▪ Simple rules engine GPU1 GPU2 GPU3 GPU4 Partition1[Primary] Partition2[Secondary] Partition3[Secondary] Partition4[Secondary]
  • 50. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Run ray generation on primary GPU ▪ Copy results in sub-regions to other GPUs ▪ Run tracing phases on separate GPUs ▪ Copy tracing results back to primary GPU ▪ Run filtering on primary GPU
  • 51. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Only width is divided ▪ Simplifies textures vs. buffers ▪ Passes are unaware of GPU count
  • 52. ▪ Lots of fun coordinate snapping bugs ▪ i.e. 3 GPUs partitioned to 0.33333…
  • 53. ▪ Lots of fun coordinate snapping bugs ▪ 16 GPUs! (because, why not?)
  • 54. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ RenderGraphSchedule ▪ NoDevices → Pass is disabled ▪ AllDevices → Pass runs on all devices ▪ PrimaryDevice → Pass only runs on primary device ▪ SecondaryDevices → Pass runs on secondaries if count > 1, otherwise primary ▪ OnlySecondaryDevices → Pass only runs on secondary devices, disabled unless mGPU Requested Per Pass →
  • 55. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ RenderTransferPartition ▪ PartitionAll → Select all partitions from device ▪ PartitionIsolated → Select isolated region from device ▪ RenderTransferFilter ▪ AllDevices → Transfer completes on all devices ▪ PrimaryDevice → Transfer completes on the primary device ▪ SecondaryDevices → Transfer completes on all secondary devices Requested Per Pass →
  • 56. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ PartitionAll → PartitionAll ▪ Copies full resource on one GPU to full resource on all specified GPUs ▪ PartitionAll → PartitionIsolated ▪ Copies full resource on one GPU to isolated regions on all specified GPUs (partial copies) ▪ PartitionIsolated → PartitionAll ▪ (Invalid configuration) ▪ PartitionIsolated → PartitionIsolated ▪ Copies isolated region on one GPU to isolated regions on all specified GPUs (partial copies)
  • 57. Devices this pass will run on Schedule transfers in or out Scaling work dimensions for each GPU
  • 58. Some bugs were obvious
  • 59. Some bugs were obvious
  • 60. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Some bugs were subtle ▪ Weird cell shading? ☺ ▪ Incorrect transfers? ▪ Transfers in (input data) ▪ Transfers out (result data) ▪ Incorrect scheduling? ▪ Pass not running ▪ Pass running when it shouldn’t ▪ Partition window
  • 61. Render Graph Some of our render graph passes: S E E D // Halcyon Architecture “Director’s Cut” ▪ Bloom ▪ BottomLevelUpdate ▪ BrdfLut ▪ CocDerive ▪ DepthPyramid ▪ DiffuseSh ▪ Dof ▪ Final ▪ GBuffer ▪ Gtao ▪ IblReflection ▪ ImGui ▪ InstanceTransforms ▪ Lighting ▪ MotionBlur ▪ Present ▪ RayTracing ▪ RayTracingAccum ▪ ReflectionFilter ▪ ReflectionSample ▪ ReflectionTrace ▪ Rtao ▪ Screenshot ▪ Segmentation ▪ ShaderTableUpdate ▪ ShadowFilter ▪ ShadowMask ▪ ShadowCascades ▪ ShadowTrace ▪ Skinning ▪ Ssr ▪ SurfelGapFill ▪ SurfelLighting ▪ SurfelPositions ▪ SurfelSpawn ▪ Svgf ▪ TemporalAa ▪ TemporalReproject ▪ TopLevelUpdate ▪ TranslucencyTrace ▪ Velocity ▪ Visibility
  • 62. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Implicit data flow via explicit scopes ▪ “Long-distance” extensible parameter passing ▪ Scope given to each render pass ▪ Can create nested scope for sub-graph ▪ Results stored into scope ▪ Hygiene via nesting and shadowing { gbuffer <- render_opaque() gbuffer <- render_decals(gbuffer) { gbuffer <- render_opaque() render_lighting(gbuffer) } -> envmap apply_envmap(gbuffer, envmap) } struct RenderGraphAreaLight { RenderGraphResource triangleLightList; uint32 triangleCount; };
  • 63. Render Graph S E E D // Halcyon Architecture “Director’s Cut” ▪ Lookup by type ▪ scope.get<T>() -> &T ▪ Parameters in “plain old data” structs ▪ RenderGraphResource, RenderHandle ▪ float, int, mat4, etc. { gbuffer <- render_opaque() gbuffer <- render_decals(gbuffer) { gbuffer <- render_opaque() render_lighting(gbuffer) } -> envmap apply_envmap(gbuffer, envmap) } struct RenderGraphAreaLight { RenderGraphResource triangleLightList; uint32 triangleCount; };
  • 64. Render Graph DSL ▪ Experimental ▪ Macro Magic S E E D // Halcyon Architecture “Director’s Cut”
  • 65. Render Graph ▪ Automatic profiling data ▪ GPU and CPU counters per-pass ▪ Works with mGPU ▪ Each GPU is profiled S E E D // Halcyon Architecture “Director’s Cut”
  • 66. Render Graph ▪ Live debugging overlay ▪ Evaluated passes in-order of execution ▪ Input and output dependencies ▪ Resource version information S E E D // Halcyon Architecture “Director’s Cut”
  • 69. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ Most developers have single GPU ▪ Uncommon for 2 GPU machines ▪ Rare for 3+ GPU ▪ Practical for show floor and cranking up to 11 ▪ Impractical for regular development ☺
  • 70. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ Build device indirection table ▪ Virtual device index → adapter index DX12: Adapter 2 ID3D12ResourceID3D12Resource DX12: Adapter 1 ID3D12Resource DX12: Adapter 0 Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
  • 71. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ Create multiple instances of a device ▪ Virtual GPUs execute sequentially (WDDM) DX12: Adapter 2 ID3D12ResourceID3D12Resource DX12: Adapter 1 ID3D12Resource DX12: Adapter 0 Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
  • 72. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ Increases overall wall time (don’t use for profiling) ▪ Amazing for development and testing! DX12: Adapter 2 ID3D12ResourceID3D12Resource DX12: Adapter 1 ID3D12Resource DX12: Adapter 0 Device 0 Device 1 Device 2 Device 3 Device 4 Device 5
  • 73. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ PICA PICA developers all had 1 GPU ▪ Limited testing with 2 GPUs ▪ Show floor at GDC 2018 was 4 GPUs ▪ Virtual-only testing… ▪ Crossed fingers ▪ Worked flawlessly!
  • 74. S E E D // Halcyon Architecture “Director’s Cut” Virtual Multi-GPU ▪ Develop and debug multi-GPU with only a single GPU ▪ Virtual mGPU reliably reproduces most bugs! ▪ Entire features developed without physical mGPU ▪ i.e. Surfel GI (the night before GDC.. ☺)
  • 76. ▪ Remote render backend ▪ Any API / Any OS Render Proxy S E E D // Halcyon Architecture “Director’s Cut”
  • 77. ▪ Render API calls are routed remotely ▪ Uses gRPC (high performance RPC framework) ▪ Use an API on an incompatible OS ▪ e.g. Direct3D 12 on macOS or Linux Render Proxy S E E D // Halcyon Architecture “Director’s Cut”
  • 78. ▪ Scale large workloads with a GPU cluster ▪ Some API as render graph mGPU ▪ Only rendering is routed, scene state is local ▪ Work from the couch! ▪ i.e. DirectX ray tracing on a MacBook ☺ Render Proxy S E E D // Halcyon Architecture “Director’s Cut”
  • 81. ▪ The possibilities are endless! Render Proxy S E E D // Halcyon Architecture “Director’s Cut”
  • 83. Machine Learning ▪ Deep reinforcement learning ▪ Rendering 36 semantic views ▪ Training with TensorFlow ▪ On-premise GPU cluster ▪ Inference with TensorFlow ▪ CPU AVX2 ▪ In-process S E E D // Halcyon Architecture “Director’s Cut”
  • 84. Machine Learning ▪ Adding inferencing with DirectML ▪ Hardware accelerated inferencing operators ▪ Resource management ▪ Schedule ML work explicitly ▪ Interleave ML work with other GPU workloads ▪ Fall back for other APIs S E E D // Halcyon Architecture “Director’s Cut”
  • 85. Machine Learning ▪ Treat trained ML models like any other 3D asset ▪ Render Graph abstractions ▪ Reference the same render resources ▪ Similar to chaining compute passes ▪ Record “meta” render commands ▪ Backends can fuse or transform, if desired S E E D // Halcyon Architecture “Director’s Cut”
  • 86. Machine Learning ▪ Provide operators as render commands S E E D // Halcyon Architecture “Director’s Cut” ▪ Activation ▪ Convolution ▪ Elementwise ▪ FC ▪ GRU ▪ LSTM ▪ MatMul ▪ Normalization ▪ Pooling ▪ Random ▪ RNN ▪ etc.
  • 88. Asset Pipelines ▪ Geometry ▪ Animations ▪ Shaders ▪ Sounds ▪ Music ▪ Textures ▪ Scenes ▪ etc. S E E D // Halcyon Architecture “Director’s Cut”
  • 89. Asset Pipelines ▪ Everything is content addressable ▪ Hash of data is the identity ▪ Sha256 ▪ Merkle trees! ▪ Dependency evaluation S E E D // Halcyon Architecture “Director’s Cut”
  • 91. Asset Pipelines ▪ Containerized, running on Kubernetes ▪ Google Cloud Platform ▪ On-Premise Cluster ▪ AMD 1950X TR ▪ NV Titan V ▪ Communication using gRPC and Protobuf ▪ Google Cloud Storage S E E D // Halcyon Architecture “Director’s Cut”
  • 92. Asset Pipelines ▪ Analytics with Prometheus and Grafana ▪ Publish custom metrics ▪ Scraped into rich UI ▪ Collecting data is important! S E E D // Halcyon Architecture “Director’s Cut”
  • 95. Shaders ▪ Complex materials ▪ Multiple microfacet layers ▪ [Stachowiak 2018] ▪ Energy conserving ▪ Automatic Fresnel between layers ▪ All lighting & rendering modes ▪ Raster, path-traced reference, hybrid ▪ Iterate with different looks ▪ Bake down permutations for production S E E D // Halcyon Architecture “Director’s Cut” Objects with Multi-Layered Materials
  • 96. Shaders ▪ Exclusively HLSL ▪ Shader Model 6.X ▪ Majority are compute shaders ▪ Performance is critical ▪ Group shared memory ▪ Wave-ops / Sub-groups S E E D // Halcyon Architecture “Director’s Cut”
  • 97. Shaders ▪ No reflection ▪ Avoid costly lookups ▪ Only explicit bindings ▪ … except for validation ▪ Extensive use of HLSL spaces ▪ Updates at varying frequency ▪ Bindless S E E D // Halcyon Architecture “Director’s Cut”
  • 98. Shaders S E E D // Halcyon Architecture “Director’s Cut” SPIR-VDXIL Vulkan 1.1Direct3D 12 ISPCMSL SPIRV-CROSS AVX2, …Metal 2 DXC HLSL
  • 99. Shader Arguments ▪ Commands refer to resources with “Shader Arguments” ▪ Each argument represents an HLSL space ▪ MaxShaderParameters → 4 [Configurable] ▪ # of spaces, not # of resources S E E D // Halcyon Architecture “Director’s Cut”
  • 100. Shader Arguments ▪ Each argument contains: ▪ “ShaderViews” handle ▪ Constant buffer handle and offset ▪ “ShaderViews” ▪ Collection of SRV and UAV handles S E E D // Halcyon Architecture “Director’s Cut”
  • 101. Shader Arguments ▪ Constant buffers are all dynamic ▪ Avoid temporary descriptors ▪ Just a few large buffers, offsets change frequently ▪ VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC ▪ DX12 Root Descriptors (pass in GPU VA) ▪ All descriptor sets are only written once ▪ Persisted / cached S E E D // Halcyon Architecture “Director’s Cut”
  • 103. Vulkan Implementation ▪ Architecture simplified development effort ▪ Vulkan specific: ▪ Backend and device implementation ▪ Memory allocators (e.g. AMD VMA) ▪ Barrier and transition logic ▪ Resource binding model S E E D // Halcyon Architecture “Director’s Cut”
  • 104. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 105. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 106. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 107. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 108. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 109. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 110. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 111. Vulkan Implementation S E E D // Halcyon Architecture “Director’s Cut”
  • 115. Vulkan Implementation ▪ Shader compilation (HLSL → SPIR-V) ▪ Patch SPIR-V to match DX12 ▪ Using spirv-reflect from Hai and Cort ▪ spvReflectCreateShaderModule ▪ spvReflectEnumerateDescriptorSets ▪ spvReflectChangeDescriptorBindingNumbers ▪ spvReflectGetCodeSize / spvReflectGetCode ▪ spvReflectDestroyShaderModule S E E D // Halcyon Architecture “Director’s Cut”
  • 116. SPIR-V Patching ▪ SPV_REFLECT_RESOURCE_FLAG_SRV ▪ Offset += 1000 ▪ SPV_REFLECT_RESOURCE_FLAG_SAMPLER ▪ Offset += 2000 ▪ SPV_REFLECT_RESOURCE_FLAG_UAV ▪ Offset += 3000 S E E D // Halcyon Architecture “Director’s Cut”
  • 117. SPIR-V Patching ▪ SPV_REFLECT_RESOURCE_FLAG_CBV ▪ Offset Unchanged: 0 ▪ Descriptor Set += MAX_SHADER_ARGUMENTS ▪ CBVs move to their own descriptor sets ▪ ShaderViews become persistent and immutable S E E D // Halcyon Architecture “Director’s Cut”
  • 118. SPIR-V Patching S E E D // Halcyon Architecture “Director’s Cut” ▪ If 2 of 4 HLSL spaces in use: Unbound SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Dynamic Constant Buffer (Offset: 0) SRVs (>=1000) UAVs (>=3000)Samplers (>=2000) Unbound Dynamic Constant Buffer (Offset: 0) Set 0 Set 1 Set 2 Set 3 Set 4 Set 5
  • 119. Vulkan Implementation ▪ Translate commands ▪ Read command list ▪ Write Vulkan API S E E D // Halcyon Architecture “Director’s Cut”
  • 120. S E E D // Halcyon Architecture “Director’s Cut”
  • 121. S E E D // Halcyon Architecture “Director’s Cut”
  • 124. Tools
  • 125. Tools ▪ RenderDoc ▪ NV Nsight ▪ AMD RGP S E E D // Halcyon Architecture “Director’s Cut”
  • 126. S E E D // Halcyon Architecture “Director’s Cut”
  • 127. Tools S E E D // Halcyon Architecture
  • 128. Tools S E E D // Halcyon Architecture ▪ C++ Export! ▪ Standalone
  • 131. Dear ImGui + ImGuizmo ▪ Live tweaking ▪ Very useful! S E E D // Halcyon Architecture “Director’s Cut”
  • 133. References S E E D // Halcyon Architecture “Director’s Cut” ▪ [Wihlidal 2018] Graham Wihlidal, Colin Barré-Brisebois. “Modern Graphics Abstractions & Real-Time Ray Tracing” . available online ▪ [Wihlidal 2018] Graham Wihlidal. “Halcyon + Vulkan”. available online ▪ [Stachowiak 2018] Tomasz Stachowiak. “Towards Effortless Photorealism Through Real-Time Raytracing”. available online ▪ [Andersson 2018] Johan Andersson, Colin Barré-Brisebois.“DirectX: Evolving Microsoft's Graphics Platform”. available online ▪ [Harmer 2018] Jack Harmer, Linus Gisslén, Henrik Holst, Joakim Bergdahl, Tom Olsson, Kristoffer Sjöö and Magnus Nordin. “Imitation Learning with Concurrent Actions in 3D Games”. available online ▪ [Opara 2018] Anastasia Opara. “Creativity of Rules and Patterns”. available online ▪ [O’Donnell 2017] Yuriy O’Donnell. “Frame Graph: Extensible Rendering Architecture in Frostbite”. available online
  • 134. Thanks ▪ Matthäus Chajdas ▪ Rys Sommefeldt ▪ Timothy Lottes ▪ Tobias Hector ▪ Neil Henning ▪ John Kessenich ▪ Hai Nguyen ▪ Nuno Subtil ▪ Adam Sawicki ▪ Alon Or-bach ▪ Baldur Karlsson ▪ Cort Stratton ▪ Mathias Schott ▪ Rolando Caloca ▪ Sebastian Aaltonen ▪ Hans-Kristian Arntzen ▪ Yuriy O’Donnell ▪ Arseny Kapoulkine ▪ Tex Riddell ▪ Marcelo Lopez Ruiz ▪ Lei Zhang ▪ Greg Roth ▪ Noah Fredriks ▪ Qun Lin ▪ Ehsan Nasiri, ▪ Steven Perron ▪ Alan Baker ▪ Diego Novillo ▪ Tomasz Stachowiak
  • 135. Thanks ▪ SEED ▪ Johan Andersson ▪ Colin Barré-Brisebois ▪ Jasper Bekkers ▪ Joakim Bergdahl ▪ Ken Brown ▪ Dean Calver ▪ Dirk de la Hunt ▪ Jenna Frisk ▪ Paul Greveson ▪ Henrik Halen ▪ Effeli Holst ▪ Andrew Lauritzen ▪ Magnus Nordin ▪ Niklas Nummelin ▪ Anastasia Opara ▪ Kristoffer Sjöö ▪ Ida Winterhaven ▪ Tomasz Stachowiak ▪ Microsoft ▪ Chas Boyd ▪ Ivan Nevraev ▪ Amar Patel ▪ Matt Sandy ▪ NVIDIA ▪ Tomas Akenine-Möller ▪ Nir Benty ▪ Jiho Choi ▪ Peter Harrison ▪ Alex Hyder ▪ Jon Jansen ▪ Aaron Lefohn ▪ Ignacio Llamas ▪ Henry Moreton ▪ Martin Stich
  • 136. S E E D / / S E A R C H F O R E X T R A O R D I N A R Y E X P E R I E N C E S D I V I S I O N S T O C K H O L M – L O S A N G E L E S – M O N T R É A L – R E M O T E S E E D . E A . C O M W E ‘ R E H I R I N G !