SlideShare a Scribd company logo
MANTLE FOR DEVELOPERS
JOHAN ANDERSSON – TECHNICAL DIRECTOR
FROSTBITE
ELECTRONIC ARTS
Mantle?
Simplify advanced development
 Improve performance

 Enable developers to innovate
 Challenge the status quo
Mantle for Developers
Developer impact areas
Control

CPU performance
Programmability

GPU performance
Platforms
Control

New model
Traditional Model:
Black Box

Explicit Model:
Mantle

 Middle-ground abstraction – compromise
between performance & “usability”

 Thin low-level abstraction to expose how
hardware works

 Hidden resource memory & state

 App explicit memory management

 Resource CPU access tied to device context

 Resources are globally accessible

 Driver analyzes & synchronizes implicitly

 App explicit resource state transitions
Control

App responsibility
 Tell when render target will be used as a texture
‒ And many more resource state transitions

 Don’t destroy resources that GPU is using
‒ Keep track with fences or frames

 Manual dynamic resource renaming
‒ No DISCARD for driver resource renaming

 Resource memory tiling
 Powerful validation layer will help!
Control

Explicit control enables
 App high-level decisions & optimizations
‒ Has full scene information
‒ Easier to optimize performance & memory

 Flexible & efficient memory management
‒ Linear frame allocators
‒ Memory pools
‒ Pinned memory

 Reduced development time
‒ For advanced game engines & apps
‒ Easier to get to target performance & robustness
Control

Explicit control enables
 Transient resources
‒ Alias render targets within frame
‒ Major memory savings
‒ No need to pre-allocate everything

 Light-weight driver
‒ Easier to develop & maintain
‒ Reduced CPU draw call overhead
CPU performance
CPU perf

Core concepts
 Descriptor sets
 Monolithic pipelines
 Command buffers
CPU perf

Descriptor sets
 Table with resource references to bind to
graphics or compute pipeline
Image

Memory

Sampler

Link

 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works

 Example 1: Single simple dynamic descriptor set
‒ Bind everything you need for a single draw call
‒ Close to DX/GL model but share between stages

Dynamic descriptor set
VertexBuffer (VS)
Texture0 (VS+PS)
Constants (VS)
Texture1 (PS)

 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic

Texture2 (PS)
Sampler0 (VS+PS)
CPU perf

Descriptor sets
 Table with resource references to bind to
graphics or compute pipeline
Image

 Example 2: Reuse static set with nesting
‒ Reduce update time & memory usage

Memory

Static descriptor set
Sampler

Link

Dynamic descriptor set

 Replaces traditional resource stage binding
‒ Major performance & flexibility advantage
‒ Closer to how the hardware works

Constants (VS)
Link

VertexBuffer (VS)
Texture0 (VS+PS)
Texture1 (PS)
Texture2 (PS)
Texture3 (PS)

 App managed - lots of strategies possible!
‒ Tiny vs huge sets
‒ Single vs multiple
‒ Static vs semi-static vs dynamic

Texture4 (PS)
Sampler0 (VS+PS)
Sampler1 (PS)
CPU perf

Monolithic pipelines
 Shader stages & select graphics state combined into single object
‒ No runtime compilation or patching needed!
‒ Significantly less runtime overhead to use
Pipeline state

 Supports parallel building & caching
‒ Fast loading times

 Usage & management up to the app
‒ Static vs dynamic creation
‒ Amount of pipelines
‒ State usage

IA

DB
VS

HS

DS
Tessellator

GS

RS

PS

CB
CPU perf

Command buffers
 Issue pipelined graphics & compute commands into a command buffer
‒ Bind graphics state, descriptor sets, pipeline
‒ Draw calls
‒ Render targets
‒ Clears
‒ Memory transfers
‒ NOT: resource mapping

 Fully independent objects
‒ Create multiple every frame
‒ Or pre-build up front and reuse
CPU perf

DX/GL parallelism
CPU 0
CPU 1
CPU 2

Game

Game
Game
Render
Render
Driver Render

 Automatically extracts parallelism out of most apps 
 Doesn’t scale beyond 2-3 cores 
 Additional latency 
 Driver thread often bottleneck – can collide app threads 

Render
CPU perf

Parallel dispatch with Mantle
CPU 0

Game

Game

Game

CPU 1

Render

Render

Render

CPU 2

Render

Render

Render

CPU 3

Render

Render

Render

CPU 4

Render

Render

Render

 App can go fully wide with its rendering – minimal latency 
 Close to linear scaling with CPU cores 
 No driver threads – no overhead – no contention 

 Frostbite’s approach on all consoles – and on PC with Mantle! 
GPU performance
GPU perf

GPU optimizations
 Thanks to improved CPU performance – CPU
will rarely be a bottleneck for the GPU
‒ CPU could help GPU more:
‒ Less brute force rendering
‒ Improve culling

 Resource states
‒ Gives driver a lot more knowledge & flexibility
‒ Apps can avoid expensive/redundant
transitions, such as surface decompression

 Expose existing GPU functionality
 Shader pipeline object – driver optimizations
‒ Can optimize with pipeline state knowledge
‒ Can optimize across all shader stages

‒ Quad & Rect-lists
‒ HW-specific MSAA & depth data access
‒ Programmable sample patterns
‒ And more..
GPU perf

Queues
 Modern GPUs are heterogeneous machines
with multiple engines

Graphics

‒ Graphics pipeline
‒ Compute pipeline(s)
‒ DMA transfer
‒ Video encode/decode
‒ More…

 Mantle exposes queues for the engines +
synchronization primitives

Compute
DMA
...
Queues

GPU
GPU perf

Queues
Graphics
Compute
DMA
...
Queues

GPU
GPU perf

Queue use cases
 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute

Copy

DMA
Graphics

Render

Other render

Use copy
GPU perf

Queue use cases
 Async DMA transfers
‒ Copy resources in parallel with graphics or
compute

 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units

Compute
Graphics

GBuffer

Non-shadowed lighting
Shadowmap 0
Shadowmap 1

Final lighting
GPU perf

Queue use cases
 Async DMA transfers

 Multiple compute kernels collaborating

‒ Copy resources in parallel with graphics or
compute

‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer

 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units

Compute 0
Compute 1
Graphics

Compute Geometry
Compute Rasterizer
Ordinary Rendering
GPU perf

Queue use cases
 Async DMA transfers

 Multiple compute kernels collaborating

‒ Copy resources in parallel with graphics or
compute

‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer

 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units

Compute
Graphics

Process0

Process1
Draw0

 Compute as frontend for graphics pipeline
‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline

Process0
Draw1

Draw2
GPU perf

Queue use cases
 Async DMA transfers

 Multiple compute kernels collaborating

‒ Copy resources in parallel with graphics or
compute

‒ Can be faster than über-kernel
‒ Example: Compute geometry backend & compute
rasterizer

 Async compute together with graphics
‒ ALU heavy compute work at the same time as
memory/ROP bound work to utilize idle units

 Compute as frontend for graphics pipeline
‒ Compute runs asynchronously ahead and prepares
& optimizes geometry for graphics pipeline

 Game engines will build large GPU job graphs
‒ Move away from single sequential submission
‒ Just as we already have done on CPU
Programmability
Programmability

Explicit Multi-GPU
 Explicit control of GPU queues and synchronization, finally!
‒ Implement your own Alternate-Frame-Rendering
‒ Or something more exotic..

 Use case: Workstation rendering with 4-8 GPUs
‒ Super high-quality rendering & simulation
‒ Load balance graphics & compute job graphs across GPUs
‒ 20-40 TFlops in a single machine!

 Use case: Low-latency rendering
‒ Important for VR and competitive games
‒ Latency optimized GPU job graph scheduling
‒ VR: Simultaneously drive 2 GPUs (1 per eye)
Programmability

New mechanisms
 Command buffer predication & flow control
‒ GPU affecting/skipping submitted commands
‒ Go beyond DrawIndirect / DispatchIndirect
‒ Advanced variable workloads
‒ Advanced culling optimizations

 Write occlusion query results into GPU buffer
‒ No CPU roundtrip needed
‒ Can drive predicated rendering
‒ Or use results directly in shaders (lens flares)
Programmability

Bindless resources
 Mantle supports bindless resources
‒ Shaders can select resources to use instead of
static binding from CPU
‒ Extension of the descriptor set support

 Examples
‒ Performance optimizations – less data to update
‒ Logic & data structures that live fully on the GPU
‒ Scene culling & rendering

‒ Material representations

 Key component that will open up a lot of
opportunities!

‒ Deferred shading
‒ Raytracing
Platforms
Platforms

Today
 Mantle gives us strong benefits on Windows today
‒ Console-like performance & programmability on both Windows 7 and Windows 8
‒ For us, well worth the dev time!

 DX & GL are the industry standards
‒ Needed for platforms that do not support Mantle
‒ Needed by devs who do not want/need more control
‒ Have to have fallback paths for GL/DX, but not limit oneself to it

 Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations
‒ PS4 graphics API has great programmability & performance as well
‒ Share concepts, methods & optimization strategies
Platforms

Linux & Mac
 Want to see Mantle on Linux and Mac!
‒ Would enable support for our full engine & rendering
‒ Significantly easier to do efficient renderer with Mantle than with OpenGL

 Use cases:
‒ Workstations
‒ R&D
‒ Not limited by WDDM

‒ Games
‒ Mantle + SteamOS = powerful combination!
Platforms

Mobile
 Mobile architectures are getting closer in capabilities to desktop GPUs
 Want graphics API that allows apps to fully utilize the hardware
‒ Power efficient
‒ High performance
‒ Programmable

 Major opportunity with Mantle – leap frog GL4, DX11
‒ For mobile SoC vendors
‒ For Google and Apple
Platforms

Multi-vendor?
 Mantle is designed to be a thin hardware abstraction
‒ Not tied to AMD’s GCN architecture
‒ Forward compatible
‒ Extensions for architecture- and platform-specific functionality

 Mantle would be a much more efficient graphics API for other vendors as well
‒ Most Mantle functionality can be supported on today’s modern GPUs

 Want to see future version of Mantle supported on all platforms and on all modern GPUs!
‒ Become an active industry standard with IHVs and ISVs collaborating
‒ Enable us developers to innovate with great performance & programmability everywhere
Mantle for Developers
Frostbite

Battlefield 4
 Mantle support is in development
‒ Core renderer (closer to PS4 than DX11)
‒ Implement all rendering techniques used in BF4 (many!)
‒ CPU optimizations (parallel dispatch, descriptor sets)
‒ GPU optimizations (minimize transitions, MSAA)
‒ R&D for advanced GPU optimizations
‒ Memory management
‒ Multi-GPU support
‒ ~2 months of work

 Update targeting late December
Frostbite

Plants vs Zombies: Garden Warfare
 Very different rendering
compared to BF4 
 Frostbite Mantle renderer will
work out of the box
 Focus on APU performance
Frostbite

Future
 All Frostbite games designed with Mantle
‒ 15 games in development across all of EA

 Advanced Mantle rendering & use cases
‒ Lots of exciting R&D opportunities!

 Want multi-vendor & multi-platform support!
Email: repi@dice.se
Web:
http://frostbite.com
Twitter: @repi

THE END

More Related Content

PPTX
FrameGraph: Extensible Rendering Architecture in Frostbite
PPTX
Parallel Futures of a Game Engine (v2.0)
PPTX
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
PPTX
The Rendering Pipeline - Challenges & Next Steps
PPTX
High Dynamic Range color grading and display in Frostbite
PPTX
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
PPTX
Rendering Battlefield 4 with Mantle
PPT
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...
FrameGraph: Extensible Rendering Architecture in Frostbite
Parallel Futures of a Game Engine (v2.0)
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
The Rendering Pipeline - Challenges & Next Steps
High Dynamic Range color grading and display in Frostbite
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)
Rendering Battlefield 4 with Mantle
Frostbite Rendering Architecture and Real-time Procedural Shading & Texturing...

What's hot (20)

PPTX
Future Directions for Compute-for-Graphics
PPT
5 Major Challenges in Interactive Rendering
PPTX
Low-level Graphics APIs
PPTX
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
PPT
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
PPTX
Scope Stack Allocation
PPTX
Battlefield 4 + Frostbite + Mantle
PPTX
Frostbite on Mobile
KEY
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
PPT
Your Game Needs Direct3D 11, So Get Started Now!
PPTX
Parallel Futures of a Game Engine
PDF
Optimizing the graphics pipeline with compute
PPT
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
PPTX
Terrain in Battlefield 3: A Modern, Complete and Scalable System
PPTX
Photogrammetry and Star Wars Battlefront
PDF
Rendering AAA-Quality Characters of Project A1
PPTX
Decima Engine: Visibility in Horizon Zero Dawn
PPTX
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
PDF
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
PPT
Advanced Real-time Post-Processing using GPGPU techniques
Future Directions for Compute-for-Graphics
5 Major Challenges in Interactive Rendering
Low-level Graphics APIs
SPU-Based Deferred Shading in BATTLEFIELD 3 for Playstation 3
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
Scope Stack Allocation
Battlefield 4 + Frostbite + Mantle
Frostbite on Mobile
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Your Game Needs Direct3D 11, So Get Started Now!
Parallel Futures of a Game Engine
Optimizing the graphics pipeline with compute
Terrain Rendering in Frostbite using Procedural Shader Splatting (Siggraph 2007)
Terrain in Battlefield 3: A Modern, Complete and Scalable System
Photogrammetry and Star Wars Battlefront
Rendering AAA-Quality Characters of Project A1
Decima Engine: Visibility in Horizon Zero Dawn
Five Rendering Ideas from Battlefield 3 & Need For Speed: The Run
Syysgraph 2018 - Modern Graphics Abstractions & Real-Time Ray Tracing
Advanced Real-time Post-Processing using GPGPU techniques
Ad

Viewers also liked (16)

PPTX
Stochastic Screen-Space Reflections
PPTX
Moving Frostbite to Physically Based Rendering
PPTX
Physically Based and Unified Volumetric Rendering in Frostbite
PPTX
Lighting the City of Glass
PPTX
5 Major Challenges in Real-time Rendering (2012)
PPT
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
PPTX
Shiny PC Graphics in Battlefield 3
PPTX
Optimizing the Graphics Pipeline with Compute, GDC 2016
PPT
Introduction to Data Oriented Design
PPTX
A Real-time Radiosity Architecture
PPS
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
PDF
Executable Bloat - How it happens and how we can fight it
PPT
Destruction Masking in Frostbite 2 using Volume Distance Fields
PPT
Bending the Graphics Pipeline
PPTX
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
PPTX
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
Stochastic Screen-Space Reflections
Moving Frostbite to Physically Based Rendering
Physically Based and Unified Volumetric Rendering in Frostbite
Lighting the City of Glass
5 Major Challenges in Real-time Rendering (2012)
Stable SSAO in Battlefield 3 with Selective Temporal Filtering
Shiny PC Graphics in Battlefield 3
Optimizing the Graphics Pipeline with Compute, GDC 2016
Introduction to Data Oriented Design
A Real-time Radiosity Architecture
Audio for Multiplayer & Beyond - Mixing Case Studies From Battlefield: Bad Co...
Executable Bloat - How it happens and how we can fight it
Destruction Masking in Frostbite 2 using Volume Distance Fields
Bending the Graphics Pipeline
Shadows & Decals: D3D10 Techniques in Frostbite (GDC'09)
How High Dynamic Range Audio Makes Battlefield: Bad Company Go BOOM
Ad

Similar to Mantle for Developers (20)

PDF
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
PDF
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
PPSX
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
PPSX
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
PPT
D3 D10 Unleashed New Features And Effects
PPTX
GPU Computing: A brief overview
PPTX
Gpu with cuda architecture
PDF
Modern Graphics Pipeline Overview
PPTX
Graphics Processing unit ppt
PDF
3 boyd direct3_d12 (1)
PPTX
Penn graphics
PPTX
PPTX
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
PPTX
Introduce to 3d rendering engine
PDF
GPU - how can we use it?
PPTX
Graphics processing unit
PDF
The Explanation the Pipeline design strategy.pdf
PPTX
The next generation of GPU APIs for Game Engines
KEY
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
PDF
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法
Keynote (Johan Andersson) - Mantle for Developers - by Johan Andersson, Techn...
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
D3 D10 Unleashed New Features And Effects
GPU Computing: A brief overview
Gpu with cuda architecture
Modern Graphics Pipeline Overview
Graphics Processing unit ppt
3 boyd direct3_d12 (1)
Penn graphics
[Unite Seoul 2019] Mali GPU Architecture and Mobile Studio
Introduce to 3d rendering engine
GPU - how can we use it?
Graphics processing unit
The Explanation the Pipeline design strategy.pdf
The next generation of GPU APIs for Game Engines
Why Graphics Is Fast, and What It Can Teach Us About Parallel Programming
【Unite 2017 Tokyo】インスタンシングを用いた美麗なグラフィックの実現方法

More from Electronic Arts / DICE (16)

PPTX
GDC2019 - SEED - Towards Deep Generative Models in Game Development
PPT
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
PDF
SEED - Halcyon Architecture
PPTX
Khronos Munich 2018 - Halcyon and Vulkan
PDF
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
PPTX
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
PPTX
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
PPTX
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
PPTX
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
PDF
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
PDF
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
PDF
Creativity of Rules and Patterns: Designing Procedural Systems
PPTX
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
PPTX
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
PPTX
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
PPTX
Modular Rigging in Battlefield 3
GDC2019 - SEED - Towards Deep Generative Models in Game Development
SIGGRAPH 2010 - Style and Gameplay in the Mirror's Edge
SEED - Halcyon Architecture
Khronos Munich 2018 - Halcyon and Vulkan
CEDEC 2018 - Towards Effortless Photorealism Through Real-Time Raytracing
CEDEC 2018 - Functional Symbiosis of Art Direction and Proceduralism
SIGGRAPH 2018 - PICA PICA and NVIDIA Turing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
HPG 2018 - Game Ray Tracing: State-of-the-Art and Open Problems
EPC 2018 - SEED - Exploring The Collaboration Between Proceduralism & Deep Le...
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
Creativity of Rules and Patterns: Designing Procedural Systems
Shiny Pixels and Beyond: Real-Time Raytracing at SEED
A Certain Slant of Light - Past, Present and Future Challenges of Global Illu...
Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite
Modular Rigging in Battlefield 3

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
A Presentation on Artificial Intelligence
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Cloud computing and distributed systems.
CIFDAQ's Market Insight: SEC Turns Pro Crypto
NewMind AI Monthly Chronicles - July 2025
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Mantle for Developers

  • 1. MANTLE FOR DEVELOPERS JOHAN ANDERSSON – TECHNICAL DIRECTOR FROSTBITE ELECTRONIC ARTS
  • 2. Mantle? Simplify advanced development  Improve performance  Enable developers to innovate  Challenge the status quo
  • 4. Developer impact areas Control CPU performance Programmability GPU performance Platforms
  • 5. Control New model Traditional Model: Black Box Explicit Model: Mantle  Middle-ground abstraction – compromise between performance & “usability”  Thin low-level abstraction to expose how hardware works  Hidden resource memory & state  App explicit memory management  Resource CPU access tied to device context  Resources are globally accessible  Driver analyzes & synchronizes implicitly  App explicit resource state transitions
  • 6. Control App responsibility  Tell when render target will be used as a texture ‒ And many more resource state transitions  Don’t destroy resources that GPU is using ‒ Keep track with fences or frames  Manual dynamic resource renaming ‒ No DISCARD for driver resource renaming  Resource memory tiling  Powerful validation layer will help!
  • 7. Control Explicit control enables  App high-level decisions & optimizations ‒ Has full scene information ‒ Easier to optimize performance & memory  Flexible & efficient memory management ‒ Linear frame allocators ‒ Memory pools ‒ Pinned memory  Reduced development time ‒ For advanced game engines & apps ‒ Easier to get to target performance & robustness
  • 8. Control Explicit control enables  Transient resources ‒ Alias render targets within frame ‒ Major memory savings ‒ No need to pre-allocate everything  Light-weight driver ‒ Easier to develop & maintain ‒ Reduced CPU draw call overhead
  • 10. CPU perf Core concepts  Descriptor sets  Monolithic pipelines  Command buffers
  • 11. CPU perf Descriptor sets  Table with resource references to bind to graphics or compute pipeline Image Memory Sampler Link  Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works  Example 1: Single simple dynamic descriptor set ‒ Bind everything you need for a single draw call ‒ Close to DX/GL model but share between stages Dynamic descriptor set VertexBuffer (VS) Texture0 (VS+PS) Constants (VS) Texture1 (PS)  App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic Texture2 (PS) Sampler0 (VS+PS)
  • 12. CPU perf Descriptor sets  Table with resource references to bind to graphics or compute pipeline Image  Example 2: Reuse static set with nesting ‒ Reduce update time & memory usage Memory Static descriptor set Sampler Link Dynamic descriptor set  Replaces traditional resource stage binding ‒ Major performance & flexibility advantage ‒ Closer to how the hardware works Constants (VS) Link VertexBuffer (VS) Texture0 (VS+PS) Texture1 (PS) Texture2 (PS) Texture3 (PS)  App managed - lots of strategies possible! ‒ Tiny vs huge sets ‒ Single vs multiple ‒ Static vs semi-static vs dynamic Texture4 (PS) Sampler0 (VS+PS) Sampler1 (PS)
  • 13. CPU perf Monolithic pipelines  Shader stages & select graphics state combined into single object ‒ No runtime compilation or patching needed! ‒ Significantly less runtime overhead to use Pipeline state  Supports parallel building & caching ‒ Fast loading times  Usage & management up to the app ‒ Static vs dynamic creation ‒ Amount of pipelines ‒ State usage IA DB VS HS DS Tessellator GS RS PS CB
  • 14. CPU perf Command buffers  Issue pipelined graphics & compute commands into a command buffer ‒ Bind graphics state, descriptor sets, pipeline ‒ Draw calls ‒ Render targets ‒ Clears ‒ Memory transfers ‒ NOT: resource mapping  Fully independent objects ‒ Create multiple every frame ‒ Or pre-build up front and reuse
  • 15. CPU perf DX/GL parallelism CPU 0 CPU 1 CPU 2 Game Game Game Render Render Driver Render  Automatically extracts parallelism out of most apps   Doesn’t scale beyond 2-3 cores   Additional latency   Driver thread often bottleneck – can collide app threads  Render
  • 16. CPU perf Parallel dispatch with Mantle CPU 0 Game Game Game CPU 1 Render Render Render CPU 2 Render Render Render CPU 3 Render Render Render CPU 4 Render Render Render  App can go fully wide with its rendering – minimal latency   Close to linear scaling with CPU cores   No driver threads – no overhead – no contention   Frostbite’s approach on all consoles – and on PC with Mantle! 
  • 18. GPU perf GPU optimizations  Thanks to improved CPU performance – CPU will rarely be a bottleneck for the GPU ‒ CPU could help GPU more: ‒ Less brute force rendering ‒ Improve culling  Resource states ‒ Gives driver a lot more knowledge & flexibility ‒ Apps can avoid expensive/redundant transitions, such as surface decompression  Expose existing GPU functionality  Shader pipeline object – driver optimizations ‒ Can optimize with pipeline state knowledge ‒ Can optimize across all shader stages ‒ Quad & Rect-lists ‒ HW-specific MSAA & depth data access ‒ Programmable sample patterns ‒ And more..
  • 19. GPU perf Queues  Modern GPUs are heterogeneous machines with multiple engines Graphics ‒ Graphics pipeline ‒ Compute pipeline(s) ‒ DMA transfer ‒ Video encode/decode ‒ More…  Mantle exposes queues for the engines + synchronization primitives Compute DMA ... Queues GPU
  • 21. GPU perf Queue use cases  Async DMA transfers ‒ Copy resources in parallel with graphics or compute Copy DMA Graphics Render Other render Use copy
  • 22. GPU perf Queue use cases  Async DMA transfers ‒ Copy resources in parallel with graphics or compute  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute Graphics GBuffer Non-shadowed lighting Shadowmap 0 Shadowmap 1 Final lighting
  • 23. GPU perf Queue use cases  Async DMA transfers  Multiple compute kernels collaborating ‒ Copy resources in parallel with graphics or compute ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute rasterizer  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute 0 Compute 1 Graphics Compute Geometry Compute Rasterizer Ordinary Rendering
  • 24. GPU perf Queue use cases  Async DMA transfers  Multiple compute kernels collaborating ‒ Copy resources in parallel with graphics or compute ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute rasterizer  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units Compute Graphics Process0 Process1 Draw0  Compute as frontend for graphics pipeline ‒ Compute runs asynchronously ahead and prepares & optimizes geometry for graphics pipeline Process0 Draw1 Draw2
  • 25. GPU perf Queue use cases  Async DMA transfers  Multiple compute kernels collaborating ‒ Copy resources in parallel with graphics or compute ‒ Can be faster than über-kernel ‒ Example: Compute geometry backend & compute rasterizer  Async compute together with graphics ‒ ALU heavy compute work at the same time as memory/ROP bound work to utilize idle units  Compute as frontend for graphics pipeline ‒ Compute runs asynchronously ahead and prepares & optimizes geometry for graphics pipeline  Game engines will build large GPU job graphs ‒ Move away from single sequential submission ‒ Just as we already have done on CPU
  • 27. Programmability Explicit Multi-GPU  Explicit control of GPU queues and synchronization, finally! ‒ Implement your own Alternate-Frame-Rendering ‒ Or something more exotic..  Use case: Workstation rendering with 4-8 GPUs ‒ Super high-quality rendering & simulation ‒ Load balance graphics & compute job graphs across GPUs ‒ 20-40 TFlops in a single machine!  Use case: Low-latency rendering ‒ Important for VR and competitive games ‒ Latency optimized GPU job graph scheduling ‒ VR: Simultaneously drive 2 GPUs (1 per eye)
  • 28. Programmability New mechanisms  Command buffer predication & flow control ‒ GPU affecting/skipping submitted commands ‒ Go beyond DrawIndirect / DispatchIndirect ‒ Advanced variable workloads ‒ Advanced culling optimizations  Write occlusion query results into GPU buffer ‒ No CPU roundtrip needed ‒ Can drive predicated rendering ‒ Or use results directly in shaders (lens flares)
  • 29. Programmability Bindless resources  Mantle supports bindless resources ‒ Shaders can select resources to use instead of static binding from CPU ‒ Extension of the descriptor set support  Examples ‒ Performance optimizations – less data to update ‒ Logic & data structures that live fully on the GPU ‒ Scene culling & rendering ‒ Material representations  Key component that will open up a lot of opportunities! ‒ Deferred shading ‒ Raytracing
  • 31. Platforms Today  Mantle gives us strong benefits on Windows today ‒ Console-like performance & programmability on both Windows 7 and Windows 8 ‒ For us, well worth the dev time!  DX & GL are the industry standards ‒ Needed for platforms that do not support Mantle ‒ Needed by devs who do not want/need more control ‒ Have to have fallback paths for GL/DX, but not limit oneself to it  Mantle and PlayStation 4 will drive our future Frostbite designs & optimizations ‒ PS4 graphics API has great programmability & performance as well ‒ Share concepts, methods & optimization strategies
  • 32. Platforms Linux & Mac  Want to see Mantle on Linux and Mac! ‒ Would enable support for our full engine & rendering ‒ Significantly easier to do efficient renderer with Mantle than with OpenGL  Use cases: ‒ Workstations ‒ R&D ‒ Not limited by WDDM ‒ Games ‒ Mantle + SteamOS = powerful combination!
  • 33. Platforms Mobile  Mobile architectures are getting closer in capabilities to desktop GPUs  Want graphics API that allows apps to fully utilize the hardware ‒ Power efficient ‒ High performance ‒ Programmable  Major opportunity with Mantle – leap frog GL4, DX11 ‒ For mobile SoC vendors ‒ For Google and Apple
  • 34. Platforms Multi-vendor?  Mantle is designed to be a thin hardware abstraction ‒ Not tied to AMD’s GCN architecture ‒ Forward compatible ‒ Extensions for architecture- and platform-specific functionality  Mantle would be a much more efficient graphics API for other vendors as well ‒ Most Mantle functionality can be supported on today’s modern GPUs  Want to see future version of Mantle supported on all platforms and on all modern GPUs! ‒ Become an active industry standard with IHVs and ISVs collaborating ‒ Enable us developers to innovate with great performance & programmability everywhere
  • 36. Frostbite Battlefield 4  Mantle support is in development ‒ Core renderer (closer to PS4 than DX11) ‒ Implement all rendering techniques used in BF4 (many!) ‒ CPU optimizations (parallel dispatch, descriptor sets) ‒ GPU optimizations (minimize transitions, MSAA) ‒ R&D for advanced GPU optimizations ‒ Memory management ‒ Multi-GPU support ‒ ~2 months of work  Update targeting late December
  • 37. Frostbite Plants vs Zombies: Garden Warfare  Very different rendering compared to BF4   Frostbite Mantle renderer will work out of the box  Focus on APU performance
  • 38. Frostbite Future  All Frostbite games designed with Mantle ‒ 15 games in development across all of EA  Advanced Mantle rendering & use cases ‒ Lots of exciting R&D opportunities!  Want multi-vendor & multi-platform support!

Editor's Notes

  • #3: So what is Mantle?Mantle is a low-level graphics API and it’s goals are to improve performance and make easier to develop these really advanced application and give developers a lot of freedom to build innovative graphics solutions.And it is a bit of a challenge to the established order of things, which I think is fun and healthy for the industry
  • #4: We’ve been working with Mantle for some time now and adding support in our engine Frostbite and Battlefield 4. And I wanted to share some of our learnings and what Mantle can mean in general for developers
  • #12: Can be of any sizeSingle per-draw call small dynamic descriptor setStatic + dynamicMultiple ones nested by update frequency
  • #13: Can be of any sizeSingle per-draw call small dynamic descriptor setStatic + dynamicMultiple ones nested by update frequency
  • #15: Not needed for:Resource mapping´No implicit pipeline flushingMuch easier to track down stalls in the app itself
  • #28: Also foveated rendering
  • #30: ? Need to go beyond HLSL for pointer support in shaders
  • #34: What is next after OpenGL ES3?
  • #38: Kaveri