World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Physics and Graphics

World of Tanks* 1.0+:
Enriching gamers experience with multicore
optimized physics and graphics
31.01.2018
Philipp Gerasimov, Mike Voss, Intel
Bronislav Sviglo, Wargaming.net
GDC 2018

2
• Philipp Gerasimov, Intel
Senior Game / Graphics Application Engineer, DRD / SSG,
Munich.
World of Tanks
Introduction
• Mike Voss, Intel
Principal Engineer, Threading Runtimes Team,
DPD / SSG, Austin, Texas.
• Bronislav Sviglo, Wargaming.net
Rendering Team Lead, World of Tanks Team
Minsk.

Agenda
3
• Making your game ready for Modern CPUs
• World of Tanks 1.0
• Going Beyond 1.0
• Threading Building Blocks (TBB)
• Destructions
• Tanks Treads
• Concurrent Rendering
• Summary and Q&A

4
Making your game ready for
modern CPUs

5
• Multi Core
• All modern platforms are MC: mobile phones, consoles, desktop and mobile
PCs.
• Adding more cores is the most efficient way to make faster CPUs, so critical
for game developers to use it.
• Vector Instructions
• Another powerful way to improve code performance.
• SSE/AVX/AVX2 supported across different CPU vendors on PC and consoles.
• Most middleware developers have support for it.
Parallel Computing
Two vectors of parallelism

6
Parallel Computing
Wide HW user base
CPU i7-8700K i7-8650U
# Cores 6 4
# Threads 12 8
Base Frequency 3.70 GHz 1.90 GHz
Max Frequency 4.70 GHz 4.20 GHz
Instruction Set
Extensions
Intel® SSE4.1 Intel®
SSE4.2 Intel® AVX2
SSE4.2 Intel® AVX2
95W
Desktop SKU
15W
Mobile SKU
CPU i7-4790K i7-4650U
# Cores 4 2
# Threads 8 4
Base Frequency 3.60 GHz 1.70 GHz
Max Frequency 4.00 GHz 3.30 GHz
Instruction Set
Extensions
SSE4.2 Intel® AVX2
SSE4.2 Intel® AVX2
84W
Desktop SKU 20182013 15W
Mobile SKU
 Enhance the enthusiasts experience
 Maintain strong backward compatibility

7
• Optimizing for Modern CPUs does not mean just fps
improvements!
• Performance Focused Features:
• Critical for low TDP platforms and consoles (Low Frequency Cores)
• Not being limited by main / client thread performance
• E-Sport class performance for High-End PCs (140+ fps)
• Enhanced Gaming Experience Features:
• Better visual effects (Better Particles, Smarter occlusion culling)
• Better physics (Destructions, collisions, water / wind simulation )
• Better AI (Smarter UI, more characters)
• Better sound (High quality 3D sound)
Parallel Computing
Enriching User Experience

10
• Supports huge range of hardware 2004 – 2018+
• Lots of CPUs with 2 physical cores (~60%)
• Has two pipelines simple & advanced
• Still has two Graphics API D3D9 on WinXP & D3D11
• Had several graphics evolutions in the past 0.8.x, 0.9.x
World of Tanks
Graphics overview

13
• Quick overview of destructions
• Improved simulation of tank treads
• Concurrent rendering
World of Tanks 1.0+
Keep enriching gamers experience

14
• Dedicated threads for each major engine sub-system
(audio, render, physics, simulation, etc.)
• Task based (pool of threads, work subdivided into tasks)
• Mix of those approaches
World of Tanks 1.0+
How to make the engine concurrent?

15
• Easy to use
• Two types of parallelism: functional/task and data
• Feature rich and robust
• Good support
• Threading Building Blocks
World of Tanks 1.0+
How to select a good job system?

16
Threading Building Blocks (TBB)

17
• What
• Parallel algorithms and data structures
• Threads and synchronization primitives
• Scalable memory allocation and task scheduling
• General Benefits
• Is a library-only solution that does not depend on special compiler support
• Is both a commercial product and an open-source project
• Supports C++, Windows*, Linux*, OS X*, Android* and other OSes
• Commercial support for Intel® AtomTM, CoreTM, Xeon® processors and for
Intel® Xeon PhiTM coprocessors
A widely used C++ template library for parallel programming
(since 2006)

18
• Expresses parallelism at a high-level to get efficient
performance on different platforms
• TBB-based parallelism scales as more cores become available
• Gets great performance on newest multicore platforms
• But works well on older machines too
• Express the parallelism and let TBB map it to the platform
• Supports multilevel and nested parallelism well
• Functional and data parallelism implemented using the same TBB tasks
• Underlying tasks are executed by same set of TBB worker threads
• Leads to composable parallelism – TBB schedules all of it
• A single thread pool avoids oversubscription problems
Why it is useful for World of Tanks

19
opentbb.org
Parallel Execution Interfaces Interfaces Independent of Execution Model

20
buffer
get_next_image
preprocess
detect_with_A
detect_with_B
make_decision
Can express pipelining, task parallelism and data parallelism
Example Feature Detection Algorithm
tbb::parallel_for( … );
tbb::flow_graph

21
To Learn More about TBB:
See Intel’s The Parallel Universe Magazine
https://software.intel.com/en-us/intel-parallel-universe-magazine
http://threadingbuildingblocks.org http://software.intel.com/intel-tbb

23
Destructions
Overview
• Powered by Havok© destructions module
• Collision and physics in one system
• Out of the box multithreading

24
Collision scene Destruction scene
Destructions
Destruction & collision scenes

Destructions
Architecture
25
Simulation
Main thread
Collision queries Collision queries
Internal Havok threads
~Number of
processor cores
IO thread

26
ToDo:
Havok destruction video

27
Destructions
Summary & feature work
• Summary
• High quality simulation of destruction process
• Multithreading execution significantly increase performance
• Future work
• Use TBB job system to execute havok tasks

29
• Skinned mesh
• Visually static
• The tread moves by scrolling its texture
• Spline tread
• The general shape of the tread is represented by a spline
• Each segment is rendered as a separate model
• Segments positions are determined by the spline
• The spline shape is animated by moving its control points
Tank Treads
Previous implementations

30
• Spline tracks visually superior to skinned meshes, but
still have flaws:
• No proper collisions with the environment
• Too smooth shapes of curves
• A very complex tuning process
Tank Treads
Things to improve

31
• Spring chain simulation
• Procedural animation
• Collisions with the environment
Tank Treads
Designing the new treads

32
• Spring chain controls tread shape
• Tread is divided into 4 parts: front, top, back and bottom
• Ray cast the area underneath the tank and from height field
• Collide each spring joint with it
Tank Treads
General solution & collisions

33
Tank Treads
Performance
* RAD Game Tools Telemetry

35
Tank Treads
Summary & feature work
• Summary
• High quality simulation
• Multicore support
• Future work
• Treads simulation for exotic tank types
• Single threaded optimizations

37
• Engine evolution
• Abstract Rendering Interface (ARI)
• Architecture
• Usage of task and data parallelism
Concurrent Rendering
Agenda

38
• Initial release based on BigWorld engine (2010)
Engine Evolution
Tick
Render
environment
Present (wait)
Direct3D 9 API
Render tanks Render effects

39
• World of Tanks came out as Direct3D 9 game
• Powered by BigWorld Engine
• Didn’t have any layer between engine renderers and Direct3D API
• Highly tied to Direct3D Effect Framework
• Never meant to run efficiently on multi-core
• No job system
• Frame tick and render ran on main thread in the synchronous manner
• GPU workload depended on the subsystems’ render order
Engine Evolution
Tick
Render
environment
Present (wait)
Direct3D 9 API
Render
tanks
Render effects

40
• Patch 0.9.15 and Core 3.0 Engine (Aug 2016)
Engine Evolution
Tick
Render
environment
Resolve queries
(wait)
Direct3D 9/11 API
Render tanks Render effects
ARI Command list
Main thread
Render thread

41
• Patch 0.9.15 and Core 3.0 Engine (Aug 2016)
• Abstract Rendering Interface (ARI)
• Unified way to handle Direct3D 9 and Direct 3D 11 hardware
• Performance gain up to 15-30%
• Renderers still working in main thread
• Separate thread for ARI-Direct3D per-frame interactions
• WGFX intermediate compiler
• Independence from Direct 3D effect framework
• Faster effect support
Engine Evolution
Tick
Render
environment
Resolve
queries (wait)
Direct3D 9/11 API
Render
tanks
Render effects
ARI Command list
Main thread
Render thread

42
• Command list – explicitly describes what should be
done by rendering thread
• Device interface – our “driver” between ARI frontend and
graphics API
• Resource – any resource like buffer, texture, query,
graphics pipeline state, etc.
Abstract Rendering Interface (ARI)

stores
43
• Commands
• Simple structures storing function arguments for the operation.
• Each command stores all of the state it relies upon.
• Concurrency support
• Command list writing is done in parallel
• Submitted commands don’t have interdependencies
• Examples
• Draw (instancing, indexed, etc)
• Clear (RTV, DSV, UAV)
• Query (begin/end)
ARI: Command List
CommandList
Stores commands as
plain data structs
Command
Draw
Dispatch
CopyResource
…

44
• Free-threaded: e.g. resource creationremoval, adapter
state
• Single-threaded: e.g. command list submit and compile,
read backs, fencing
• Creation thread only: e.g. special operations such as
device reset.
ARI: Device Interface
Device::Interface
• Creates Resources
• Present, Adapter Info, Etc
D3D9
D3D11
D3D12

• Frame render graph based on TBB flow graph
• Separate command lists for each rendering subsystem
• TBB primitives for inner subsystems’ parallelism
• Separate thread for per-frame graphics API calls
45
Approach overview
Core 0 Core 1 Core 2 Core 3Core 0 Core 1 Core 2 Core 3

46
• High-level frame render graph
• Nodes
• Big chunks of work in one of the renderer subsystems
• Intermediate render context flushes
• Edges
• Dependencies between renderer subsystems
• Inner subsystems dependencies
• Indication that context is ready for flush
Frame render graph
Nodes
• Occlusion Resolve
• Dynamic models
• Static Models
• Transparent models
• Tanks
• Vegetation
• Terrain
• Shadows
• Lighting
• Atmosphere
• Water
• Post-Processing
• GUI

47
• High-level frame render graph
Frame render graph
VT prepare
Shadows
VegetationAtmosphere
Particles
Visibility
resolve
VFX prepare Decal culling
Particle
shadows
Environment
models
Tanks
Intermediate submit and sync
Water culling
Terrain
Lighting
Intermediate submit Main submit
Post
processing
Transparent
models
VT tiles
UI
Water draw
Issue
occlusion
queries

• Submitting GPU workload
• Main per-frame contexts flush
• Every path comes here
• Uploads all gathered contexts to GPU submission thread
• Intermediate flush
• Synchronization point for tasks that order-dependent on the GPU side
• Prevents the GPU submission thread starvation
48
Concurrent rendering
Frame render graph
Waits for sync
Render thread
submits
Tick
Render
environment
Render water
Render thread submits
Render
tanks
Render
effects
GPU draws
Render thread
waits
GPU waits

49
• Render frame ~17ms
• Parallel execution off
Frame render graph

50
• Parallel execution on
• Resolve visibility
• Static models
Frame render graph

51
• Static models
• Tanks
• Lighting
• Water
• Vegetation
• Post-processing
Frame render graph

52
• Speedup ~2x
• Static models
• Tanks
• Lighting
• Water
• Vegetation
• Post-processing
• Shadows
• ...
Frame render graph

53
• Functional parallelism
• Pros
• Easy to implement
• Easy to read and maintain
• Easy to reason about
Frame render graph

54
• Pros
• Cons
• Too high level
• Some paths can’t be shortened
• Critical execution path
Frame render graph

55
• Pros
• Cons
• Too high level
Frame render graph

56
• Pros
• Cons
• Too high level
• Data parallelism to rescue!
Frame render graph

57
Frame render graph
• Parallel execution off

58
Frame render graph
• Speedup ~3x

59
Frame render graph
• Speedup ~3x

60
Frame render graph
• Speedup ~3x
• Data parallelism on

61
• Summary
• Significant speedup in commands gathering
• Code is still simple and easy to modify
• Room for more demanding graphics
• Future work
• Future decomposition
• Parallel algorithms
• Better submission pattern
• Consume spare performance
Summary & future work

62
• Multi-core CPU is the future of gaming
• World of Tanks right now is finding the way to use it
effectively
World of Tanks 1.0+
Summary

Acknowledgments
63
Big thank you to:
• Render, R&D artist, Tools, Engine team
• … and entire WoT dev team

64
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH
PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS,
INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER
INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life-saving, life-sustaining, critical control or safety systems, or in nuclear facility applications.
Intel products may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are
available on request.
Intel may make changes to dates, specifications, product descriptions, and plans referenced in this document at any time, without notice.
This document may contain information on products in the design phase of development. The information herein is subject to change without notice. Do not finalize a design with
this information.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have
no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
Intel Corporation or its subsidiaries in the United States and other countries may have patents or pending patent applications, trademarks, copyrights, or other intellectual property
rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or
otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights.
Wireless connectivity and some features may require you to purchase additional software, services or external hardware.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those
tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the
performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel
Performance Benchmark Limitations
Intel, the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Other names and brands may be claimed as the property of others.
Copyright © 2014 Intel Corporation. All rights reserved.

65
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY
THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY
OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY
RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with
other products.
Copyright© 2014, Intel Corporation. All rights reserved. Intel, the Intel logo, Atom, Xeon, Xeon Phi, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other
countries.

66
Optimization Notice
Intel® compilers, associated libraries and associated development tools may include or utilize options that optimize for instruction sets that are available in both Intel® and non-Intel
microprocessors (for example SIMD instruction sets), but do not optimize equally for non-Intel microprocessors. In addition, certain compiler options for Intel compilers, including
some that are not specific to Intel micro-architecture, are reserved for Intel microprocessors. For a detailed description of Intel compiler options, including the instruction sets and
specific microprocessors they implicate, please refer to the “Intel® Compiler User and Reference Guides” under “Compiler Options." Many library routines that are part of Intel®
compiler products are more highly optimized for Intel microprocessors than for other microprocessors. While the compilers and libraries in Intel® compiler products offer
optimizations for both Intel and Intel-compatible microprocessors, depending on the options you select, your code and other factors, you likely will get extra performance on Intel
microprocessors.
Intel® compilers, associated libraries and associated development tools may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental
Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.
While Intel believes our compilers and libraries are excellent choices to assist in obtaining the best performance on Intel® and non-Intel microprocessors, Intel recommends that you
evaluate other compilers and libraries to determine which best meet your requirements. We hope to win your business by striving to offer the best performance of any compiler or
library; please let us know if you find we do not.

68
• Our physical simulation should consider the following
aspects
• Each tread consists of 60 to 250 segments
• Hundreds of different tanks and dozens of suspensions to support
• Strict requirements for the tread’s stability
• The server doesn’t consider the tread when it simulates the tank’s
movement
• The tread’s setup process should be easy for the artist
Tank Treads
Challenges

69
• Also the tread is able to sag if
there is an empty space beneath it
• For an average PC we always snap the
bottom part of the tread to wheels
• For High-End PC
• Calculate the bottom snapping it to wheels
• Simulate the bottom without snapping
• Based on how far from an obstacle a
particular simulated tread’s joint blend its
final position between snapped and
unsnapped positions
Tank Treads
Sagging

70
• Parallel execution guidelines
• Measure
• Start from parallel algorithms with a coarse grain
• Measure!
• Make global decision in advance to heavy computations
• Reconsider your data structure
• Measure!!
• Fine tune algorithms grain sizes
• Bring in vectorization as last resort
Data parallelism

71
• Data layout guidelines
• Make processed objects self-sufficient as much as is
reasonable
• Group up objects with similar data together and process in
blocks
• Prefer structures of arrays
• Know your data critical paths
• Don't read and write to the same buffer
Data parallelism

World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Physics and Graphics

More Related Content

What's hot (20)

Similar to World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Physics and Graphics (20)

More from Intel® Software (20)

Recently uploaded (20)

World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Physics and Graphics

Editor's Notes