SlideShare a Scribd company logo
The Next Mainstream Programming Language: A Game Developer’s Perspective Tim Sweeney Epic Games
Outline Game Development Typical Process What’s in a game? Game Simulation Numeric Computation Shading Where are today’s languages failing? Concurrency Reliability
Game Development
Game Development: Gears of War Resources ~10 programmers ~20 artists ~24 month development cycle ~$10M budget Software Dependencies 1 middleware game engine ~20 middleware libraries OS graphics APIs, sound, input, etc
Software Dependencies … Gears of War Gameplay Code ~250,000 lines C++, script code Unreal Engine 3  Middleware Game Engine ~250,000 lines C++ code DirectX Graphics OpenAL Audio Ogg Vorbis Music Codec Speex Speech Codec wx Widgets Window Library ZLib Data Compr- ession
Game Development: Platforms The typical Unreal Engine 3 game will ship on: Xbox 360 PlayStation 3 Windows Some will also ship on: Linux MacOS
What’s in a game? The obvious: Rendering Pixel shading Physics simulation, collision detection Game world simulation Artificial intelligence, path finding But it’s not just fun and games: Data persistence with versioning, streaming Distributed Computing (multiplayer game simulation) Visual content authoring tools Scripting and compiler technology User interfaces
Three Kinds of Code Gameplay Simulation Numeric Computation Shading
Gameplay Simulation
Gameplay Simulation Models the state of the game world as interacting objects evolve over time High-level, object-oriented code Written in C++ or scripting language Imperative programming style Usually garbage-collected
Gameplay Simulation – The Numbers 30-60 updates (frames) per second ~1000 distinct gameplay classes Contain imperative state Contain member functions Highly dynamic ~10,000 active gameplay objects Each time a gameplay object is updated, it typically touches 5-10 other objects
Numeric Computation Algorithms: Scene graph traversal Physics simulation Collision Detection Path Finding Sound Propagation Low-level, high-performance code Written in C++ with SIMD intrinsics Essentially functional Transforms a small input data set to a small output data set, making use of large constant data structures.
Shading
Shading Generates pixel and vertex attributes Written in HLSL/CG shading language Runs on the GPU Inherently data-parallel Control flow is statically known  “ Embarassingly Parallel” Current GPU’s are 16-wide to 48-wide!
Shading in HLSL
Shading – The Numbers Game runs at 30 FPS @ 1280x720p ~5,000 visible objects ~10M pixels rendered per frame Per-pixel lighting and shadowing requires multiple rendering passes per object and per-light Typical pixel shader is ~100 instructions long Shader FPU’s are 4-wide SIMD ~500 GFLOPS compute power
Three Kinds of Code FPU Usage Lines of Code CPU Budget Languages 500 GFLOPS 5 GFLOPS 0.5 GFLOPS 10,000 250,000 250,000 n/a 90% 10% CG, HLSL C++ C++, Scripting Shading Numeric Computation Game Simulation
What are the hard problems? Performance When updating 10,000 objects at 60 FPS, everything is performance-sensitive Modularity Very important with ~10-20 middleware libraries per game Reliability Error-prone language / type system leads to wasted effort finding trivial bugs Significantly impacts productivity Concurrency Hardware supports 6-8 threads C++ is ill-equipped for concurrency
Performance
Performance When updating 10,000 objects at 60 FPS, everything is performance-sensitive But: Productivity is just as important Will gladly sacrifice 10% of our performance for 10% higher productivity We never use assembly language There is not a simple set of “hotspots” to optimize! That’s all!
Modularity
Unreal’s game framework package UnrealEngine; class Actor { int Health; void TakeDamage(int Amount) { Health = Health – Amount; if (Health<0) Die(); } } class Player extends Actor { string PlayerName; socket NetworkConnection; } Gameplay module Base class of gameplay objects Members
Game class hierarchy Actor Player Enemy InventoryItem Weapon Actor Player Enemy Dragon Troll InventoryItem Weapon Sword Crossbow Generic Game Framework Game-Specific Framework Extension
Software Frameworks The Problem:   Users of a framework   need to extend the functionality   of the framework’s base classes! The workarounds: Modify the source   …and modify it again with each new version Add references to payload classes, and dynamically cast them at runtime to the appropriate types.
Software Frameworks The Problem:   Users of a framework   want to extend the functionality   of the framework’s base classes! The workarounds: Modify the source   …and modify it again with each new version Add references to payload classes, and dynamically cast them at runtime to the appropriate types. These are all error-prone: Can the compiler help us here?
What we would like to write… The basic goal: To extend an entire software framework’s class hierarchy in parallel, in an open-world system . package Engine; class Actor { int Health; … } class Player extends Actor { … } class Inventory extends Actor { … } Base Framework Package GearsOfWar extends Engine; class Actor extends Engine.Actor { // Here we can add new members // to the base class. …  } class Player extends Engine.Player { // Thus virtually inherits from // GearsOfWar.Actor … } class Gun extends GearsOfWar.Inventory { … } Extended Framework
Reliability Or: If the compiler doesn’t beep, my program should work
Dynamic Failure in Mainstream Languages Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m) { Vertex[] Result = new Vertex[Indices.length]; for(int i=0; i<Indices.length; i++) Result[i] = Transform(m,Vertices[Indices[i]]); return Result; }; Example (C#): Given a vertex array and an index array, we read and transform the indexed vertices into a new array. What can possibly go wrong?
Dynamic Failure in Mainstream Languages Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m) { Vertex[] Result = new Vertex[Indices.length]; for(int i=0; i<Indices.length; i++) Result[i] = Transform(m,Vertices[Indices[i]]); return Result; }; May be NULL May be NULL May contain indices outside of the range of the Vertex array May be NULL Array access might be out of bounds Will the compiler realize this can’t fail? Could dereference a null pointer Our code is littered with runtime failure cases, Yet the compiler remains silent!
Dynamic Failure in Mainstream Languages Solved problems: Random memory overwrites Memory leaks   Solveable: Accessing arrays out-of-bounds Dereferencing null pointers Integer overflow Accessing uninitialized variables 50% of the bugs in Unreal can be traced to these problems!
What we would like to write… Transform{n:nat}(Vertices:[n]Vertex, Indices:[]nat<n, m:Matrix):[]Vertex=   for each(i in Indices)   Transform(m,Vertices[i]) Universally quantify over all natural numbers An array of exactly known size An index buffer containing natural numbers less than n Haskell-style array comprehension The only possible failure mode: divergence, if the call to Transform diverges.
How might this work? Dependent types Dependent functions Universal quantification int nat nat<n Sum{n:nat}(xs:[n]int)=.. a=Sum([7,8,9]) Sum(n:nat,xs:[n]int)=.. a=Sum(3,[7,8,9]) The Integers The Natural Numbers The Natural Numbers less than n, where n may be a variable! Explicit type/value dependency between function parameters
How might this work? Separating the “pointer to t” concept from the “optional value of t” concept   Comprehensions (a la Haskell), for safely traversing and generating collections   xp:^int xo:?int xpo:?^int A pointer to an integer An optional integer An optional pointer to an integer! Successors(xs:[]int):[]int= foreach(x in xs) x+1
How might this work? A guarded casting mechanism for cases where need a safe “escape”: All potential failure must be explicitly handled, but we lose no expressiveness. GetElement(as:[]string, i:int):string=   if(n:nat<as.length=i) as[n]   else “ Index Out of Bounds” Here, we cast  i  to type of natural numbers bounded by the length of  as , and bind the result to  n We can only access  i within this context If the cast fails, we execute the else-branch
Analysis of the Unreal code Usage of integer variables in Unreal: 90% of integer variables in Unreal exist to index into arrays 80% could be dependently-typed explicitly, guaranteeing safe array access without casting. 10% would require casts upon array access. The other 10% are used for: Computing summary statistics Encoding bit flags Various forms of low-level hackery “ For” loops in Unreal: 40% are functional comprehensions 50% are functional folds
Accessing uninitialized variables Can we make this work? This is a frequent bug. Data structures are often rearranged, changing the initialization order. Lessons from Haskell: Lazy evaluation enables correct out-of-order evaluation Accessing circularly entailed values causes thunk reentry (divergence), rather than just returning the wrong value Lesson from Id90: Lenient evaluation is sufficient to guarantee this class MyClass { const int a=c+1; const int b=7; const int c=b+1; } MyClass myvalue = new C; // What is myvalue.a?
Dynamic Failure: Conclusion Reasonable type-system extensions could statically eliminate all: Out-of-bounds array access Null pointer dereference Integer overflow Accessing of uninitialized variables See Haskell for excellent implementation of: Comprehensions  Option types via Maybe Non-NULL references via IORef, STRef Out-of-order initialization
Integer overflow The Natural Numbers  Factoid: C# exposes more than 10 integer-like data types, none of which are those defined by (Pythagoras, 500BC). In the future, can we get integers right? data Nat = Zero | Succ Nat
Can we get integers right? Neat Trick: In a machine word (size 2 n ), encode an integer ±2 n-1  or a pointer to a variable-precision integer Thus “small” integers carry no storage cost Additional access cost is ~5 CPU instructions But: A natural number bounded so as to index into an active array is guaranteed to fit within the machine word size (the array is the proof of this!) and thus requires no special encoding. Since ~80% of integers can dependently-typed to access into an array, the amortized cost is ~1 CPU instruction per integer operation. This could be a viable tradeoff
Concurrency
The C++/Java/C# Model: “Shared State Concurrency” The Idea: Any thread can modify any state at any time. All synchronization is explicit, manual. No compile-time verification of correctness properties: Deadlock-free Race-free
The C++/Java/C# Model: “Shared State Concurrency” This is hard! How we cope in Unreal Engine 3: 1 main thread responsible for doing all work we can’t hope to safely multithread 1 heavyweight rendering thread A pool of 4-6 helper threads Dynamically allocate them to simple tasks. “ Program Very Carefully!” Huge productivity burden Scales poorly to thread counts There must be a better way!
Three Kinds of Code: Revisited Gameplay Simulation Gratuitous use of mutable state 10,000’s of objects must be updated Typical object update touches 5-10 other objects Numeric Computation Computations are purely functional But they use state locally during computations Shading Already implicitly data parallel
Concurrency in Shading Look at the solution of CG/HLSL: New programming language aimed at “Embarassingly Parallel” shader programming Its constructs map naturally to a data-parallel implementation Static control flow (conditionals supported via masking)
Concurrency in Shading Conclusion: The problem of  data-parallel  concurrency is effectively solved(!) “ Proof”:  Xbox 360 games are running with 48-wide data shader programs utilizing half a Teraflop of compute power...
Concurrency in Numeric Computation These are essentially pure functional algorithms, but they operate locally on mutable state Haskell ST, STRef solution enables encapsulating local heaps and mutability within referentially-transparent code These are the building blocks for implicitly parallel programs Estimate ~80% of CPU effort in Unreal can be parallelized this way In the future, we will write these algorithms using referentially-transparent constructs.
Numeric Computation Example: Collision Detection A typical collision detection algorithm takes a line segment and determines when and where a point moving along that line will collide with a (constant) geometric dataset. struct vec3 { float x,y,z; }; struct hit { bool  DidCollide; float Time; vec3  Location; }; hit collide(vec3 start,vec3 end); Vec3  = data Vec3 float float float Hit  = data Hit float Vec3 collide :: (vec3,vec3)->Maybe Hit
Numeric Computation Example: Collision Detection Since  collisionCheck  is effects-free, it may be executed in parallel with any other effects-free computations. Basic idea: The programmer supplies effect annotations to the compiler. The compiler verifies the annotations. Many viable implementations (Haskell’s Monadic effects, effect typing, etc) collide(start:Vec3,end:Vec3):?Hit print(s:string)[#imperative]:void A pure function (the default) Effectful functions require explicit annotations In a concurrent world, imperative is the wrong default!
Concurrency in Gameplay Simulation This is the hardest problem… 10,00’s of objects Each one contains mutable state Each one updated 30 times per second Each update touches 5-10 other objects   Manual synchronization (shared state concurrency) is  hopelessly intractible here.   Solutions? Rewrite as referentially-transparent functions? Message-passing concurrency? Continue using the sequential, single-threaded approach?
Concurrency in Gameplay Simulation: Software Transactional Memory See “ Composable memory transactions”; Harris, Marlow, Peyton-Jones, Herlihy The idea: Update all objects concurrently in arbitrary order, with each update wrapped in an atomic {...} block With 10,000’s of updates, and 5-10 objects touched per update, collisions will be low ~2-4X STM performance overhead is acceptable: if it enables our state-intensive code to scale to many threads, it’s still a win Claim: Transactions are the only plausible solution to concurrent mutable state
Three Kinds of Code: Revisited 500 GFLOPS 5 GFLOPS 0.5 GFLOPS FPU Usage Implicit Data Parallelism Implicit Thread Parallelism Software Transactional Memory Parallelism Lines of Code CPU Budget Languages 10,000 250,000 250,000 n/a 90% 10% CG, HLSL C++ C++, Scripting Shading Numeric Computation Game Simulation
Parallelism and purity Software Transactional Memory Purely functional core Physics, collision detection, scene traversal, path finding, .. Data Parallel Subset Graphics shader programs Game World State
Musings On the Next Maintream Programming Language
Musings There is a wonderful correspondence between: Features that aid reliability Features that enable concurrency. Example: Outlawing runtime exceptions through dependent types Out of bounds array access Null pointer dereference Integer overflow Exceptions impose sequencing constraints on concurrent execution. Dependent types and concurrency must evolve simultaneously
Language Implications Evaluation Strategy Lenient evaluation is the right default. Support lazy evaluation through explicit suspend/evaluate constructs.  Eager evaluation is an optimization the compiler may perform when it is safe to do so.
Language Implications Effects Model Purely Functional is the right default Imperative constructs are vital features that must be exposed through explicit effects-typing constructs Exceptions are an effect Why not go one step further and define partiality as an effect, thus creating a foundational language subset suitable for proofs?
Performance – Language Implications Memory model Garbage collection should be the only option Exception Model The Java/C# “exceptions everywhere” model should be wholly abandoned All dereference and array accesses must be statically verifyable, rather than  causing sequenced exceptions No language construct except “throw” should generate an exception
Syntax Requirement: Must not scare away mainstream programmers. Lots of options. int f{nat n}(int[] as,natrange<n> i) { return as[i]; } f{n:nat}(as:[]int,i:nat<n)=as[i] f :: forall n::nat. ([int],nat<n) -> int f (xs,i) = xs !! i C Family: Least scary, but it’s a messy legacy Haskell family: Quite scary   :-) Pascal/ML family: Seems promising
Conclusion
A Brief History of Game Technology 1972 Pong (hardware) 1980 Zork (high level interpretted language) 1993 DOOM (C) 1998 Unreal (C++, Java-style scripting) 2005-6 Xbox 360, PlayStation 3 with 6-8 hardware threads 2009 Next console generation. Unification of the CPU, GPU. Massive multi-core, data parallelism, etc.
The Coming Crisis in Computing By 2009, game developers will face… CPU’s with: 20+ cores 80+ hardware threads >1 TFLOP of computing power GPU’s with general computing capabilities. Game developers will be at the forefront.  If we are to program these devices productively, you are our only hope!
Questions?
Backup Slides
The Genius of Haskell Algebraic Datatypes Unions done right Compare to: C unions, Java union-like class hierarchies Maybe t C/Java option types are coupled to pointer/reference types IO, ST With STRef, you can write a pure function that uses heaps and mutable state locally, verifyably guaranteeing that those effects remain local.
The Genius of Haskell Comprehensions sort []  = [] sort (x:xs) = sort [y | y<-xs, y<x ] ++ [x  ] ++   sort [y | y<-xs, y>=x] int partition(int y[], int f, int l); void quicksort(int x[], int first, int last) { int pivIndex = 0; if(first < last) { pivIndex = partition(x,first, last); quicksort(x,first,(pivIndex-1)); quicksort(x,(pivIndex+1),last); } } int partition(int y[], int f, int l) { int up,down,temp; int cc; int piv = y[f]; up = f; down = l; do {  while (y[up] <= piv && up < l) { up++; } while (y[down] > piv  ) { down--; } if (up < down ) { temp = y[up]; y[up] = y[down]; y[down] = temp; } } while (down > up); temp = piv; y[f] = y[down]; y[down] = piv; return down; } Sorting in Haskell Sorting in C
Why Haskell is Not My Favorite Programming Language The syntax is … scary Lazy evaluation is a costly default But eager evaluation is too limiting Lenient evaluation would be an interesting default Lists are the syntactically preferred sequence type In the absence of lazy evaluation, arrays seem preferable
Why Haskell is Not My Favorite Programming Language Type inference doesn’t scale To large hierarchies of open-world modules To type system extensions To system-wide error propagation f(x,y) = x+y a=f(3,”4”)   f(int x,int y) = x+y a=f(3,”4”) ERROR - Cannot infer instance *** Instance  : Num [Char] *** Expression : f (3,&quot;4&quot;) Parameter mismatch paremter 2 of call to f:   Expected: int Got:  “4” … … ???

More Related Content

PDF
The Next Mainstream Programming Language: A Game Developer's Perspective
PPT
Game development
PPTX
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
PPTX
02. Data Types and variables
PDF
Why we cannot ignore Functional Programming
PPTX
05. Java Loops Methods and Classes
PPTX
19. Data Structures and Algorithm Complexity
PPTX
03 and 04 .Operators, Expressions, working with the console and conditional s...
The Next Mainstream Programming Language: A Game Developer's Perspective
Game development
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
02. Data Types and variables
Why we cannot ignore Functional Programming
05. Java Loops Methods and Classes
19. Data Structures and Algorithm Complexity
03 and 04 .Operators, Expressions, working with the console and conditional s...

What's hot (19)

PPTX
Java Tutorial: Part 1. Getting Started
PDF
Software Abstractions for Parallel Hardware
PDF
The Goal and The Journey - Turning back on one year of C++14 Migration
PPTX
Symbolic Execution And KLEE
PPTX
SPF Getting Started - Console Program
PDF
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
PPTX
Java Foundations: Basic Syntax, Conditions, Loops
PPTX
20.1 Java working with abstraction
PPTX
D3, TypeScript, and Deep Learning
PDF
Cs8792 cns - Public key cryptosystem (Unit III)
PPTX
Icom4015 lecture12-s16
PPTX
Introduction to Machine Learning with TensorFlow
PPTX
Icom4015 lecture4-f16
PDF
Designing Architecture-aware Library using Boost.Proto
DOC
Java programming lab assignments
PPTX
Icom4015 lecture4-f17
PPTX
Java Foundations: Strings and Text Processing
PPT
Devnology Workshop Genpro 2 feb 2011
PPTX
Icom4015 lecture15-f16
Java Tutorial: Part 1. Getting Started
Software Abstractions for Parallel Hardware
The Goal and The Journey - Turning back on one year of C++14 Migration
Symbolic Execution And KLEE
SPF Getting Started - Console Program
Symbolic Reasoning and Concrete Execution - Andrii Vozniuk
Java Foundations: Basic Syntax, Conditions, Loops
20.1 Java working with abstraction
D3, TypeScript, and Deep Learning
Cs8792 cns - Public key cryptosystem (Unit III)
Icom4015 lecture12-s16
Introduction to Machine Learning with TensorFlow
Icom4015 lecture4-f16
Designing Architecture-aware Library using Boost.Proto
Java programming lab assignments
Icom4015 lecture4-f17
Java Foundations: Strings and Text Processing
Devnology Workshop Genpro 2 feb 2011
Icom4015 lecture15-f16
Ad

Similar to The Next Mainstream Programming Language: A Game Developer’s Perspective (20)

PDF
Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]
PPTX
Introduction to Deep Learning and Tensorflow
PPT
C#
PDF
The Ring programming language version 1.9 book - Part 58 of 210
PDF
Standardizing on a single N-dimensional array API for Python
PDF
Haskell for data science
PPTX
A well-typed program never goes wrong
PPTX
unit 1 (1).pptx
PPTX
C programming language tutorial
PPTX
Python ppt
PPTX
Modern C++
PPTX
D3, TypeScript, and Deep Learning
PPT
C language
PDF
PythonStudyMaterialSTudyMaterial.pdf
PDF
Unmanaged Parallelization via P/Invoke
PPT
Introduction toc sharp
PPT
IntroductionToCSharp.ppt
PPT
IntroductionToCSharppppppppppppppppppp.ppt
PPT
IntroductionToCSharp.ppt
PPT
IntroductionToCSharp.ppt
Oh Crap, I Forgot (Or Never Learned) C! [CodeMash 2010]
Introduction to Deep Learning and Tensorflow
C#
The Ring programming language version 1.9 book - Part 58 of 210
Standardizing on a single N-dimensional array API for Python
Haskell for data science
A well-typed program never goes wrong
unit 1 (1).pptx
C programming language tutorial
Python ppt
Modern C++
D3, TypeScript, and Deep Learning
C language
PythonStudyMaterialSTudyMaterial.pdf
Unmanaged Parallelization via P/Invoke
Introduction toc sharp
IntroductionToCSharp.ppt
IntroductionToCSharppppppppppppppppppp.ppt
IntroductionToCSharp.ppt
IntroductionToCSharp.ppt
Ad

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025

The Next Mainstream Programming Language: A Game Developer’s Perspective

  • 1. The Next Mainstream Programming Language: A Game Developer’s Perspective Tim Sweeney Epic Games
  • 2. Outline Game Development Typical Process What’s in a game? Game Simulation Numeric Computation Shading Where are today’s languages failing? Concurrency Reliability
  • 4. Game Development: Gears of War Resources ~10 programmers ~20 artists ~24 month development cycle ~$10M budget Software Dependencies 1 middleware game engine ~20 middleware libraries OS graphics APIs, sound, input, etc
  • 5. Software Dependencies … Gears of War Gameplay Code ~250,000 lines C++, script code Unreal Engine 3 Middleware Game Engine ~250,000 lines C++ code DirectX Graphics OpenAL Audio Ogg Vorbis Music Codec Speex Speech Codec wx Widgets Window Library ZLib Data Compr- ession
  • 6. Game Development: Platforms The typical Unreal Engine 3 game will ship on: Xbox 360 PlayStation 3 Windows Some will also ship on: Linux MacOS
  • 7. What’s in a game? The obvious: Rendering Pixel shading Physics simulation, collision detection Game world simulation Artificial intelligence, path finding But it’s not just fun and games: Data persistence with versioning, streaming Distributed Computing (multiplayer game simulation) Visual content authoring tools Scripting and compiler technology User interfaces
  • 8. Three Kinds of Code Gameplay Simulation Numeric Computation Shading
  • 10. Gameplay Simulation Models the state of the game world as interacting objects evolve over time High-level, object-oriented code Written in C++ or scripting language Imperative programming style Usually garbage-collected
  • 11. Gameplay Simulation – The Numbers 30-60 updates (frames) per second ~1000 distinct gameplay classes Contain imperative state Contain member functions Highly dynamic ~10,000 active gameplay objects Each time a gameplay object is updated, it typically touches 5-10 other objects
  • 12. Numeric Computation Algorithms: Scene graph traversal Physics simulation Collision Detection Path Finding Sound Propagation Low-level, high-performance code Written in C++ with SIMD intrinsics Essentially functional Transforms a small input data set to a small output data set, making use of large constant data structures.
  • 14. Shading Generates pixel and vertex attributes Written in HLSL/CG shading language Runs on the GPU Inherently data-parallel Control flow is statically known “ Embarassingly Parallel” Current GPU’s are 16-wide to 48-wide!
  • 16. Shading – The Numbers Game runs at 30 FPS @ 1280x720p ~5,000 visible objects ~10M pixels rendered per frame Per-pixel lighting and shadowing requires multiple rendering passes per object and per-light Typical pixel shader is ~100 instructions long Shader FPU’s are 4-wide SIMD ~500 GFLOPS compute power
  • 17. Three Kinds of Code FPU Usage Lines of Code CPU Budget Languages 500 GFLOPS 5 GFLOPS 0.5 GFLOPS 10,000 250,000 250,000 n/a 90% 10% CG, HLSL C++ C++, Scripting Shading Numeric Computation Game Simulation
  • 18. What are the hard problems? Performance When updating 10,000 objects at 60 FPS, everything is performance-sensitive Modularity Very important with ~10-20 middleware libraries per game Reliability Error-prone language / type system leads to wasted effort finding trivial bugs Significantly impacts productivity Concurrency Hardware supports 6-8 threads C++ is ill-equipped for concurrency
  • 20. Performance When updating 10,000 objects at 60 FPS, everything is performance-sensitive But: Productivity is just as important Will gladly sacrifice 10% of our performance for 10% higher productivity We never use assembly language There is not a simple set of “hotspots” to optimize! That’s all!
  • 22. Unreal’s game framework package UnrealEngine; class Actor { int Health; void TakeDamage(int Amount) { Health = Health – Amount; if (Health<0) Die(); } } class Player extends Actor { string PlayerName; socket NetworkConnection; } Gameplay module Base class of gameplay objects Members
  • 23. Game class hierarchy Actor Player Enemy InventoryItem Weapon Actor Player Enemy Dragon Troll InventoryItem Weapon Sword Crossbow Generic Game Framework Game-Specific Framework Extension
  • 24. Software Frameworks The Problem: Users of a framework need to extend the functionality of the framework’s base classes! The workarounds: Modify the source …and modify it again with each new version Add references to payload classes, and dynamically cast them at runtime to the appropriate types.
  • 25. Software Frameworks The Problem: Users of a framework want to extend the functionality of the framework’s base classes! The workarounds: Modify the source …and modify it again with each new version Add references to payload classes, and dynamically cast them at runtime to the appropriate types. These are all error-prone: Can the compiler help us here?
  • 26. What we would like to write… The basic goal: To extend an entire software framework’s class hierarchy in parallel, in an open-world system . package Engine; class Actor { int Health; … } class Player extends Actor { … } class Inventory extends Actor { … } Base Framework Package GearsOfWar extends Engine; class Actor extends Engine.Actor { // Here we can add new members // to the base class. … } class Player extends Engine.Player { // Thus virtually inherits from // GearsOfWar.Actor … } class Gun extends GearsOfWar.Inventory { … } Extended Framework
  • 27. Reliability Or: If the compiler doesn’t beep, my program should work
  • 28. Dynamic Failure in Mainstream Languages Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m) { Vertex[] Result = new Vertex[Indices.length]; for(int i=0; i<Indices.length; i++) Result[i] = Transform(m,Vertices[Indices[i]]); return Result; }; Example (C#): Given a vertex array and an index array, we read and transform the indexed vertices into a new array. What can possibly go wrong?
  • 29. Dynamic Failure in Mainstream Languages Vertex[] Transform (Vertex[] Vertices, int[] Indices, Matrix m) { Vertex[] Result = new Vertex[Indices.length]; for(int i=0; i<Indices.length; i++) Result[i] = Transform(m,Vertices[Indices[i]]); return Result; }; May be NULL May be NULL May contain indices outside of the range of the Vertex array May be NULL Array access might be out of bounds Will the compiler realize this can’t fail? Could dereference a null pointer Our code is littered with runtime failure cases, Yet the compiler remains silent!
  • 30. Dynamic Failure in Mainstream Languages Solved problems: Random memory overwrites Memory leaks Solveable: Accessing arrays out-of-bounds Dereferencing null pointers Integer overflow Accessing uninitialized variables 50% of the bugs in Unreal can be traced to these problems!
  • 31. What we would like to write… Transform{n:nat}(Vertices:[n]Vertex, Indices:[]nat<n, m:Matrix):[]Vertex= for each(i in Indices) Transform(m,Vertices[i]) Universally quantify over all natural numbers An array of exactly known size An index buffer containing natural numbers less than n Haskell-style array comprehension The only possible failure mode: divergence, if the call to Transform diverges.
  • 32. How might this work? Dependent types Dependent functions Universal quantification int nat nat<n Sum{n:nat}(xs:[n]int)=.. a=Sum([7,8,9]) Sum(n:nat,xs:[n]int)=.. a=Sum(3,[7,8,9]) The Integers The Natural Numbers The Natural Numbers less than n, where n may be a variable! Explicit type/value dependency between function parameters
  • 33. How might this work? Separating the “pointer to t” concept from the “optional value of t” concept Comprehensions (a la Haskell), for safely traversing and generating collections xp:^int xo:?int xpo:?^int A pointer to an integer An optional integer An optional pointer to an integer! Successors(xs:[]int):[]int= foreach(x in xs) x+1
  • 34. How might this work? A guarded casting mechanism for cases where need a safe “escape”: All potential failure must be explicitly handled, but we lose no expressiveness. GetElement(as:[]string, i:int):string= if(n:nat<as.length=i) as[n] else “ Index Out of Bounds” Here, we cast i to type of natural numbers bounded by the length of as , and bind the result to n We can only access i within this context If the cast fails, we execute the else-branch
  • 35. Analysis of the Unreal code Usage of integer variables in Unreal: 90% of integer variables in Unreal exist to index into arrays 80% could be dependently-typed explicitly, guaranteeing safe array access without casting. 10% would require casts upon array access. The other 10% are used for: Computing summary statistics Encoding bit flags Various forms of low-level hackery “ For” loops in Unreal: 40% are functional comprehensions 50% are functional folds
  • 36. Accessing uninitialized variables Can we make this work? This is a frequent bug. Data structures are often rearranged, changing the initialization order. Lessons from Haskell: Lazy evaluation enables correct out-of-order evaluation Accessing circularly entailed values causes thunk reentry (divergence), rather than just returning the wrong value Lesson from Id90: Lenient evaluation is sufficient to guarantee this class MyClass { const int a=c+1; const int b=7; const int c=b+1; } MyClass myvalue = new C; // What is myvalue.a?
  • 37. Dynamic Failure: Conclusion Reasonable type-system extensions could statically eliminate all: Out-of-bounds array access Null pointer dereference Integer overflow Accessing of uninitialized variables See Haskell for excellent implementation of: Comprehensions Option types via Maybe Non-NULL references via IORef, STRef Out-of-order initialization
  • 38. Integer overflow The Natural Numbers Factoid: C# exposes more than 10 integer-like data types, none of which are those defined by (Pythagoras, 500BC). In the future, can we get integers right? data Nat = Zero | Succ Nat
  • 39. Can we get integers right? Neat Trick: In a machine word (size 2 n ), encode an integer ±2 n-1 or a pointer to a variable-precision integer Thus “small” integers carry no storage cost Additional access cost is ~5 CPU instructions But: A natural number bounded so as to index into an active array is guaranteed to fit within the machine word size (the array is the proof of this!) and thus requires no special encoding. Since ~80% of integers can dependently-typed to access into an array, the amortized cost is ~1 CPU instruction per integer operation. This could be a viable tradeoff
  • 41. The C++/Java/C# Model: “Shared State Concurrency” The Idea: Any thread can modify any state at any time. All synchronization is explicit, manual. No compile-time verification of correctness properties: Deadlock-free Race-free
  • 42. The C++/Java/C# Model: “Shared State Concurrency” This is hard! How we cope in Unreal Engine 3: 1 main thread responsible for doing all work we can’t hope to safely multithread 1 heavyweight rendering thread A pool of 4-6 helper threads Dynamically allocate them to simple tasks. “ Program Very Carefully!” Huge productivity burden Scales poorly to thread counts There must be a better way!
  • 43. Three Kinds of Code: Revisited Gameplay Simulation Gratuitous use of mutable state 10,000’s of objects must be updated Typical object update touches 5-10 other objects Numeric Computation Computations are purely functional But they use state locally during computations Shading Already implicitly data parallel
  • 44. Concurrency in Shading Look at the solution of CG/HLSL: New programming language aimed at “Embarassingly Parallel” shader programming Its constructs map naturally to a data-parallel implementation Static control flow (conditionals supported via masking)
  • 45. Concurrency in Shading Conclusion: The problem of data-parallel concurrency is effectively solved(!) “ Proof”: Xbox 360 games are running with 48-wide data shader programs utilizing half a Teraflop of compute power...
  • 46. Concurrency in Numeric Computation These are essentially pure functional algorithms, but they operate locally on mutable state Haskell ST, STRef solution enables encapsulating local heaps and mutability within referentially-transparent code These are the building blocks for implicitly parallel programs Estimate ~80% of CPU effort in Unreal can be parallelized this way In the future, we will write these algorithms using referentially-transparent constructs.
  • 47. Numeric Computation Example: Collision Detection A typical collision detection algorithm takes a line segment and determines when and where a point moving along that line will collide with a (constant) geometric dataset. struct vec3 { float x,y,z; }; struct hit { bool DidCollide; float Time; vec3 Location; }; hit collide(vec3 start,vec3 end); Vec3 = data Vec3 float float float Hit = data Hit float Vec3 collide :: (vec3,vec3)->Maybe Hit
  • 48. Numeric Computation Example: Collision Detection Since collisionCheck is effects-free, it may be executed in parallel with any other effects-free computations. Basic idea: The programmer supplies effect annotations to the compiler. The compiler verifies the annotations. Many viable implementations (Haskell’s Monadic effects, effect typing, etc) collide(start:Vec3,end:Vec3):?Hit print(s:string)[#imperative]:void A pure function (the default) Effectful functions require explicit annotations In a concurrent world, imperative is the wrong default!
  • 49. Concurrency in Gameplay Simulation This is the hardest problem… 10,00’s of objects Each one contains mutable state Each one updated 30 times per second Each update touches 5-10 other objects Manual synchronization (shared state concurrency) is hopelessly intractible here. Solutions? Rewrite as referentially-transparent functions? Message-passing concurrency? Continue using the sequential, single-threaded approach?
  • 50. Concurrency in Gameplay Simulation: Software Transactional Memory See “ Composable memory transactions”; Harris, Marlow, Peyton-Jones, Herlihy The idea: Update all objects concurrently in arbitrary order, with each update wrapped in an atomic {...} block With 10,000’s of updates, and 5-10 objects touched per update, collisions will be low ~2-4X STM performance overhead is acceptable: if it enables our state-intensive code to scale to many threads, it’s still a win Claim: Transactions are the only plausible solution to concurrent mutable state
  • 51. Three Kinds of Code: Revisited 500 GFLOPS 5 GFLOPS 0.5 GFLOPS FPU Usage Implicit Data Parallelism Implicit Thread Parallelism Software Transactional Memory Parallelism Lines of Code CPU Budget Languages 10,000 250,000 250,000 n/a 90% 10% CG, HLSL C++ C++, Scripting Shading Numeric Computation Game Simulation
  • 52. Parallelism and purity Software Transactional Memory Purely functional core Physics, collision detection, scene traversal, path finding, .. Data Parallel Subset Graphics shader programs Game World State
  • 53. Musings On the Next Maintream Programming Language
  • 54. Musings There is a wonderful correspondence between: Features that aid reliability Features that enable concurrency. Example: Outlawing runtime exceptions through dependent types Out of bounds array access Null pointer dereference Integer overflow Exceptions impose sequencing constraints on concurrent execution. Dependent types and concurrency must evolve simultaneously
  • 55. Language Implications Evaluation Strategy Lenient evaluation is the right default. Support lazy evaluation through explicit suspend/evaluate constructs. Eager evaluation is an optimization the compiler may perform when it is safe to do so.
  • 56. Language Implications Effects Model Purely Functional is the right default Imperative constructs are vital features that must be exposed through explicit effects-typing constructs Exceptions are an effect Why not go one step further and define partiality as an effect, thus creating a foundational language subset suitable for proofs?
  • 57. Performance – Language Implications Memory model Garbage collection should be the only option Exception Model The Java/C# “exceptions everywhere” model should be wholly abandoned All dereference and array accesses must be statically verifyable, rather than causing sequenced exceptions No language construct except “throw” should generate an exception
  • 58. Syntax Requirement: Must not scare away mainstream programmers. Lots of options. int f{nat n}(int[] as,natrange<n> i) { return as[i]; } f{n:nat}(as:[]int,i:nat<n)=as[i] f :: forall n::nat. ([int],nat<n) -> int f (xs,i) = xs !! i C Family: Least scary, but it’s a messy legacy Haskell family: Quite scary :-) Pascal/ML family: Seems promising
  • 60. A Brief History of Game Technology 1972 Pong (hardware) 1980 Zork (high level interpretted language) 1993 DOOM (C) 1998 Unreal (C++, Java-style scripting) 2005-6 Xbox 360, PlayStation 3 with 6-8 hardware threads 2009 Next console generation. Unification of the CPU, GPU. Massive multi-core, data parallelism, etc.
  • 61. The Coming Crisis in Computing By 2009, game developers will face… CPU’s with: 20+ cores 80+ hardware threads >1 TFLOP of computing power GPU’s with general computing capabilities. Game developers will be at the forefront. If we are to program these devices productively, you are our only hope!
  • 64. The Genius of Haskell Algebraic Datatypes Unions done right Compare to: C unions, Java union-like class hierarchies Maybe t C/Java option types are coupled to pointer/reference types IO, ST With STRef, you can write a pure function that uses heaps and mutable state locally, verifyably guaranteeing that those effects remain local.
  • 65. The Genius of Haskell Comprehensions sort [] = [] sort (x:xs) = sort [y | y<-xs, y<x ] ++ [x ] ++ sort [y | y<-xs, y>=x] int partition(int y[], int f, int l); void quicksort(int x[], int first, int last) { int pivIndex = 0; if(first < last) { pivIndex = partition(x,first, last); quicksort(x,first,(pivIndex-1)); quicksort(x,(pivIndex+1),last); } } int partition(int y[], int f, int l) { int up,down,temp; int cc; int piv = y[f]; up = f; down = l; do { while (y[up] <= piv && up < l) { up++; } while (y[down] > piv ) { down--; } if (up < down ) { temp = y[up]; y[up] = y[down]; y[down] = temp; } } while (down > up); temp = piv; y[f] = y[down]; y[down] = piv; return down; } Sorting in Haskell Sorting in C
  • 66. Why Haskell is Not My Favorite Programming Language The syntax is … scary Lazy evaluation is a costly default But eager evaluation is too limiting Lenient evaluation would be an interesting default Lists are the syntactically preferred sequence type In the absence of lazy evaluation, arrays seem preferable
  • 67. Why Haskell is Not My Favorite Programming Language Type inference doesn’t scale To large hierarchies of open-world modules To type system extensions To system-wide error propagation f(x,y) = x+y a=f(3,”4”) f(int x,int y) = x+y a=f(3,”4”) ERROR - Cannot infer instance *** Instance : Num [Char] *** Expression : f (3,&quot;4&quot;) Parameter mismatch paremter 2 of call to f: Expected: int Got: “4” … … ???