Networking Architecture of Warframe

Outline
• Replication system overview
• Case study: Lunaro ball throw
• Replication: going wide
• Congestion control
• Dedicated servers

References/thanks
• David Aldridge, I Shot You First! (Gameplay Networking in
Halo: Reach) [GDC2011]
• Timothy Ford, Overwatch Gameplay Architecture and
Netcode [GDC2017]
• Philip Orwig, Replay Technology in Overwatch [GDC2017]

Warframe
• A cooperative third person online action game
• 3 platforms (PC, PS4 (launch title), XB1)
• ~32 millions accounts
• Own technology stack (everything from low-level socket code to
matchmaking, all 3 platforms)
• Mostly P2P, but we support “volunteer PVP servers”

Replication models in games
• Deterministic Lockstep (input only)
• Snapshot Interpolation (complete world state for all clients)
• State Replication (individual, prioritized chunks for every client)

Replication (host to client)
• Properties per object,
not reliable, unordered,
objects sorted by priority
(different for every client)
• Events – optionally reliable,
optionally ordered

Properties
• (Network) property: a network relevant data field of a replicated
object
• Two ends of the spectrum: single group of properties per object
(better perf) vs N individual properties (better for bandwidth)
• Our version: somewhere in-between, dirty bit + value. Implemented
as a template: TNet<T>. If a group of properties changes together
– encapsulate and associate with a single bit
• Dynamic arrays (replicated): convenient, but can of worms (bit
record structure no longer static, so merge/mask operations get
more complicated)

Property priorities
• Property priority = replication frequency
• More important properties (position, health) replicated more frequently
• Perfect conditions, no throttling: exactly as frequently as defined (data-driven)
• Part of the congestion control system – if throttled – all properties replicated
less frequent

Object prioritization
• Motivation: in case we can’t send all the objects this frame, sort them by a
perceived importance
• Per replicated object and per client (so 2 clients can end up with a very
different set of priorities)
• Broad per-type priorities + custom code logic for special types (like avatars)
• Supports inter-object dependencies (A needs to be replicated before B, mostly
for creation messages, e.g. avatar and his weapons)

High/low frequency lists
• Problem: thousands of object to update, only fraction
actually important. Created a system to split into two lists
(high and low frequency) automatically
• Objects start on high frequency list by default
(+associated with a timer)
• High frequency objects tested ever frame. If not dirty for X
seconds – moved to a low frequency list. If dirty – timer
bumped slightly.
• Low frequency list traversed over the course of multiple
frames, round-robin style. If dirty – moved to a high
frequency list and grace period extended

Ball throw – predict throw animation

Ball throw – full prediction

Ball throw – hybrid solution
• Request ball throw in advance,
at our future hand position
• No visible lag as long as ping
less than time to release event
(~160ms in Lunaro)
• Perfect for instigator, a little off
for others

Takeaways
• No silver bullet (perfect prediction would be the closest), every solution comes
with a different set of problems
• Human players much more concerned with themselves - “favor the shooter”
• NPCs don’t complain (“favor the human” in PVE)
• Choose wisely depending on situation (responsiveness vs ‘correctness’)
• Not so hard to predict the future if your horizon is 100-200ms

Replication jobs - “traditional” approach
• Work item: N objects (any client)
• The most natural approach, tempting to try it first
• Good load balancing
• Lock hell, prone to races (any job can read/write to from
client’s internal structures)

Replication jobs – our approach
• Work item: all objects for single client
• 100% lock and wait-free, only touching
own structures
• Pre-allocated buffers to avoid contention
on memory manager
• A little bit worse load-balancing, but
we try to fill the bubbles with other jobs

Congestion control
• UDP has no congestion control (unlike TCP)
• Existing approaches- first idea – “let’s do what TCP is doing!”
• Bad: not just 1 approach, dozens, good: well documented, source codes exist
(e.g. Linux kernel),
• Not directly applicable – takes too much time to converge, tries to maximize
bandwidth in the long run (steady transfer), has to be very generic (as opposed
to fine-tuned for just 1 game), transport layer only

Congestion control – our version
• Quickly realized a “TCP approach” not going to cut it (limited to transport
layer, very generic), still got/validated some ideas (e.g. BIC)
• Very small search space (~10-80kbytes/s), majority of logic in the replication
layer (as it has more information). Very application specific, controlled by ~30-
40 parameters. Uses both RTT & packet loss as connection quality metrics
• Start reasonably high, decrease if can’t handle, only try to increase if definitely
necessary (probing). Distinguish between upstream/downstream limitations
• Track both current and allowed maximum, rebalance periodically
• Two-tier throttling: a) sending properties less frequently, b) limiting # of
updated objects (sorted by priority)

Dedicated servers
• Decent starting point – game/engine code split into server and client layers.
P2P host/single player: running both layers, P2P client: only client
• Dedicated server – running only server layer, no need for custom
binaries/removing code (game process with extra arguments)
• Problem: version not used/maintained (P2P only for the last few years), easy to
introduce non-obvious bugs (changing net properties from client code, works
OK in P2P (both layers), breaks for the DS (no client layer, code doesn’t run))
• Added “DS validation” mode: TNet triggers an error if modified from the client
code. Catches majority of mistakes, works even in single-player

Compression
• Network compression - a very special case (tiny packets, performance very
important)
• We’ve tried LZF and different Huffman variants (N trees, ‘best’ tree chosen
based on data characteristics)
• Couldn’t justify spending too much time here versus buying Oodle
• Oodle worked out of the box, gave us very good results (1.4:1 or better), the
only time consuming part is training, but can be automated to some extent

Multithreading – pushing it further
• Handling packet delivery information from clients (acks/nacks)
• Not very expensive (< 1ms), but was trivial to offload, so why not
• Job per-client again, less urgent, can span frame boundaries

• For complex types (avatars) visiting all the individual properties can get
expensive
• Solution: split into groups (components), skip entire groups if empty
• Component/controller split not always ideal for dirty masks, so had to split it
based on how frequently they change rather than gameplay structure.

Networking Architecture of Warframe

More Related Content

What's hot (15)

Similar to Networking Architecture of Warframe (20)

Recently uploaded (20)

Networking Architecture of Warframe

Editor's Notes