Gpu submit time frequency boosting

IMPROVING GPU FREQUENCY SCALING FOR GPU WORKLOADS

TYPICAL DVFS BASED GPU BOOST MECHANISM
• GPU frequency boosting wired through the devfreq governor
• Monitors GPU busyness and tries to keep current load under given
target load by adjusting gpu frequency with tunables like settling
time, bias, damp and rampdown_delay
• Basically boost_freq = bias * freq * (load - target)/target
• Ideal for sustained loads and burstiness within high load window
• Too aggressive tunings lead to higher reactiveness
• However also leads to constant gpu overpowering
• For e.g. too low target_load or high rampdown_delay

PROBLEM
• Low latency VR use cases typically present repetitive & bursty GPU
workloads
• Need is guaranteed GPU horsepower exactly when workload
gets scheduled
• Load quickly gets degenerated (but high chance of repeating) -
so frequency needs to quickly fall down (and ramp up back)
• Typical use cases exhibiting this kind of burstiness are camera post
processing, edge detection, atw...
• Slower response time associated with current governor in ramping
up frequency clearly shows up with overall low perf/watt

JUST IN (SUBMIT) TIME FREQ SCALING
• Density of work submission (unit time) forms basis of GPU load
• Delay (order of ms) in submit to governor’s load visibility
• Translates to latency in effective gpu frequency boost
• Short boost pulse in submit code path takes care of ramp up latency
• Inherently makes frequency follow workload
• Increased chances of governor now seeing lower load and pulling
frequency down
• Effective gpu freq comes down to fmax@vmin for profiled use cases
(presenting better perf/watt)

PERF/POWER DATA ACROSS USE CASES
GPU intensive
section (ms)
Avg GPU
Busyness
Avg GPU
Frequency
(Mhz)
Avg GPU
Power
(mW)
Avg
(VDD_IN)
Total Power
(mW)
%
Perf/Watt
Increase
Pupil Detection (with
JIT scaling)
Edge
Detection
11.004 34 497 471 5488
99.623182
Pupil Detection (with
default scaling)
21.158 182 293 421 5286
Passthrough camera
(with JIT scaling)
Camera to
Display
(e2e)
40.599 219 596 856 7591
4.5763017
Passthrough camera
(with default scaling)
45.466 590 283 837 8129
Passthrough camera
(with max gpu)
40.025 153 1331 1377 8677

PUPIL DETECTION WITH CURRENT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency ( in
ms)
21.158 843.41 7.289
GPU
Busyness
182 401 57
GPU
frequency
(in Mhz)
293 595 109
GPU Power
(in mW)
421 534 152

PUPIL DETECTION WITH JIT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
11.004 957.52 5890
GPU
Busyness
34 504 10
GPU
frequency
(in Mhz)
497 790 109
GPU Power
(in mW)
471 610 152

PASSTHROUGH WITH DEFAULT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
45.466 82.425 33.461
GPU
Busyness
590 946 173
GPU
frequency
(in Mhz)
283 693 109
GPU Power
(in mW)
837 838 761

PASSTHROUGH WITH JIT FREQ SCALING
Avg Max Min
GPU
intensive
code
latency (in
ms)
40.599 63.626 31.690
GPU
Busyness
219 390 67
GPU
frequency
(in Mhz)
596 790 303
GPU Power
(in mW)
856 914 762

Gpu submit time frequency boosting

More Related Content

Similar to Gpu submit time frequency boosting (20)

Recently uploaded (20)

Gpu submit time frequency boosting