How fast ist it really? Benchmarking in practice

iex(1)> defmodule RepeatN do
...(1)> def repeat_n(_function, 0) do
...(1)> # noop
...(1)> end
...(1)> def repeat_n(function, 1) do
...(1)> function.()
...(1)> end
...(1)> def repeat_n(function, count) do
...(1)> function.()
...(1)> repeat_n(function, count - 1)
...(1)> end
...(1)> end
{:module, RepeatN, ...}

...(1)> # noop
...(1)> end
...(1)> function.()
...(1)> end
...(1)> function.()
...(1)> end
...(1)> end
iex(2)> :timer.tc fn -> RepeatN.repeat_n(fn -> 0 end, 100) end
{210, 0}

...(1)> # noop
...(1)> end
...(1)> function.()
...(1)> end
...(1)> function.()
...(1)> end
...(1)> end
{210, 0}
iex(3)> list = Enum.to_list(1..100)
[...]
iex(4)> :timer.tc fn -> Enum.each(list, fn(_) -> 0 end) end
{165, :ok}

...(1)> # noop
...(1)> end
...(1)> function.()
...(1)> end
...(1)> function.()
...(1)> end
...(1)> end
{210, 0}
iex(3)> list = Enum.to_list(1..100)
[...]
{165, :ok}
{170, :ok}
{184, :ok}

How many atrocities have
we just committed?

Atrocities
● Way too few samples
● Realistic data/multiple inputs?
● No warmup
● Non production environment
● Does creating the list matter?
● Is repeating really the bottle neck?
● Repeatability?
● Setup information
● Running on battery
● Lots of applications running

n = 10_000
range = 1..n
list = Enum.to_list range
fun = fn -> 0 end
Benchee.run %{
"Enum.each" =>
fn -> Enum.each(list, fn(_) -> fun.() end) end,
"List comprehension" =>
fn -> for _ <- list, do: fun.() end,
"Recursion" =>
fn -> RepeatN.repeat_n(fun, n) end
}

Name ips average deviation median
Recursion 6.83 K 146.41 μs ±15.76% 139.00 μs
Enum.each 4.39 K 227.86 μs ±8.05% 224.00 μs
List comprehension 3.13 K 319.22 μs ±16.20% 323.00 μs
Comparison:
Recursion 6.83 K
Enum.each 4.39 K - 1.56x slower
List comprehension 3.13 K - 2.18x slower

How fast is it really?
Benchmarking in Practice
Tobias Pfeiffer
@PragTob
pragtob.info

Flame Graph
Elixir.Life.Board:map/2
El.. Elixir.Enum:-map/2-lc$^0/1-0-/2El.. El..El..
El.. Elixir.Life.Board:map/2Elixir.Life.Board:map/2 El..
El..
Elixir.Enum:-map/2-lc$^0/1-0-/2
El..
El..Elixir.Enum:-map/2-lc$^0/1-0-/2Elixir.Enum:-map/2-lc$^0/1-0-/2
(0.47.0)
Eli..
eflame:apply1/3
Elixir.Life:run_loop/3
Elixir.Enum:-map/2-lc$^0/1-0-/2
El..El.. El..
http://learningelixir.joekain.com/profiling-elixir-2/

● Runtime?
● Memory?
● Throughput?
● Custom?
What to measure?

What to measure?
● Runtime!
● Memory?
● Throughput?
● Custom?

Enum.sort/1 performance
10k 595.62 1.68 ms ±8.77% 1.61 ms
100k 43.29 23.10 ms ±13.21% 21.50 ms
1M 3.26 306.53 ms ±9.82% 291.05 ms
5M 0.53 1899.00 ms ±7.94% 1834.97 ms
Comparison:
10k 595.62
100k 43.29 - 13.76x slower
1M 3.26 - 182.58x slower
5M 0.53 - 1131.09x slower

“Isn’t that the root of all evil?”

“Programing Bumper Sticker”

More likely, not reading
the sources is the source
of all evil

“We should forget about small efficiencies, say
about 97% of the time: premature optimization
is the root of all evil.”
Donald Knuth, 1974
(Computing Surveys, Vol 6, No 4, December 1974)

“Yet we should not pass up our opportunities in
that critical 3%.
A good programmer (…) will be wise to look
carefully at the critical code but only after that
code has been identified.”
Donald Knuth, 1974
The very next sentence

What is critical?
Application Monitoring

“In established engineering disciplines a 12 %
improvement, easily obtained, is never
considered marginal; and I believe the same
viewpoint should prevail in software engineering.”
Donald Knuth, 1974
Prior Paragraph

“It is often a mistake to make a priori
judgments about what parts of a program are
really critical, since the universal experience of
programmers who have been using measurement
tools has been that their intuitive guesses fail.”
Donald Knuth, 1974
( Computing Surveys, Vol 6, No 4, December 1974 )

What's the fastest way to
sort a list of numbers
largest to smallest?

list = 1..10_000 |> Enum.to_list |> Enum.shuffle
Benchee.run %{
"sort(fun)" =>
fn -> Enum.sort(list, &(&1 > &2)) end,
"sort |> reverse" =>
fn -> list |> Enum.sort |> Enum.reverse end,
"sort_by(-value)" =>
fn -> Enum.sort_by(list, fn(val) -> -val end) end
}

sort |> reverse 596.54 1.68 ms ±6.83% 1.65 ms
sort(fun) 238.88 4.19 ms ±5.53% 4.14 ms
sort_by(-value) 146.86 6.81 ms ±8.68% 6.59 ms
Comparison:
sort |> reverse 596.54
sort(fun) 238.88 - 2.50x slower
sort_by(-value) 146.86 - 4.06x slower

Feature
Integration
Unit
Testing Pyramid

Application
Macro
Micro
Benchmarking Pyramid

Micro Macro
Components involved
Application

Micro Macro
Setup Complexity
Components involved
Application

Micro Macro
Setup Complexity
Execution Time
Components involved
Application

Micro Macro
Setup Complexity
Execution Time
Confidence of Real Impact
Components involved
Application

Micro Macro
Setup Complexity
Execution Time
Components involved
Chance of Interference
Application

Micro Macro
Setup Complexity
Execution Time
Components involved
Chance of Interference
Golden Middle
Application

What are you benchmarking for?

Overly specific benchmarks &
exaggerated results

● Elixir 1.3.4
● Erlang 19.1
● i5-7200U – 2 x 2.5GHz (Up to 3.10GHz)
● 8GB RAM
● Linux Mint 18 - 64 bit (Ubuntu 16.04 base)
● Linux Kernel 4.4.0-51
System Specification

[info] GET /
[debug] Processing by Rumbl.PageController.index/2
Parameters: %{}
Pipelines: [:browser]
[info] Sent 200 in 46ms
[info] GET /sessions/new
[debug] Processing by Rumbl.SessionController.new/2
Parameters: %{}
[info] GET /users/new
[debug] Processing by Rumbl.UserController.new/2
Parameters: %{}
[info] POST /users
[debug] Processing by Rumbl.UserController.create/2
Parameters: %{"_csrf_token" =>
"NUEUdRMNAiBfIHEeNwZkfA05PgAOJgAAf0ACXJqCjl7YojW+trdjdg==", "_utf8" => " ", "user" =>✓
%{"name" => "asdasd", "password" => "[FILTERED]", "username" => "Homer"}}
[debug] QUERY OK db=0.1ms
begin []
[debug] QUERY OK db=0.9ms
INSERT INTO "users" ("name","password_hash","username","inserted_at","updated_at") VALUES
($1,$2,$3,$4,$5) RETURNING "id" ["asdasd",
"$2b$12$.qY/kpo0Dec7vMK1ClJoC.Lw77c3oGllX7uieZILMlFh2hFpJ3F.C", "Homer", {{2016, 12, 2},
{14, 10, 28, 0}}, {{2016, 12, 2}, {14, 10, 28, 0}}]
Logging & Friends

Where are your inputs
n = 10_000
fun = fn -> 0 end
Benchee.run %{
"Enum.each" => fn ->
Enum.each(Enum.to_list(1..n), fn(_) -> fun.() end)
end,
"List comprehension" => fn ->
for _ <- Enum.to_list(1..n), do: fun.()
end,
"Recursion" => fn -> RepeatN.repeat_n(fun, n) end
}

Executed every time
n = 10_000
fun = fn -> 0 end
Benchee.run %{
"Enum.each" => fn ->
Enum.each(Enum.to_list(1..n), fn(_) -> fun.() end)
end,
"List comprehension" => fn ->
for _ <- Enum.to_list(1..n), do: fun.()
end,
"Recursion" => fn -> RepeatN.repeat_n(fun, n) end
}

defmodule MyMap do
def map_tco(list, function) do
Enum.reverse do_map_tco([], list, function)
end
defp do_map_tco(acc, [], _function) do
acc
end
defp do_map_tco(acc, [head | tail], func) do
do_map_tco([func.(head) | acc], tail, func)
end
def map_body([], _func), do: []
def map_body([head | tail], func) do
[func.(head) | map_body(tail, func)]
end
end
TCO

alias Benchee.Formatters.{Console, HTML}
map_fun = fn(i) -> i + 1 end
inputs = %{
"Small (10 Thousand)" => Enum.to_list(1..10_000),
"Middle (100 Thousand)" => Enum.to_list(1..100_000),
"Big (1 Million)" => Enum.to_list(1..1_000_000),
"Bigger (5 Million)" => Enum.to_list(1..5_000_000)
}
Benchee.run %{
"tail-recursive" =>
fn(list) -> MyMap.map_tco(list, map_fun) end,
"stdlib map" =>
fn(list) -> Enum.map(list, map_fun) end,
"body-recursive" =>
fn(list) -> MyMap.map_body(list, map_fun) end
}, time: 20, warmup: 10, inputs: inputs,
formatters: [&Console.output/1, &HTML.output/1],
html: [file: "bench/output/tco_small_sample.html"]
TCO

##### With input Small (10 Thousand) #####
Comparison:
body-recursive 5.12 K
stdlib map 5.07 K - 1.01x slower
tail-recursive 4.38 K - 1.17x slower
##### With input Middle (100 Thousand) #####
Comparison:
body-recursive 491.16
stdlib map 488.45 - 1.01x slower
tail-recursive 399.08 - 1.23x slower
##### With input Big (1 Million) #####
Comparison:
tail-recursive 35.36
body-recursive 25.69 - 1.38x slower
##### With input Bigger (5 Million) #####
Comparison:
tail-recursive 6.93
TCO

average = total_time / iterations
Average

Why not just take the average?

defp standard_deviation(samples, average, iterations) do
total_variance = Enum.reduce samples, 0, fn(sample, total) ->
total + :math.pow((sample - average), 2)
end
variance = total_variance / iterations
:math.sqrt variance
end
Standard Deviation

defp standard_deviation(samples, average, iterations) do
total_variance = Enum.reduce samples, 0, fn(sample, total) ->
total + :math.pow((sample - average), 2)
end
variance = total_variance / iterations
:math.sqrt variance
end
Spread of Values

defp compute_median(run_times, iterations) do
sorted = Enum.sort(run_times)
middle = div(iterations, 2)
if Integer.is_odd(iterations) do
sorted |> Enum.at(middle) |> to_float
else
(Enum.at(sorted, middle) +
Enum.at(sorted, middle - 1)) / 2
end
end
Median

map_fun = fn(i) -> [i, i * i] end
inputs = %{
"Small" => Enum.to_list(1..200),
"Medium" => Enum.to_list(1..1000),
"Bigger" => Enum.to_list(1..10_000)
}
Benchee.run(%{
"flat_map" =>
fn(list) -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn(list) ->
list
|> Enum.map(map_fun)
|> List.flatten
end
}, inputs: inputs,
html: [file: "bench/output/flat_map.html"])
flat_map

##### With input Medium #####
map.flatten 15.51 K 64.48 μs ±17.66% 63.00 μs
flat_map 8.95 K 111.76 μs ±7.18% 112.00 μs
Comparison:
map.flatten 15.51 K
flat_map 8.95 K - 1.73x slower
flat_map

base_map = (0..50)
|> Enum.zip(300..350)
|> Enum.into(%{})
# deep maps with 6 top level conflicts
orig = Map.merge base_map, some_deep_map
new = Map.merge base_map, some_deep_map_2
simple = fn(_key, _base, override) -> override end
Benchee.run %{
"Map.merge/2" => fn -> Map.merge orig, new end,
"Map.merge/3" =>
fn -> Map.merge orig, new, simple end,
}, formatters: [&Benchee.Formatters.Console.output/1,
&Benchee.Formatters.HTML.output/1],
html: %{file: "bench/output/merge_3.html"}
merge/2 vs merge/3

merge/2 vs merge/3
Is merge/3 variant about…
– as fast as merge/2? (+-20%)
– 2x slower than merge/2

Map.merge/2 1.64 M 0.61 μs ±11.12% 0.61 μs
Map.merge/3 0.0921 M 10.86 μs ±72.22% 10.00 μs
Comparison:
Map.merge/2 1.64 M
Map.merge/3 0.0921 M - 17.85x slower
merge/2 vs merge/3

defmodule MyMap do
def map_tco(list, function) do
Enum.reverse do_map_tco([], list, function)
end
defp do_map_tco(acc, [], _function) do
acc
end
defp do_map_tco(acc, [head | tail], function) do
do_map_tco([function.(head) | acc], tail, function)
end
def map_tco_arg_order(list, function) do
Enum.reverse do_map_tco_arg_order(list, function, [])
end
defp do_map_tco_arg_order([], _function, acc) do
acc
end
defp do_map_tco_arg_order([head | tail], func, acc) do
do_map_tco_arg_order(tail, func, [func.(head) | acc])
end
end

Does argument order make a
difference?

##### With input Middle (100 Thousand) #####
stdlib map 490.02 2.04 ms ±7.76% 2.07 ms
body-recursive 467.51 2.14 ms ±7.34% 2.17 ms
tail-rec arg-order 439.04 2.28 ms ±17.96% 2.25 ms
tail-recursive 402.56 2.48 ms ±16.00% 2.46 ms
Comparison:
stdlib map 490.02
tail-rec arg-order 439.04 - 1.12x slower
##### With input Big (1 Million) #####
tail-rec arg-order 39.76 25.15 ms ±10.14% 24.33 ms
tail-recursive 36.58 27.34 ms ±9.38% 26.41 ms
stdlib map 25.70 38.91 ms ±3.05% 38.58 ms
body-recursive 25.04 39.94 ms ±3.04% 39.64 ms
Comparison:
tail-rec arg-order 39.76

“The order of arguments will likely matter when
we generate the branching code. The order of
arguments will specially matter if performing
binary matching.”
José Valim, 2016
(Comment Section of my blog!)
A wild José appears!

Always do your own benchmarks!

map_fun = fn(i) -> [i, i * i] end
inputs = %{
"Small" => Enum.to_list(1..200),
"Medium" => Enum.to_list(1..1000),
"Bigger" => Enum.to_list(1..10_000)
}
Benchee.run(%{
"flat_map" =>
fn(list) -> Enum.flat_map(list, map_fun) end,
"map.flatten" => fn(list) ->
list
|> Enum.map(map_fun)
|> List.flatten
end
}, inputs: inputs,
html: [file: "bench/output/flat_map.html"])
Remember?

● Elixir 1.4.0-rc.0
● Erlang 19.1
● i5-7200U – 2 x 2.5GHz (Up to 3.10GHz)
● 8GB RAM
● Linux Mint 18 - 64 bit (Ubuntu 16.04 base)
● Linux Kernel 4.4.0-51
Mhm Upgrades

Erlang/OTP 19 [erts-8.1] [source] [64-bit] [smp:4:4] [async-
threads:10] [hipe] [kernel-poll:false]
Elixir 1.4.0-rc.0
Benchmark suite executing with the following configuration:
warmup: 2.0s
time: 5.0s
parallel: 1
inputs: Bigger, Medium, Small
Estimated total run time: 42.0s
Benchmarking with input Bigger:
Benchmarking flat_map...
Benchmarking map.flatten...
Benchmarking with input Medium:
Benchmarking with input Small:
flat_map

##### With input Bigger #####
flat_map 1.76 K 569.47 μs ±26.95% 512.00 μs
Comparison:
flat_map 1.76 K
map.flatten 1.02 K - 1.73x slower
flat_map 21.39 K 46.76 μs ±19.24% 48.00 μs
Comparison:
flat_map 21.39 K
##### With input Small #####
flat_map 118.66 K 8.43 μs ±180.99% 8.00 μs
Comparison:
flat_map 118.66 K
The tables have turned!

##### With input Bigger #####
flat_map 1.76 K 569.47 μs ±26.95% 512.00 μs
Comparison:
flat_map 1.76 K
flat_map 21.39 K 46.76 μs ±19.24% 48.00 μs
Comparison:
flat_map 21.39 K
##### With input Small #####
flat_map 118.66 K 8.43 μs ±180.99% 8.00 μs
Comparison:
flat_map 118.66 K
> 2x faster

Enjoy Benchmarking! ❤
Tobias Pfeiffer
@PragTob
pragtob.info
github.com/PragTob/benchee

How fast ist it really? Benchmarking in practice

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to How fast ist it really? Benchmarking in practice (20)

More from Tobias Pfeiffer (20)

Recently uploaded (20)

How fast ist it really? Benchmarking in practice