SlideShare a Scribd company logo
boost-histogram and hist
Henry Schreiner
April 15, 2019
Histograms in Python
1/27Henry Schreiner boost-histogram and hist April 15, 2019
Current state of histograms in Python Histograms in Python
Core library: numpy
• Historically slow
• No histogram object
• Plotting is separate
Other libraries
• Narrow focus: speed,
plotting, or language
• Many are abandoned
• Poor design, backends,
distribution
HistBook
Histogrammar
pygram11
rootplotlib
PyROOT
YODA
physt
fast-histogramqhist
Vaex
hdrhistogram
multihist
matplotlib-hep
pyhistogram
histogram
SimpleHist
paida
theodoregoetz
numpy
2/27Henry Schreiner boost-histogram and hist April 15, 2019
What is needed? Histograms in Python
Design
• A histogram should be an object
• Manipulation and plotting should be easy
Performance
• Fast single threaded filling
• Multithreaded filling (since it’s 2019)
Flexibility
• Axes options: sparse, growing, labels
• Storage: integers, weights, errors…
Distribution
• Easy to use anywhere, pip or conda
• Should have wheels, be easy to build, etc.
3/27Henry Schreiner boost-histogram and hist April 15, 2019
Future of histograms in Python Histograms in Python
Core histogramming libraries boost-histogram ROOT
Universal adaptor Aghast
Front ends (plotting, etc) hist mpl-hep physt others
4/27Henry Schreiner boost-histogram and hist April 15, 2019
Boost::Histogram (C++14)
5/27Henry Schreiner boost-histogram and hist April 15, 2019
Intro to Boost::Histogram Boost::Histogram (C++14)
• Multidimensional templated header-only histogram library: /boostorg/histogram
• Designed by Hans Dembinski, inspired by ROOT, GSL, and histbook
Histogram
• Axes
• Storages
• Accumulators
Axes types
• Regular, Circular
• Variable
• Integer
• Category
Storage (
Static
Dynamic
)Regular axis
Regular axis with
log transformaxes
Optional overflowOptional underflow
Accumulator
int, double,
unlimited, ...
6/27Henry Schreiner boost-histogram and hist April 15, 2019
Intro to Boost::Histogram Boost::Histogram (C++14)
• Multidimensional templated header-only histogram library: /boostorg/histogram
• Designed by Hans Dembinski, inspired by ROOT, GSL, and histbook
Histogram
• Axes
• Storages
• Accumulators
Axes types
• Regular, Circular
• Variable
• Integer
• Category
Storage (
Static
Dynamic
)Regular axis
Regular axis with
log transformaxes
Optional overflowOptional underflow
Accumulator
int, double,
unlimited, ...
Boost 1.70 now released with Boost::Histogram!
6/27Henry Schreiner boost-histogram and hist April 15, 2019
boost-histogram (Python)
7/27Henry Schreiner boost-histogram and hist April 15, 2019
Intro to the Python bindings boost-histogram (Python)
• Boost::Histogram developed with Python in mind
• Original bindings based on Boost::Python
▶ Hard to build and distribute
▶ Somewhat limited
• New bindings: /scikit-hep/boost-histogram
▶ 0-dependency build (C++14 only)
▶ State-of-the-art PyBind11
Design Flexibility Speed Distribution
8/27Henry Schreiner boost-histogram and hist April 15, 2019
Design boost-histogram (Python)
• Supports Python 2.7 and 3.4+
• 260+ unit tests run on Azure on Linux, macOS, and Windows
• Up to 16 axes supported (may go up or down)
• 1D, 2D, and ND histograms all have the same interface
Tries to stay close to the original Boost::Histogram where possible.
C++
#include <boost/histogram.hpp>
namespace bh = boost::histogram;
auto hist = bh::make_histogram(
bh::axis::regular<>{2, 0, 1, "x"},
bh::axis::regular<>{4, 0, 1, "y"});
hist(.2, .3);
Python
import boost.histogram as bh
hist = bh.make_histogram(
bh.axis.regular(2, 0, 1, metadata="x"),
bh.axis.regular(4, 0, 1, metadata="y"))
hist(.2, .3)
9/27Henry Schreiner boost-histogram and hist April 15, 2019
Design: Manipulations boost-histogram (Python)
Combine two histograms
hist1 + hist2
Scale a histogram
hist * 2.0
Project a 3D histogram to 2D
hist.project(0,1) # select axis
Sum a histogram contents
hist.sum()
Access an axis
axis0 = hist.axis(0)
axis0.edges() # The edges array
axis0.bin(1) # The bin accessors
Fill 2D histogram with values or arrays
hist(x, y)
Fill copies in 4 threads, then merge
hist.fill_threaded(4, x, y)
Fill in 4 threads (atomic storage only)
hist.fill_atomic(4, x, y)
Convert to Numpy, 0-copy
hist.view()
# Or
np.asarray(hist)
10/27Henry Schreiner boost-histogram and hist April 15, 2019
Flexibility: Axis boost-histogram (Python)
• bh.axis.regular
▶ bh.axis.regular_uoflow
▶ bh.axis.regular_noflow
▶ bh.axis.regular_growth
• bh.axis.circular
• bh.axis.regular_log
• bh.axis.regular_sqrt
• bh.axis.regular_pow
• bh.axis.integer
• bh.axis.integer_noflow
• bh.axis.integer_growth
• bh.axis.variable
• bh.axis.category_int
• bh.axis.category_int_growth
0 0.5 1
bh.axis.regular(10,0,1)
𝜋/2
0, 2𝜋
𝜋
3𝜋/3
bh.axis.circular(8,0,2*np.pi)
0 0.3 0.5 1
bh.axis.variable([0,.3,.5,1])
0 1 2 3 4
bh.axis.integer(0,5)
2 5 8 3 7
bh.axis.category_int([2,5,8,3,7])
11/27Henry Schreiner boost-histogram and hist April 15, 2019
Flexibility: Storage types boost-histogram (Python)
• bh.storage.int
• bh.storage.double
• bh.storage.unlimited (WIP)
• bh.storage.atomic_int
• bh.storage.weight (WIP)
• bh.storage.profile (WIP, needs sampled fill)
• bh.storage.weighted_profile (WIP, needs sampled fill)
12/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance boost-histogram (Python)
The following measurements are with:
1D
• 100 regular bins
• 10,000,000 entries
2D
• 100x100 regular bins
• 1,000,000 entries
See my histogram performance post for measurements of other libraries.
13/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: macOS, dual core, 1D boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 149.4 ms 1x
Any int 236 ms 0.63x
Regular int 86.23 ms 1.7x
Regular aint 1 132 ms 1.1x
Regular aint 2 168.2 ms 0.89x
Regular aint 4 143.6 ms 1x
Regular int 1 84.75 ms 1.8x
Regular int 2 51.6 ms 2.9x
Regular int 4 42.39 ms 3.5x
14/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: CentOS7, 24 core, 1D (anaconda) boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 121 ms 1x
Any int 261.5 ms 0.46x
Regular int 142.2 ms 0.85x
Regular aint 1 319.1 ms 0.38x
Regular aint 48 272.9 ms 0.44x
Regular int 1 243.4 ms 0.5x
Regular int 6 94.76 ms 1.3x
Regular int 12 71.38 ms 1.7x
Regular int 24 52.26 ms 2.3x
Regular int 48 43.01 ms 2.8x
15/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: KNL, 64 core, 1D (anaconda) boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 716.9 ms 1x
Any int 1418 ms 0.51x
Regular int 824 ms 0.87x
Regular aint 1 871.7 ms 0.82x
Regular aint 4 437.1 ms 1.6x
Regular aint 64 198.8 ms 3.6x
Regular aint 128 186.8 ms 3.8x
Regular aint 256 195.2 ms 3.7x
Regular int 1 796.9 ms 0.9x
Regular int 2 430.6 ms 1.7x
Regular int 4 247.6 ms 2.9x
Regular int 64 88.77 ms 8.1x
Regular int 128 98.08 ms 7.3x
Regular int 256 112.2 ms 6.4x
16/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: macOS, dual core, 2D boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 121.1 ms 1x
Any int 37.12 ms 3.3x
Regular int 18.5 ms 6.5x
Regular aint 1 20.21 ms 6x
Regular aint 2 14.17 ms 8.5x
Regular aint 4 10.23 ms 12x
Regular int 1 17.86 ms 6.8x
Regular int 2 9.41 ms 13x
Regular int 4 6.854 ms 18x
17/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: CentOS7, 24 core, 2D (anaconda) boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 87.27 ms 1x
Any int 41.42 ms 2.1x
Regular int 21.67 ms 4x
Regular aint 1 38.61 ms 2.3x
Regular aint 6 19.89 ms 4.4x
Regular aint 24 9.556 ms 9.1x
Regular aint 48 8.518 ms 10x
Regular int 1 36.5 ms 2.4x
Regular int 6 8.976 ms 9.7x
Regular int 12 5.318 ms 16x
Regular int 24 4.388 ms 20x
Regular int 48 5.839 ms 15x
18/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: KNL, 64 core, 2D (anaconda) boost-histogram (Python)
Type Storage Fill Time Speedup
Numpy uint64 439.5 ms 1x
Any int 250.6 ms 1.8x
Regular int 135.6 ms 3.2x
Regular aint 1 142.2 ms 3.1x
Regular aint 4 52.71 ms 8.3x
Regular aint 32 12.05 ms 36x
Regular aint 64 16.5 ms 27x
Regular aint 256 43.93 ms 10x
Regular int 1 141.1 ms 3.1x
Regular int 2 70.78 ms 6.2x
Regular int 4 36.11 ms 12x
Regular int 64 18.93 ms 23x
Regular int 128 36.09 ms 12x
Regular int 256 55.64 ms 7.9x
19/27Henry Schreiner boost-histogram and hist April 15, 2019
Performance: Summary boost-histogram (Python)
System 1D max speedup 2D max speedup
macOS 1 core 1.7 x 6.5 x
macOS 2 core 3.5 x 18 x
Linux 1 core 0.85 x 4 x
Linux 24 core 2.8 x 20 x
KNL 1 core 0.87 x 3.2 x
KNL 64 core 8.1 x 36 x
• Note that Numpy 1D is well optimized (last few versions)
• Anaconda versions may provide a few more optimizations to Numpy
• Mixing axes types in boost-histogram can reduce performance by 2-3x
20/27Henry Schreiner boost-histogram and hist April 15, 2019
Distribution boost-histogram (Python)
• We must provide excellent distribution.
▶ If anyone writes pip install boost-histogram and it fails, we have failed.
• Docker ManyLinux1 GCC 8.3: /scikit-hep/manylinuxgcc
Wheels
• manylinux1 32, 64 bit (ready)
• manylinux2010 64 bit (planned)
• macOS 10.9+ (wip)
• Windows 32, 64 bit, Python 3.6+ (wip)
▶ Is Python 2.7 Windows needed?
Source
• SDist (ready)
• Build directly from GitHub (done)
Conda
• conda package (planned, easy)
python -m pip install 
git+https://github.com/scikit-hep/boost-histogram.git@develop
21/27Henry Schreiner boost-histogram and hist April 15, 2019
Plans boost-histogram (Python)
• Add shortcuts for axis types, fill out axis types
• Allow view access into unlimited storage histograms
• Add from_numpy and numpy style shortcut(s)
• Filling
▶ Samples
▶ Weights
▶ Non-numerical fill (if possible)
• Add profile, weighted_profile histograms
• Add reduce operations
• Release to PyPI
• Add some docs and read the docs support
First alpha
Release planned this week
22/27Henry Schreiner boost-histogram and hist April 15, 2019
Bikeshedding (API discussion) boost-histogram (Python)
Let’s discuss API! (On GitHub issues or gitter)
• Download: pip install boost-histogram (WIP)
• Use: import boost.histogram as bh
• Create: hist =
bh.histogram(bh.axis.regular(12,0,1))
• Fill: hist(values)
• Access values, convert to numpy, etc.
AndGod
III
1am
a it
a a
EAB.zpkpt.LY eEFEEIE
Documentation
• The documentation will also need useful examples, feel free to contribute!
23/27Henry Schreiner boost-histogram and hist April 15, 2019
hist
24/27Henry Schreiner boost-histogram and hist April 15, 2019
A slide about hist hist
hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem.
Plans
• Easy plotting adaptors (mpl-hep)
• Serialization formats (ROOT, HDF5)
• Auto-multithreading
• Statistical functions (Like TEfficiency)
• Multihistograms (HistBook)
• Interaction with fitters (ZFit, GooFit, etc)
• Bayesian Blocks algorithm from SciKit-HEP
• Command line histograms for stream of numbers
Call for contributions
• What do you need?
• What do you want?
• What would you like?
Join in the development! This
should combine the best features
of other packages.
25/27Henry Schreiner boost-histogram and hist April 15, 2019
Questions?
26/27Henry Schreiner boost-histogram and hist April 15, 2019
Backup Questions?
• Supported by IRIS-HEP, NSF OAC-1836650
27/27Henry Schreiner boost-histogram and hist April 15, 2019

More Related Content

PDF
IRIS-HEP Retreat: Boost-Histogram Roadmap
PDF
2019 IRIS-HEP AS workshop: Boost-histogram and hist
PDF
CHEP 2019: Recent developments in histogram libraries
PDF
PyHEP 2019: Python Histogramming Packages
PDF
DIANA: Recent developments in GooFit
PDF
DPF 2017: GPUs in LHCb for Analysis
PDF
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
PDF
ACAT 2017: GooFit 2.0
IRIS-HEP Retreat: Boost-Histogram Roadmap
2019 IRIS-HEP AS workshop: Boost-histogram and hist
CHEP 2019: Recent developments in histogram libraries
PyHEP 2019: Python Histogramming Packages
DIANA: Recent developments in GooFit
DPF 2017: GPUs in LHCb for Analysis
CHEP 2018: A Python upgrade to the GooFit package for parallel fitting
ACAT 2017: GooFit 2.0

What's hot (20)

PDF
Pybind11 - SciPy 2021
PDF
ROOT 2018: iminuit and MINUIT2 Standalone
PDF
PyHEP 2019: Python 3.8
PDF
2019 IRIS-HEP AS workshop: Particles and decays
PDF
PEARC17: Modernizing GooFit: A Case Study
PDF
RDM 2020: Python, Numpy, and Pandas
PPTX
Mixing C++ & Python II: Pybind11
PDF
Massively Parallel Processing with Procedural Python (PyData London 2014)
PDF
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PPTX
Pypy is-it-ready-for-production-the-sequel
PPTX
PyPy - is it ready for production
PDF
Scientific visualization with_gr
PDF
20181016_pgconfeu_ssd2gpu_multi
PDF
Move from C to Go
PDF
GPars in Saga Groovy Study
PDF
20181025_pgconfeu_lt_gstorefdw
PDF
High scalable applications with Python
PDF
Python kansai2019
PDF
Apache spark session
PDF
PyTorch 튜토리얼 (Touch to PyTorch)
Pybind11 - SciPy 2021
ROOT 2018: iminuit and MINUIT2 Standalone
PyHEP 2019: Python 3.8
2019 IRIS-HEP AS workshop: Particles and decays
PEARC17: Modernizing GooFit: A Case Study
RDM 2020: Python, Numpy, and Pandas
Mixing C++ & Python II: Pybind11
Massively Parallel Processing with Procedural Python (PyData London 2014)
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
Pypy is-it-ready-for-production-the-sequel
PyPy - is it ready for production
Scientific visualization with_gr
20181016_pgconfeu_ssd2gpu_multi
Move from C to Go
GPars in Saga Groovy Study
20181025_pgconfeu_lt_gstorefdw
High scalable applications with Python
Python kansai2019
Apache spark session
PyTorch 튜토리얼 (Touch to PyTorch)
Ad

Similar to IRIS-HEP: Boost-histogram and Hist (20)

PDF
boost-histogram / Hist: PyHEP Topical meeting
PPTX
data science for engineering reference pdf
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
PDF
Simple APIs and innovative documentation
PDF
Python for Computer Vision - Revision
PPTX
DATA ANALYSIS AND VISUALISATION using python
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
PPTX
2.2-Intro-NumPy-Matplotlib.pptx
PPTX
Introduction-to-NumPy-in-Python (1).pptx
PDF
Python for Computer Vision - Revision 2nd Edition
PDF
Scientific Python
PPTX
NumPy.pptx
KEY
Numpy Talk at SIAM
PDF
A Map of the PyData Stack
PDF
ePOM - Intro to Ocean Data Science - Scientific Computing
PDF
SciPy Latin America 2019
PPTX
Effective management of high volume numeric data with histograms
PDF
London level39
PPTX
Python Cheat Sheet Presentation Learning
boost-histogram / Hist: PyHEP Topical meeting
data science for engineering reference pdf
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Simple APIs and innovative documentation
Python for Computer Vision - Revision
DATA ANALYSIS AND VISUALISATION using python
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
2.2-Intro-NumPy-Matplotlib.pptx
Introduction-to-NumPy-in-Python (1).pptx
Python for Computer Vision - Revision 2nd Edition
Scientific Python
NumPy.pptx
Numpy Talk at SIAM
A Map of the PyData Stack
ePOM - Intro to Ocean Data Science - Scientific Computing
SciPy Latin America 2019
Effective management of high volume numeric data with histograms
London level39
Python Cheat Sheet Presentation Learning
Ad

More from Henry Schreiner (20)

PDF
SciPy 2025 - Packaging a Scientific Python Project
PDF
Tools That Help You Write Better Code - 2025 Princeton Software Engineering S...
PDF
Princeton RSE: Building Python Packages (+binary)
PDF
Tools to help you write better code - Princeton Wintersession
PDF
Learning Rust with Advent of Code 2023 - Princeton
PDF
The two flavors of Python 3.13 - PyHEP 2024
PDF
Modern binary build systems - PyCon 2024
PDF
Software Quality Assurance Tooling - Wintersession 2024
PDF
Princeton RSE Peer network first meeting
PDF
Software Quality Assurance Tooling 2023
PDF
Princeton Wintersession: Software Quality Assurance Tooling
PDF
What's new in Python 3.11
PDF
Everything you didn't know you needed
PDF
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
PDF
SciPy 2022 Scikit-HEP
PDF
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PDF
PyCon2022 - Building Python Extensions
PDF
Digital RSE: automated code quality checks - RSE group meeting
PDF
CMake best practices
PDF
HOW 2019: Machine Learning for the Primary Vertex Reconstruction
SciPy 2025 - Packaging a Scientific Python Project
Tools That Help You Write Better Code - 2025 Princeton Software Engineering S...
Princeton RSE: Building Python Packages (+binary)
Tools to help you write better code - Princeton Wintersession
Learning Rust with Advent of Code 2023 - Princeton
The two flavors of Python 3.13 - PyHEP 2024
Modern binary build systems - PyCon 2024
Software Quality Assurance Tooling - Wintersession 2024
Princeton RSE Peer network first meeting
Software Quality Assurance Tooling 2023
Princeton Wintersession: Software Quality Assurance Tooling
What's new in Python 3.11
Everything you didn't know you needed
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
SciPy 2022 Scikit-HEP
PyCon 2022 -Scikit-HEP Developer Pages: Guidelines for modern packaging
PyCon2022 - Building Python Extensions
Digital RSE: automated code quality checks - RSE group meeting
CMake best practices
HOW 2019: Machine Learning for the Primary Vertex Reconstruction

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Big Data Technologies - Introduction.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
Big Data Technologies - Introduction.pptx
The AUB Centre for AI in Media Proposal.docx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Machine Learning_overview_presentation.pptx
Encapsulation theory and applications.pdf
sap open course for s4hana steps from ECC to s4
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Programs and apps: productivity, graphics, security and other tools
Advanced methodologies resolving dimensionality complications for autism neur...
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

IRIS-HEP: Boost-histogram and Hist

  • 1. boost-histogram and hist Henry Schreiner April 15, 2019
  • 2. Histograms in Python 1/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 3. Current state of histograms in Python Histograms in Python Core library: numpy • Historically slow • No histogram object • Plotting is separate Other libraries • Narrow focus: speed, plotting, or language • Many are abandoned • Poor design, backends, distribution HistBook Histogrammar pygram11 rootplotlib PyROOT YODA physt fast-histogramqhist Vaex hdrhistogram multihist matplotlib-hep pyhistogram histogram SimpleHist paida theodoregoetz numpy 2/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 4. What is needed? Histograms in Python Design • A histogram should be an object • Manipulation and plotting should be easy Performance • Fast single threaded filling • Multithreaded filling (since it’s 2019) Flexibility • Axes options: sparse, growing, labels • Storage: integers, weights, errors… Distribution • Easy to use anywhere, pip or conda • Should have wheels, be easy to build, etc. 3/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 5. Future of histograms in Python Histograms in Python Core histogramming libraries boost-histogram ROOT Universal adaptor Aghast Front ends (plotting, etc) hist mpl-hep physt others 4/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 6. Boost::Histogram (C++14) 5/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 7. Intro to Boost::Histogram Boost::Histogram (C++14) • Multidimensional templated header-only histogram library: /boostorg/histogram • Designed by Hans Dembinski, inspired by ROOT, GSL, and histbook Histogram • Axes • Storages • Accumulators Axes types • Regular, Circular • Variable • Integer • Category Storage ( Static Dynamic )Regular axis Regular axis with log transformaxes Optional overflowOptional underflow Accumulator int, double, unlimited, ... 6/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 8. Intro to Boost::Histogram Boost::Histogram (C++14) • Multidimensional templated header-only histogram library: /boostorg/histogram • Designed by Hans Dembinski, inspired by ROOT, GSL, and histbook Histogram • Axes • Storages • Accumulators Axes types • Regular, Circular • Variable • Integer • Category Storage ( Static Dynamic )Regular axis Regular axis with log transformaxes Optional overflowOptional underflow Accumulator int, double, unlimited, ... Boost 1.70 now released with Boost::Histogram! 6/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 9. boost-histogram (Python) 7/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 10. Intro to the Python bindings boost-histogram (Python) • Boost::Histogram developed with Python in mind • Original bindings based on Boost::Python ▶ Hard to build and distribute ▶ Somewhat limited • New bindings: /scikit-hep/boost-histogram ▶ 0-dependency build (C++14 only) ▶ State-of-the-art PyBind11 Design Flexibility Speed Distribution 8/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 11. Design boost-histogram (Python) • Supports Python 2.7 and 3.4+ • 260+ unit tests run on Azure on Linux, macOS, and Windows • Up to 16 axes supported (may go up or down) • 1D, 2D, and ND histograms all have the same interface Tries to stay close to the original Boost::Histogram where possible. C++ #include <boost/histogram.hpp> namespace bh = boost::histogram; auto hist = bh::make_histogram( bh::axis::regular<>{2, 0, 1, "x"}, bh::axis::regular<>{4, 0, 1, "y"}); hist(.2, .3); Python import boost.histogram as bh hist = bh.make_histogram( bh.axis.regular(2, 0, 1, metadata="x"), bh.axis.regular(4, 0, 1, metadata="y")) hist(.2, .3) 9/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 12. Design: Manipulations boost-histogram (Python) Combine two histograms hist1 + hist2 Scale a histogram hist * 2.0 Project a 3D histogram to 2D hist.project(0,1) # select axis Sum a histogram contents hist.sum() Access an axis axis0 = hist.axis(0) axis0.edges() # The edges array axis0.bin(1) # The bin accessors Fill 2D histogram with values or arrays hist(x, y) Fill copies in 4 threads, then merge hist.fill_threaded(4, x, y) Fill in 4 threads (atomic storage only) hist.fill_atomic(4, x, y) Convert to Numpy, 0-copy hist.view() # Or np.asarray(hist) 10/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 13. Flexibility: Axis boost-histogram (Python) • bh.axis.regular ▶ bh.axis.regular_uoflow ▶ bh.axis.regular_noflow ▶ bh.axis.regular_growth • bh.axis.circular • bh.axis.regular_log • bh.axis.regular_sqrt • bh.axis.regular_pow • bh.axis.integer • bh.axis.integer_noflow • bh.axis.integer_growth • bh.axis.variable • bh.axis.category_int • bh.axis.category_int_growth 0 0.5 1 bh.axis.regular(10,0,1) 𝜋/2 0, 2𝜋 𝜋 3𝜋/3 bh.axis.circular(8,0,2*np.pi) 0 0.3 0.5 1 bh.axis.variable([0,.3,.5,1]) 0 1 2 3 4 bh.axis.integer(0,5) 2 5 8 3 7 bh.axis.category_int([2,5,8,3,7]) 11/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 14. Flexibility: Storage types boost-histogram (Python) • bh.storage.int • bh.storage.double • bh.storage.unlimited (WIP) • bh.storage.atomic_int • bh.storage.weight (WIP) • bh.storage.profile (WIP, needs sampled fill) • bh.storage.weighted_profile (WIP, needs sampled fill) 12/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 15. Performance boost-histogram (Python) The following measurements are with: 1D • 100 regular bins • 10,000,000 entries 2D • 100x100 regular bins • 1,000,000 entries See my histogram performance post for measurements of other libraries. 13/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 16. Performance: macOS, dual core, 1D boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 149.4 ms 1x Any int 236 ms 0.63x Regular int 86.23 ms 1.7x Regular aint 1 132 ms 1.1x Regular aint 2 168.2 ms 0.89x Regular aint 4 143.6 ms 1x Regular int 1 84.75 ms 1.8x Regular int 2 51.6 ms 2.9x Regular int 4 42.39 ms 3.5x 14/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 17. Performance: CentOS7, 24 core, 1D (anaconda) boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 121 ms 1x Any int 261.5 ms 0.46x Regular int 142.2 ms 0.85x Regular aint 1 319.1 ms 0.38x Regular aint 48 272.9 ms 0.44x Regular int 1 243.4 ms 0.5x Regular int 6 94.76 ms 1.3x Regular int 12 71.38 ms 1.7x Regular int 24 52.26 ms 2.3x Regular int 48 43.01 ms 2.8x 15/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 18. Performance: KNL, 64 core, 1D (anaconda) boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 716.9 ms 1x Any int 1418 ms 0.51x Regular int 824 ms 0.87x Regular aint 1 871.7 ms 0.82x Regular aint 4 437.1 ms 1.6x Regular aint 64 198.8 ms 3.6x Regular aint 128 186.8 ms 3.8x Regular aint 256 195.2 ms 3.7x Regular int 1 796.9 ms 0.9x Regular int 2 430.6 ms 1.7x Regular int 4 247.6 ms 2.9x Regular int 64 88.77 ms 8.1x Regular int 128 98.08 ms 7.3x Regular int 256 112.2 ms 6.4x 16/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 19. Performance: macOS, dual core, 2D boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 121.1 ms 1x Any int 37.12 ms 3.3x Regular int 18.5 ms 6.5x Regular aint 1 20.21 ms 6x Regular aint 2 14.17 ms 8.5x Regular aint 4 10.23 ms 12x Regular int 1 17.86 ms 6.8x Regular int 2 9.41 ms 13x Regular int 4 6.854 ms 18x 17/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 20. Performance: CentOS7, 24 core, 2D (anaconda) boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 87.27 ms 1x Any int 41.42 ms 2.1x Regular int 21.67 ms 4x Regular aint 1 38.61 ms 2.3x Regular aint 6 19.89 ms 4.4x Regular aint 24 9.556 ms 9.1x Regular aint 48 8.518 ms 10x Regular int 1 36.5 ms 2.4x Regular int 6 8.976 ms 9.7x Regular int 12 5.318 ms 16x Regular int 24 4.388 ms 20x Regular int 48 5.839 ms 15x 18/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 21. Performance: KNL, 64 core, 2D (anaconda) boost-histogram (Python) Type Storage Fill Time Speedup Numpy uint64 439.5 ms 1x Any int 250.6 ms 1.8x Regular int 135.6 ms 3.2x Regular aint 1 142.2 ms 3.1x Regular aint 4 52.71 ms 8.3x Regular aint 32 12.05 ms 36x Regular aint 64 16.5 ms 27x Regular aint 256 43.93 ms 10x Regular int 1 141.1 ms 3.1x Regular int 2 70.78 ms 6.2x Regular int 4 36.11 ms 12x Regular int 64 18.93 ms 23x Regular int 128 36.09 ms 12x Regular int 256 55.64 ms 7.9x 19/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 22. Performance: Summary boost-histogram (Python) System 1D max speedup 2D max speedup macOS 1 core 1.7 x 6.5 x macOS 2 core 3.5 x 18 x Linux 1 core 0.85 x 4 x Linux 24 core 2.8 x 20 x KNL 1 core 0.87 x 3.2 x KNL 64 core 8.1 x 36 x • Note that Numpy 1D is well optimized (last few versions) • Anaconda versions may provide a few more optimizations to Numpy • Mixing axes types in boost-histogram can reduce performance by 2-3x 20/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 23. Distribution boost-histogram (Python) • We must provide excellent distribution. ▶ If anyone writes pip install boost-histogram and it fails, we have failed. • Docker ManyLinux1 GCC 8.3: /scikit-hep/manylinuxgcc Wheels • manylinux1 32, 64 bit (ready) • manylinux2010 64 bit (planned) • macOS 10.9+ (wip) • Windows 32, 64 bit, Python 3.6+ (wip) ▶ Is Python 2.7 Windows needed? Source • SDist (ready) • Build directly from GitHub (done) Conda • conda package (planned, easy) python -m pip install git+https://github.com/scikit-hep/boost-histogram.git@develop 21/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 24. Plans boost-histogram (Python) • Add shortcuts for axis types, fill out axis types • Allow view access into unlimited storage histograms • Add from_numpy and numpy style shortcut(s) • Filling ▶ Samples ▶ Weights ▶ Non-numerical fill (if possible) • Add profile, weighted_profile histograms • Add reduce operations • Release to PyPI • Add some docs and read the docs support First alpha Release planned this week 22/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 25. Bikeshedding (API discussion) boost-histogram (Python) Let’s discuss API! (On GitHub issues or gitter) • Download: pip install boost-histogram (WIP) • Use: import boost.histogram as bh • Create: hist = bh.histogram(bh.axis.regular(12,0,1)) • Fill: hist(values) • Access values, convert to numpy, etc. AndGod III 1am a it a a EAB.zpkpt.LY eEFEEIE Documentation • The documentation will also need useful examples, feel free to contribute! 23/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 26. hist 24/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 27. A slide about hist hist hist is the ‘wrapper’ piece that does plotting and interacts with the rest of the ecosystem. Plans • Easy plotting adaptors (mpl-hep) • Serialization formats (ROOT, HDF5) • Auto-multithreading • Statistical functions (Like TEfficiency) • Multihistograms (HistBook) • Interaction with fitters (ZFit, GooFit, etc) • Bayesian Blocks algorithm from SciKit-HEP • Command line histograms for stream of numbers Call for contributions • What do you need? • What do you want? • What would you like? Join in the development! This should combine the best features of other packages. 25/27Henry Schreiner boost-histogram and hist April 15, 2019
  • 29. Backup Questions? • Supported by IRIS-HEP, NSF OAC-1836650 27/27Henry Schreiner boost-histogram and hist April 15, 2019