SlideShare a Scribd company logo
Yung-Yu Chen (@yungyuc)
On the necessity and
inapplicability of Python
Help us develop numerical software
Whom I am
• I am a mechanical engineer by training, focusing on
applications of continuum mechanics. A computational
scientist / engineer rather than a computer scientist.

• In my day job, I write high-performance code for
semiconductor applications of computational geometry
and lithography.

• In my spare time, I am teaching a course ‘numerical
software development’ in the dept. of computer science
in NCTU.
2
You can contact me through twitter: https://twitter.com/yungyuc
or linkedin: https://www.linkedin.com/in/yungyuc/.
PyHUG
• Python Hsinchu User Group (established in late
2011)

• The first group of staff of PyCon Taiwan (2012)

• Weekly meetups at a pub for 3 years, not
stopped by COVID-19

• 7+ active user groups in Taiwan 

• I have been in PyConJP in 2012, 2013 (APAC),
2015, 2019

• Last year I led a visit group to PyConJP (thank
you Terada san for the sharing the know-
how!)

• I hope we can do more
3
PyCon
Taiwan
5-6 Sep, 2020, Tainan, Taiwan

• It is planned to be an on-site conference
(unless something incredibly bad
happens again)

• Speakers may choose to speak online

• We still need to wear a face mask

• Appreciate the Taiwan citizens and
government, who work hard to
counter COVID-19

• https://g0v.hackmd.io/@kiang/
mask-info 

• We hope to see you again in Taiwan!
4
https://tw.pycon.org/2020/
Numerical software
• Numerical software: Computer programs to solve scientific or
mathematic problems.

• Other names: Mathematical software, scientific software, technical
software.

• Python is a popular language for application experts to describe the
problems and solutions, because it is easy to use.

• Most of the computing systems (the numerical software) are designed in
a hybrid architecture.

• The computing kernel uses C++.

• Python is chosen for the user-level API.
5
Example: OPC
6
photoresist
silicon substrate
photomask
light source
Photolithography in semiconductor fabrication
wave length is only
hundreds of nm
image I want to
project on the PR
shape I need
on the mask
Optical proximity correction (OPC)
(smaller than the
wave length)
write code to
make it happen
Example: PDEs
7
Numerical simulations of
conservation laws:

∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
Use case: stress waves in 

anisotropic solids
Use case: compressible flows
Example: What others do
• Machine learning

• Examples: TensorFlow, PyTorch

• Also:

• Computer aided design and engineering (CAD/CAE)

• Computer graphics and visualization

• Hybrid architecture provides both speed and flexibility

• C++ makes it possible to do the huge amount of calculations, e.g.,
distributed computing of thousands of computers

• Python helps describe the complex problems of mathematics or sciences
8
Crunch real numbers
• Simple example: solve the Laplace equation

• 

• 

• 

• Use a two-dimensional array as the spatial grid

• Point-Jacobi method: 3-level nested loop
∂2
u
∂x2
+
∂2
u
∂y2
= 0 (0 < x < 1; 0 < y < 1)
u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1)
u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1)
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt]
+ u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
9
Non-trivial boundary condition
Power of Numpy C++
def solve_numpy_array():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
while not converged:
step += 1
un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] +
u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
def solve_python_loop():
u = uoriginal.copy()
un = u.copy()
converged = False
step = 0
# Outer loop.
while not converged:
step += 1
# Inner loops. One for x and the other for y.
for it in range(1, nx-1):
for jt in range(1, nx-1):
un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4
norm = np.abs(un-u).max()
u[...] = un[...]
converged = True if norm < 1.e-5 else False
return u, step, norm
CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms
Wall time: 63.1 ms: Pretty good!
CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s
Wall time: 5280 ms: Poor speed
10
std::tuple<xt::xarray<double>, size_t, double>
solve_cpp(xt::xarray<double> u)
{
const size_t nx = u.shape(0);
xt::xarray<double> un = u;
bool converged = false;
size_t step = 0;
double norm;
while (!converged)
{
++step;
for (size_t it=1; it<nx-1; ++it)
{
for (size_t jt=1; jt<nx-1; ++jt)
{
un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4;
}
}
norm = xt::amax(xt::abs(un-u))();
if (norm < 1.e-5) { converged = true; }
u = un;
}
return std::make_tuple(u, step, norm);
}
CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms
Wall time: 29.9 ms: Definitely good!
Pure Python 5280 ms
Numpy 63.1 ms
C++ 29.9 ms
83.7x
2.1x 176.6x
Pure Python Numpy
C++
The speed is the reason

1000 computers → 5.67

Save a lot of $
Recap: Why Python?
• Python is slow, but numpy may be reasonably fast.

• Coding in C++ is time-consuming.

• C++ is only needed in the computing kernel.

• Most code is supportive code, but it must not slow down the
computing kernel.

• Python makes it easier to organize structure the code.

This is why high-performance system usually uses a hybrid
architecture (C++ with Python or another scripting language).
11
Let’s go hybrid, but …
• A dilemma:

• Engineers (domain experts) know the problems but
don’t know C++ and software engineering.

• Computer scientists (programmers) know about C++
and software engineering but not the problems.

• Either side takes years of practices and study.

• Not a lot of people want to play both roles.
12
NSD: attempt to improve
• Numerical software development: a graduate-level
course

• Train computer scientists the hybrid architecture
for numerical software

• https://github.com/yungyuc/nsd

• Runnable Jupyter notebooks
13
• Part 1: Start with Python
• Lecture 1: Introduction

• Lecture 2: Fundamental engineering practices

• Lecture 3: Python and numpy

• Part 2: Computer architecture for performance
• Lecture 4: C++ and computer architecture
• Lecture 5: Matrix operations

• Lecture 6: Cache optimization

• Lecture 7: SIMD

• Part 3: Resource management
• Lecture 8: Memory management

• Lecture 9: Ownership and smart pointers

• Part 4: How to write C++ for Python
• Lecture 10: Modern C++

• Lecture 11: C++ and C for Python

• Lecture 12: Array code in C++

• Lecture 13: Array-oriented design

• Part 5: Conclude with Python
• Lecture 14: Advanced Python

• Term project presentation
Memory hierarchy
• We go to C++ to make it easier to access hardware

• Modern computer has faster CPU than memory

• High performance comes with hiding the memory-access latency
registers (0 cycle)
L1 cache (4 cycles)
L2 cache (10 cycles)
L3 cache (50 cycles)
Main memory (200 cycles)
Disk (storage) (100,000 cycles)
14
Data object
• Numerical software processes
huge amount of data. Copying
them is expensive.

• Use a pipeline to process the
same block of data

• Use an object to manage the
data: data object

• Data objects may not always be a
good idea in other fields.

• Here we do what it takes for
uncompromisable
performance.
Field initialization
Interior time-marching
Boundary condition
Parallel data sync
Finalization
Data
15
Data access at all phases
Zero-copy: do it where it fits
Python app C++ app
C++
container
Ndarray
manage
access
Python app C++ app
C++
container
Ndarray
manage
accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language memory buffer shared across language
Top (Python) - down (C++) Bottom (C++) - up (Python)
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language
Ndarray
C++
container
16
More detail …
Notes about moving from Python to C++ 

• Python frame object

• Building Python extensions using pybind11
and cmake

• Inspecting assembly code

• x86 intrinsics

• PyObject, CPython API and pybind11 API

• Shared pointer, unique pointer, raw pointer,
and ownership

• Template generic programming

https://tw.pycon.org/2020/en-us/events/talk/
1164539411870777736/
17
How to learn
• Work on a real project.

• Keep in mind that Python is 100x slower than C/C++.

• Always profile (time).

• Don’t treat Python as simply Python.

• View Python as an interpreter library written in C.

• Use tools to call C/C++: Cython, pybind11, etc.
18
What we want
19
See problems
Formulate the
problems
Get something
working
Automate PrototypeReusable
software
? ?
One-time programs may happen
Thanks!
Questions?

More Related Content

ODP
Tensorflow for Beginners
PPTX
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
PDF
Tensorflow presentation
PDF
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
PDF
Machine Intelligence at Google Scale: TensorFlow
PDF
Distributed implementation of a lstm on spark and tensorflow
PDF
Deep Learning in theano
PDF
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Tensorflow for Beginners
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
Tensorflow presentation
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Machine Intelligence at Google Scale: TensorFlow
Distributed implementation of a lstm on spark and tensorflow
Deep Learning in theano
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow

What's hot (19)

PDF
TensorFlow example for AI Ukraine2016
PPTX
Tensorflow - Intro (2017)
PDF
TensorFlow and Keras: An Overview
PDF
Multithreading to Construct Neural Networks
PPTX
Tensorflow windows installation
PDF
Introduction to Deep Learning, Keras, and TensorFlow
PPTX
Introduction to Machine Learning with TensorFlow
PPTX
An Introduction to TensorFlow architecture
PDF
Introduction to TensorFlow
PDF
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
PPTX
Getting started with TensorFlow
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
PPTX
Tensor flow
PDF
TensorFlow Dev Summit 2017 요약
PPTX
Tensorflow internal
PPTX
Neural Networks with Google TensorFlow
PDF
Towards Machine Learning in Pharo with TensorFlow
PDF
PAKDD2016 Tutorial DLIF: Introduction and Basics
TensorFlow example for AI Ukraine2016
Tensorflow - Intro (2017)
TensorFlow and Keras: An Overview
Multithreading to Construct Neural Networks
Tensorflow windows installation
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Machine Learning with TensorFlow
An Introduction to TensorFlow architecture
Introduction to TensorFlow
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Getting started with TensorFlow
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Tensor flow
TensorFlow Dev Summit 2017 요약
Tensorflow internal
Neural Networks with Google TensorFlow
Towards Machine Learning in Pharo with TensorFlow
PAKDD2016 Tutorial DLIF: Introduction and Basics
Ad

Similar to On the Necessity and Inapplicability of Python (20)

PDF
Your interactive computing
PDF
Engineer Engineering Software
PPTX
Role of python in hpc
PDF
Harmonic Stack for Speed
PDF
Travis Oliphant "Python for Speed, Scale, and Science"
PDF
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
PPTX
Scaling Python to CPUs and GPUs
PDF
The Joy of SciPy
PDF
Notes about moving from python to c++ py contw 2020
PDF
SunPy: Python for solar physics
PDF
Craftsmanship in Computational Work
PDF
Numba: Array-oriented Python Compiler for NumPy
PDF
High Performance Python Practical Performant Programming for Humans 2nd Editi...
PDF
High Performance Python 2nd Edition Micha Gorelick
PDF
(Ebook) High Performance Python by Micha Gorelick, Ian Ozsvald
PDF
London level39
PDF
Compoutational Physics
PDF
Introduction to Python Syntax and Semantics
PDF
Migrating from matlab to python
PDF
High Performance Python 2nd Edition Micha Gorelick Ian Ozsvald
Your interactive computing
Engineer Engineering Software
Role of python in hpc
Harmonic Stack for Speed
Travis Oliphant "Python for Speed, Scale, and Science"
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
Scaling Python to CPUs and GPUs
The Joy of SciPy
Notes about moving from python to c++ py contw 2020
SunPy: Python for solar physics
Craftsmanship in Computational Work
Numba: Array-oriented Python Compiler for NumPy
High Performance Python Practical Performant Programming for Humans 2nd Editi...
High Performance Python 2nd Edition Micha Gorelick
(Ebook) High Performance Python by Micha Gorelick, Ian Ozsvald
London level39
Compoutational Physics
Introduction to Python Syntax and Semantics
Migrating from matlab to python
High Performance Python 2nd Edition Micha Gorelick Ian Ozsvald
Ad

More from Takeshi Akutsu (20)

PDF
みんなのPython勉強会#111 LT資料 "AIとサステナビリティについて"
PDF
万年ビギナーによるPythonプログラミングのリハビリ計画
PPTX
Stapyの6年~本との出会いから生まれた技術コミュニティ~
PPTX
Start Python Club 2020年活動報告
PPTX
みんなのPython勉強会#59 Intro
PDF
Stapyユーザーガイド
PDF
stapy_fukuoka_01_akutsu
PDF
Python初心者が4年で5000人のコミュニティに作ったエモい話
PDF
Scipy Japan 2019参加レポート
PDF
Scipy Japan 2019の紹介
PDF
みんなのPython勉強会 in 長野 #3, Intro
PDF
Introduction
PPTX
みんなのPython勉強会#35 まとめ
PDF
モダンな独学の道。そうだ、オープンソースでいこう!
PDF
LT_by_Takeshi
PDF
Orientation
PDF
Introduction
PDF
プログラミング『超入門書』から見るPythonと解説テクニック
PPTX
We are OSS Communities: Introduction of Start Python Club
PDF
ドコモAIエージェントAPIのご紹介
みんなのPython勉強会#111 LT資料 "AIとサステナビリティについて"
万年ビギナーによるPythonプログラミングのリハビリ計画
Stapyの6年~本との出会いから生まれた技術コミュニティ~
Start Python Club 2020年活動報告
みんなのPython勉強会#59 Intro
Stapyユーザーガイド
stapy_fukuoka_01_akutsu
Python初心者が4年で5000人のコミュニティに作ったエモい話
Scipy Japan 2019参加レポート
Scipy Japan 2019の紹介
みんなのPython勉強会 in 長野 #3, Intro
Introduction
みんなのPython勉強会#35 まとめ
モダンな独学の道。そうだ、オープンソースでいこう!
LT_by_Takeshi
Orientation
Introduction
プログラミング『超入門書』から見るPythonと解説テクニック
We are OSS Communities: Introduction of Start Python Club
ドコモAIエージェントAPIのご紹介

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Electronic commerce courselecture one. Pdf
Spectroscopy.pptx food analysis technology

On the Necessity and Inapplicability of Python

  • 1. Yung-Yu Chen (@yungyuc) On the necessity and inapplicability of Python Help us develop numerical software
  • 2. Whom I am • I am a mechanical engineer by training, focusing on applications of continuum mechanics. A computational scientist / engineer rather than a computer scientist. • In my day job, I write high-performance code for semiconductor applications of computational geometry and lithography. • In my spare time, I am teaching a course ‘numerical software development’ in the dept. of computer science in NCTU. 2 You can contact me through twitter: https://twitter.com/yungyuc or linkedin: https://www.linkedin.com/in/yungyuc/.
  • 3. PyHUG • Python Hsinchu User Group (established in late 2011) • The first group of staff of PyCon Taiwan (2012) • Weekly meetups at a pub for 3 years, not stopped by COVID-19 • 7+ active user groups in Taiwan • I have been in PyConJP in 2012, 2013 (APAC), 2015, 2019 • Last year I led a visit group to PyConJP (thank you Terada san for the sharing the know- how!) • I hope we can do more 3
  • 4. PyCon Taiwan 5-6 Sep, 2020, Tainan, Taiwan • It is planned to be an on-site conference (unless something incredibly bad happens again) • Speakers may choose to speak online • We still need to wear a face mask • Appreciate the Taiwan citizens and government, who work hard to counter COVID-19 • https://g0v.hackmd.io/@kiang/ mask-info • We hope to see you again in Taiwan! 4 https://tw.pycon.org/2020/
  • 5. Numerical software • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 5
  • 6. Example: OPC 6 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  • 7. Example: PDEs 7 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  • 8. Example: What others do • Machine learning • Examples: TensorFlow, PyTorch • Also: • Computer aided design and engineering (CAD/CAE) • Computer graphics and visualization • Hybrid architecture provides both speed and flexibility • C++ makes it possible to do the huge amount of calculations, e.g., distributed computing of thousands of computers • Python helps describe the complex problems of mathematics or sciences 8
  • 9. Crunch real numbers • Simple example: solve the Laplace equation • • • • Use a two-dimensional array as the spatial grid • Point-Jacobi method: 3-level nested loop ∂2 u ∂x2 + ∂2 u ∂y2 = 0 (0 < x < 1; 0 < y < 1) u(0,y) = 0, u(1,y) = sin(πy) (0 ≤ y ≤ 1) u(x,0) = 0, u(x,1) = 0 (0 ≤ x ≤ 1) def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm 9 Non-trivial boundary condition
  • 10. Power of Numpy C++ def solve_numpy_array(): u = uoriginal.copy() un = u.copy() converged = False step = 0 while not converged: step += 1 un[1:nx-1,1:nx-1] = (u[2:nx,1:nx-1] + u[0:nx-2,1:nx-1] + u[1:nx-1,2:nx] + u[1:nx-1,0:nx-2]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm def solve_python_loop(): u = uoriginal.copy() un = u.copy() converged = False step = 0 # Outer loop. while not converged: step += 1 # Inner loops. One for x and the other for y. for it in range(1, nx-1): for jt in range(1, nx-1): un[it,jt] = (u[it+1,jt] + u[it-1,jt] + u[it,jt+1] + u[it,jt-1]) / 4 norm = np.abs(un-u).max() u[...] = un[...] converged = True if norm < 1.e-5 else False return u, step, norm CPU times: user 62.1 ms, sys: 1.6 ms, total: 63.7 ms Wall time: 63.1 ms: Pretty good! CPU times: user 5.24 s, sys: 22.5 ms, total: 5.26 s Wall time: 5280 ms: Poor speed 10 std::tuple<xt::xarray<double>, size_t, double> solve_cpp(xt::xarray<double> u) { const size_t nx = u.shape(0); xt::xarray<double> un = u; bool converged = false; size_t step = 0; double norm; while (!converged) { ++step; for (size_t it=1; it<nx-1; ++it) { for (size_t jt=1; jt<nx-1; ++jt) { un(it,jt) = (u(it+1,jt) + u(it-1,jt) + u(it,jt+1) + u(it,jt-1)) / 4; } } norm = xt::amax(xt::abs(un-u))(); if (norm < 1.e-5) { converged = true; } u = un; } return std::make_tuple(u, step, norm); } CPU times: user 29.7 ms, sys: 506 µs, total: 30.2 ms Wall time: 29.9 ms: Definitely good! Pure Python 5280 ms Numpy 63.1 ms C++ 29.9 ms 83.7x 2.1x 176.6x Pure Python Numpy C++ The speed is the reason 1000 computers → 5.67 Save a lot of $
  • 11. Recap: Why Python? • Python is slow, but numpy may be reasonably fast. • Coding in C++ is time-consuming. • C++ is only needed in the computing kernel. • Most code is supportive code, but it must not slow down the computing kernel. • Python makes it easier to organize structure the code. This is why high-performance system usually uses a hybrid architecture (C++ with Python or another scripting language). 11
  • 12. Let’s go hybrid, but … • A dilemma: • Engineers (domain experts) know the problems but don’t know C++ and software engineering. • Computer scientists (programmers) know about C++ and software engineering but not the problems. • Either side takes years of practices and study. • Not a lot of people want to play both roles. 12
  • 13. NSD: attempt to improve • Numerical software development: a graduate-level course • Train computer scientists the hybrid architecture for numerical software • https://github.com/yungyuc/nsd • Runnable Jupyter notebooks 13 • Part 1: Start with Python • Lecture 1: Introduction • Lecture 2: Fundamental engineering practices • Lecture 3: Python and numpy • Part 2: Computer architecture for performance • Lecture 4: C++ and computer architecture • Lecture 5: Matrix operations • Lecture 6: Cache optimization • Lecture 7: SIMD • Part 3: Resource management • Lecture 8: Memory management • Lecture 9: Ownership and smart pointers • Part 4: How to write C++ for Python • Lecture 10: Modern C++ • Lecture 11: C++ and C for Python • Lecture 12: Array code in C++ • Lecture 13: Array-oriented design • Part 5: Conclude with Python • Lecture 14: Advanced Python • Term project presentation
  • 14. Memory hierarchy • We go to C++ to make it easier to access hardware • Modern computer has faster CPU than memory • High performance comes with hiding the memory-access latency registers (0 cycle) L1 cache (4 cycles) L2 cache (10 cycles) L3 cache (50 cycles) Main memory (200 cycles) Disk (storage) (100,000 cycles) 14
  • 15. Data object • Numerical software processes huge amount of data. Copying them is expensive. • Use a pipeline to process the same block of data • Use an object to manage the data: data object • Data objects may not always be a good idea in other fields. • Here we do what it takes for uncompromisable performance. Field initialization Interior time-marching Boundary condition Parallel data sync Finalization Data 15 Data access at all phases
  • 16. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 16
  • 17. More detail … Notes about moving from Python to C++ • Python frame object • Building Python extensions using pybind11 and cmake • Inspecting assembly code • x86 intrinsics • PyObject, CPython API and pybind11 API • Shared pointer, unique pointer, raw pointer, and ownership • Template generic programming https://tw.pycon.org/2020/en-us/events/talk/ 1164539411870777736/ 17
  • 18. How to learn • Work on a real project. • Keep in mind that Python is 100x slower than C/C++. • Always profile (time). • Don’t treat Python as simply Python. • View Python as an interpreter library written in C. • Use tools to call C/C++: Cython, pybind11, etc. 18
  • 19. What we want 19 See problems Formulate the problems Get something working Automate PrototypeReusable software ? ? One-time programs may happen