SlideShare a Scribd company logo
Nature | Vol 585 | 17 September 2020 | 357
Review
ArrayprogrammingwithNumPy
Charles R. Harris1
, K. Jarrod Millman2,3,4 ✉, Stéfan J. van der Walt2,4,5 ✉, Ralf Gommers6 ✉,
Pauli Virtanen7,8
, David Cournapeau9
, Eric Wieser10
, Julian Taylor11
, Sebastian Berg4
,
Nathaniel J. Smith12
, Robert Kern13
, Matti Picus4
, Stephan Hoyer14
, Marten H. van Kerkwijk15
,
Matthew Brett2,16
, Allan Haldane17
, Jaime Fernández del Río18
, Mark Wiebe19,20
,
Pearu Peterson6,21,22
, Pierre Gérard-Marchant23,24
, Kevin Sheppard25
, Tyler Reddy26
,
Warren Weckesser4
, Hameer Abbasi6
, Christoph Gohlke27
& Travis E. Oliphant6
Arrayprogrammingprovidesapowerful,compactandexpressivesyntaxfor
accessing,manipulatingandoperatingondatainvectors,matricesand
higher-dimensionalarrays.NumPyistheprimaryarrayprogramminglibraryforthe
Pythonlanguage.Ithasanessentialroleinresearchanalysispipelinesinfieldsas
diverseasphysics,chemistry,astronomy,geoscience,biology,psychology,materials
science,engineering,financeandeconomics.Forexample,inastronomy,NumPywas
animportantpartofthesoftwarestackusedinthediscoveryofgravitationalwaves1
andinthefirstimagingofablackhole2
.Herewereviewhowafewfundamentalarray
conceptsleadtoasimpleandpowerfulprogrammingparadigmfororganizing,
exploringandanalysingscientificdata.NumPyisthefoundationuponwhichthe
scientificPythonecosystemisconstructed.Itissopervasivethatseveralprojects,
targetingaudienceswithspecializedneeds,havedevelopedtheirownNumPy-like
interfacesandarrayobjects.Owingtoitscentralpositionintheecosystem,NumPy
increasinglyactsasaninteroperabilitylayerbetweensucharraycomputation
librariesand,togetherwithitsapplicationprogramminginterface(API),providesa
flexibleframeworktosupportthenextdecadeofscientificandindustrialanalysis.
TwoPythonarraypackagesexistedbeforeNumPy.TheNumericpack-
age was developed in the mid-1990s and provided array objects and
array-awarefunctionsinPython.ItwaswritteninCandlinkedtostand-
ardfastimplementationsoflinearalgebra3,4
.Oneofitsearliestuseswas
to steer C++ applications for inertial confinement fusion research at
LawrenceLivermoreNationalLaboratory5
.Tohandlelargeastronomi-
calimagescomingfromtheHubbleSpaceTelescope,areimplementa-
tionofNumeric,calledNumarray,addedsupportforstructuredarrays,
flexibleindexing,memorymapping,byte-ordervariants,moreefficient
memoryuse,flexibleIEEE754-standarderror-handlingcapabilities,and
bettertype-castingrules6
.AlthoughNumarraywashighlycompatible
withNumeric,thetwopackageshadenoughdifferencesthatitdivided
the community; however, in 2005 NumPy emerged as a ‘best of both
worlds’ unification7
—combining the features of Numarray with the
small-array performance of Numeric and its rich C API.
Now, 15 years later, NumPy underpins almost every Python library
that does scientific or numerical computation8–11
, including SciPy12
,
Matplotlib13
, pandas14
, scikit-learn15
and scikit-image16
. NumPy is a
community-developed, open-source library, which provides a mul-
tidimensional Python array object along with array-aware functions
thatoperateonit.Becauseofitsinherentsimplicity,theNumPyarray
is the de facto exchange format for array data in Python.
NumPyoperatesonin-memoryarraysusingthecentralprocessing
unit(CPU).Toutilizemodern,specializedstorageandhardware,there
has been a recent proliferation of Python array packages. Unlike with
the Numarray–Numeric divide, it is now much harder for these new
libraries to fracture the user community—given how much work is
alreadybuiltontopofNumPy.However,toprovidethecommunitywith
access to new and exploratory technologies, NumPy is transitioning
into a central coordinating mechanism that specifies a well defined
array programming API and dispatches it, as appropriate, to special-
ized array implementations.
NumPyarrays
TheNumPyarrayisadatastructurethatefficientlystoresandaccesses
multidimensionalarrays17
(alsoknownastensors),andenablesawide
variety of scientific computation. It consists of a pointer to memory,
along with metadata used to interpret the data stored there, notably
‘data type’, ‘shape’ and ‘strides’ (Fig. 1a).
https://doi.org/10.1038/s41586-020-2649-2
Received: 21 February 2020
Accepted: 17 June 2020
Published online: 16 September 2020
Open access
Check for updates
1
Independent researcher, Logan, UT, USA. 2
Brain Imaging Center, University of California, Berkeley, Berkeley, CA, USA. 3
Division of Biostatistics, University of California, Berkeley, Berkeley, CA,
USA. 4
Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA. 5
Applied Mathematics, Stellenbosch University, Stellenbosch, South Africa. 6
Quansight, Austin,
TX, USA. 7
Department of Physics, University of Jyväskylä, Jyväskylä, Finland. 8
Nanoscience Center, University of Jyväskylä, Jyväskylä, Finland. 9
Mercari JP, Tokyo, Japan. 10
Department of
Engineering, University of Cambridge, Cambridge, UK. 11
Independent researcher, Karlsruhe, Germany. 12
Independent researcher, Berkeley, CA, USA. 13
Enthought, Austin, TX, USA. 14
Google
Research, Mountain View, CA, USA. 15
Department of Astronomy and Astrophysics, University of Toronto, Toronto, Ontario, Canada. 16
School of Psychology, University of Birmingham,
Edgbaston, Birmingham, UK. 17
Department of Physics, Temple University, Philadelphia, PA, USA. 18
Google, Zurich, Switzerland. 19
Department of Physics and Astronomy, The University of
British Columbia, Vancouver, British Columbia, Canada. 20
Amazon, Seattle, WA, USA. 21
Independent researcher, Saue, Estonia. 22
Department of Mechanics and Applied Mathematics, Institute
of Cybernetics at Tallinn Technical University, Tallinn, Estonia. 23
Department of Biological and Agricultural Engineering, University of Georgia, Athens, GA, USA. 24
France-IX Services, Paris,
France. 25
Department of Economics, University of Oxford, Oxford, UK. 26
CCS-7, Los Alamos National Laboratory, Los Alamos, NM, USA. 27
Laboratory for Fluorescence Dynamics, Biomedical
Engineering Department, University of California, Irvine, Irvine, CA, USA. ✉e-mail: millman@berkeley.edu; stefanv@berkeley.edu; ralf.gommers@gmail.com
358 | Nature | Vol 585 | 17 September 2020
Review
The data type describes the nature of elements stored in an array.
Anarrayhasasingledatatype,andeachelementofanarrayoccupies
thesamenumberofbytesinmemory.Examplesofdatatypesinclude
real and complex numbers (of lower and higher precision), strings,
timestamps and pointers to Python objects.
The shape of an array determines the number of elements along
each axis, and the number of axes is the dimensionality of the array.
For example, a vector of numbers can be stored as a one-dimensional
array of shape N, whereas colour videos are four-dimensional arrays
of shape (T, M, N, 3).
Strides are necessary to interpret computer memory, which stores
elementslinearly,asmultidimensionalarrays.Theydescribethenum-
berofbytestomoveforwardinmemorytojumpfromrowtorow,col-
umntocolumn,andsoforth.Consider,forexample,atwo-dimensional
arrayoffloating-pointnumberswithshape(4, 3),whereeachelement
occupies 8 bytes in memory. To move between consecutive columns,
weneedtojumpforward8 bytesinmemory,andtoaccessthenextrow,
3 × 8 = 24 bytes. The strides of that array are therefore (24, 8). NumPy
canstorearraysineitherCorFortranmemoryorder,iteratingfirstover
either rows or columns. This allows external libraries written in those
languages to access NumPy array data in memory directly.
Users interact with NumPy arrays using ‘indexing’ (to access sub-
arrays or individual elements), ‘operators’ (for example, +, − and ×
for vectorized operations and @ for matrix multiplication), as well
as‘array-awarefunctions’;together,theseprovideaneasilyreadable,
expressive, high-level API for array programming while NumPy deals
with the underlying mechanics of making operations fast.
Indexing an array returns single elements, subarrays or elements
that satisfy a specific condition (Fig. 1b). Arrays can even be indexed
usingotherarrays(Fig. 1c).Whereverpossible,indexingthatretrievesa
subarrayreturnsa‘view’ontheoriginalarraysuchthatdataareshared
between the two arrays. This provides a powerful way to operate on
subsets of array data while limiting memory usage.
To complement the array syntax, NumPy includes functions that
perform vectorized calculations on arrays, including arithmetic,
statistics and trigonometry (Fig. 1d). Vectorization—operating on
entirearraysratherthantheirindividualelements—isessentialtoarray
programming.Thismeansthatoperationsthatwouldtakemanytens
oflinestoexpressinlanguagessuchasCcanoftenbeimplementedas
asingle,clearPythonexpression.Thisresultsinconcisecodeandfrees
users to focus on the details of their analysis, while NumPy handles
looping over array elements near-optimally—for example, taking
strides into consideration to best utilize the computer’s fast cache
memory.
Whenperformingavectorizedoperation(suchasaddition)ontwo
arrays with the same shape, it is clear what should happen. Through
‘broadcasting’ NumPy allows the dimensions to differ, and produces
results that appeal to intuition. A trivial example is the addition of a
scalarvaluetoanarray,butbroadcastingalsogeneralizestomorecom-
plex examples such as scaling each column of an array or generating
agridofcoordinates.Inbroadcasting,oneorbotharraysarevirtually
duplicated (that is, without copying any data in memory), so that the
shapes of the operands match (Fig. 1d). Broadcasting is also applied
when an array is indexed using arrays of indices (Fig. 1c).
Other array-aware functions, such as sum, mean and maximum,
performelement-by-element‘reductions’,aggregatingresultsacross
one, multiple or all axes of a single array. For example, summing an
n-dimensional array over d axes results in an array of dimension n − d
(Fig. 1f).
NumPyalsoincludesarray-awarefunctionsforcreating,reshaping,
concatenating and padding arrays; searching, sorting and counting
data; and reading and writing files. It provides extensive support for
generatingpseudorandomnumbers,includesanassortmentofprob-
ability distributions, and performs accelerated linear algebra, using
oneofseveralbackendssuchasOpenBLAS18,19
orIntelMKLoptimized
for the CPUs at hand (see Supplementary Methods for more details).
Altogether, the combination of a simple in-memory array repre-
sentation, a syntax that closely mimics mathematics, and a variety
of array-aware utility functions forms a productive and powerfully
expressive array programming language.
In [1]: import numpy as np
In [2]: x = np.arange(12)
In [3]: x = x.reshape(4, 3)
In [4]: x
Out[4]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
In [5]: np.mean(x, axis=0)
Out[5]: array([4.5, 5.5, 6.5])
In [6]: x = x - np.mean(x, axis=0)
In [7]: x
Out[7]:
array([[-4.5, -4.5, -4.5],
[-1.5, -1.5, -1.5],
[ 1.5, 1.5, 1.5],
[ 4.5, 4.5, 4.5]])
a Data structure g Example
x =
0 1 2
3 4 5
6 7 8
9 10 11
data
data type
shape
strides
8-byte integer
(4, 3)
(24, 8)
1 2 3 4 5 6 70 8 9 10 11
8 bytes
per element
3 × 8 = 24 bytes
to jump one
row down
b Indexing (view)
10 1199
x[:,1:] → with slices
1 2
4 5
7 8
00
33
66
x[:,::2]→ with slices
with steps
0 2
3 5
6 8
9 11
0 11 2
3 44 5
6 77 8
9 1010 11
Slices are start:end:step,
any of which can be left blank
d Vectorization
+ →
0 1
3 4
6 7
9 10
1
1
1
1
1
1
1
1
1 2
4 5
7 8
10 11
e Broadcasting
×
3
6
0
9
1 2
→
0 0
3 6
6 12
9 18
f Reduction
0 1
3 4
6 7
9 10
2
5
8
11
3
12
21
30
sum
axis 1
18 22 26
sum
axis 0
66
sum
axis (0,1)
c Indexing (copy)
4 3
7 6
with arrays
with broadcasting
→x →
,2
1 1 0
x
,
1 1
2 2
1 0
1 0
x with arraysx[0,1],x[1,2] 1 5→ →0 1 1 2
,
x[x > 9] with masks10 11→→ 5 with scalarsx[1,2]
Fig.1|TheNumPyarrayincorporatesseveralfundamentalarrayconcepts.
a,TheNumPyarraydatastructureanditsassociatedmetadatafields.
b,Indexinganarraywithslicesandsteps.Theseoperationsreturna‘view’of
theoriginaldata.c,Indexinganarraywithmasks,scalarcoordinatesorother
arrays,sothatitreturnsa‘copy’oftheoriginaldata.Inthebottomexample,an
arrayisindexedwithotherarrays;thisbroadcaststheindexingarguments
beforeperformingthelookup.d,Vectorizationefficientlyappliesoperations
togroupsofelements.e,Broadcastinginthemultiplicationoftwo-dimensional
arrays.f,Reductionoperationsactalongoneormoreaxes.Inthisexample,
anarrayissummedalongselectaxestoproduceavector,oralongtwoaxes
consecutivelytoproduceascalar.g,ExampleNumPycode,illustratingsomeof
theseconcepts.
Nature | Vol 585 | 17 September 2020 | 359
ScientificPythonecosystem
Pythonisanopen-source,general-purposeinterpretedprogramming
languagewellsuitedtostandardprogrammingtaskssuchascleaning
data,interactingwithwebresourcesandparsingtext.Addingfastarray
operations and linear algebra enables scientists to do all their work
withinasingleprogramminglanguage—onethathastheadvantageof
being famously easy to learn and teach, as witnessed by its adoption
as a primary learning language in many universities.
Even though NumPy is not part of Python’s standard library, it ben-
efits from a good relationship with the Python developers. Over the
years,thePythonlanguagehasaddednewfeaturesandspecialsyntax
so that NumPy would have a more succinct and easier-to-read array
notation.However,becauseitisnotpartofthestandardlibrary,NumPy
is able to dictate its own release policies and development patterns.
SciPyandMatplotlibaretightlycoupledwithNumPyintermsofhis-
tory,developmentanduse.SciPyprovidesfundamentalalgorithmsfor
scientificcomputing,includingmathematical,scientificandengineer-
ingroutines.Matplotlibgeneratespublication-readyfiguresandvisu-
alizations.ThecombinationofNumPy,SciPyandMatplotlib,together
with an advanced interactive environment such as IPython20
or Jupy-
ter21
,providesasolidfoundationforarrayprogramminginPython.The
scientificPythonecosystem(Fig. 2)buildsontopofthisfoundationto
provide several, widely used technique-specific libraries15,16,22
, that in
turn underlie numerous domain-specific projects23–28
. NumPy, at the
base of the ecosystem of array-aware libraries, sets documentation
standards, provides array testing infrastructure and adds build sup-
port for Fortran and other compilers.
Manyresearchgroupshavedesignedlarge,complexscientificlibrar-
ies that add application-specific functionality to the ecosystem. For
example, the eht-imaging library29
, developed by the Event Horizon
Telescope collaboration for radio interferometry imaging, analysis
andsimulation,reliesonmanylower-levelcomponentsofthescientific
Pythonecosystem.Inparticular,theEHTcollaborationusedthislibrary
forthefirstimagingofablackhole.Withineht-imaging,NumPyarrays
are used to store and manipulate numerical data at every step in the
processingchain:fromrawdatathroughcalibrationandimagerecon-
struction.SciPysuppliestoolsforgeneralimage-processingtaskssuch
asfilteringandimagealignment,andscikit-image,animage-processing
library that extends SciPy, provides higher-level functionality such
as edge filters and Hough transforms. The ‘scipy.optimize’ module
performsmathematicaloptimization.NetworkX22
,apackageforcom-
plexnetworkanalysis,isusedtoverifyimagecomparisonconsistency.
Astropy23,24
handlesstandardastronomicalfileformatsandcomputes
time–coordinatetransformations.Matplotlibisusedtovisualizedata
and to generate the final image of the black hole.
Theinteractiveenvironmentcreatedbythearray programmingfoun-
dation and the surrounding ecosystem of tools—inside of IPython or
Jupyter—isideallysuitedtoexploratorydataanalysis.Userscanfluidly
inspect,manipulateandvisualizetheirdata,andrapidlyiteratetorefine
programmingstatements.Thesestatementsarethenstitchedtogether
intoimperativeorfunctionalprograms,ornotebookscontainingboth
computationandnarrative.Scientificcomputingbeyondexploratory
workisoftendoneinatexteditororanintegrateddevelopmentenvi-
ronment (IDE) such as Spyder. This rich and productive environment
has made Python popular for scientific research.
To complement this facility for exploratory work and rapid proto-
typing,NumPyhasdevelopedacultureofusingtime-testedsoftware
engineeringpracticestoimprovecollaborationandreduceerror30
.This
culture is not only adopted by leaders in the project but also enthusi-
astically taught to newcomers. The NumPy team was early to adopt
distributedrevisioncontrolandcodereviewtoimprovecollaboration
cantera
Chemistry
Biopython
Biology
Astropy
Astronomy
simpeg
Geophysics
NLTK
Linguistics
QuantEcon
Economics
SciPy
Algorithms
Matplotlib
Plots
scikit-learn
Machine learning
NetworkX
Network analysis
pandas, statsmodels
Statistics
scikit-image
Image processing
PsychoPykhmer Qiime2 FiPy deepchem
librosaPyWavelets SunPy QuTiP yt
nibabel yellowbrickmne-python scikit-HEP
eht-imagingMDAnalysis iriscesium PyChrono
Foundation
Application-specific
Domain-specific
Technique-specific
Array ProtocolsNumPy API
Python
Language
IPython / Jupyter
Interactive environments
NumPy
Arrays
New array implementations
Fig.2|NumPyisthebaseofthescientificPythonecosystem.EssentiallibrariesandprojectsthatdependonNumPy’sAPIgainaccesstonewarray
implementationsthatsupportNumPy’sarrayprotocols(Fig. 3).
360 | Nature | Vol 585 | 17 September 2020
Review
oncode,andcontinuoustestingthatrunsanextensivebatteryofauto-
mated tests for every proposed change to NumPy. The project also
hascomprehensive,high-qualitydocumentation,integratedwiththe
source code31–33
.
Thiscultureofusingbestpracticesforproducingreliablescientific
softwarehasbeenadoptedbytheecosystemoflibrariesthatbuildon
NumPy.Forexample,inarecentawardgivenbytheRoyalAstronomi-
cal Society to Astropy, they state: “The Astropy Project has provided
hundredsofjuniorscientistswithexperienceinprofessional-standard
softwaredevelopmentpracticesincludinguseofversioncontrol,unit
testing, code review and issue tracking procedures. This is a vital skill
setformodernresearchersthatisoftenmissingfromformaluniversity
educationinphysicsorastronomy”34
.Communitymembersexplicitly
work to address this lack of formal education through courses and
workshops35–37
.
Therecentrapidgrowthofdatascience,machinelearningandarti-
ficial intelligence has further and dramatically boosted the scientific
use of Python. Examples of its important applications, such as the
eht-imaging library, now exist in almost every discipline in the natu-
ralandsocialsciences.Thesetoolshavebecometheprimarysoftware
environmentinmanyfields.NumPyanditsecosystemarecommonly
taught in university courses, boot camps and summer schools, and
are the focus of community conferences and workshops worldwide.
NumPy and its API have become truly ubiquitous.
Arrayproliferationandinteroperability
NumPyprovidesin-memory,multidimensional,homogeneouslytyped
(thatis,single-pointerandstrided)arraysonCPUs.Itrunsonmachines
rangingfromembeddeddevicestotheworld’slargestsupercomputers,
withperformanceapproachingthatofcompiledlanguages.Formost
its existence, NumPy addressed the vast majority of array computa-
tion use cases.
However,scientificdatasetsnowroutinelyexceedthememorycapac-
ity of a single machine and may be stored on multiple machines or in
thecloud.Inaddition,therecentneedtoacceleratedeep-learningand
artificialintelligenceapplicationshasledtotheemergenceofspecial-
izedacceleratorhardware,includinggraphicsprocessingunits(GPUs),
tensor processing units (TPUs) and field-programmable gate arrays
(FPGAs).Owingtoitsin-memorydatamodel,NumPyiscurrentlyunable
to directly utilize such storage and specialized hardware. However,
both distributed data and also the parallel execution of GPUs, TPUs
andFPGAsmapwelltotheparadigmofarrayprogramming:therefore
leadingtoagapbetweenavailablemodernhardwarearchitecturesand
the tools necessary to leverage their computational power.
Thecommunity’seffortstofillthisgapledtoaproliferationofnew
array implementations. For example, each deep-learning framework
created its own arrays; the PyTorch38
, Tensorflow39
, Apache MXNet40
and JAX arrays all have the capability to run on CPUs and GPUs in a
distributed fashion, using lazy evaluation to allow for additional per-
formanceoptimizations.SciPyandPyData/Sparsebothprovidesparse
arrays,whichtypicallycontainfewnon-zerovaluesandstoreonlythose
in memory for efficiency. In addition, there are projects that build on
NumPy arrays as data containers, and extend its capabilities. Distrib-
uted arrays are made possible that way by Dask, and labelled arrays—
referring to dimensions of an array by name rather than by index for
clarity, compare x[:, 1] versus x.loc[:, 'time']—by xarray41
.
Such libraries often mimic the NumPy API, because this lowers the
barriertoentryfornewcomersandprovidesthewidercommunitywith
astablearray programminginterface.This,inturn,preventsdisruptive
schisms such as the divergence between Numeric and Numarray. But
exploring new ways of working with arrays is experimental by nature
and,infact,severalpromisinglibraries(suchasTheanoandCaffe)have
alreadyceaseddevelopment.Andeachtimethatauserdecidestotrya
newtechnology,theymustchangeimportstatementsandensurethatthe
newlibraryimplementsallthepartsoftheNumPyAPItheycurrentlyuse.
Ideally, operating on specialized arrays using NumPy functions or
semantics would simply work, so that users could write code once,
and would then benefit from switching between NumPy arrays, GPU
arrays,distributedarraysandsoforthasappropriate.Tosupportarray
operations between external array objects, NumPy therefore added
the capability to act as a central coordination mechanism with a well
specified API (Fig. 2).
To facilitate this interoperability, NumPy provides ‘protocols’ (or
contractsofoperation),thatallowforspecializedarraystobepassedto
NumPyfunctions(Fig. 3).NumPy,inturn,dispatchesoperationstothe
originatinglibrary,asrequired.Overfourhundredofthemostpopular
NumPy functions are supported. The protocols are implemented by
widely used libraries such as Dask, CuPy, xarray and PyData/Sparse.
Thankstothesedevelopments,userscannow,forexample,scaletheir
computationfromasinglemachinetodistributedsystemsusingDask.
The protocols also compose well, allowing users to redeploy NumPy
codeatscaleondistributed,multi-GPUsystemsvia,forinstance,CuPy
arrays embedded in Dask arrays. Using NumPy’s high-level API, users
can leverage highly parallel code execution on multiple systems with
millions of cores, all with minimal code changes42
.
These array protocols are now a key feature of NumPy, and are
expected to only increase in importance. The NumPy developers—
many of whom are authors of this Review—iteratively refine and add
protocol designs to improve utility and simplify adoption.
Output
arrays
Input
arrays
NumPy
API
np.stack
np.reshape
np.transpose
np.argmin
np.mean
np.std
np.max
np.cos
np.arctan
np.log
np.cumsum
np.diff
...
NumPy array protocols
In [1]: import numpy as np
In [2]: import dask.array as da
In [3]: x = da.arange(12)
In [4]: x = np.reshape(x, (4, 3))
In [5]: x
Out[5]: dask.array<..., shape=(4, 3), ...>
In [6]: np.mean(x, axis=0)
Out[6]: dask.array<..., shape=(3,), ...>
In [7]: x = x - np.mean(x, axis=0)
In [8]: x
Out[8]: dask.array<..., shape=(4, 3), ...>
Array
implementation
NumPy
Dask
CuPy
PyData/
Sparse
...
...
Dask
NumPy
CuPy
PyData
Sparse
...
Dask
NumPy
CuPy
PyData
Sparse
Fig.3|NumPy’sAPIandarrayprotocolsexposenewarraystothe
ecosystem.Inthisexample,NumPy’s‘mean’functioniscalledonaDaskarray.
Thecallsucceedsbydispatchingtotheappropriatelibraryimplementation(in
thiscase,Dask)andresultsinanewDaskarray.Comparethiscodetothe
examplecodeinFig. 1g.
Nature | Vol 585 | 17 September 2020 | 361
Discussion
NumPy combines the expressive power of array programming, the
performanceofC,andthereadability,usabilityandversatilityofPython
inamature,welltested,welldocumentedandcommunity-developed
library.LibrariesinthescientificPythonecosystemprovidefastimple-
mentations of most important algorithms. Where extreme optimiza-
tion is warranted, compiled languages can be used, such as Cython43
,
Numba44
and Pythran45
; these languages extend Python and trans-
parently accelerate bottlenecks. Owing to NumPy’s simple memory
model, it is easy to write low-level, hand-optimized code, usually in C
orFortran,tomanipulateNumPyarraysandpassthembacktoPython.
Furthermore, using array protocols, it is possible to utilize the full
spectrumofspecializedhardwareaccelerationwithminimalchanges
to existing code.
NumPywasinitiallydevelopedbystudents,facultyandresearchers
to provide an advanced, open-source array programming library for
Python,whichwasfreetouseandunencumberedbylicenseserversand
softwareprotectiondongles.Therewasasenseofbuildingsomething
consequential together for the benefit of many others. Participating
in such an endeavour, within a welcoming community of like-minded
individuals, held a powerful attraction for many early contributors.
These user–developers frequently had to write code from scratch
to solve their own or their colleagues’ problems—often in low-level
languages that preceded Python, such as Fortran46
and C. To them,
theadvantagesofaninteractive,high-levelarraylibrarywereevident.
Thedesignofthisnewtoolwasinformedbyotherpowerfulinteractive
programming languages for scientific computing such as Basis47–50
,
Yorick51
, R52
and APL53
, as well as commercial languages and environ-
ments such as IDL (Interactive Data Language) and MATLAB.
WhatbeganasanattempttoaddanarrayobjecttoPythonbecame
thefoundationofavibrantecosystemoftools.Now,alargeamountof
scientific work depends on NumPy being correct, fast and stable. It is
nolongerasmallcommunityproject,butcorescientificinfrastructure.
Thedeveloperculturehasmatured:althoughinitialdevelopmentwas
highlyinformal,NumPynowhasaroadmapandaprocessforpropos-
ing and discussing large changes. The project has formal governance
structures and is fiscally sponsored by NumFOCUS, a nonprofit that
promotes open practices in research, data and scientific computing.
Overthepastfewyears,theprojectattracteditsfirstfundeddevelop-
ment, sponsored by the Moore and Sloan Foundations, and received
anawardaspartoftheChanZuckerbergInitiative’sEssentialsofOpen
Source Software programme. With this funding, the project was (and
is) able to have sustained focus over multiple months to implement
substantial new features and improvements. That said, the develop-
mentofNumPystilldependsheavilyoncontributionsmadebygradu-
ate students and researchers in their free time (see Supplementary
Methods for more details).
NumPyisnolongermerelythefoundationalarraylibraryunderlying
thescientificPythonecosystem,butithasbecomethestandardAPIfor
tensor computation and a central coordinating mechanism between
arraytypesandtechnologiesinPython.Workcontinuestoexpandon
and improve these interoperability features.
Overthenextdecade,NumPydeveloperswillfaceseveralchallenges.
Newdeviceswillbedeveloped,andexistingspecializedhardwarewill
evolvetomeetdiminishingreturnsonMoore’slaw.Therewillbemore,
andawidervarietyof,datasciencepractitioners,alargeproportionof
whom will use NumPy. The scale of scientific data gathering will con-
tinue to increase, with the adoption of devices and instruments such
as light-sheet microscopes and the Large Synoptic Survey Telescope
(LSST)54
.Newgenerationlanguages,interpretersandcompilers,suchas
Rust55
,Julia56
andLLVM57
,willcreatenewconceptsanddatastructures,
and determine their viability.
ThroughthemechanismsdescribedinthisReview,NumPyispoised
to embrace such a changing landscape, and to continue playing a
leading part in interactive scientific computation, although to do so
willrequiresustainedfundingfromgovernment,academiaandindus-
try.But,importantly,forNumPytomeettheneedsofthenextdecade
ofdatascience,itwillalsoneedanewgenerationofgraduatestudents
and community contributors to drive it forward.
1.	 Abbott, B. P. et al. Observation of gravitational waves from a binary black hole merger.
Phys. Rev. Lett. 116, 061102 (2016).
2.	 Chael, A. et al. High-resolution linear polarimetric imaging for the Event Horizon
Telescope. Astrophys. J. 286, 11 (2016).
3.	 Dubois, P. F., Hinsen, K. & Hugunin, J. Numerical Python. Comput. Phys. 10, 262–267 (1996).
4.	 Ascher, D., Dubois, P. F., Hinsen, K., Hugunin, J. & Oliphant, T. E. An Open Source Project:
Numerical Python (Lawrence Livermore National Laboratory, 2001).
5.	 Yang, T.-Y., Furnish, G. & Dubois, P. F. Steering object-oriented scientific computations. In
Proc. TOOLS USA 97. Intl Conf. Technology of Object Oriented Systems and Languages
(eds Ege, R., Singh, M. & Meyer, B.) 112–119 (IEEE, 1997).
6.	 Greenfield, P., Miller, J. T., Hsu, J. & White, R. L. numarray: a new scientific array package
for Python. In PyCon DC 2003 http://citeseerx.ist.psu.edu/viewdoc/download?d
oi=10.1.1.112.9899 (2003).
7.	 Oliphant, T. E. Guide to NumPy 1st edn (Trelgol Publishing, 2006).
8.	 Dubois, P. F. Python: batteries included. Comput. Sci. Eng. 9, 7–9 (2007).
9.	 Oliphant, T. E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007).
10.	 Millman, K. J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12
(2011).
11.	 Pérez, F., Granger, B. E. & Hunter, J. D. Python: an ecosystem for scientific computing.
Comput. Sci. Eng. 13, 13–21 (2011).
Explains why the scientific Python ecosystem is a highly productive environment for
research.
12.	 Virtanen, P. et al. SciPy 1.0—fundamental algorithms for scientific computing in Python.
Nat. Methods 17, 261–272 (2020); correction 17, 352 (2020).
Introduces the SciPy library and includes a more detailed history of NumPy and SciPy.
13.	 Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
14.	 McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in
Science Conf. (eds van der Walt, S. & Millman, K. J.) 56–61 (2010).
15.	 Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12,
2825–2830 (2011).
16.	 van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014).
17.	 van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient
numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
Discusses the NumPy array data structure with a focus on how it enables efficient
computation.
18.	 Wang, Q., Zhang, X., Zhang, Y. & Yi, Q. AUGEM: automatically generate high performance
dense linear algebra kernels on x86 CPUs. In SC’13: Proc. Intl Conf. High Performance
Computing, Networking, Storage and Analysis 25 (IEEE, 2013).
19.	 Xianyi, Z., Qian, W. & Yunquan, Z. Model-driven level 3 BLAS performance optimization
on Loongson 3A processor. In 2012 IEEE 18th Intl Conf. Parallel and Distributed Systems
684–691 (IEEE, 2012).
20.	 Pérez, F. & Granger, B. E. IPython: a system for interactive scientific computing. Comput.
Sci. Eng. 9, 21–29 (2007).
21.	 Kluyver, T. et al. Jupyter Notebooks—a publishing format for reproducible computational
workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas
(eds Loizides, F. & Schmidt, B.) 87–90 (IOS Press, 2016).
22.	 Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and
function using NetworkX. In Proc. 7th Python in Science Conf. (eds Varoquaux, G.,
Vaught, T. & Millman, K. J.) 11–15 (2008).
23.	 Astropy Collaboration et al. Astropy: a community Python package for astronomy. Astron.
Astrophys. 558, A33 (2013).
24.	 Price-Whelan, A. M. et al. The Astropy Project: building an open-science project and
status of the v2.0 core package. Astron. J. 156, 123 (2018).
25.	 Cock, P. J. et al. Biopython: freely available Python tools for computational molecular
biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
26.	 Millman, K. J. & Brett, M. Analysis of functional magnetic resonance imaging in Python.
Comput. Sci. Eng. 9, 52–55 (2007).
27.	 The SunPy Community et al. SunPy—Python for solar physics. Comput. Sci. Discov. 8,
014009 (2015).
28.	 Hamman, J., Rocklin, M. & Abernathy, R. Pangeo: a big-data ecosystem for scalable Earth
system science. In EGU General Assembly Conf. Abstracts 12146 (2018).
29.	 Chael, A. A. et al. ehtim: imaging, analysis, and simulation software for radio
interferometry. Astrophysics Source Code Library https://ascl.net/1904.004 (2019).
30.	 Millman, K. J. & Pérez, F. Developing open source scientific practice. In Implementing
Reproducible Research (eds Stodden, V., Leisch, F. & Peng, R. D.) 149–183 (CRC Press, 2014).
Describes the software engineering practices embraced by the NumPy and SciPy
communities with a focus on how these practices improve research.
31.	 van der Walt, S. The SciPy Documentation Project (technical overview). In Proc. 7th Python
in Science Conf. (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 27–28 (2008).
32.	 Harrington, J. The SciPy Documentation Project. In Proc. 7th Python in Science
Conference (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 33–35 (2008).
33.	 Harrington, J. & Goldsmith, D. Progress report: NumPy and SciPy documentation in 2009.
In Proc. 8th Python in Science Conf. (SciPy 2009) (eds Varoquaux, G., van der Walt, S. &
Millman, K. J.) 84–87 (2009).
34.	 Royal Astronomical Society Report of the RAS ‘A’ Awards Committee 2020: Astropy
Project: 2020 Group Achievement Award (A) https://ras.ac.uk/sites/default/files/2020-01/
Group%20Award%20-%20Astropy.pdf (2020).
35.	 Wilson, G. Software carpentry: getting scientists to write better code by making them
more productive. Comput. Sci. Eng. 8, 66–69 (2006).
362 | Nature | Vol 585 | 17 September 2020
Review
36.	 Hannay, J. E. et al. How do scientists develop and use scientific software? In Proc. 2009
ICSE Workshop on Software Engineering for Computational Science and Engineering 1–8
(IEEE, 2009).
37.	 Millman, K. J., Brett, M., Barnowski, R. & Poline, J.-B. Teaching computational
reproducibility for neuroimaging. Front. Neurosci. 12, 727 (2018).
38.	 Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In
Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035
(Neural Information Processing Systems, 2019).
39.	 Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In OSDI’16: Proc.
12th USENIX Conf. Operating Systems Design and Implementation (chairs Keeton, K. &
Roscoe, T.) 265–283 (USENIX Association, 2016).
40.	 Chen, T. et al. MXNet: a flexible and efficient machine learning library for heterogeneous
distributed systems. Preprint at http://www.arxiv.org/abs/1512.01274 (2015).
41.	 Hoyer, S. & Hamman, J. xarray: N–D labeled arrays and datasets in Python. J. Open Res.
Softw. 5, 10 (2017).
42.	 Entschev, P. Distributed multi-GPU computing with Dask, CuPy and RAPIDS. In EuroPython
2019 https://ep2019.europython.eu/media/conference/slides/
fX8dJsD-distributed-multi-gpu-computing-with-dask-cupy-and-rapids.pdf (2019).
43.	 Behnel, S. et al. Cython: the best of both worlds. Comput. Sci. Eng. 13, 31–39 (2011).
44.	 Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc.
Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM ’15 7:1–7:6 (ACM, 2015).
45.	 Guelton, S. et al. Pythran: enabling static optimization of scientific Python programs.
Comput. Sci. Discov. 8, 014001 (2015).
46.	 Dongarra, J., Golub, G. H., Grosse, E., Moler, C. & Moore, K. Netlib and NA-Net: building a
scientific computing community. IEEE Ann. Hist. Comput. 30, 30–41 (2008).
47.	 Barrett, K. A., Chiu, Y. H., Painter, J. F., Motteler, Z. C. & Dubois, P. F. Basis System, Part I:
Running a Basis Program—A Tutorial for Beginners UCRL-MA-118543, Vol. 1 (Lawrence
Livermore National Laboratory 1995).
48.	 Dubois, P. F. & Motteler, Z. Basis System, Part II: Basis Language Reference Manual
UCRL-MA-118543, Vol. 2 (Lawrence Livermore National Laboratory, 1995).
49.	 Chiu, Y. H. & Dubois, P. F. Basis System, Part III: EZN User Manual UCRL-MA-118543, Vol. 3
(Lawrence Livermore National Laboratory, 1995).
50.	 Chiu, Y. H. & Dubois, P. F. Basis System, Part IV: EZD User Manual UCRL-MA-118543, Vol. 4
(Lawrence Livermore National Laboratory, 1995).
51.	 Munro, D. H. & Dubois, P. F. Using the Yorick interpreted language. Comput. Phys. 9,
609–615 (1995).
52.	 Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph.
Stat. 5, 299–314 (1996).
53.	 Iverson, K. E. A programming language. In Proc. 1962 Spring Joint Computer Conf.
345–351 (1962).
54.	 Jenness, T. et al. LSST data management software development practices and tools. In
Proc. SPIE 10707, Software and Cyberinfrastructure for Astronomy V 1070709 (SPIE and
International Society for Optics and Photonics, 2018).
55.	 Matsakis, N. D. & Klock, F. S. The Rust language. Ada Letters 34, 103–104 (2014).
56.	 Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: a fresh approach to numerical
computing. SIAM Rev. 59, 65–98 (2017).
57.	 Lattner, C. & Adve, V. LLVM: a compilation framework for lifelong program analysis and
transformation. In Proc. 2004 Intl Symp. Code Generation and Optimization (CGO’04)
75–88 (IEEE, 2004).
Acknowledgements We thank R. Barnowski, P. Dubois, M. Eickenberg, and P. Greenfield, who
suggested text and provided helpful feedback on the manuscript. K.J.M. and S.J.v.d.W. were
funded in part by the Gordon and Betty Moore Foundation through grant GBMF3834 and by
the Alfred P. Sloan Foundation through grant 2013-10-27 to the University of California,
Berkeley. S.J.v.d.W., S.B., M.P. and W.W. were funded in part by the Gordon and Betty Moore
Foundation through grant GBMF5447 and by the Alfred P. Sloan Foundation through grant
G-2017-9960 to the University of California, Berkeley.
Author contributions K.J.M. and S.J.v.d.W. composed the manuscript with input from
others. S.B., R.G., K.S., W.W., M.B. and T.R. contributed text. All authors contributed
substantial code, documentation and/or expertise to the NumPy project. All authors
reviewed the manuscript.
Competing interests The authors declare no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
2649-2.
Correspondence and requests for materials should be addressed to K.J.M., S.J.v.W. or R.G.
Peer review information Nature thanks Edouard Duchesnay, Alan Edelman and the other,
anonymous, reviewer(s) for their contribution to the peer review of this work.
Reprints and permissions information is available at http://www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution
4.0 International License, which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The images or other third party material in this article are
included in the article’s Creative Commons license, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons license and your
intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this license,
visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s) 2020

More Related Content

PDF
Kernel methods for data integration in systems biology
PDF
La statistique et le machine learning pour l'intégration de données de la bio...
PDF
'ACCOST' for differential HiC analysis
PDF
Multiple kernel learning applied to the integration of Tara oceans datasets
PDF
When The New Science Is In The Outliers
PDF
Lightning: large scale machine learning in python
PDF
TMS workshop on machine learning in materials science: Intro to deep learning...
PDF
Composite repetition-aware data structures
Kernel methods for data integration in systems biology
La statistique et le machine learning pour l'intégration de données de la bio...
'ACCOST' for differential HiC analysis
Multiple kernel learning applied to the integration of Tara oceans datasets
When The New Science Is In The Outliers
Lightning: large scale machine learning in python
TMS workshop on machine learning in materials science: Intro to deep learning...
Composite repetition-aware data structures

What's hot (20)

PDF
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
PPTX
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
PDF
MS Imaging data in ProteomeXchange (HUPO 2014)
PPTX
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
PDF
A short introduction to statistical learning
PDF
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
PDF
Materials Informatics Overview
PDF
Subgraph relative frequency approach for extracting interesting substructur
PDF
The MGI and AI
PDF
Machine Learning for Molecules
PDF
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
PDF
Using publicly available resources to build a comprehensive knowledgebase of ...
PDF
Instance-Based Ontological Knowledge Acquisition
PDF
Machine Learning in computational materials science: an overview, a primer, a...
PDF
Mid-Ontology Learning from Linked Data @JIST2011
PDF
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
PDF
Machine learning for materials design: opportunities, challenges, and methods
PPTX
Graphical Structure Learning accelerated with POWER9
PPTX
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
PDF
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
(2019.9) 不均一系触媒研究のための機械学習と最適実験計画
The Art and Power of Data-Driven Modeling: Statistical and Machine Learning A...
MS Imaging data in ProteomeXchange (HUPO 2014)
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
A short introduction to statistical learning
PFP:材料探索のための汎用Neural Network Potential - 2021/10/4 QCMSR + DLAP共催
Materials Informatics Overview
Subgraph relative frequency approach for extracting interesting substructur
The MGI and AI
Machine Learning for Molecules
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Using publicly available resources to build a comprehensive knowledgebase of ...
Instance-Based Ontological Knowledge Acquisition
Machine Learning in computational materials science: an overview, a primer, a...
Mid-Ontology Learning from Linked Data @JIST2011
Interactive Analysis of Large-Scale Sequencing Genomics Data Sets using a Rea...
Machine learning for materials design: opportunities, challenges, and methods
Graphical Structure Learning accelerated with POWER9
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Ad

Similar to Array programming with Numpy (20)

PPTX
Introduction-to-NumPy-in-Python (1).pptx
PDF
Numpy.pdf
PDF
Introduction to NumPy (PyData SV 2013)
PDF
Introduction to NumPy
PPT
Python crash course libraries numpy-1, panda.ppt
PPT
Introduction to Numpy Foundation Study GuideStudyGuide
PPTX
NumPy.pptx
PPTX
Introduction to numpy.pptx
PPTX
L 5 Numpy final learning and Coding
PDF
Array computing and the evolution of SciPy, NumPy, and PyData
PPTX
NumPy.pptx
PDF
NumPy__data__anlysis___using__python.pdf
PDF
NumPy__data__anlysis___using__python.pdf
PPTX
PPTX
lec08-numpy.pptx
PPTX
NUMPY [Autosaved] .pptx
PDF
Essential numpy before you start your Machine Learning journey in python.pdf
PPTX
1.NumPy is a Python library used for wor
PPTX
THE NUMPY LIBRARY of python with slides.pptx
PPTX
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
Introduction-to-NumPy-in-Python (1).pptx
Numpy.pdf
Introduction to NumPy (PyData SV 2013)
Introduction to NumPy
Python crash course libraries numpy-1, panda.ppt
Introduction to Numpy Foundation Study GuideStudyGuide
NumPy.pptx
Introduction to numpy.pptx
L 5 Numpy final learning and Coding
Array computing and the evolution of SciPy, NumPy, and PyData
NumPy.pptx
NumPy__data__anlysis___using__python.pdf
NumPy__data__anlysis___using__python.pdf
lec08-numpy.pptx
NUMPY [Autosaved] .pptx
Essential numpy before you start your Machine Learning journey in python.pdf
1.NumPy is a Python library used for wor
THE NUMPY LIBRARY of python with slides.pptx
UNIT-03_Numpy (1) python yeksodbbsisbsjsjsh
Ad

More from mustafa sarac (20)

PDF
Uluslararasilasma son
PDF
Real time machine learning proposers day v3
PDF
Latka december digital
PDF
Axial RC SCX10 AE2 ESC user manual
PDF
Math for programmers
PDF
The book of Why
PDF
BM sgk meslek kodu
PDF
TEGV 2020 Bireysel bagiscilarimiz
PDF
How to make and manage a bee hotel?
PDF
Cahit arf makineler dusunebilir mi
PDF
How did Software Got So Reliable Without Proof?
PDF
Staff Report on Algorithmic Trading in US Capital Markets
PDF
Yetiskinler icin okuma yazma egitimi
PDF
Consumer centric api design v0.4.0
PDF
State of microservices 2020 by tsh
PDF
Uber pitch deck 2008
PDF
Wireless solar keyboard k760 quickstart guide
PDF
State of Serverless Report 2020
PDF
Dont just roll the dice
PDF
Handbook of covid 19 prevention and treatment
Uluslararasilasma son
Real time machine learning proposers day v3
Latka december digital
Axial RC SCX10 AE2 ESC user manual
Math for programmers
The book of Why
BM sgk meslek kodu
TEGV 2020 Bireysel bagiscilarimiz
How to make and manage a bee hotel?
Cahit arf makineler dusunebilir mi
How did Software Got So Reliable Without Proof?
Staff Report on Algorithmic Trading in US Capital Markets
Yetiskinler icin okuma yazma egitimi
Consumer centric api design v0.4.0
State of microservices 2020 by tsh
Uber pitch deck 2008
Wireless solar keyboard k760 quickstart guide
State of Serverless Report 2020
Dont just roll the dice
Handbook of covid 19 prevention and treatment

Recently uploaded (20)

PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
System and Network Administration Chapter 2
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Introduction to Artificial Intelligence
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Odoo Companies in India – Driving Business Transformation.pdf
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ISO 45001 Occupational Health and Safety Management System
2025 Textile ERP Trends: SAP, Odoo & Oracle
Wondershare Filmora 15 Crack With Activation Key [2025
How to Choose the Right IT Partner for Your Business in Malaysia
System and Network Administration Chapter 2
Online Work Permit System for Fast Permit Processing
Navsoft: AI-Powered Business Solutions & Custom Software Development
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Upgrade and Innovation Strategies for SAP ERP Customers
Which alternative to Crystal Reports is best for small or large businesses.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
VVF-Customer-Presentation2025-Ver1.9.pptx
Operating system designcfffgfgggggggvggggggggg
Introduction to Artificial Intelligence
Design an Analysis of Algorithms II-SECS-1021-03
How to Migrate SBCGlobal Email to Yahoo Easily
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

Array programming with Numpy

  • 1. Nature | Vol 585 | 17 September 2020 | 357 Review ArrayprogrammingwithNumPy Charles R. Harris1 , K. Jarrod Millman2,3,4 ✉, Stéfan J. van der Walt2,4,5 ✉, Ralf Gommers6 ✉, Pauli Virtanen7,8 , David Cournapeau9 , Eric Wieser10 , Julian Taylor11 , Sebastian Berg4 , Nathaniel J. Smith12 , Robert Kern13 , Matti Picus4 , Stephan Hoyer14 , Marten H. van Kerkwijk15 , Matthew Brett2,16 , Allan Haldane17 , Jaime Fernández del Río18 , Mark Wiebe19,20 , Pearu Peterson6,21,22 , Pierre Gérard-Marchant23,24 , Kevin Sheppard25 , Tyler Reddy26 , Warren Weckesser4 , Hameer Abbasi6 , Christoph Gohlke27 & Travis E. Oliphant6 Arrayprogrammingprovidesapowerful,compactandexpressivesyntaxfor accessing,manipulatingandoperatingondatainvectors,matricesand higher-dimensionalarrays.NumPyistheprimaryarrayprogramminglibraryforthe Pythonlanguage.Ithasanessentialroleinresearchanalysispipelinesinfieldsas diverseasphysics,chemistry,astronomy,geoscience,biology,psychology,materials science,engineering,financeandeconomics.Forexample,inastronomy,NumPywas animportantpartofthesoftwarestackusedinthediscoveryofgravitationalwaves1 andinthefirstimagingofablackhole2 .Herewereviewhowafewfundamentalarray conceptsleadtoasimpleandpowerfulprogrammingparadigmfororganizing, exploringandanalysingscientificdata.NumPyisthefoundationuponwhichthe scientificPythonecosystemisconstructed.Itissopervasivethatseveralprojects, targetingaudienceswithspecializedneeds,havedevelopedtheirownNumPy-like interfacesandarrayobjects.Owingtoitscentralpositionintheecosystem,NumPy increasinglyactsasaninteroperabilitylayerbetweensucharraycomputation librariesand,togetherwithitsapplicationprogramminginterface(API),providesa flexibleframeworktosupportthenextdecadeofscientificandindustrialanalysis. TwoPythonarraypackagesexistedbeforeNumPy.TheNumericpack- age was developed in the mid-1990s and provided array objects and array-awarefunctionsinPython.ItwaswritteninCandlinkedtostand- ardfastimplementationsoflinearalgebra3,4 .Oneofitsearliestuseswas to steer C++ applications for inertial confinement fusion research at LawrenceLivermoreNationalLaboratory5 .Tohandlelargeastronomi- calimagescomingfromtheHubbleSpaceTelescope,areimplementa- tionofNumeric,calledNumarray,addedsupportforstructuredarrays, flexibleindexing,memorymapping,byte-ordervariants,moreefficient memoryuse,flexibleIEEE754-standarderror-handlingcapabilities,and bettertype-castingrules6 .AlthoughNumarraywashighlycompatible withNumeric,thetwopackageshadenoughdifferencesthatitdivided the community; however, in 2005 NumPy emerged as a ‘best of both worlds’ unification7 —combining the features of Numarray with the small-array performance of Numeric and its rich C API. Now, 15 years later, NumPy underpins almost every Python library that does scientific or numerical computation8–11 , including SciPy12 , Matplotlib13 , pandas14 , scikit-learn15 and scikit-image16 . NumPy is a community-developed, open-source library, which provides a mul- tidimensional Python array object along with array-aware functions thatoperateonit.Becauseofitsinherentsimplicity,theNumPyarray is the de facto exchange format for array data in Python. NumPyoperatesonin-memoryarraysusingthecentralprocessing unit(CPU).Toutilizemodern,specializedstorageandhardware,there has been a recent proliferation of Python array packages. Unlike with the Numarray–Numeric divide, it is now much harder for these new libraries to fracture the user community—given how much work is alreadybuiltontopofNumPy.However,toprovidethecommunitywith access to new and exploratory technologies, NumPy is transitioning into a central coordinating mechanism that specifies a well defined array programming API and dispatches it, as appropriate, to special- ized array implementations. NumPyarrays TheNumPyarrayisadatastructurethatefficientlystoresandaccesses multidimensionalarrays17 (alsoknownastensors),andenablesawide variety of scientific computation. It consists of a pointer to memory, along with metadata used to interpret the data stored there, notably ‘data type’, ‘shape’ and ‘strides’ (Fig. 1a). https://doi.org/10.1038/s41586-020-2649-2 Received: 21 February 2020 Accepted: 17 June 2020 Published online: 16 September 2020 Open access Check for updates 1 Independent researcher, Logan, UT, USA. 2 Brain Imaging Center, University of California, Berkeley, Berkeley, CA, USA. 3 Division of Biostatistics, University of California, Berkeley, Berkeley, CA, USA. 4 Berkeley Institute for Data Science, University of California, Berkeley, Berkeley, CA, USA. 5 Applied Mathematics, Stellenbosch University, Stellenbosch, South Africa. 6 Quansight, Austin, TX, USA. 7 Department of Physics, University of Jyväskylä, Jyväskylä, Finland. 8 Nanoscience Center, University of Jyväskylä, Jyväskylä, Finland. 9 Mercari JP, Tokyo, Japan. 10 Department of Engineering, University of Cambridge, Cambridge, UK. 11 Independent researcher, Karlsruhe, Germany. 12 Independent researcher, Berkeley, CA, USA. 13 Enthought, Austin, TX, USA. 14 Google Research, Mountain View, CA, USA. 15 Department of Astronomy and Astrophysics, University of Toronto, Toronto, Ontario, Canada. 16 School of Psychology, University of Birmingham, Edgbaston, Birmingham, UK. 17 Department of Physics, Temple University, Philadelphia, PA, USA. 18 Google, Zurich, Switzerland. 19 Department of Physics and Astronomy, The University of British Columbia, Vancouver, British Columbia, Canada. 20 Amazon, Seattle, WA, USA. 21 Independent researcher, Saue, Estonia. 22 Department of Mechanics and Applied Mathematics, Institute of Cybernetics at Tallinn Technical University, Tallinn, Estonia. 23 Department of Biological and Agricultural Engineering, University of Georgia, Athens, GA, USA. 24 France-IX Services, Paris, France. 25 Department of Economics, University of Oxford, Oxford, UK. 26 CCS-7, Los Alamos National Laboratory, Los Alamos, NM, USA. 27 Laboratory for Fluorescence Dynamics, Biomedical Engineering Department, University of California, Irvine, Irvine, CA, USA. ✉e-mail: millman@berkeley.edu; stefanv@berkeley.edu; ralf.gommers@gmail.com
  • 2. 358 | Nature | Vol 585 | 17 September 2020 Review The data type describes the nature of elements stored in an array. Anarrayhasasingledatatype,andeachelementofanarrayoccupies thesamenumberofbytesinmemory.Examplesofdatatypesinclude real and complex numbers (of lower and higher precision), strings, timestamps and pointers to Python objects. The shape of an array determines the number of elements along each axis, and the number of axes is the dimensionality of the array. For example, a vector of numbers can be stored as a one-dimensional array of shape N, whereas colour videos are four-dimensional arrays of shape (T, M, N, 3). Strides are necessary to interpret computer memory, which stores elementslinearly,asmultidimensionalarrays.Theydescribethenum- berofbytestomoveforwardinmemorytojumpfromrowtorow,col- umntocolumn,andsoforth.Consider,forexample,atwo-dimensional arrayoffloating-pointnumberswithshape(4, 3),whereeachelement occupies 8 bytes in memory. To move between consecutive columns, weneedtojumpforward8 bytesinmemory,andtoaccessthenextrow, 3 × 8 = 24 bytes. The strides of that array are therefore (24, 8). NumPy canstorearraysineitherCorFortranmemoryorder,iteratingfirstover either rows or columns. This allows external libraries written in those languages to access NumPy array data in memory directly. Users interact with NumPy arrays using ‘indexing’ (to access sub- arrays or individual elements), ‘operators’ (for example, +, − and × for vectorized operations and @ for matrix multiplication), as well as‘array-awarefunctions’;together,theseprovideaneasilyreadable, expressive, high-level API for array programming while NumPy deals with the underlying mechanics of making operations fast. Indexing an array returns single elements, subarrays or elements that satisfy a specific condition (Fig. 1b). Arrays can even be indexed usingotherarrays(Fig. 1c).Whereverpossible,indexingthatretrievesa subarrayreturnsa‘view’ontheoriginalarraysuchthatdataareshared between the two arrays. This provides a powerful way to operate on subsets of array data while limiting memory usage. To complement the array syntax, NumPy includes functions that perform vectorized calculations on arrays, including arithmetic, statistics and trigonometry (Fig. 1d). Vectorization—operating on entirearraysratherthantheirindividualelements—isessentialtoarray programming.Thismeansthatoperationsthatwouldtakemanytens oflinestoexpressinlanguagessuchasCcanoftenbeimplementedas asingle,clearPythonexpression.Thisresultsinconcisecodeandfrees users to focus on the details of their analysis, while NumPy handles looping over array elements near-optimally—for example, taking strides into consideration to best utilize the computer’s fast cache memory. Whenperformingavectorizedoperation(suchasaddition)ontwo arrays with the same shape, it is clear what should happen. Through ‘broadcasting’ NumPy allows the dimensions to differ, and produces results that appeal to intuition. A trivial example is the addition of a scalarvaluetoanarray,butbroadcastingalsogeneralizestomorecom- plex examples such as scaling each column of an array or generating agridofcoordinates.Inbroadcasting,oneorbotharraysarevirtually duplicated (that is, without copying any data in memory), so that the shapes of the operands match (Fig. 1d). Broadcasting is also applied when an array is indexed using arrays of indices (Fig. 1c). Other array-aware functions, such as sum, mean and maximum, performelement-by-element‘reductions’,aggregatingresultsacross one, multiple or all axes of a single array. For example, summing an n-dimensional array over d axes results in an array of dimension n − d (Fig. 1f). NumPyalsoincludesarray-awarefunctionsforcreating,reshaping, concatenating and padding arrays; searching, sorting and counting data; and reading and writing files. It provides extensive support for generatingpseudorandomnumbers,includesanassortmentofprob- ability distributions, and performs accelerated linear algebra, using oneofseveralbackendssuchasOpenBLAS18,19 orIntelMKLoptimized for the CPUs at hand (see Supplementary Methods for more details). Altogether, the combination of a simple in-memory array repre- sentation, a syntax that closely mimics mathematics, and a variety of array-aware utility functions forms a productive and powerfully expressive array programming language. In [1]: import numpy as np In [2]: x = np.arange(12) In [3]: x = x.reshape(4, 3) In [4]: x Out[4]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [5]: np.mean(x, axis=0) Out[5]: array([4.5, 5.5, 6.5]) In [6]: x = x - np.mean(x, axis=0) In [7]: x Out[7]: array([[-4.5, -4.5, -4.5], [-1.5, -1.5, -1.5], [ 1.5, 1.5, 1.5], [ 4.5, 4.5, 4.5]]) a Data structure g Example x = 0 1 2 3 4 5 6 7 8 9 10 11 data data type shape strides 8-byte integer (4, 3) (24, 8) 1 2 3 4 5 6 70 8 9 10 11 8 bytes per element 3 × 8 = 24 bytes to jump one row down b Indexing (view) 10 1199 x[:,1:] → with slices 1 2 4 5 7 8 00 33 66 x[:,::2]→ with slices with steps 0 2 3 5 6 8 9 11 0 11 2 3 44 5 6 77 8 9 1010 11 Slices are start:end:step, any of which can be left blank d Vectorization + → 0 1 3 4 6 7 9 10 1 1 1 1 1 1 1 1 1 2 4 5 7 8 10 11 e Broadcasting × 3 6 0 9 1 2 → 0 0 3 6 6 12 9 18 f Reduction 0 1 3 4 6 7 9 10 2 5 8 11 3 12 21 30 sum axis 1 18 22 26 sum axis 0 66 sum axis (0,1) c Indexing (copy) 4 3 7 6 with arrays with broadcasting →x → ,2 1 1 0 x , 1 1 2 2 1 0 1 0 x with arraysx[0,1],x[1,2] 1 5→ →0 1 1 2 , x[x > 9] with masks10 11→→ 5 with scalarsx[1,2] Fig.1|TheNumPyarrayincorporatesseveralfundamentalarrayconcepts. a,TheNumPyarraydatastructureanditsassociatedmetadatafields. b,Indexinganarraywithslicesandsteps.Theseoperationsreturna‘view’of theoriginaldata.c,Indexinganarraywithmasks,scalarcoordinatesorother arrays,sothatitreturnsa‘copy’oftheoriginaldata.Inthebottomexample,an arrayisindexedwithotherarrays;thisbroadcaststheindexingarguments beforeperformingthelookup.d,Vectorizationefficientlyappliesoperations togroupsofelements.e,Broadcastinginthemultiplicationoftwo-dimensional arrays.f,Reductionoperationsactalongoneormoreaxes.Inthisexample, anarrayissummedalongselectaxestoproduceavector,oralongtwoaxes consecutivelytoproduceascalar.g,ExampleNumPycode,illustratingsomeof theseconcepts.
  • 3. Nature | Vol 585 | 17 September 2020 | 359 ScientificPythonecosystem Pythonisanopen-source,general-purposeinterpretedprogramming languagewellsuitedtostandardprogrammingtaskssuchascleaning data,interactingwithwebresourcesandparsingtext.Addingfastarray operations and linear algebra enables scientists to do all their work withinasingleprogramminglanguage—onethathastheadvantageof being famously easy to learn and teach, as witnessed by its adoption as a primary learning language in many universities. Even though NumPy is not part of Python’s standard library, it ben- efits from a good relationship with the Python developers. Over the years,thePythonlanguagehasaddednewfeaturesandspecialsyntax so that NumPy would have a more succinct and easier-to-read array notation.However,becauseitisnotpartofthestandardlibrary,NumPy is able to dictate its own release policies and development patterns. SciPyandMatplotlibaretightlycoupledwithNumPyintermsofhis- tory,developmentanduse.SciPyprovidesfundamentalalgorithmsfor scientificcomputing,includingmathematical,scientificandengineer- ingroutines.Matplotlibgeneratespublication-readyfiguresandvisu- alizations.ThecombinationofNumPy,SciPyandMatplotlib,together with an advanced interactive environment such as IPython20 or Jupy- ter21 ,providesasolidfoundationforarrayprogramminginPython.The scientificPythonecosystem(Fig. 2)buildsontopofthisfoundationto provide several, widely used technique-specific libraries15,16,22 , that in turn underlie numerous domain-specific projects23–28 . NumPy, at the base of the ecosystem of array-aware libraries, sets documentation standards, provides array testing infrastructure and adds build sup- port for Fortran and other compilers. Manyresearchgroupshavedesignedlarge,complexscientificlibrar- ies that add application-specific functionality to the ecosystem. For example, the eht-imaging library29 , developed by the Event Horizon Telescope collaboration for radio interferometry imaging, analysis andsimulation,reliesonmanylower-levelcomponentsofthescientific Pythonecosystem.Inparticular,theEHTcollaborationusedthislibrary forthefirstimagingofablackhole.Withineht-imaging,NumPyarrays are used to store and manipulate numerical data at every step in the processingchain:fromrawdatathroughcalibrationandimagerecon- struction.SciPysuppliestoolsforgeneralimage-processingtaskssuch asfilteringandimagealignment,andscikit-image,animage-processing library that extends SciPy, provides higher-level functionality such as edge filters and Hough transforms. The ‘scipy.optimize’ module performsmathematicaloptimization.NetworkX22 ,apackageforcom- plexnetworkanalysis,isusedtoverifyimagecomparisonconsistency. Astropy23,24 handlesstandardastronomicalfileformatsandcomputes time–coordinatetransformations.Matplotlibisusedtovisualizedata and to generate the final image of the black hole. Theinteractiveenvironmentcreatedbythearray programmingfoun- dation and the surrounding ecosystem of tools—inside of IPython or Jupyter—isideallysuitedtoexploratorydataanalysis.Userscanfluidly inspect,manipulateandvisualizetheirdata,andrapidlyiteratetorefine programmingstatements.Thesestatementsarethenstitchedtogether intoimperativeorfunctionalprograms,ornotebookscontainingboth computationandnarrative.Scientificcomputingbeyondexploratory workisoftendoneinatexteditororanintegrateddevelopmentenvi- ronment (IDE) such as Spyder. This rich and productive environment has made Python popular for scientific research. To complement this facility for exploratory work and rapid proto- typing,NumPyhasdevelopedacultureofusingtime-testedsoftware engineeringpracticestoimprovecollaborationandreduceerror30 .This culture is not only adopted by leaders in the project but also enthusi- astically taught to newcomers. The NumPy team was early to adopt distributedrevisioncontrolandcodereviewtoimprovecollaboration cantera Chemistry Biopython Biology Astropy Astronomy simpeg Geophysics NLTK Linguistics QuantEcon Economics SciPy Algorithms Matplotlib Plots scikit-learn Machine learning NetworkX Network analysis pandas, statsmodels Statistics scikit-image Image processing PsychoPykhmer Qiime2 FiPy deepchem librosaPyWavelets SunPy QuTiP yt nibabel yellowbrickmne-python scikit-HEP eht-imagingMDAnalysis iriscesium PyChrono Foundation Application-specific Domain-specific Technique-specific Array ProtocolsNumPy API Python Language IPython / Jupyter Interactive environments NumPy Arrays New array implementations Fig.2|NumPyisthebaseofthescientificPythonecosystem.EssentiallibrariesandprojectsthatdependonNumPy’sAPIgainaccesstonewarray implementationsthatsupportNumPy’sarrayprotocols(Fig. 3).
  • 4. 360 | Nature | Vol 585 | 17 September 2020 Review oncode,andcontinuoustestingthatrunsanextensivebatteryofauto- mated tests for every proposed change to NumPy. The project also hascomprehensive,high-qualitydocumentation,integratedwiththe source code31–33 . Thiscultureofusingbestpracticesforproducingreliablescientific softwarehasbeenadoptedbytheecosystemoflibrariesthatbuildon NumPy.Forexample,inarecentawardgivenbytheRoyalAstronomi- cal Society to Astropy, they state: “The Astropy Project has provided hundredsofjuniorscientistswithexperienceinprofessional-standard softwaredevelopmentpracticesincludinguseofversioncontrol,unit testing, code review and issue tracking procedures. This is a vital skill setformodernresearchersthatisoftenmissingfromformaluniversity educationinphysicsorastronomy”34 .Communitymembersexplicitly work to address this lack of formal education through courses and workshops35–37 . Therecentrapidgrowthofdatascience,machinelearningandarti- ficial intelligence has further and dramatically boosted the scientific use of Python. Examples of its important applications, such as the eht-imaging library, now exist in almost every discipline in the natu- ralandsocialsciences.Thesetoolshavebecometheprimarysoftware environmentinmanyfields.NumPyanditsecosystemarecommonly taught in university courses, boot camps and summer schools, and are the focus of community conferences and workshops worldwide. NumPy and its API have become truly ubiquitous. Arrayproliferationandinteroperability NumPyprovidesin-memory,multidimensional,homogeneouslytyped (thatis,single-pointerandstrided)arraysonCPUs.Itrunsonmachines rangingfromembeddeddevicestotheworld’slargestsupercomputers, withperformanceapproachingthatofcompiledlanguages.Formost its existence, NumPy addressed the vast majority of array computa- tion use cases. However,scientificdatasetsnowroutinelyexceedthememorycapac- ity of a single machine and may be stored on multiple machines or in thecloud.Inaddition,therecentneedtoacceleratedeep-learningand artificialintelligenceapplicationshasledtotheemergenceofspecial- izedacceleratorhardware,includinggraphicsprocessingunits(GPUs), tensor processing units (TPUs) and field-programmable gate arrays (FPGAs).Owingtoitsin-memorydatamodel,NumPyiscurrentlyunable to directly utilize such storage and specialized hardware. However, both distributed data and also the parallel execution of GPUs, TPUs andFPGAsmapwelltotheparadigmofarrayprogramming:therefore leadingtoagapbetweenavailablemodernhardwarearchitecturesand the tools necessary to leverage their computational power. Thecommunity’seffortstofillthisgapledtoaproliferationofnew array implementations. For example, each deep-learning framework created its own arrays; the PyTorch38 , Tensorflow39 , Apache MXNet40 and JAX arrays all have the capability to run on CPUs and GPUs in a distributed fashion, using lazy evaluation to allow for additional per- formanceoptimizations.SciPyandPyData/Sparsebothprovidesparse arrays,whichtypicallycontainfewnon-zerovaluesandstoreonlythose in memory for efficiency. In addition, there are projects that build on NumPy arrays as data containers, and extend its capabilities. Distrib- uted arrays are made possible that way by Dask, and labelled arrays— referring to dimensions of an array by name rather than by index for clarity, compare x[:, 1] versus x.loc[:, 'time']—by xarray41 . Such libraries often mimic the NumPy API, because this lowers the barriertoentryfornewcomersandprovidesthewidercommunitywith astablearray programminginterface.This,inturn,preventsdisruptive schisms such as the divergence between Numeric and Numarray. But exploring new ways of working with arrays is experimental by nature and,infact,severalpromisinglibraries(suchasTheanoandCaffe)have alreadyceaseddevelopment.Andeachtimethatauserdecidestotrya newtechnology,theymustchangeimportstatementsandensurethatthe newlibraryimplementsallthepartsoftheNumPyAPItheycurrentlyuse. Ideally, operating on specialized arrays using NumPy functions or semantics would simply work, so that users could write code once, and would then benefit from switching between NumPy arrays, GPU arrays,distributedarraysandsoforthasappropriate.Tosupportarray operations between external array objects, NumPy therefore added the capability to act as a central coordination mechanism with a well specified API (Fig. 2). To facilitate this interoperability, NumPy provides ‘protocols’ (or contractsofoperation),thatallowforspecializedarraystobepassedto NumPyfunctions(Fig. 3).NumPy,inturn,dispatchesoperationstothe originatinglibrary,asrequired.Overfourhundredofthemostpopular NumPy functions are supported. The protocols are implemented by widely used libraries such as Dask, CuPy, xarray and PyData/Sparse. Thankstothesedevelopments,userscannow,forexample,scaletheir computationfromasinglemachinetodistributedsystemsusingDask. The protocols also compose well, allowing users to redeploy NumPy codeatscaleondistributed,multi-GPUsystemsvia,forinstance,CuPy arrays embedded in Dask arrays. Using NumPy’s high-level API, users can leverage highly parallel code execution on multiple systems with millions of cores, all with minimal code changes42 . These array protocols are now a key feature of NumPy, and are expected to only increase in importance. The NumPy developers— many of whom are authors of this Review—iteratively refine and add protocol designs to improve utility and simplify adoption. Output arrays Input arrays NumPy API np.stack np.reshape np.transpose np.argmin np.mean np.std np.max np.cos np.arctan np.log np.cumsum np.diff ... NumPy array protocols In [1]: import numpy as np In [2]: import dask.array as da In [3]: x = da.arange(12) In [4]: x = np.reshape(x, (4, 3)) In [5]: x Out[5]: dask.array<..., shape=(4, 3), ...> In [6]: np.mean(x, axis=0) Out[6]: dask.array<..., shape=(3,), ...> In [7]: x = x - np.mean(x, axis=0) In [8]: x Out[8]: dask.array<..., shape=(4, 3), ...> Array implementation NumPy Dask CuPy PyData/ Sparse ... ... Dask NumPy CuPy PyData Sparse ... Dask NumPy CuPy PyData Sparse Fig.3|NumPy’sAPIandarrayprotocolsexposenewarraystothe ecosystem.Inthisexample,NumPy’s‘mean’functioniscalledonaDaskarray. Thecallsucceedsbydispatchingtotheappropriatelibraryimplementation(in thiscase,Dask)andresultsinanewDaskarray.Comparethiscodetothe examplecodeinFig. 1g.
  • 5. Nature | Vol 585 | 17 September 2020 | 361 Discussion NumPy combines the expressive power of array programming, the performanceofC,andthereadability,usabilityandversatilityofPython inamature,welltested,welldocumentedandcommunity-developed library.LibrariesinthescientificPythonecosystemprovidefastimple- mentations of most important algorithms. Where extreme optimiza- tion is warranted, compiled languages can be used, such as Cython43 , Numba44 and Pythran45 ; these languages extend Python and trans- parently accelerate bottlenecks. Owing to NumPy’s simple memory model, it is easy to write low-level, hand-optimized code, usually in C orFortran,tomanipulateNumPyarraysandpassthembacktoPython. Furthermore, using array protocols, it is possible to utilize the full spectrumofspecializedhardwareaccelerationwithminimalchanges to existing code. NumPywasinitiallydevelopedbystudents,facultyandresearchers to provide an advanced, open-source array programming library for Python,whichwasfreetouseandunencumberedbylicenseserversand softwareprotectiondongles.Therewasasenseofbuildingsomething consequential together for the benefit of many others. Participating in such an endeavour, within a welcoming community of like-minded individuals, held a powerful attraction for many early contributors. These user–developers frequently had to write code from scratch to solve their own or their colleagues’ problems—often in low-level languages that preceded Python, such as Fortran46 and C. To them, theadvantagesofaninteractive,high-levelarraylibrarywereevident. Thedesignofthisnewtoolwasinformedbyotherpowerfulinteractive programming languages for scientific computing such as Basis47–50 , Yorick51 , R52 and APL53 , as well as commercial languages and environ- ments such as IDL (Interactive Data Language) and MATLAB. WhatbeganasanattempttoaddanarrayobjecttoPythonbecame thefoundationofavibrantecosystemoftools.Now,alargeamountof scientific work depends on NumPy being correct, fast and stable. It is nolongerasmallcommunityproject,butcorescientificinfrastructure. Thedeveloperculturehasmatured:althoughinitialdevelopmentwas highlyinformal,NumPynowhasaroadmapandaprocessforpropos- ing and discussing large changes. The project has formal governance structures and is fiscally sponsored by NumFOCUS, a nonprofit that promotes open practices in research, data and scientific computing. Overthepastfewyears,theprojectattracteditsfirstfundeddevelop- ment, sponsored by the Moore and Sloan Foundations, and received anawardaspartoftheChanZuckerbergInitiative’sEssentialsofOpen Source Software programme. With this funding, the project was (and is) able to have sustained focus over multiple months to implement substantial new features and improvements. That said, the develop- mentofNumPystilldependsheavilyoncontributionsmadebygradu- ate students and researchers in their free time (see Supplementary Methods for more details). NumPyisnolongermerelythefoundationalarraylibraryunderlying thescientificPythonecosystem,butithasbecomethestandardAPIfor tensor computation and a central coordinating mechanism between arraytypesandtechnologiesinPython.Workcontinuestoexpandon and improve these interoperability features. Overthenextdecade,NumPydeveloperswillfaceseveralchallenges. Newdeviceswillbedeveloped,andexistingspecializedhardwarewill evolvetomeetdiminishingreturnsonMoore’slaw.Therewillbemore, andawidervarietyof,datasciencepractitioners,alargeproportionof whom will use NumPy. The scale of scientific data gathering will con- tinue to increase, with the adoption of devices and instruments such as light-sheet microscopes and the Large Synoptic Survey Telescope (LSST)54 .Newgenerationlanguages,interpretersandcompilers,suchas Rust55 ,Julia56 andLLVM57 ,willcreatenewconceptsanddatastructures, and determine their viability. ThroughthemechanismsdescribedinthisReview,NumPyispoised to embrace such a changing landscape, and to continue playing a leading part in interactive scientific computation, although to do so willrequiresustainedfundingfromgovernment,academiaandindus- try.But,importantly,forNumPytomeettheneedsofthenextdecade ofdatascience,itwillalsoneedanewgenerationofgraduatestudents and community contributors to drive it forward. 1. Abbott, B. P. et al. Observation of gravitational waves from a binary black hole merger. Phys. Rev. Lett. 116, 061102 (2016). 2. Chael, A. et al. High-resolution linear polarimetric imaging for the Event Horizon Telescope. Astrophys. J. 286, 11 (2016). 3. Dubois, P. F., Hinsen, K. & Hugunin, J. Numerical Python. Comput. Phys. 10, 262–267 (1996). 4. Ascher, D., Dubois, P. F., Hinsen, K., Hugunin, J. & Oliphant, T. E. An Open Source Project: Numerical Python (Lawrence Livermore National Laboratory, 2001). 5. Yang, T.-Y., Furnish, G. & Dubois, P. F. Steering object-oriented scientific computations. In Proc. TOOLS USA 97. Intl Conf. Technology of Object Oriented Systems and Languages (eds Ege, R., Singh, M. & Meyer, B.) 112–119 (IEEE, 1997). 6. Greenfield, P., Miller, J. T., Hsu, J. & White, R. L. numarray: a new scientific array package for Python. In PyCon DC 2003 http://citeseerx.ist.psu.edu/viewdoc/download?d oi=10.1.1.112.9899 (2003). 7. Oliphant, T. E. Guide to NumPy 1st edn (Trelgol Publishing, 2006). 8. Dubois, P. F. Python: batteries included. Comput. Sci. Eng. 9, 7–9 (2007). 9. Oliphant, T. E. Python for scientific computing. Comput. Sci. Eng. 9, 10–20 (2007). 10. Millman, K. J. & Aivazis, M. Python for scientists and engineers. Comput. Sci. Eng. 13, 9–12 (2011). 11. Pérez, F., Granger, B. E. & Hunter, J. D. Python: an ecosystem for scientific computing. Comput. Sci. Eng. 13, 13–21 (2011). Explains why the scientific Python ecosystem is a highly productive environment for research. 12. Virtanen, P. et al. SciPy 1.0—fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020); correction 17, 352 (2020). Introduces the SciPy library and includes a more detailed history of NumPy and SciPy. 13. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007). 14. McKinney, W. Data structures for statistical computing in Python. In Proc. 9th Python in Science Conf. (eds van der Walt, S. & Millman, K. J.) 56–61 (2010). 15. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). 16. van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014). 17. van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011). Discusses the NumPy array data structure with a focus on how it enables efficient computation. 18. Wang, Q., Zhang, X., Zhang, Y. & Yi, Q. AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. In SC’13: Proc. Intl Conf. High Performance Computing, Networking, Storage and Analysis 25 (IEEE, 2013). 19. Xianyi, Z., Qian, W. & Yunquan, Z. Model-driven level 3 BLAS performance optimization on Loongson 3A processor. In 2012 IEEE 18th Intl Conf. Parallel and Distributed Systems 684–691 (IEEE, 2012). 20. Pérez, F. & Granger, B. E. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007). 21. Kluyver, T. et al. Jupyter Notebooks—a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas (eds Loizides, F. & Schmidt, B.) 87–90 (IOS Press, 2016). 22. Hagberg, A. A., Schult, D. A. & Swart, P. J. Exploring network structure, dynamics, and function using NetworkX. In Proc. 7th Python in Science Conf. (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 11–15 (2008). 23. Astropy Collaboration et al. Astropy: a community Python package for astronomy. Astron. Astrophys. 558, A33 (2013). 24. Price-Whelan, A. M. et al. The Astropy Project: building an open-science project and status of the v2.0 core package. Astron. J. 156, 123 (2018). 25. Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009). 26. Millman, K. J. & Brett, M. Analysis of functional magnetic resonance imaging in Python. Comput. Sci. Eng. 9, 52–55 (2007). 27. The SunPy Community et al. SunPy—Python for solar physics. Comput. Sci. Discov. 8, 014009 (2015). 28. Hamman, J., Rocklin, M. & Abernathy, R. Pangeo: a big-data ecosystem for scalable Earth system science. In EGU General Assembly Conf. Abstracts 12146 (2018). 29. Chael, A. A. et al. ehtim: imaging, analysis, and simulation software for radio interferometry. Astrophysics Source Code Library https://ascl.net/1904.004 (2019). 30. Millman, K. J. & Pérez, F. Developing open source scientific practice. In Implementing Reproducible Research (eds Stodden, V., Leisch, F. & Peng, R. D.) 149–183 (CRC Press, 2014). Describes the software engineering practices embraced by the NumPy and SciPy communities with a focus on how these practices improve research. 31. van der Walt, S. The SciPy Documentation Project (technical overview). In Proc. 7th Python in Science Conf. (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 27–28 (2008). 32. Harrington, J. The SciPy Documentation Project. In Proc. 7th Python in Science Conference (SciPy 2008) (eds Varoquaux, G., Vaught, T. & Millman, K. J.) 33–35 (2008). 33. Harrington, J. & Goldsmith, D. Progress report: NumPy and SciPy documentation in 2009. In Proc. 8th Python in Science Conf. (SciPy 2009) (eds Varoquaux, G., van der Walt, S. & Millman, K. J.) 84–87 (2009). 34. Royal Astronomical Society Report of the RAS ‘A’ Awards Committee 2020: Astropy Project: 2020 Group Achievement Award (A) https://ras.ac.uk/sites/default/files/2020-01/ Group%20Award%20-%20Astropy.pdf (2020). 35. Wilson, G. Software carpentry: getting scientists to write better code by making them more productive. Comput. Sci. Eng. 8, 66–69 (2006).
  • 6. 362 | Nature | Vol 585 | 17 September 2020 Review 36. Hannay, J. E. et al. How do scientists develop and use scientific software? In Proc. 2009 ICSE Workshop on Software Engineering for Computational Science and Engineering 1–8 (IEEE, 2009). 37. Millman, K. J., Brett, M., Barnowski, R. & Poline, J.-B. Teaching computational reproducibility for neuroimaging. Front. Neurosci. 12, 727 (2018). 38. Paszke, A. et al. Pytorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.) 8024–8035 (Neural Information Processing Systems, 2019). 39. Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In OSDI’16: Proc. 12th USENIX Conf. Operating Systems Design and Implementation (chairs Keeton, K. & Roscoe, T.) 265–283 (USENIX Association, 2016). 40. Chen, T. et al. MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Preprint at http://www.arxiv.org/abs/1512.01274 (2015). 41. Hoyer, S. & Hamman, J. xarray: N–D labeled arrays and datasets in Python. J. Open Res. Softw. 5, 10 (2017). 42. Entschev, P. Distributed multi-GPU computing with Dask, CuPy and RAPIDS. In EuroPython 2019 https://ep2019.europython.eu/media/conference/slides/ fX8dJsD-distributed-multi-gpu-computing-with-dask-cupy-and-rapids.pdf (2019). 43. Behnel, S. et al. Cython: the best of both worlds. Comput. Sci. Eng. 13, 31–39 (2011). 44. Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc. Second Workshop on the LLVM Compiler Infrastructure in HPC, LLVM ’15 7:1–7:6 (ACM, 2015). 45. Guelton, S. et al. Pythran: enabling static optimization of scientific Python programs. Comput. Sci. Discov. 8, 014001 (2015). 46. Dongarra, J., Golub, G. H., Grosse, E., Moler, C. & Moore, K. Netlib and NA-Net: building a scientific computing community. IEEE Ann. Hist. Comput. 30, 30–41 (2008). 47. Barrett, K. A., Chiu, Y. H., Painter, J. F., Motteler, Z. C. & Dubois, P. F. Basis System, Part I: Running a Basis Program—A Tutorial for Beginners UCRL-MA-118543, Vol. 1 (Lawrence Livermore National Laboratory 1995). 48. Dubois, P. F. & Motteler, Z. Basis System, Part II: Basis Language Reference Manual UCRL-MA-118543, Vol. 2 (Lawrence Livermore National Laboratory, 1995). 49. Chiu, Y. H. & Dubois, P. F. Basis System, Part III: EZN User Manual UCRL-MA-118543, Vol. 3 (Lawrence Livermore National Laboratory, 1995). 50. Chiu, Y. H. & Dubois, P. F. Basis System, Part IV: EZD User Manual UCRL-MA-118543, Vol. 4 (Lawrence Livermore National Laboratory, 1995). 51. Munro, D. H. & Dubois, P. F. Using the Yorick interpreted language. Comput. Phys. 9, 609–615 (1995). 52. Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996). 53. Iverson, K. E. A programming language. In Proc. 1962 Spring Joint Computer Conf. 345–351 (1962). 54. Jenness, T. et al. LSST data management software development practices and tools. In Proc. SPIE 10707, Software and Cyberinfrastructure for Astronomy V 1070709 (SPIE and International Society for Optics and Photonics, 2018). 55. Matsakis, N. D. & Klock, F. S. The Rust language. Ada Letters 34, 103–104 (2014). 56. Bezanson, J., Edelman, A., Karpinski, S. & Shah, V. B. Julia: a fresh approach to numerical computing. SIAM Rev. 59, 65–98 (2017). 57. Lattner, C. & Adve, V. LLVM: a compilation framework for lifelong program analysis and transformation. In Proc. 2004 Intl Symp. Code Generation and Optimization (CGO’04) 75–88 (IEEE, 2004). Acknowledgements We thank R. Barnowski, P. Dubois, M. Eickenberg, and P. Greenfield, who suggested text and provided helpful feedback on the manuscript. K.J.M. and S.J.v.d.W. were funded in part by the Gordon and Betty Moore Foundation through grant GBMF3834 and by the Alfred P. Sloan Foundation through grant 2013-10-27 to the University of California, Berkeley. S.J.v.d.W., S.B., M.P. and W.W. were funded in part by the Gordon and Betty Moore Foundation through grant GBMF5447 and by the Alfred P. Sloan Foundation through grant G-2017-9960 to the University of California, Berkeley. Author contributions K.J.M. and S.J.v.d.W. composed the manuscript with input from others. S.B., R.G., K.S., W.W., M.B. and T.R. contributed text. All authors contributed substantial code, documentation and/or expertise to the NumPy project. All authors reviewed the manuscript. Competing interests The authors declare no competing interests. Additional information Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020- 2649-2. Correspondence and requests for materials should be addressed to K.J.M., S.J.v.W. or R.G. Peer review information Nature thanks Edouard Duchesnay, Alan Edelman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Reprints and permissions information is available at http://www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2020