SlideShare a Scribd company logo
The Joy of SciPy

   David Kammeyer
 PUB February 14, 2013
Brief History

             Person               Package       Year
                                Matrix Object
           Jim Fulton                           1994
                                 in Python
         Jim Hugunin              Numeric       1995
        Perry Greenfield, Rick
         White, Todd Miller      Numarray       2001

        Travis Oliphant            NumPy        2005
SciPy 2001       Travis Oliphant
                     optimize
                      sparse
                   interpolate
                    integrate
                      special
                       signal
                        stats      Founded in 2001 with Travis Vaught
                      fftpack
                        misc




                                                   Eric Jones
                                                     weave
                                                    cluster
Pearu Peterson
                                                      GA*
     linalg
  interpolate
      f2py
SciPy Ecosystem
Community effort
•   Chuck Harris
•   Pauli Virtanen
•   David Cournapeau
•   Stefan van der Walt
•   Dag Sverre Seljebotn
•   Robert Kern
•   Warren Weckesser
•   Ralf Gommers
•   Mark Wiebe
•   Nathaniel Smith
Why Python for Technical Computing
• Syntax (it gets out of your way)
• Over-loadable operators
• Complex numbers built-in early
• Just enough language support for arrays
• “Occasional” programmers can grok it
• Supports multiple programming styles
• Expert programmers can also use it effectively
• Has a simple, extensible implementation
• General-purpose language --- can build a system
• Critical mass
Putting Science back in Comp Sci
 • Much of the software stack is for systems
   programming --- C++, Java, .NET, ObjC, web
    - Complex numbers?
    - Vectorized primitives?
 • Array-oriented programming has been
   supplanted by Object-oriented programming
 • Software stack for scientists is not as helpful
   as it should be
 • Fortran is still where many scientists end up
NumPy: an Array-Oriented Extension
• Data: the array object
  – slicing and shaping
  – data-type map to Bytes

• Fast Math:
  – vectorization
  – broadcasting
  – aggregations
Zen of NumPy
•   strided is better than scattered
•   contiguous is better than strided
•   descriptive is better than imperative
•   array-oriented is better than object-oriented
•   broadcasting is a great idea
•   vectorized is better than an explicit loop
•   unless it’s too complicated --- then use Cython/Numba
•   think in higher dimensions
Memory using Object-oriented

                     Object
   Object                                  Object
                     Attr1
   Attr1                                   Attr1
                     Attr2
   Attr2                                   Attr2
                     Attr3
   Attr3                                   Attr3


                                  Object
                                  Attr1
            Object
                                  Attr2
            Attr1        Object
                                  Attr3
            Attr2         Attr1
            Attr3         Attr2
                          Attr3
Array-oriented (Table) approach
             Attr1   Attr2   Attr3
   Object1
   Object2
   Object3
   Object4
   Object5
   Object6
Benefits of Array-oriented

• Many technical problems are naturally array-
  oriented (easy to vectorize)
• Algorithms can be expressed at a high-level
• These algorithms can be parallelized more
  simply (quite often much information is lost in
  the translation to typical “compiled” languages)
• Array-oriented algorithms map to modern
  hard-ware caches and pipelines.
We need more focus on
complied array-oriented
languages with fast compilers!
What is good about NumPy?
• Array-oriented
• Extensive Dtype System (including structures)
• C-API
• Simple to understand data-structure
• Memory mapping
• Syntax support from Python
• Large community of users
• Broadcasting
• Easy to interface C/C++/Fortran code
New Project



 NumPy
               Blaze
         Next Generation NumPy
              Out-of-core
           Distributed Tables
Overview
                          Processing
            Code
                            Node       Processing
                   Code
                                         Node
   Main            Code   Processing
   Script                   Node
                   Code
                                       Processing
                          Processing     Node
                            Node
Timeline (Available on GitHubNow!)

         Date           Milestone

       July 2012     Pre-alpha release


     December 2012   Early Beta Release


       June 2013        Version 1.0
Spectrogram Demo
Introducing Numba
(lots of kernels to write)
NumPy Users

 • Want to be able to write Python to get fast
     code that works on arrays and scalars
 •   Need access to a boat-load of C-extensions
     (NumPy is just the beginning)


              PyPy doesn’t cut it for us!
Ufuncs


                Generalized
                 UFuncs
                                                      Python
                                                     Function
                 Window
                 Kernel
                  Funcs

                 Function-
                   based
                 Indexing


                 Memory
                                                                Dynamic compilation




                  Filters
                                              Dynamic
                                             Compilation




NumPy Runtime
                I/O Filters



                Reduction
                 Filters


                Computed
                Columns
                              function pointer
SciPy needs a Python compiler

     optimize                    integrate


      special                       ode



      writing more of SciPy at high-level
Numba -- a Python compiler

 • Replays byte-code on a stack with simple type-
   inference
 • Translates to LLVM (using LLVM-py)
 • Uses LLVM for code-gen
 • Resulting C-level function-pointer can be
   inserted into NumPy run-time
 • Understands NumPy arrays
 • Is NumPy / SciPy aware
NumPy + Mamba = Numba
 Python Function                            Machine Code


                          LLVM-PY

                          LLVM 3.1
       ISPC      OpenCL    OpenMP    CUDA     CLANG

         Intel       AMD        Nvidia      Apple
Examples
Software Stack Future?
         Plateaus of Code re-use + DSLs
   SQL                                R
            TDPL                                Matlab


                    Python


             OBJC                C
  FORTRAN                                 C++



                     LLVM
Wakari/Numba Demo
How to pay for all this?
Dual strategy




                Blaze
NumFOCUS
Num(Py) Foundation for Open Code for Usable Science
NumFOCUS

• Mission
  • To initiate and support educational programs
    furthering the use of open source software in
    science.
  • To promote the use of high-level languages and
    open source in science, engineering, and math
    research
  • To encourage reproducible scientific research
  • To provide infrastructure and support for open
    source projects for technical computing
NumFOCUS

Core Projects



  NumPy            SciPy         IPython      Matplotlib

Other Projects (seeking more --- need representatives)


                        Scikits Image
•   Large-scale data analysis products
•   Anaconda, SciPy in a Box
•   Wakari.io -- Cloud Hosted SciPy
•   Python training (data analysis and
    development)
•   NumPy support and consulting
•   Blaze, Numba, and More Development

More Related Content

PPTX
Python Scipy Numpy
PDF
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
PDF
Python NumPy Tutorial | NumPy Array | Edureka
PPTX
Introduction to Clustering algorithm
PPTX
Naive bayes
DOCX
Compiler design important questions
PPTX
Pumping lemma
PPTX
Introduction To Machine Learning
Python Scipy Numpy
Python Matplotlib Tutorial | Matplotlib Tutorial | Python Tutorial | Python T...
Python NumPy Tutorial | NumPy Array | Edureka
Introduction to Clustering algorithm
Naive bayes
Compiler design important questions
Pumping lemma
Introduction To Machine Learning

What's hot (20)

PPTX
Machine Learning
PPTX
Supervised and unsupervised learning
PPTX
Classification and Regression
PDF
Python list
PPT
Semi-supervised Learning
PPTX
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
PDF
List,tuple,dictionary
PDF
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
PPSX
Generate and test random numbers
PPTX
Visualization and Matplotlib using Python.pptx
PPTX
Unit 1 polynomial manipulation
PPTX
Top Down Parsing, Predictive Parsing
PPTX
Asymptotic notations
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
PDF
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2013...
PDF
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
PPTX
Data Analysis with Python Pandas
PPT
PPTX
Machine learning overview
PPT
Apriori and Eclat algorithm in Association Rule Mining
Machine Learning
Supervised and unsupervised learning
Classification and Regression
Python list
Semi-supervised Learning
PYTHON-Chapter 4-Plotting and Data Science PyLab - MAULIK BORSANIYA
List,tuple,dictionary
What is Machine Learning | Introduction to Machine Learning | Machine Learnin...
Generate and test random numbers
Visualization and Matplotlib using Python.pptx
Unit 1 polynomial manipulation
Top Down Parsing, Predictive Parsing
Asymptotic notations
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Internet Technology (Practical Questions Paper) [CBSGS - 75:25 Pattern] {2013...
Learn Python Programming | Python Programming - Step by Step | Python for Beg...
Data Analysis with Python Pandas
Machine learning overview
Apriori and Eclat algorithm in Association Rule Mining
Ad

Viewers also liked (17)

PDF
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
PPTX
Raspberry Pi and Scientific Computing [SciPy 2012]
PDF
Scipy, numpy and friends
KEY
NumPy/SciPy Statistics
PPTX
SciPy - Scientific Computing Tool
PDF
Statistical inference for (Python) Data Analysis. An introduction.
PPTX
Data Visulalization
PDF
Introduction to NumPy & SciPy
PDF
Getting started with pandas
POTX
Making your code faster cython and parallel processing in the jupyter notebook
PDF
Effective Numerical Computation in NumPy and SciPy
PDF
pandas - Python Data Analysis
PDF
Data Analytics with Pandas and Numpy - Python
PDF
Introduction to NumPy (PyData SV 2013)
PDF
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
PDF
pandas: Powerful data analysis tools for Python
PDF
Mining Scipy Lectures
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Raspberry Pi and Scientific Computing [SciPy 2012]
Scipy, numpy and friends
NumPy/SciPy Statistics
SciPy - Scientific Computing Tool
Statistical inference for (Python) Data Analysis. An introduction.
Data Visulalization
Introduction to NumPy & SciPy
Getting started with pandas
Making your code faster cython and parallel processing in the jupyter notebook
Effective Numerical Computation in NumPy and SciPy
pandas - Python Data Analysis
Data Analytics with Pandas and Numpy - Python
Introduction to NumPy (PyData SV 2013)
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
pandas: Powerful data analysis tools for Python
Mining Scipy Lectures
Ad

Similar to The Joy of SciPy (20)

KEY
PDF
Migrating from matlab to python
PDF
Array computing and the evolution of SciPy, NumPy, and PyData
PDF
Blaze: a large-scale, array-oriented infrastructure for Python
PDF
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
PDF
PyCon Estonia 2019
PDF
Travis Oliphant "Python for Speed, Scale, and Science"
PDF
London level39
PDF
PyData Boston 2013
PDF
Python as number crunching code glue
PDF
Large-scale Array-oriented Computing with Python
PPTX
Role of python in hpc
PDF
Standardizing arrays -- Microsoft Presentation
PDF
Python for Computer Vision - Revision
KEY
Numba lightning
PDF
SunPy: Python for solar physics
PPTX
Adarsh_Masekar(2GP19CS003).pptx
PPTX
Scaling Python to CPUs and GPUs
PPTX
Introduction-to-NumPy-in-Python (1).pptx
PDF
On the necessity and inapplicability of python
Migrating from matlab to python
Array computing and the evolution of SciPy, NumPy, and PyData
Blaze: a large-scale, array-oriented infrastructure for Python
Python for Science and Engineering: a presentation to A*STAR and the Singapor...
PyCon Estonia 2019
Travis Oliphant "Python for Speed, Scale, and Science"
London level39
PyData Boston 2013
Python as number crunching code glue
Large-scale Array-oriented Computing with Python
Role of python in hpc
Standardizing arrays -- Microsoft Presentation
Python for Computer Vision - Revision
Numba lightning
SunPy: Python for solar physics
Adarsh_Masekar(2GP19CS003).pptx
Scaling Python to CPUs and GPUs
Introduction-to-NumPy-in-Python (1).pptx
On the necessity and inapplicability of python

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Modernizing your data center with Dell and AMD
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Encapsulation theory and applications.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Review of recent advances in non-invasive hemoglobin estimation
Modernizing your data center with Dell and AMD
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
“AI and Expert System Decision Support & Business Intelligence Systems”
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
NewMind AI Monthly Chronicles - July 2025
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

The Joy of SciPy

  • 1. The Joy of SciPy David Kammeyer PUB February 14, 2013
  • 2. Brief History Person Package Year Matrix Object Jim Fulton 1994 in Python Jim Hugunin Numeric 1995 Perry Greenfield, Rick White, Todd Miller Numarray 2001 Travis Oliphant NumPy 2005
  • 3. SciPy 2001 Travis Oliphant optimize sparse interpolate integrate special signal stats Founded in 2001 with Travis Vaught fftpack misc Eric Jones weave cluster Pearu Peterson GA* linalg interpolate f2py
  • 5. Community effort • Chuck Harris • Pauli Virtanen • David Cournapeau • Stefan van der Walt • Dag Sverre Seljebotn • Robert Kern • Warren Weckesser • Ralf Gommers • Mark Wiebe • Nathaniel Smith
  • 6. Why Python for Technical Computing • Syntax (it gets out of your way) • Over-loadable operators • Complex numbers built-in early • Just enough language support for arrays • “Occasional” programmers can grok it • Supports multiple programming styles • Expert programmers can also use it effectively • Has a simple, extensible implementation • General-purpose language --- can build a system • Critical mass
  • 7. Putting Science back in Comp Sci • Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web - Complex numbers? - Vectorized primitives? • Array-oriented programming has been supplanted by Object-oriented programming • Software stack for scientists is not as helpful as it should be • Fortran is still where many scientists end up
  • 8. NumPy: an Array-Oriented Extension • Data: the array object – slicing and shaping – data-type map to Bytes • Fast Math: – vectorization – broadcasting – aggregations
  • 9. Zen of NumPy • strided is better than scattered • contiguous is better than strided • descriptive is better than imperative • array-oriented is better than object-oriented • broadcasting is a great idea • vectorized is better than an explicit loop • unless it’s too complicated --- then use Cython/Numba • think in higher dimensions
  • 10. Memory using Object-oriented Object Object Object Attr1 Attr1 Attr1 Attr2 Attr2 Attr2 Attr3 Attr3 Attr3 Object Attr1 Object Attr2 Attr1 Object Attr3 Attr2 Attr1 Attr3 Attr2 Attr3
  • 11. Array-oriented (Table) approach Attr1 Attr2 Attr3 Object1 Object2 Object3 Object4 Object5 Object6
  • 12. Benefits of Array-oriented • Many technical problems are naturally array- oriented (easy to vectorize) • Algorithms can be expressed at a high-level • These algorithms can be parallelized more simply (quite often much information is lost in the translation to typical “compiled” languages) • Array-oriented algorithms map to modern hard-ware caches and pipelines.
  • 13. We need more focus on complied array-oriented languages with fast compilers!
  • 14. What is good about NumPy? • Array-oriented • Extensive Dtype System (including structures) • C-API • Simple to understand data-structure • Memory mapping • Syntax support from Python • Large community of users • Broadcasting • Easy to interface C/C++/Fortran code
  • 15. New Project NumPy Blaze Next Generation NumPy Out-of-core Distributed Tables
  • 16. Overview Processing Code Node Processing Code Node Main Code Processing Script Node Code Processing Processing Node Node
  • 17. Timeline (Available on GitHubNow!) Date Milestone July 2012 Pre-alpha release December 2012 Early Beta Release June 2013 Version 1.0
  • 19. Introducing Numba (lots of kernels to write)
  • 20. NumPy Users • Want to be able to write Python to get fast code that works on arrays and scalars • Need access to a boat-load of C-extensions (NumPy is just the beginning) PyPy doesn’t cut it for us!
  • 21. Ufuncs Generalized UFuncs Python Function Window Kernel Funcs Function- based Indexing Memory Dynamic compilation Filters Dynamic Compilation NumPy Runtime I/O Filters Reduction Filters Computed Columns function pointer
  • 22. SciPy needs a Python compiler optimize integrate special ode writing more of SciPy at high-level
  • 23. Numba -- a Python compiler • Replays byte-code on a stack with simple type- inference • Translates to LLVM (using LLVM-py) • Uses LLVM for code-gen • Resulting C-level function-pointer can be inserted into NumPy run-time • Understands NumPy arrays • Is NumPy / SciPy aware
  • 24. NumPy + Mamba = Numba Python Function Machine Code LLVM-PY LLVM 3.1 ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple
  • 26. Software Stack Future? Plateaus of Code re-use + DSLs SQL R TDPL Matlab Python OBJC C FORTRAN C++ LLVM
  • 28. How to pay for all this?
  • 29. Dual strategy Blaze
  • 30. NumFOCUS Num(Py) Foundation for Open Code for Usable Science
  • 31. NumFOCUS • Mission • To initiate and support educational programs furthering the use of open source software in science. • To promote the use of high-level languages and open source in science, engineering, and math research • To encourage reproducible scientific research • To provide infrastructure and support for open source projects for technical computing
  • 32. NumFOCUS Core Projects NumPy SciPy IPython Matplotlib Other Projects (seeking more --- need representatives) Scikits Image
  • 33. Large-scale data analysis products • Anaconda, SciPy in a Box • Wakari.io -- Cloud Hosted SciPy • Python training (data analysis and development) • NumPy support and consulting • Blaze, Numba, and More Development