SlideShare a Scribd company logo
Presented by:
An Introduction to Apache Arrow for
Python Programmers
03 June 2022
Marlene Mhangami
2
What is Arrow?
3
database developers and dataframe library
maintainers
4
data represented in memory
5
What is Arrow?
Arrow is fundamentally about 3 things:
Libraries Ecosystem
Format
1. 2. 3.
arrow.apache.org
6
Problem:
How can we take
advantage of all the
advancements in
modern hardware?
7
Multiple Cores
-The average CPU has 4-8 cores
-Modern CPUs have between
2-64 cores
-GPUs can contain 1000’s of
cores
SIMD (single instruction multiple data)
8
Benefits of Arrow
- Take advantage of modern hardware
- Store data in columns not rows (better
for analytics work)
9
1.Columnar FORMAT
10
11
limit potential
12
Problem:
How can we move
data efficiently
between projects,
languages or systems
13
CSV FILES
● EXPENSIVE TO
READ AND WRITE
TO
● CONVERT FROM
STRING FORMAT TO
● NO RICH SUPPORT
FOR TYPES
14
2. Libraries
15
Arrow implementations
16
marlenemhangami.com
17
Arrow
● Columnar format
● Binary
● Language agnostic
● memory mapped
● Zero copy
18
Problem:
How can we make the
libraries as fast as
possible?
19
3. Ecosystem
20
Community
- Large language agnostic
community
- Open Source
- Continuous improvement
21
In-memory analytics
PyArrow
`pip install pyarrow`
`import pyarrow as pa`
`pd.read_parquet(x.parquet, engine=’pyarrow’)`
22
In Memory Analytics
23
In Memory Analytics
Projects Using Arrow
Ray
Hugging Face
Graphistry
Apache Spark
Dask
https://arrow.apache.org/powered_by/
marlene@voltrondata.com
marlene_zw
Contact
GENERAL CONTACT
Twitter
LinkedIn
Marlene Mhangami
Thanks

More Related Content

PDF
Apache Arrow and Python: The latest
PDF
Autoscaling Kubernetes
PDF
An introduction to MongoDB
PDF
How Netflix Is Solving Authorization Across Their Cloud
PDF
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
PPTX
Elastic search Walkthrough
PDF
멀티·하이브리드 클라우드 구축 전략 - 네이버비즈니스플랫폼 박기은 CTO
PDF
Cloud native principles
Apache Arrow and Python: The latest
Autoscaling Kubernetes
An introduction to MongoDB
How Netflix Is Solving Authorization Across Their Cloud
AWS와 함께 하는 클라우드 컴퓨팅 - 홍민우 AWS 매니저
Elastic search Walkthrough
멀티·하이브리드 클라우드 구축 전략 - 네이버비즈니스플랫폼 박기은 CTO
Cloud native principles

What's hot (20)

PPTX
MSA(Service Mesh), MDA(Data Mesh), MIA(Inference Mesh) 기술동향 소개-박문기@메ᄀ...
PDF
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
PPTX
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
PDF
Migrating to Apache Spark at Netflix
PDF
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
PPTX
PDF
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
PDF
Kubernetes Deployment Strategies
PPTX
Feature Store as a Data Foundation for Machine Learning
PDF
DevJam 2019 - Introduction to Kubernetes
PDF
Présentation docker et kubernetes
PPTX
Introducing MongoDB Atlas
PDF
PostgreSQL HA
PDF
Zero-risk 엔터프라이즈 클라우드 스토리지 - 조순현 부장, Zadara :: AWS Summit Seoul 2019
PDF
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
PPTX
Kubernetes Workshop
PDF
2019.06.27 Intro to Ceph
PDF
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
PDF
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
PDF
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
MSA(Service Mesh), MDA(Data Mesh), MIA(Inference Mesh) 기술동향 소개-박문기@메ᄀ...
오픈소스 모니터링 알아보기(Learn about opensource monitoring)
Room 2 - 1 - Phạm Quang Minh - A real DevOps culture in practice
Migrating to Apache Spark at Netflix
AWS 미디어 서비스를 이용한 글로벌 라이브 스트리밍 서비스 구축 - 황윤상 솔루션즈 아키텍트, AWS / 조용진 솔루션즈 아키텍트, AW...
FIWARE Wednesday Webinars - Performing Big Data Analysis Using Cosmos With Sp...
Kubernetes Deployment Strategies
Feature Store as a Data Foundation for Machine Learning
DevJam 2019 - Introduction to Kubernetes
Présentation docker et kubernetes
Introducing MongoDB Atlas
PostgreSQL HA
Zero-risk 엔터프라이즈 클라우드 스토리지 - 조순현 부장, Zadara :: AWS Summit Seoul 2019
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Kubernetes Workshop
2019.06.27 Intro to Ceph
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Gartner 2021 Magic Quadrant for Cloud Database Management Systems.pdf
Ad

Similar to An Introduction to Apache Arrow for Python Programmers.pptx (20)

PDF
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
PDF
Ursa Labs and Apache Arrow in 2019
PDF
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PPTX
Parallelization using open mp
PPTX
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
PPTX
Study of various factors affecting performance of multi core processors
PPTX
Clustering
PDF
Ef35745749
PDF
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...
PDF
Py tables
PDF
PyTables
PDF
Large Data Analyze With PyTables
PDF
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
PDF
ACM TechTalks : Apache Arrow and the Future of Data Frames
PDF
Apache Spark 101 - Demi Ben-Ari
PDF
PyTables
PDF
Multicore Computers
PPTX
Windows Server 2008 R2 Dev Session 02
PDF
Apache Arrow -- Cross-language development platform for in-memory data
PDF
Apache Arrow: Present and Future @ ScaledML 2020
PyConDE / PyData Karlsruhe 2017 – Connecting PyData to other Big Data Landsca...
Ursa Labs and Apache Arrow in 2019
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
Parallelization using open mp
01 introduction fundamentals_of_parallelism_and_code_optimization-www.astek.ir
Study of various factors affecting performance of multi core processors
Clustering
Ef35745749
PEARC17: Interactive Code Adaptation Tool for Modernizing Applications for In...
Py tables
PyTables
Large Data Analyze With PyTables
Small, fast and useful – MMTF a new paradigm in macromolecular data transmiss...
ACM TechTalks : Apache Arrow and the Future of Data Frames
Apache Spark 101 - Demi Ben-Ari
PyTables
Multicore Computers
Windows Server 2008 R2 Dev Session 02
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow: Present and Future @ ScaledML 2020
Ad

More from ssuser59b75e (8)

PPTX
PyCon LT .pptx
PPTX
PyCon Italia 2023.pptx
PPTX
Elephants J on the beach.pptx
PPTX
PyCon MEA Arrow Slides.pptx
PPTX
Augmented Reality.pptx
PPTX
Transcendance
PDF
Elephants Ibises and a more Pythonic way to work with databases
PPTX
PyCon Colombia.pptx
PyCon LT .pptx
PyCon Italia 2023.pptx
Elephants J on the beach.pptx
PyCon MEA Arrow Slides.pptx
Augmented Reality.pptx
Transcendance
Elephants Ibises and a more Pythonic way to work with databases
PyCon Colombia.pptx

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Cloud computing and distributed systems.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Building Integrated photovoltaic BIPV_UPV.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Cloud computing and distributed systems.
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Dropbox Q2 2025 Financial Results & Investor Presentation
CIFDAQ's Market Insight: SEC Turns Pro Crypto

An Introduction to Apache Arrow for Python Programmers.pptx