SlideShare a Scribd company logo
bonobo
Simple ETL in Python 3.5+
Romain Dorgueil
@rdorgueil
CTO/Hacker in Residence
Technical Co-founder
(Solo) Founder
Eng. Manager
Developer
L’Atelier BNP Paribas
WeAreTheShops
RDC Dist. Agency
Sensio/SensioLabs
AffiliationWizard
Felt too young in a Linux Cauldron
Dismantler of Atari computers
Basic literacy using a Minitel
Guitars & accordions
Off by one baby
Inception
#Grayscales
#Open-Source
#Trapeze shapes
#Data Engineering
STARTUP ACCELERATION PROGRAMS
NO HYPE, JUST BUSINESS
launchpad.atelier.net
bonobo
Simple ETL in Python 3.5+
• History & Market
• Extract, transform, load
• Basics – Bonobo One-O-One
• Concepts & Candies
• The Future – State & Plans
Once upon a time…
Extract Transform Load
• Not new. Popular concept in the 1970s [1] [2]
• Everywhere. Commerce, websites, marketing, finance, …
• Update secondary data-stores from a master data-store.
• Insert initial data somewhere
• Bi-directional API calls
• Every day, compute time reports, edit invoices if threshold then send e-mail.
• …
[1] https://en.wikipedia.org/wiki/Extract,_transform,_load
[2] https://www.sas.com/en_us/insights/data-management/what-is-etl.html
Small Automation Tools
• Mostly aimed at simple recurring tasks.
• Cloud / SaaS only.

Data Integration Tools
• Pentaho Data Integration (IDE/Java)
• Talend Open Studio (IDE/Java)
• CloverETL (IDE/Java)
Talend Open Studio
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Data Integration Tools
• Java + IDE based, for most of them
• Data transformations are blocks
• IO flow managed by connections
• Execution
In the Python world …
• Bubbles (https://github.com/stiivi/bubbles)
• PETL (https://github.com/alimanfoo/petl)
• and now… Bonobo (https://www.bonobo-project.org/)
You can also use amazing libraries including 

Joblib, Dask, Pandas, Toolz, 

but ETL is not their main focus.
Big Data Tools
• Can do anything. And probably more. Fast.
• Either needs an infrastructure, or cloud based.
Story time
Partner 1 Data Integration
WE GOT DEALS !!!
Partner 1 Partner 2 Partner 3 Partner 4 Partner 5
Partner 6 Partner 7 Partner 8 Partner 9 …
Tiny bug there…
Can you fix it ?
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
I want…
• A data integration / ETL tool using code as configuration.
• Preferably Python code.
• Something that can be tested (I mean, by a machine).
• Something that can use inheritance.
• Fast install on laptop, thought to run on servers too.
And that’s Bonobo
It is …
• A framework to write ETL jobs in Python 3 (3.5+)
• Using the same concepts as the old ETLs
• But for coders!
• You can use OOP!
It is NOT …
• Pandas / R Dataframes
• Dask
• Luigi / Airflow
• Hadoop / Big Data
• A monkey
Bonobo - One - O - One
Let’s create a project
Let’s create a project
~ $ pip install bonobo cookiecutter

Let’s create a project
~ $ pip install bonobo cookiecutter

~ $ bonobo init pyparis

Let’s create a project
~ $ pip install bonobo cookiecutter

~ $ bonobo init pyparis

~ $ cd pyparis

Let’s create a project
~ $ pip install bonobo cookiecutter

~ $ bonobo init pyparis

~ $ cd pyparis

~/pyparis $ ls -lsah

total 32

0 drwxr-xr-x 7 rd staff 238B 30 mai 18:02 .

0 drwxr-xr-x 4 rd staff 136B 30 mai 18:03 ..

0 -rw-r--r-- 1 rd staff 0B 30 mai 18:02 .env

8 -rw-r--r-- 1 rd staff 6B 30 mai 18:02 .gitignore

8 -rw-r--r-- 1 rd staff 333B 30 mai 18:02 main.py

8 -rw-r--r-- 1 rd staff 140B 30 mai 18:02 _services.py

8 -rw-r--r-- 1 rd staff 7B 30 mai 18:02 requirements.txt
It works!
$ bonobo run .
1
3
5
...
37
39
41
- range in=1 out=42
- Filter in=42 out=21
- print in=21
Let’s rewrite main.py
Let’s rewrite main.py
import bonobo
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
if __name__ == '__main__':
bonobo.run(graph)
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
if __name__ == '__main__':
bonobo.run(graph)
range(42)
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
if __name__ == '__main__':
bonobo.run(graph)
range(42)
bonobo.Filter(lambda n: n % 2)
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
if __name__ == '__main__':
bonobo.run(graph)
range(42)
bonobo.Filter(lambda n: n % 2)
print
Let’s rewrite main.py
import bonobo
def extract():
for i in range(42):
yield i
def transform(n):
if n % 2:
yield n
def load(n):
print(n)
graph = bonobo.Graph(
extract,
transform,
load,
)
if __name__ == '__main__':
bonobo.run(graph)
range(42)
bonobo.Filter(lambda n: n % 2)
print
graph = bonobo.Graph(
range(42),
bonobo.Filter(filter=lambda x: x % 2),
print,
)
if __name__ == '__main__':
bonobo.run(graph)
graph = bonobo.Graph(…)
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
bonobo.run(graph)
or in a shell…
$ bonobo run main.py
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
Context
+
Thread
Context
+
Thread
Context
+
Thread
Context
+
Thread
Transformations ?
a.k.a
nodes in the graph
Functions!
def get_more_infos(api, **row):
more = api.query(row.get('id'))
return {
**row,
**(more or {}),
}
Generators!
def join_orders(order_api, **row):
for order in order_api.get(row.get('customer_id')):
yield {
**row,
**order,
}
Iterators!
extract = (
'foo',
'bar',
'baz',
)
extract = range(0, 1001, 7)
Classes!
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
# Simple string option
table_name = Option(str, default='customers')
# External dependency
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clientId'])
return {
**row,
'is_customer': bool(customer),
}
Services ?
get_services()
COINBASE_API_KEY, COINBASE_API_SECRET = environ.get('...'), environ.get('...')
KRAKEN_API_KEY, KRAKEN_API_SECRET = environ.get('...'), environ.get('...')
POLONIEX_API_KEY, POLONIEX_API_SECRET = environ.get('...'), environ.get('...')
def get_services():
return {
'fs': bonobo.open_fs(
path.join(path.dirname(__file__), './data')
),
'coinbase.client': coinbase.wallet.client.Client(
COINBASE_API_KEY, COINBASE_API_SECRET
),
'kraken.client': krakenex.API(
KRAKEN_API_KEY, KRAKEN_API_SECRET
),
'poloniex.client': poloniex.Poloniex(
apikey=POLONIEX_API_KEY, secret=POLONIEX_API_SECRET
),
'google.client': get_google_api_client(),
}
run() will handle injection
import bonobo
from bonobo.commands.run import get_default_services
graph = bonobo.Graph()
if __name__ == '__main__':
bonobo.run(
graph,
services=get_default_services(__file__)
)
…
class QueryDatabase(Configurable):
# External dependency
database = Service('database.default')
def call(self, database, **row):
return { … }
Bananas!
Files
bonobo.FileReader(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str
)
bonobo.FileWriter(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str
)
CSV
bonobo.CsvReader(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str,
delimiter: str,
quotechar: str,
headers: tuple,
skip: int
)
bonobo.CsvWriter(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str,
delimiter: str,
quotechar: str,
headers: tuple
)
JSON
bonobo.JsonReader(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str
)
bonobo.JsonWriter(
path: str,
*,
eol: str,
encoding: str,
ioformat,
mode: str
)
Pickle
bonobo.PickleReader(
path: str,
*,
eol: str,
encoding: str,
ioformat,
item_names: tuple,
mode: str
)
bonobo.PickleWriter(
path: str,
*,
eol: str,
encoding: str,
ioformat,
item_names: tuple,
mode: str
)
bonobo.Limit(limit)
bonobo.PrettyPrinter()
Filters
@bonobo.Filter
def NoneShallPass(username):
return username == 'romain'
bonobo.graph(
# ...
NoneShallPass(),
# ...
)
bonobo.Filter(lambda username: username == 'romain')
More bananas!
Console & Logging
Jupyter Notebook
SQLAlchemy Extension
bonobo_sqlalchemy.Select(
query,
*,
pack_size=1000,
limit=None
)
bonobo_sqlalchemy.InsertOrUpdate(
table_name,
*,
fetch_columns,
insert_only_fields,
discriminant,
…
)
PREVIEW
Docker Extension
$ pip install bonobo[docker]
$ bonobo runc myjob.py
PREVIEW
State & Plans
RIP rdc.etl
• 2013-2015, python 2.7.
• Brings most of the execution algorithms.
• Different context.
• Learn from mistakes.
Bonobo is young
• First commit : December 2016
• 21 releases, ~400 commits, 4 contributors
• Current stable 0.4.1, released yesterday
__future__
• Stay 100% Open-Source
• Light & Focused : Graphs and Executions
• The rest goes to Extensions.
Pre 1.0 Roadmap
• Towards stable API
• More tests, more documentation
• Developer experience first
Target: 1.0 before 2018
Post 1.0 Roadmap
• Scheduling
• Monitoring
• More execution strategies
• Optimizations
Hacker News is crazy
Data Processing for Humans
www.bonobo-project.org
docs.bonobo-project.org
bonobo-slack.herokuapp.com
github.com/python-bonobo
Let me know what you think!
Thank you!
@monkcage @rdorgueil
https://goo.gl/e25eoa
bonobo
@monkcage

More Related Content

PDF
Ways to generate PDF from Python Web applications, Gaël Le Mignot
PDF
Developer-friendly taskqueues: What you should ask yourself before choosing one
PDF
Syncing up with Python’s asyncio for (micro) service development, Joir-dan Gumbs
PDF
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PDF
Golang Performance : microbenchmarks, profilers, and a war story
PDF
DaNode - A home made web server in D
PDF
Running a Plone product on Substance D
PPTX
PlantUML
Ways to generate PDF from Python Web applications, Gaël Le Mignot
Developer-friendly taskqueues: What you should ask yourself before choosing one
Syncing up with Python’s asyncio for (micro) service development, Joir-dan Gumbs
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
Golang Performance : microbenchmarks, profilers, and a war story
DaNode - A home made web server in D
Running a Plone product on Substance D
PlantUML

What's hot (20)

PDF
Draw More, Work Less
PPTX
HHVM: Efficient and Scalable PHP/Hack Execution / Guilherme Ottoni (Facebook)
PDF
Functional Reactive Programming on Android
PDF
20151117 IoT를 위한 서비스 구성과 개발
KEY
PyCon AU 2012 - Debugging Live Python Web Applications
PDF
#PDR15 - waf, wscript and Your Pebble App
PDF
Writing multi-language documentation using Sphinx
PPTX
Device Tree Overlay implementation on AOSP 9.0
PDF
Puppet Camp Paris 2016 Data in Modules
PDF
Flask With Server-Sent Event
PPTX
Async programming and python
PDF
From Java to Kotlin - The first month in practice
PPT
ECMAScript 6: A Better JavaScript for the Ambient Computing Era
PDF
PHP, Under The Hood - DPC
ODP
CouchApp - Build scalable web applications and relax
PDF
Terraform AWS modules and some best-practices - May 2019
PPTX
Binary Studio Academy: Concurrency in C# 5.0
PDF
Top 10 Perl Performance Tips
PPTX
Docker for Development
PDF
PROCESS WARP「プロセスがデバイス間で移動する」仕組みを作る
Draw More, Work Less
HHVM: Efficient and Scalable PHP/Hack Execution / Guilherme Ottoni (Facebook)
Functional Reactive Programming on Android
20151117 IoT를 위한 서비스 구성과 개발
PyCon AU 2012 - Debugging Live Python Web Applications
#PDR15 - waf, wscript and Your Pebble App
Writing multi-language documentation using Sphinx
Device Tree Overlay implementation on AOSP 9.0
Puppet Camp Paris 2016 Data in Modules
Flask With Server-Sent Event
Async programming and python
From Java to Kotlin - The first month in practice
ECMAScript 6: A Better JavaScript for the Ambient Computing Era
PHP, Under The Hood - DPC
CouchApp - Build scalable web applications and relax
Terraform AWS modules and some best-practices - May 2019
Binary Studio Academy: Concurrency in C# 5.0
Top 10 Perl Performance Tips
Docker for Development
PROCESS WARP「プロセスがデバイス間で移動する」仕組みを作る
Ad

Similar to Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil (20)

PDF
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
PDF
Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)
PDF
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
ODP
Introduction to Raspberry Pi and GPIO
PDF
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
PDF
Intro to the raspberry pi board
PDF
GoFFIng around with Ruby #RubyConfPH
PPT
Euro python2011 High Performance Python
PDF
01 isa
PDF
A Few of My Favorite (Python) Things
PDF
CorePy High-Productivity CellB.E. Programming
PDF
Getting Started with iBeacons (Designers of Things 2014)
PDF
Dependencies Managers in C/C++. Using stdcpp 2014
PDF
Notes about moving from python to c++ py contw 2020
PDF
JIT compilation for CPython
PDF
Joblib for cloud computing
PDF
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
PDF
Thinking in Functions: Functional Programming in Python
PDF
Dependency Injection Why is it awesome and Why should I care?
PDF
DEF CON 27 - KYLE GWINNUP - next generation process emulation with binee
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Introduction to Raspberry Pi and GPIO
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Intro to the raspberry pi board
GoFFIng around with Ruby #RubyConfPH
Euro python2011 High Performance Python
01 isa
A Few of My Favorite (Python) Things
CorePy High-Productivity CellB.E. Programming
Getting Started with iBeacons (Designers of Things 2014)
Dependencies Managers in C/C++. Using stdcpp 2014
Notes about moving from python to c++ py contw 2020
JIT compilation for CPython
Joblib for cloud computing
PyParis2017 / Cloud computing made easy in Joblib, by Alexandre Abadie
Thinking in Functions: Functional Programming in Python
Dependency Injection Why is it awesome and Why should I care?
DEF CON 27 - KYLE GWINNUP - next generation process emulation with binee
Ad

More from Pôle Systematic Paris-Region (20)

PDF
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
PDF
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
PDF
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
PDF
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
PDF
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
PDF
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
PDF
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
PDF
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
PDF
Osis18_Cloud : Pas de commun sans communauté ?
PDF
Osis18_Cloud : Projet Wolphin
PDF
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
PDF
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
PDF
Osis18_Cloud : Software-heritage
PDF
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
PDF
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
PDF
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
PDF
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
PDF
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
PDF
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
PDF
PyParis 2017 / Un mooc python, by thierry parmentelat
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : Software-heritage
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
PyParis 2017 / Un mooc python, by thierry parmentelat

Recently uploaded (20)

PDF
Hybrid model detection and classification of lung cancer
PPTX
Chapter 5: Probability Theory and Statistics
PDF
project resource management chapter-09.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
August Patch Tuesday
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Tartificialntelligence_presentation.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
A Presentation on Touch Screen Technology
Hybrid model detection and classification of lung cancer
Chapter 5: Probability Theory and Statistics
project resource management chapter-09.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
August Patch Tuesday
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
A Presentation on Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
Tartificialntelligence_presentation.pptx
WOOl fibre morphology and structure.pdf for textiles
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
A comparative study of natural language inference in Swahili using monolingua...
NewMind AI Weekly Chronicles - August'25-Week II
Web App vs Mobile App What Should You Build First.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
A Presentation on Touch Screen Technology

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil