SlideShare a Scribd company logo
bonobo
Simple ETL in Python 3.5+
Romain Dorgueil
@rdorgueil
CTO/Hacker in Residence
Technical Co-founder
(Solo) Founder
Eng. Manager
Developer
L’Atelier BNP Paribas
WeAreTheShops
RDC Dist. Agency
Sensio/SensioLabs
AffiliationWizard
Felt too young in a Linux Cauldron
Dismantler of Atari computers
Basic literacy using a Minitel
Guitars & accordions
Off by one baby
Inception
Once upon a time…
Extract Transform Load
• Not new. Popular concept in the 1970s [1] [2]
• Everywhere. Commerce, websites, marketing, finance, …
[1] https://en.wikipedia.org/wiki/Extract,_transform,_load
[2] https://www.sas.com/en_us/insights/data-management/what-is-etl.html
Extract Transform Load
foo
bar
baz
Extract Transform Load
Extract Transform Load
foo
bar
baz
Extract
Transform Load
Transform

more
Join


DB
HTTP POST
log?
Let’s see
Init
~ $ pip install bonobo
~ $ bonobo init polyconf
Code
import bonobo
input_data = (
'Gérard Chamayou, La Géode, sa sphère miroir, in: Le Paris des Centraliens, pp
'Armelle Lavalou (2000), La Villette, Paris: Éditions du Patrimoine, ISBN 2858
'Jean Marie Pérouse de Montclos (1994), Le guide du Patrimoine: Paris, Ministè
)
def transform(line):
return line.split(',')[0]
graph = bonobo.Graph(
input_data,
transform,
print
)
Run
$ bonobo run polyconf/
Gérard Chamayou
Armelle Lavalou (2000)
Jean Marie Pérouse de Montclos (1994)
- tuple in=1 out=3
- transform in=3 out=3
- print in=3
graph = bonobo.Graph(…)
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
bonobo.run(graph)
or in a shell…
$ bonobo run main.py
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
BEGIN
CsvReader(
'clients.csv'
)
InsertOrUpdate(
'db.site',
'clients',
key='guid'
)
update_crm
retrieve_orders
Context
+
Thread
Context
+
Thread
Context
+
Thread
Context
+
Thread
Transformations ?
a.k.a
nodes in the graph
Functions
def get_more_infos(api, **row):
more = api.query(row.get('id'))
return {
**row,
**(more or {}),
}
Generators
def join_orders(order_api, **row):
for order in order_api.get(row.get('customer_id')):
yield {
**row,
**order,
}
Iterators
extract = (
'foo',
'bar',
'baz',
)
extract = range(0, 1001, 7)
Classes
from bonobo.config import Configurable, Option, Service
class QueryDatabase(Configurable):
table_name = Option(str, default='customers')
database = Service('database.default')
def call(self, database, **row):
customer = database.query(self.table_name, customer_id=row['clientId'
return {
**row,
'is_customer': bool(customer),
}
Services ?
Define as names
class QueryDatabase(Configurable):
database = Service('database.default')
def call(self, database, **row):
return { … }
Runtime injection
import bonobo
graph = bonobo.Graph(...)
def get_services():
return {
‘database.default’: MyDatabaseImpl()
}
Candies
Library
bonobo.FileReader(…)
bonobo.CsvReader(…)
bonobo.JsonReader(…)
bonobo.PickleReader(…)
bonobo.ExcelReader(…)
bonobo.XMLReader(…)
… more to come
bonobo.FileWriter(…)
bonobo.CsvWriter(…)
bonobo.JsonWriter(…)
bonobo.PickleWriter(…)
bonobo.ExcelWriter(…)
bonobo.XMLWriter(…)
… more to come
Library
bonobo.Limit(limit)
bonobo.PrettyPrinter()
bonobo.Filter(…)
… more to come
Console Plugin
Jupyter Plugin
SQLAlchemy Extension
bonobo_sqlalchemy.Select(
query,
*,
pack_size=1000,
limit=None
)
bonobo_sqlalchemy.InsertOrUpdate(
table_name,
*,
fetch_columns,
insert_only_fields,
discriminant,
…
)
PREVIEW
Docker Extension
$ pip install bonobo[docker]
$ bonobo runc myjob.py
PREVIEW
Wrap up
Young
• First commit : December 2016
• 23 releases, ~420 commits, 4 contributors
• Current « stable » 0.4.3
• Target : 1.0 early 2018
1.0
• 100% Open-Source.
• Light & Focused.
• Very few dependencies.
• Comprehensive standard library.
• The rest goes to plugins and extensions.
Small scale
• 1 minute to install
• Easy to deploy
• NOT : Big Data, Statistics, Analytics …
• IS : Lean manufacturing for data






www.bonobo-project.org
github.com/python-bonobo
Let me know what you think!
Data Processing for Humans
Thank you!
@monkcage @rdorgueil
https://goo.gl/e25eoa
bonobo
@monkcage

More Related Content

PDF
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
PDF
Simple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETL
PDF
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
PDF
Streams for (Co)Free!
PDF
Vim Script Programming
PDF
Loops and Unicorns - The Future of the Puppet Language - PuppetConf 2013
PPT
Python - Getting to the Essence - Points.com - Dave Park
PDF
My Adventures In Objective-C (A Rubyists Perspective)
EuroPython 2017 - Bonono - Simple ETL in python 3.5+
Simple Data Engineering in Python 3.5+ — Pycon.DE 2017 Karlsruhe — Bonobo ETL
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Streams for (Co)Free!
Vim Script Programming
Loops and Unicorns - The Future of the Puppet Language - PuppetConf 2013
Python - Getting to the Essence - Points.com - Dave Park
My Adventures In Objective-C (A Rubyists Perspective)

What's hot (20)

PDF
Python Coroutines, Present and Future
PDF
Docopt
PDF
A tour of Python
PPTX
P4 2018 io_functions
PDF
Go Java, Go!
PDF
What's New in PHP 5.5
PPTX
Go Java, Go!
PPTX
Ruby on rails tips
PDF
JavaScript ES6
PDF
Imugi: Compiler made with Python
PDF
JVMLS 2016. Coroutines in Kotlin
PDF
Explaining ES6: JavaScript History and What is to Come
PPTX
P3 2018 python_regexes
PDF
Ruby 2.0
PDF
Go Java, Go!
PDF
Building Interpreters with PyPy
PPTX
Value protocols and codables
PDF
Power of Puppet 4
PPTX
Parse, scale to millions
PDF
PHP 8.1 - What's new and changed
Python Coroutines, Present and Future
Docopt
A tour of Python
P4 2018 io_functions
Go Java, Go!
What's New in PHP 5.5
Go Java, Go!
Ruby on rails tips
JavaScript ES6
Imugi: Compiler made with Python
JVMLS 2016. Coroutines in Kotlin
Explaining ES6: JavaScript History and What is to Come
P3 2018 python_regexes
Ruby 2.0
Go Java, Go!
Building Interpreters with PyPy
Value protocols and codables
Power of Puppet 4
Parse, scale to millions
PHP 8.1 - What's new and changed
Ad

Similar to Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes) (7)

PDF
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
PDF
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
PPTX
Data Engineer’s Lunch #41: PygramETL
PDF
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
PPT
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
PDF
OpenERP data integration in an entreprise context: a war story
PPTX
Django - sql alchemy - jquery
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Business Dashboards using Bonobo ETL, Grafana and Apache Airflow
Data Engineer’s Lunch #41: PygramETL
"Building Data Warehouse with Google Cloud Platform", Artem Nikulchenko
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
OpenERP data integration in an entreprise context: a war story
Django - sql alchemy - jquery
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
737-MAX_SRG.pdf student reference guides
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Well-logging-methods_new................
PPT
Mechanical Engineering MATERIALS Selection
PPTX
Artificial Intelligence
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
737-MAX_SRG.pdf student reference guides
Current and future trends in Computer Vision.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Well-logging-methods_new................
Mechanical Engineering MATERIALS Selection
Artificial Intelligence
R24 SURVEYING LAB MANUAL for civil enggi
Operating System & Kernel Study Guide-1 - converted.pdf
additive manufacturing of ss316l using mig welding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Model Code of Practice - Construction Work - 21102022 .pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Construction Project Organization Group 2.pptx
bas. eng. economics group 4 presentation 1.pptx

Simple ETL in Python 3.5+ - PolyConf Paris 2017 - Lightning Talk (10 minutes)