SlideShare a Scribd company logo
Introducing  Pipe2Py : Converting Yahoo Pipes to Python Code Original code:  Greg Gaughan Additional development:  Tuukka Hastrup Based on an original idea by:   Tony Hirst , Dept of Communication and Systems, The Open University
pipes.yahoo.com
But what happens if Yahoo Pipes dies?
Pipe2Py github.com/ggaughan/pipe2py
Yahoo pipelines are translated into pipelines of Python generators* to give a close match to the original data flow. * based on ideas by David Beazley http://www.dabeaz.com/generators-uk
Each Yahoo module is coded as a separate Python module.
So you can use   Yahoo Pipes   as a  graphical rapid prototyping application , and then  generate a Python code equivalent   you can host yourself So what?
download code http://github.com/ggaughan/pipe2py to  dev8d/pipes/pipe2py set path export PYTHONPATH=dev8d/pipes installation
simplejson* sudo easy_install simplejson dependencies * only needed for Python pre 2.6
test directory python testbasics.py unit tests
python compile.py -p  pipelineid compilation - direct from Yahoo Pipes generates pipe_ pipelineid .py
python compile.py  pipelinefile.json compilation - from a file generates pipelinefile .py
python  pipe_ pipelineid .py command line execution runs pipe_ pipelineid .py
from pipe2py import Context from pipe2py.modules import * def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs):     "Pipeline"     if conf is None:         conf = {}     forever = pipeforever.pipe_forever(context, None, conf=None)     sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}})     _OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={})     return _OUTPUT compiled code of the form...
Each call to the final generator will ripple through the pipeline issuing  .next()  calls onto the previous generator until the source is exhausted.
Each item is typically passed through the whole pipeline one at a time, so: memory usage is kept to a minimum no module is waiting on an earlier module to finish processing the whole data set by adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability
from pipe2py import Context import pipe_9dc8014dcfd34c834a960321afde68d9 as p C=Context() r = p.pipe_9dc8014dcfd34c834a960321afde68d9(C,None) for i in r:    print i    print i['title'] usage - compiled pipe
from pipe2py.compile import parse_and_build_pipe from pipe2py import Context pipe_def = """json representation of the pipe""" p = parse_and_build_pipe(Context(), pipe_def) for i in p:     print i usage - interpreted pipe
context = Context(describe_input=True) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None) need_inputs = p print need_inputs >>> [(u'0', u'username', u'Twitter username', u'text', u''), ...    (u'1', u'statustitle',  u'Status title [string] or [logo] means twitter icon', u'text', u'logo')] ''' That is, tuples of the form     (position, name, prompt, type, default) ''' usage - user inputs #1              Identifying console prompts
C = Context(inputs={'username':'greg', 'statustitle':'logo'},                      console=False) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(C, None) for i in p:     print i usage - user inputs #2              avoiding console prompts
Yahoo Pipes modules: Pipe2Py implementation progress
Yahoo Pipes modules: Pipe2Py implementation progress
Yahoo Pipes modules: Pipe2Py implementation progress
;-) One more thing...
pipes-engine.appspot.com pipe2py  hosting on Google App Engine
- generate test pipes  that work  of increasing complexity -  generate test pipes that  don't work - commit  pipe2py  patches for test pipes that don't work How can you help?
- simplify installation (easy_install?) - identify a good convention for integrating  pipe2py  compiled pipes in arbitrary code -  - identify a good convention for inserting arbitrary python functions into, or in-between, compiled  pipe2py  pipelines How else can you help?
the next step: produce an open source front end visual editor? wireit? pypes? Anything else?
generate a ready-to-run instance of a Google App Engine configuration based around a compiled pipe? Anything more else?
Pipe2Py github.com/ggaughan/pipe2py

More Related Content

PDF
Scaling FastAGI Applications with Go
PDF
Sup intro
DOCX
Basic command for linux
PDF
GoLang & GoatCore
PDF
Os Treat
PDF
Demystifying the Go Scheduler
PPTX
Ultimate Unix Meetup Presentation
PDF
Ansible, Simplicity, and the Zen of Python
Scaling FastAGI Applications with Go
Sup intro
Basic command for linux
GoLang & GoatCore
Os Treat
Demystifying the Go Scheduler
Ultimate Unix Meetup Presentation
Ansible, Simplicity, and the Zen of Python

What's hot (20)

PDF
Cp command in Linux
PDF
Coding in GO - GDG SL - NSBM
ODP
Vim and Python
PDF
บทที่ 2 โพรโตคอล (protocol)
PDF
Dependency management with Composer
PPT
Concurrency in go
KEY
PDF
Source Plugins
PDF
DOCX
Lab manual
PDF
Golang concurrency design
PDF
3 rd animation
DOCX
PPTX
GopherCon Denver LT 2018
PDF
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
PDF
Move from C to Go
PDF
Composer the right way - SunshinePHP
PDF
Migrating from drupal to plone with transmogrifier
PDF
Transmogrifier: Migrating to Plone with less pain
Cp command in Linux
Coding in GO - GDG SL - NSBM
Vim and Python
บทที่ 2 โพรโตคอล (protocol)
Dependency management with Composer
Concurrency in go
Source Plugins
Lab manual
Golang concurrency design
3 rd animation
GopherCon Denver LT 2018
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Move from C to Go
Composer the right way - SunshinePHP
Migrating from drupal to plone with transmogrifier
Transmogrifier: Migrating to Plone with less pain
Ad

Similar to Dev8d 2011-pipe2 py (20)

PDF
PyCon 2013 : Scripting to PyPi to GitHub and More
PDF
PyCon Taiwan 2013 Tutorial
PDF
The state of PyPy
ZIP
An Introduction to PyPy
PDF
Web2py Code Lab
PDF
Pipfile, pipenv, pip… what?!
PPTX
The New York Times: Sustainable Systems, Powered by Python
PDF
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
PPT
a quick Introduction to PyPy
ODP
Learn python
PPTX
Python
PDF
Python for Penetration testers
PPTX
RedisConf17 - Pain-free Pipelining
PDF
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
PPTX
Complete python toolbox for modern developers
PDF
Python for web security - beginner
PDF
Teasing talk for Flow-based programming made easy with PyF 2.0
PPTX
Python for IoT CoE.pptx KDOJWIHJNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
PDF
Python for PHP developers
PPTX
3-Tut2_Interfacing_Sensors_RPioT.pptx good reference
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon Taiwan 2013 Tutorial
The state of PyPy
An Introduction to PyPy
Web2py Code Lab
Pipfile, pipenv, pip… what?!
The New York Times: Sustainable Systems, Powered by Python
SciPy22 - Building binary extensions with pybind11, scikit build, and cibuild...
a quick Introduction to PyPy
Learn python
Python
Python for Penetration testers
RedisConf17 - Pain-free Pipelining
Marrow: A Meta-Framework for Python 2.6+ and 3.1+
Complete python toolbox for modern developers
Python for web security - beginner
Teasing talk for Flow-based programming made easy with PyF 2.0
Python for IoT CoE.pptx KDOJWIHJNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Python for PHP developers
3-Tut2_Interfacing_Sensors_RPioT.pptx good reference
Ad

More from Tony Hirst (20)

PPTX
15 in 20 research fiesta
PPTX
Dev8d jupyter
PPTX
Ili 16 robot
PDF
Jupyternotebooks ou.pptx
PDF
Virtual computing.pptx
PPTX
ouseful-parlihacks
PDF
Gors appropriate
PPTX
Gors appropriate
PPTX
Robotlab jupyter
PDF
Fco open data in half day th-v2
PPTX
Notes on the Future - ILI2015 Workshop
PPTX
Community Journalism Conf - hyperlocal data wire
PPTX
Residential school 2015_robotics_interest
PPTX
Data Mining - Separating Fact From Fiction - NetIKX
PPTX
Week4
PPTX
A Quick Tour of OpenRefine
PPTX
Conversations with data
PPTX
Data reuse OU workshop bingo
PPTX
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
PDF
Lincoln jun14datajournalism
15 in 20 research fiesta
Dev8d jupyter
Ili 16 robot
Jupyternotebooks ou.pptx
Virtual computing.pptx
ouseful-parlihacks
Gors appropriate
Gors appropriate
Robotlab jupyter
Fco open data in half day th-v2
Notes on the Future - ILI2015 Workshop
Community Journalism Conf - hyperlocal data wire
Residential school 2015_robotics_interest
Data Mining - Separating Fact From Fiction - NetIKX
Week4
A Quick Tour of OpenRefine
Conversations with data
Data reuse OU workshop bingo
Inspiring content - You Don't Need Big Data to Tell Good Data Stories
Lincoln jun14datajournalism

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
sap open course for s4hana steps from ECC to s4
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Spectroscopy.pptx food analysis technology

Dev8d 2011-pipe2 py

  • 1. Introducing Pipe2Py : Converting Yahoo Pipes to Python Code Original code: Greg Gaughan Additional development:  Tuukka Hastrup Based on an original idea by:   Tony Hirst , Dept of Communication and Systems, The Open University
  • 3. But what happens if Yahoo Pipes dies?
  • 5. Yahoo pipelines are translated into pipelines of Python generators* to give a close match to the original data flow. * based on ideas by David Beazley http://www.dabeaz.com/generators-uk
  • 6. Each Yahoo module is coded as a separate Python module.
  • 7. So you can use Yahoo Pipes as a graphical rapid prototyping application , and then generate a Python code equivalent you can host yourself So what?
  • 8. download code http://github.com/ggaughan/pipe2py to dev8d/pipes/pipe2py set path export PYTHONPATH=dev8d/pipes installation
  • 9. simplejson* sudo easy_install simplejson dependencies * only needed for Python pre 2.6
  • 10. test directory python testbasics.py unit tests
  • 11. python compile.py -p pipelineid compilation - direct from Yahoo Pipes generates pipe_ pipelineid .py
  • 12. python compile.py  pipelinefile.json compilation - from a file generates pipelinefile .py
  • 13. python  pipe_ pipelineid .py command line execution runs pipe_ pipelineid .py
  • 14. from pipe2py import Context from pipe2py.modules import * def pipe_404411a8d22104920f3fc1f428f33642(context, _INPUT, conf=None, **kwargs):     "Pipeline"     if conf is None:         conf = {}     forever = pipeforever.pipe_forever(context, None, conf=None)     sw_502 = pipefetch.pipe_fetch(context, forever, conf={u'URL': {u'type': u'url', u'value': u'http://blog.ouseful.info/feed'}})     _OUTPUT = pipeoutput.pipe_output(context, sw_502, conf={})     return _OUTPUT compiled code of the form...
  • 15. Each call to the final generator will ripple through the pipeline issuing .next() calls onto the previous generator until the source is exhausted.
  • 16. Each item is typically passed through the whole pipeline one at a time, so: memory usage is kept to a minimum no module is waiting on an earlier module to finish processing the whole data set by adding queues between the modules they could easily be made to run in parallel, each on a different CPU, to give great scalability
  • 17. from pipe2py import Context import pipe_9dc8014dcfd34c834a960321afde68d9 as p C=Context() r = p.pipe_9dc8014dcfd34c834a960321afde68d9(C,None) for i in r:    print i    print i['title'] usage - compiled pipe
  • 18. from pipe2py.compile import parse_and_build_pipe from pipe2py import Context pipe_def = """json representation of the pipe""" p = parse_and_build_pipe(Context(), pipe_def) for i in p:     print i usage - interpreted pipe
  • 19. context = Context(describe_input=True) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(context, None) need_inputs = p print need_inputs >>> [(u'0', u'username', u'Twitter username', u'text', u''), ...    (u'1', u'statustitle',  u'Status title [string] or [logo] means twitter icon', u'text', u'logo')] ''' That is, tuples of the form     (position, name, prompt, type, default) ''' usage - user inputs #1              Identifying console prompts
  • 20. C = Context(inputs={'username':'greg', 'statustitle':'logo'},                      console=False) p = pipe_ac45e9eb9b0174a4e53f23c4c9903c3f(C, None) for i in p:     print i usage - user inputs #2              avoiding console prompts
  • 21. Yahoo Pipes modules: Pipe2Py implementation progress
  • 22. Yahoo Pipes modules: Pipe2Py implementation progress
  • 23. Yahoo Pipes modules: Pipe2Py implementation progress
  • 24. ;-) One more thing...
  • 25. pipes-engine.appspot.com pipe2py hosting on Google App Engine
  • 26. - generate test pipes that work of increasing complexity - generate test pipes that don't work - commit pipe2py patches for test pipes that don't work How can you help?
  • 27. - simplify installation (easy_install?) - identify a good convention for integrating pipe2py compiled pipes in arbitrary code - - identify a good convention for inserting arbitrary python functions into, or in-between, compiled pipe2py pipelines How else can you help?
  • 28. the next step: produce an open source front end visual editor? wireit? pypes? Anything else?
  • 29. generate a ready-to-run instance of a Google App Engine configuration based around a compiled pipe? Anything more else?