SlideShare a Scribd company logo
PyPedia
The free programming environment
       that anyone can edit!
       AlexandrosKanterakis



        Genomics Coordination Center, Department of Genetics,
        University Medical Center, Groningen, The Netherlands
Introduction
How not to be a bioinformatician
•   Stay low level at every level
•   Be open source without being open
•   Make tools that make no sense to scientists
•   Do not ever share your results and do not reuse
•   Never maintain your databases and web services
•   Be unreachable and isolated
So, you think you can be a
             bioinformatician…
• Imagine you only have: A personal computer
  with a browser and an Internet connection
• Answer the following question:
     - Who is the current prime minister of Latvia?
SYTYCBAB
• Imagine you only have: A personal computer with
  a browser and an Internet connection
• Answer the following question:
 Compute the Hardy-Weinberg equilibriums of a set of
 genotypes
                                                Execute
                                                 Source
                                                Documentation


                                                Execute
                                                 Source
                                                Documentation



                                                Execute
                                                 Source
                                                
                                                Documentation
Execute
 Source
 Documentation
But what about…
? Web environment, online execution
? Open Source
? Integrate with other tools
? Edit a method and share it
? Examples and Unit tests
? Deploy in the cloud
? Frequency of new releases
Apython sandbox to the rescue
From:
http://wiki.python.org/moin/SandboxedPython




So:
Google App Engine + MediaWiki = PyPedia
www.pypedia.com
A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology
Code as wiki
HTML input as wiki
Executing a method in a remote computer

• Edit your user page and add an “ssh” section:

                      ==ssh==
                      host=ec2-107-22-59-115.compute-1.amazonaws.com
                      username=JohnDoe
                      path=/home/JohnDoe/runPyPedia




• This content is NOT shown to anyone
• Install the PyPedia client on remote
  computer(details on pypedia.com)
“Execute on remote computer”

Example:
Fixed_point_user_JohnDoe


The cloud instance contains:
numpy, scipy, matplotlib


Like SAGE but with custom
execution environments
(i.eBioPython, PyCogent, …)
Cool, but I want to call the function from my local computer..

• Install the PyPedia python library:
git clone git://github.com/kantale/pypedia.git

• Load the function in python:
 import pypedia
 from pypedia importPairwise_linkage_disequilibrium
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G),
   (G,A)], [(A,A), (A,G), (G,G), (A,A)])

   {'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG',
   2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498,
   0.3125), ('GG', 0.37499999997393502, 0.1875)], 'R_sq':
   0.59999999983318408, 'Dprime': 0.99999999986098675}


• You can call the method of any user and your method can be
  called by anyone.
• Edit locally, push changes.
• On the top of each article there is a button:

• Creates a personalized version of the article that only
  you can edit.

• This is similar to the Github’s “fork” feature.
Using PyPedia for open science
• A complete analysis can be hosted in PyPedia

• Any finding generated or published should be
  easily shared and reproduced.

• The reproduction of a finding takes time even
  when the source code is released.
Reproducible science
• PyPedia offers a REST interface:
• www.pypedia.com/index.php?
     b_timestamp=YYYYMMDDHHMMSS
get_code=python code
• Get the most recent version of the python
  code that is edited before the timestamp.

• Reproduce the analysis by sharing a single URL:
  http://www.pypedia.com/index.php?b_timestamp=20120102101010get_code=print
  Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G),
  (A,A)])
Reproducing an experiment
# curl 
--data-urlencode 'b_timestamp=20120501010101' 
--data-urlencode 'get_code=print
Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G),
(G,A)], [(A,A), (A,G), (G,G), (A,A)])' 
http://www.pypedia.com/index.php 
--output code.py

# python code.py
{'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG',
    2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498, 0.3125),
    ('GG', 0.37499999997393502, 0.1875)], 'R_sq': 0.59999999983318408,
    'Dprime': 0.99999999986098675}
Meta-webserver
• HTML injection is allowed
  and encouraged!
http://www.pypedia.com/index.php/Draw_face_user_Kantale



• Example run an HTML code
  posted on gist:
    http://www.pypedia.com/index.php?
      run_code=
            import urllib2
            print urllib2.urlopen(
                ‘https://raw.github.com/gist/2689822/bbea0c43b278d7c4c04
                b3f7a23ba43f558fba98b/index_full.html’).read()
      Click me!
• All content is under the Simplified BSD License
• Two namespaces:
  – Validated articles. i.e: Minor_allele_frequency
     • Safe, only admins can edit
  – User articles. i.e: Minor_allele_frequency_user_John
     • Unsafe, edited by individual user
  – Qualitative articles from User namespace is
    promoted to the Validated namespace
  – Validated articles cannot call User articles (duh..)
A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology
Some thoughts
    (in the embarrassing occasion I have some minutes left)

Code as wiki, program as wiki concept
• Multidimensional expansion
• As Mao said: Let a thousand flowers scripts bloom (and
   some of them rot in hell)
• Minimize the distance:
Dsanity(SCRIPTmade_by_IT_guy, SCRIPTuseful_to_biologists)
• Encyclopedialize™ your scripts because open source isn’t
   enough!

Future steps:
• Attract editors, make communities!
• If it can be done in python, why not Ruby, …?
• Contact: admin@pypedia.com
• Source code license: GPL v3
• Content license: Simplified BSD license
• Join us in google groups:
  http://groups.google.com/group/pypedia
• Twitter: @PyPedia

• PyPedia’s source code:
    – Mediawiki extension:
       https://github.com/kantale/PyPedia_server
    – Python library:
    https://github.com/kantale/pypedia

• Acknowledgements:
    – Despoina Antonakaki
    – Kostas Tselios                               Posters:
    – Morris A. Swertz                                 BOSC: 11
                                                       ISMB: E12

More Related Content

PDF
Jupyter Kernel: How to Speak in Another Language
PDF
Jupyter, A Platform for Data Science at Scale
PDF
Data analysis with Pandas and Spark
PDF
Computable content: Notebooks, containers, and data-centric organizational le...
PDF
Data analytics in the cloud with Jupyter notebooks.
PPTX
Source andassetcontrolingamedev
PDF
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
PDF
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...
Jupyter Kernel: How to Speak in Another Language
Jupyter, A Platform for Data Science at Scale
Data analysis with Pandas and Spark
Computable content: Notebooks, containers, and data-centric organizational le...
Data analytics in the cloud with Jupyter notebooks.
Source andassetcontrolingamedev
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
OSDC 2016 - Continous Integration in Data Centers - Further 3 Years later by ...

What's hot (18)

PDF
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
PDF
Make an Instant Website with Webhooks
PDF
What is version control software and why do you need it?
PDF
Introduction to IPython & Notebook
PDF
Inside GitHub with Chris Wanstrath
PPTX
C# - Raise the bar with functional & immutable constructs (Dutch)
PDF
Git Tutorial
PDF
Open Source Tools for Leveling Up Operations FOSSET 2014
PDF
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
PPTX
Re-thinking Performance tuning with HTTP2
PPTX
Git 101 for Beginners
PDF
Git tutorial
PDF
Gitgithub101slideshare 150922131830-lva1-app6891
PDF
Intro to Jupyter Notebooks
PPT
Git Introduction
ZIP
Introduction to Git
PDF
git and github
PPTX
Package Management on Windows with Chocolatey
PLOTCON NYC: The Architecture of Jupyter: Protocols for Interactive Data Expl...
Make an Instant Website with Webhooks
What is version control software and why do you need it?
Introduction to IPython & Notebook
Inside GitHub with Chris Wanstrath
C# - Raise the bar with functional & immutable constructs (Dutch)
Git Tutorial
Open Source Tools for Leveling Up Operations FOSSET 2014
IPython: A Modern Vision of Interactive Computing (PyData SV 2013)
Re-thinking Performance tuning with HTTP2
Git 101 for Beginners
Git tutorial
Gitgithub101slideshare 150922131830-lva1-app6891
Intro to Jupyter Notebooks
Git Introduction
Introduction to Git
git and github
Package Management on Windows with Chocolatey
Ad

Viewers also liked (6)

PDF
Visual Analytics in Omics - why, what, how?
PPTX
Python programming for Bioinformatics
PDF
Visual Analytics talk at ISMB2013
PDF
Visual Analytics in Omics: why, what, how?
PDF
VIZBI 2014 - Visualizing Genomic Variation
PDF
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visual Analytics in Omics - why, what, how?
Python programming for Bioinformatics
Visual Analytics talk at ISMB2013
Visual Analytics in Omics: why, what, how?
VIZBI 2014 - Visualizing Genomic Variation
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Ad

Similar to A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology (20)

PDF
G3 talk rld_2
PDF
Python Spyder IDE | Edureka
PDF
A Jupyter kernel for Scala and Apache Spark.pdf
PPTX
Sonian, Open Source and Sensu
PPTX
Introduction to Python Programming
PPT
Getting Started With Jenkins And Drupal
PDF
Continuous Integration with Open Source Tools - PHPUgFfm 2014-11-20
PDF
The Five Stages of Enterprise Jupyter Deployment
PPTX
Docs as Part of the Product - Open Source Summit North America 2018
PPT
Resumable File Upload API using GridFS and TUS
PDF
On the Edge Systems Administration with Golang
PPT
Case study
PDF
Use open source software to develop ideas at work
PDF
SymfonyCon Madrid 2014 - Rock Solid Deployment of Symfony Apps
PDF
Everyone wants (someone else) to do it: writing documentation for open source...
PPTX
Reproducible research: practice
PPTX
Using nu get the way you should svcc
PDF
Django dev-env-my-way
PDF
Reproducibility and automation of machine learning process
PPTX
Reproducibility - The myths and truths of pipeline bioinformatics
G3 talk rld_2
Python Spyder IDE | Edureka
A Jupyter kernel for Scala and Apache Spark.pdf
Sonian, Open Source and Sensu
Introduction to Python Programming
Getting Started With Jenkins And Drupal
Continuous Integration with Open Source Tools - PHPUgFfm 2014-11-20
The Five Stages of Enterprise Jupyter Deployment
Docs as Part of the Product - Open Source Summit North America 2018
Resumable File Upload API using GridFS and TUS
On the Edge Systems Administration with Golang
Case study
Use open source software to develop ideas at work
SymfonyCon Madrid 2014 - Rock Solid Deployment of Symfony Apps
Everyone wants (someone else) to do it: writing documentation for open source...
Reproducible research: practice
Using nu get the way you should svcc
Django dev-env-my-way
Reproducibility and automation of machine learning process
Reproducibility - The myths and truths of pipeline bioinformatics

More from Jan Aerts (20)

PPT
Humanizing Data Analysis
PDF
Intro to data visualization
PDF
L Fu - Dao: a novel programming language for bioinformatics
PPTX
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
PDF
S Cain - GMOD in the cloud
PDF
B Temperton - The Bioinformatics Testing Consortium
PDF
J Goecks - The Galaxy Visual Analysis Framework
PDF
S Cain - GMOD in the cloud
PDF
B Chapman - Toolkit for variation comparison and analysis
PDF
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
PPT
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
PPT
S Cheng - eagle-i: development and expansion of a scientific resource discove...
PDF
A Kalderimis - InterMine: Embeddable datamining components
PDF
E Afgan - Zero to a bioinformatics analysis platform in four minutes
PPT
B Kinoshita - Creating biology pipelines with BioUno
PPT
D Baker - Galaxy Update
PPTX
M Reich - GenomeSpace
PPTX
CT Brown - Doing next-gen sequencing analysis in the cloud
PPTX
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
PPTX
D Robinson - Using HDF5 to work with large quantities of rich biological data
Humanizing Data Analysis
Intro to data visualization
L Fu - Dao: a novel programming language for bioinformatics
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
S Cain - GMOD in the cloud
B Temperton - The Bioinformatics Testing Consortium
J Goecks - The Galaxy Visual Analysis Framework
S Cain - GMOD in the cloud
B Chapman - Toolkit for variation comparison and analysis
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
A Kalderimis - InterMine: Embeddable datamining components
E Afgan - Zero to a bioinformatics analysis platform in four minutes
B Kinoshita - Creating biology pipelines with BioUno
D Baker - Galaxy Update
M Reich - GenomeSpace
CT Brown - Doing next-gen sequencing analysis in the cloud
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
D Robinson - Using HDF5 to work with large quantities of rich biological data

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Machine Learning_overview_presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Network Security Unit 5.pdf for BCA BBA.
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

A Kanterakis - PyPedia: a python crowdsourcing development environment for bioinformatics and computational biology

  • 1. PyPedia The free programming environment that anyone can edit! AlexandrosKanterakis Genomics Coordination Center, Department of Genetics, University Medical Center, Groningen, The Netherlands
  • 3. How not to be a bioinformatician • Stay low level at every level • Be open source without being open • Make tools that make no sense to scientists • Do not ever share your results and do not reuse • Never maintain your databases and web services • Be unreachable and isolated
  • 4. So, you think you can be a bioinformatician… • Imagine you only have: A personal computer with a browser and an Internet connection • Answer the following question: - Who is the current prime minister of Latvia?
  • 5. SYTYCBAB • Imagine you only have: A personal computer with a browser and an Internet connection • Answer the following question: Compute the Hardy-Weinberg equilibriums of a set of genotypes Execute Source Documentation Execute Source Documentation Execute Source Documentation
  • 6. Execute Source Documentation But what about… ? Web environment, online execution ? Open Source ? Integrate with other tools ? Edit a method and share it ? Examples and Unit tests ? Deploy in the cloud ? Frequency of new releases
  • 7. Apython sandbox to the rescue From: http://wiki.python.org/moin/SandboxedPython So: Google App Engine + MediaWiki = PyPedia
  • 12. Executing a method in a remote computer • Edit your user page and add an “ssh” section: ==ssh== host=ec2-107-22-59-115.compute-1.amazonaws.com username=JohnDoe path=/home/JohnDoe/runPyPedia • This content is NOT shown to anyone • Install the PyPedia client on remote computer(details on pypedia.com)
  • 13. “Execute on remote computer” Example: Fixed_point_user_JohnDoe The cloud instance contains: numpy, scipy, matplotlib Like SAGE but with custom execution environments (i.eBioPython, PyCogent, …)
  • 14. Cool, but I want to call the function from my local computer.. • Install the PyPedia python library: git clone git://github.com/kantale/pypedia.git • Load the function in python: import pypedia from pypedia importPairwise_linkage_disequilibrium Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G), (A,A)]) {'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG', 2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498, 0.3125), ('GG', 0.37499999997393502, 0.1875)], 'R_sq': 0.59999999983318408, 'Dprime': 0.99999999986098675} • You can call the method of any user and your method can be called by anyone. • Edit locally, push changes.
  • 15. • On the top of each article there is a button: • Creates a personalized version of the article that only you can edit. • This is similar to the Github’s “fork” feature.
  • 16. Using PyPedia for open science • A complete analysis can be hosted in PyPedia • Any finding generated or published should be easily shared and reproduced. • The reproduction of a finding takes time even when the source code is released.
  • 17. Reproducible science • PyPedia offers a REST interface: • www.pypedia.com/index.php? b_timestamp=YYYYMMDDHHMMSS get_code=python code • Get the most recent version of the python code that is edited before the timestamp. • Reproduce the analysis by sharing a single URL: http://www.pypedia.com/index.php?b_timestamp=20120102101010get_code=print Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G), (A,A)])
  • 18. Reproducing an experiment # curl --data-urlencode 'b_timestamp=20120501010101' --data-urlencode 'get_code=print Pairwise_linkage_disequilibrium([(A,A), (A,G), (G,G), (G,A)], [(A,A), (A,G), (G,G), (A,A)])' http://www.pypedia.com/index.php --output code.py # python code.py {'haplotypes': [('AA', 0.49999999997393502, 0.3125), ('AG', 2.606498430642265e-11, 0.1875), ('GA', 0.12500000002606498, 0.3125), ('GG', 0.37499999997393502, 0.1875)], 'R_sq': 0.59999999983318408, 'Dprime': 0.99999999986098675}
  • 19. Meta-webserver • HTML injection is allowed and encouraged! http://www.pypedia.com/index.php/Draw_face_user_Kantale • Example run an HTML code posted on gist: http://www.pypedia.com/index.php? run_code= import urllib2 print urllib2.urlopen( ‘https://raw.github.com/gist/2689822/bbea0c43b278d7c4c04 b3f7a23ba43f558fba98b/index_full.html’).read() Click me!
  • 20. • All content is under the Simplified BSD License • Two namespaces: – Validated articles. i.e: Minor_allele_frequency • Safe, only admins can edit – User articles. i.e: Minor_allele_frequency_user_John • Unsafe, edited by individual user – Qualitative articles from User namespace is promoted to the Validated namespace – Validated articles cannot call User articles (duh..)
  • 22. Some thoughts (in the embarrassing occasion I have some minutes left) Code as wiki, program as wiki concept • Multidimensional expansion • As Mao said: Let a thousand flowers scripts bloom (and some of them rot in hell) • Minimize the distance: Dsanity(SCRIPTmade_by_IT_guy, SCRIPTuseful_to_biologists) • Encyclopedialize™ your scripts because open source isn’t enough! Future steps: • Attract editors, make communities! • If it can be done in python, why not Ruby, …?
  • 23. • Contact: admin@pypedia.com • Source code license: GPL v3 • Content license: Simplified BSD license • Join us in google groups: http://groups.google.com/group/pypedia • Twitter: @PyPedia • PyPedia’s source code: – Mediawiki extension: https://github.com/kantale/PyPedia_server – Python library: https://github.com/kantale/pypedia • Acknowledgements: – Despoina Antonakaki – Kostas Tselios Posters: – Morris A. Swertz BOSC: 11 ISMB: E12