SlideShare a Scribd company logo
http://bioteam.net
Web Services for
Bioinformatics
Chris Dwan
The BioTeam
http://bioteam.net
Totally Unscientific Impression
The vast majority of CPU cycles (clusters, SMP
machines, and grids) in the life sciences either sit
idle, or are dominated by a very few power users.
• Because:
– Most users aren’t aware of what they have
– Or, they don’t know how to use it
– Or, they’ve tried to use it, and it’s difficult
– Or, it doesn’t read their Excel data
– Or, they tried to use it last year, and it gave them incorrect
results
http://bioteam.net
Bioinformatic
s
In the XXI
Century
Lincoln Stein’s “Bioinformatics Nation”
http://bioteam.net
Convergence
• Web interfaces, currently human-
friendly, will become machine-friendly
• Data formats and interfaces will begin
to standardize
• Heterogeneous platforms,
applications, and systems will begin to
interoperate
• Machines will begin to communicate
with each other in profound and
powerful new ways.
http://bioteam.net
Computing For Science
• Many user models
• Many applications, mostly open source,
some quite proprietary
• Cooperative, collaborative, yet competitive
• Compute and data intensive
• Rapid rate of growth / change
• There is no single solution.
Many skill levels: Physicist -> MD
http://bioteam.net
No single solution
http://bioteam.net
Core Problems
• Distribution
Data and applications are created and controlled by
autonomous groups all over the world
• Biology is difficult and messy:
Large collections of data, many data types and tools
developed in a massively distributed environment.
• Research code is different from business code
Rapid development, flexibility, “interactive” development
http://bioteam.net
Web Services
The World Wide Web is more and more used for application to
application communication. The programmatic interfaces
made available are referred to as Web Services.
•WSDL (advertisement)
–Machine readable
–An “interface contract” defining what
services are available via a particular
server
•SOAP (access)
–Independent of platform, language,
and transport protocol
http://bioteam.net
Why Web Services?
• Why not?
– CORBA, RMI, Bytecodes, Relocatable libraries,
The Grid, Opportunistic computing,
metacomputing …
• Selfish benefit to both publishers and users
– Easy publishing (no interface needed)
– Choice of client (command line .. integrated
workflow environments)
– Minimal buy-in
http://bioteam.net
Web Services Adoption?
• Languages
– PERL, C, C++, Objective C, Ruby, Java,
Applescript, Python, …
• Open Source Graphical Clients
– Taverna
• Commercial SOAP / WSDL Clients
– Inforsense, Pipeline Pilot, TurboWorx, VIBE, OS
X, Mathematica, Spotfire, …
http://bioteam.net
Bioinformatic Web Services
• EBI SOAPLab, Emboss, Ensembl, …
• KEGG Pathway
• GO Gene Ontologies
• BioMOBY Objects for modeling data
• NCBI Netblast
• iNquiry Clustered tools
As more organizations adopt common standards,
those standards become more valuable
http://bioteam.net
The BioTeam
• Consulting company:
– Scientists,
Developers, IT
Professionals
• Expertise:
– Scientific, parallel,
distributed computing
– Infrastructure
– Optimization
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompresso
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
http://bioteam.net
BioTeam’s iNquiry
• iNquiry is two things:
– “Instant” cluster deployment kit
• Scheduler, Web Browser, integrated configuration
– Web portal for Bioinformatics
• 170+ applications pre-installed
• HTML interface
• SOAP / Web Services interface, integrated with Cluster tools
• OS X / Apple, HP, Linux, SGI, Orion Multisystems
• 190+ installations worldwide
– 170+ are Apple
– 2 -> 240 nodes
http://bioteam.net
iNquiry (2004)
• All interfaces defined by “PISE” XML
documents
– /usr/local/lib/Pise/5.a/Xml
– Other files created by scripts
HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
http://bioteam.net
iNquiry Interface
blastall.xml
<pise>
<head>
<title>BLASTALL</title>
<version>2.2.1</version>
<description>with gaps</description>
<authors>Altschul, Madden, Schaeffer, Zhang, Miller, Lipman</authors>
<category>NCBI</category>
<reference>Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaeffer,J
inghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Gapped
BLAS
T and PSI-BLAST: a new generation of protein database search programs,
Nucleic
Acids Res. 25:3389-3402.</reference>
<doclink>http://www.ncbi.nih.gov/Education/BLASTinfo/information3.html<
/doclink>
</head>
http://bioteam.net
iNquiry Web Services
• Released, summer 2004
• Actually in use at Novartis, BMS, VBI
• Called from Perl, Java, Taverna, Inforsense,
Pipeline Pilot, VIBE, Apple Automater,
Applescript, … HTML
PISE XML
CGI Scripts
PERL ModulesPISE Scripts
Cluster
SOAP Interface
WSDL
http://bioteam.net
A Vision for Web Services – Based
Computing
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
Pise, InQuiry
Job Distribution/Management
LSF
Clustered ComputingClustered Computing
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
Pise, InQuiry
Job Distribution/Management
LSF
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Biology/Chemistry
Scientific Questions
Sequence Analysis, Genomic Profiling,
Computational Biology/Chemistry
Workflow Tools/Scripts
Pipeline Pilot, Perl
Web Services
InQuiry/Pise
Job Distribution/Management
LSF / SGE
Expert
Users
System
Administrators
Novice
Users
http://bioteam.net
What Web Services Don’t Do
• Traditional scheduler tasks:
– Job Control
– Queuing
– Scheduling
– Failure handling
http://bioteam.net
What Web Services Do Not Do
• Semantics
– Service ‘X’ must still be
interpreted and used in
some context.
– No OMG-like object
model imposed by
default!
– In bioinformatics, other
related projects
(BioMOBY, etc) attempt
to deal with semantic
issues.
http://bioteam.net
What Web Services Do
• Standard interface to arbitrary resources
• Allow someone else to write the interface
• Allow someone else to build the infrastructure
Completely split the interface from the service
provision
Divide and conquer
http://bioteam.net
PERL Web Service Client
$res = $server->blastall_simple(
SOAP::Data->name(TICKET)->value($ticket),
SOAP::Data->name("BLOCKING")->value(0),
SOAP::Data->name("blastall")->value("blastn"),
SOAP::Data->name("query")->value("$query_id"),
SOAP::Data->name("protein_db")->value("yeast.nt"),
SOAP::Data->name("nucleotid_db")->value("yeast.nt"),
SOAP::Data->name("tmp_outfile")
->value($query_id.".blastx")
);
http://bioteam.net
Example Taverna Workflow: Running
Blast
http://bioteam.net
Inforsense Workflow - Microarray Normalization
http://bioteam.net
Pipeline Pilot Web Service Plugin
http://bioteam.net
OS X Tiger - Automator
http://bioteam.net
Re-publication
• Most high level tools
can publish their
protocols as web
services
• All can also call
published web services
• It’s turtles all the way
down.
http://bioteam.net
This can lead to difficulties
http://bioteam.net
Sneak Preview
http://bioteam.net
Excel Web Services Plugin
http://bioteam.net
http://bioteam.net
Stumbling Blocks
• Pass by reference (URL)
– SOAP data bloat
– MIME encode / decode
• System security
– Inadvertent DoS attacks are
easy
• Blocking / Timeouts
– Reattach
• Complex Data Types
• Service Relocation
http://bioteam.net
Plan For Failure
• Myron Livney (U. Wisconsin, Madison)
– Condor project: 20+ years of distributed
computing
– Management (pessimistic) rather than
engineering (optimistic) assumptions.
• Scheduling is complete when the job finishes, not
when it starts.
• Double check all results
• Assume each element will fail.
• Double-schedule the critical path
http://bioteam.net
Users (Research) are the Point
• Maximize user freedom
– Let users help each other:
• shared repository of workflows, codes, etc.
• mailing lists, chat rooms,
– If at all possible, provide source code
– The key problems are social / managerial
• Technical issues are simple by comparison.
• Include all possible resources
– Never try to get in the way of your users
Assume that users know what they’re doing
http://bioteam.net
Take Home
• Biology is difficult and messy
• IT and HPC are difficult and
messy
• Federate, don’t integrate (divide
and conquer)
• Web Services (WSDL and
SOAP) are the standard of
choice.
• If your resources are sitting idle,
there is a problem, and it’s not
the users.
http://bioteam.net
Thank You
• Early adopters (iNquiry web services):
– Nathan Siemers (Bristol-Meyers Squibb)
– John Davies, Jeremy Jenkins (Novartis IBR)
– Dustin Machai (VBI)
– Tim Kunau*, Michael Heuer (CCGB, University of Minnesota)
• Collaborators & Partners:
– Tom Oinn (Taverna), Scitegic, Inforsense
• The Bioteam
– Michael Athanas, Chris Dagdigian, Stan Gloss, Bill Van Etten,
Jiesheng Zhang
• Bio-IT World / Life Sciences Expo
http://bioteam.net

More Related Content

PPTX
Putting Linked Data to Use in a Large Higher-Education Organisation
PPTX
Webscale Discovery with the Enduser in Mind
PDF
Web Landscape - updated in Jan 2016
PDF
Information system a system view
PPTX
Get On The Reference Bus! Wyoming
PPT
Internet Filtering Pp
PDF
A Life Well Lived: Looking Backwards and Forwards and Sideways Too
PPTX
Research Management Tools
Putting Linked Data to Use in a Large Higher-Education Organisation
Webscale Discovery with the Enduser in Mind
Web Landscape - updated in Jan 2016
Information system a system view
Get On The Reference Bus! Wyoming
Internet Filtering Pp
A Life Well Lived: Looking Backwards and Forwards and Sideways Too
Research Management Tools

What's hot (9)

PPTX
Alamw15 VIVO
PDF
The “use” of an electronic resource from a social network analysis perspective
PDF
PPTX
OALT- Create.Collaborate.Communicate
PDF
The Web of Data: The W3C Semantic Web Initiative
PPT
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
PDF
ERMes: An Open Source ERM (Speaker's Notes)
PDF
Graph Structure In The Web
PPTX
PhD Viva - Disambiguating Identity Web References using Social Data
Alamw15 VIVO
The “use” of an electronic resource from a social network analysis perspective
OALT- Create.Collaborate.Communicate
The Web of Data: The W3C Semantic Web Initiative
Harvesting From Many Silos at Web-scale Makes E-content Truly Discoverable
ERMes: An Open Source ERM (Speaker's Notes)
Graph Structure In The Web
PhD Viva - Disambiguating Identity Web References using Social Data
Ad

Similar to 2006 bio it web services (20)

PPTX
Bots & spiders
PDF
Software Analytics: Data Analytics for Software Engineering
PPT
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
PDF
Architecture Patterns - Open Discussion
PDF
Software Mining and Software Datasets
PPTX
2016 05 sanger
PPTX
Web Information Systems Introduction and Origin of World Wide Web
PPT
Realigning library services with e resources (ss)
PPTX
Databases, Web Services and Tools For Systems Immunology
PDF
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
PPT
sem_web_slides_k2013.ppt
PPT
Seminar on Semantic web analysis by Juha
PPT
Ict uses in libraries
PDF
Blockchains and databases a new era in distributed computing
PPTX
Taverna workflows in the cloud
PPTX
Climb stateoftheartintro
PDF
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
PDF
e-Learning Delivery System : The Challenges
PPTX
New ICT Trends and Issues of Librarianship
PPTX
Software Analytics: Towards Software Mining that Matters (2014)
Bots & spiders
Software Analytics: Data Analytics for Software Engineering
Non technical introduction to Web Services & Workflows. Taverna, Biocatalogue...
Architecture Patterns - Open Discussion
Software Mining and Software Datasets
2016 05 sanger
Web Information Systems Introduction and Origin of World Wide Web
Realigning library services with e resources (ss)
Databases, Web Services and Tools For Systems Immunology
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
sem_web_slides_k2013.ppt
Seminar on Semantic web analysis by Juha
Ict uses in libraries
Blockchains and databases a new era in distributed computing
Taverna workflows in the cloud
Climb stateoftheartintro
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
e-Learning Delivery System : The Challenges
New ICT Trends and Issues of Librarianship
Software Analytics: Towards Software Mining that Matters (2014)
Ad

More from Chris Dwan (20)

PPTX
Data and Computing Infrastructure for the Life Sciences
PDF
Somerville Police Staffing Final Report.pdf
PDF
2023 Ward 2 community meeting.pdf
PPTX
One Size Does Not Fit All
PDF
Somerville FY23 Proposed Budget
PPTX
Production Bioinformatics, emphasis on Production
PPTX
#Defund thepolice
PPTX
2009 cluster user training
PPTX
No Free Lunch: Metadata in the life sciences
PDF
Somerville ufc memo tree hearing
PDF
2011 career-fair
PPTX
Advocacy in the Enterprise (what works, what doesn't)
PPTX
"The Cutting Edge Can Hurt You"
PPT
Introduction to HPC
PPT
Intro bioinformatics
PDF
Proposed tree protection ordinance
PDF
Tree Ordinance Change Matrix
PDF
Tree protection overhaul
PDF
Response from newport
PDF
Sacramento underpass bid_docs
Data and Computing Infrastructure for the Life Sciences
Somerville Police Staffing Final Report.pdf
2023 Ward 2 community meeting.pdf
One Size Does Not Fit All
Somerville FY23 Proposed Budget
Production Bioinformatics, emphasis on Production
#Defund thepolice
2009 cluster user training
No Free Lunch: Metadata in the life sciences
Somerville ufc memo tree hearing
2011 career-fair
Advocacy in the Enterprise (what works, what doesn't)
"The Cutting Edge Can Hurt You"
Introduction to HPC
Intro bioinformatics
Proposed tree protection ordinance
Tree Ordinance Change Matrix
Tree protection overhaul
Response from newport
Sacramento underpass bid_docs

Recently uploaded (20)

PDF
An interstellar mission to test astrophysical black holes
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
Phytochemical Investigation of Miliusa longipes.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet earth and life
PDF
. Radiology Case Scenariosssssssssssssss
PDF
lecture 2026 of Sjogren's syndrome l .pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Microbiology with diagram medical studies .pptx
An interstellar mission to test astrophysical black holes
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Biophysics 2.pdffffffffffffffffffffffffff
Phytochemical Investigation of Miliusa longipes.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Cell Membrane: Structure, Composition & Functions
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
neck nodes and dissection types and lymph nodes levels
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet earth and life
. Radiology Case Scenariosssssssssssssss
lecture 2026 of Sjogren's syndrome l .pdf
famous lake in india and its disturibution and importance
2Systematics of Living Organisms t-.pptx
Placing the Near-Earth Object Impact Probability in Context
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Microbiology with diagram medical studies .pptx

2006 bio it web services