Biothings.api
https://github.com/SuLab/biothings.api
Generalizing MyGene and MyVariant
Motivation
• Isolate the common aspects of MyGene and
MyVariant codebases and make them
available in a separate framework:
biothings.api
• Allows easier development of additional
biothings APIs (Disease, Drug/Chemical, GO,
Species… -> JSON, aggregate on a single field)
• Allows easier maintenance and development
of current biothings (gene, variant).
System Overview
• The tornado HTTP server consists of handlers that contain the code to run
when a particular URL pattern is matched, e.g. /variant/, or /metadata
• The biothing codebase essentially contains the connection between the
appropriate Tornado HTTP Request Handler for a request and the elasticsearch
query that executes that request
Biothings – HTTP Handling
• tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods:
get/post, get_arguments, write
• biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers.
Important class methods: get_query_params, return_json
• biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation
endpoint. Important class methods: get, post, _examine_kwargs
• biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint
• biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS
ELB
Biothings – HTTP Handling
• biothings.www.api.handlers.BiothingHandler:
– GET request (e.g. /variant/chr6:g.152708291G>A)
– POST request (e.g. /variant/)
Biothings – HTTP Handling
• biothings.www.api.handlers.QueryHandler:
– GET request (e.g. /query?q=_exists_:dbsnp)
– POST request (e.g. /query/)
Biothings – Elasticsearch query
• biothings.www.api.es.ESQuery – contains the python code
for constructing the elasticsearch query and formatting the resulting data
– query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a
GET or POST to the /query/ endpoint.
– get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data
obtained from a GET to the /annotation/ endpoint.
– mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data
obtained from a POST to the /annotation/ endpoint.
– _cleaned_res(res) – Contains the code to format the return object for get_biothing and
mget_biothings.
– _cleaned_res2(res) – Contains the code to format the return object for query.
– _get_biothingdoc(hit) – Contains the code to format a single biothing object from any
elasticsearch query. Called by _cleaned_res and _cleaned_res2.
– _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in
_get_biothingdoc. Currently empty -> for subclassing.
Biothings - Settings
• Problem: Until now, we have left out the problem of how to
refer to things that MUST be project specific (e.g., the name
of the elasticsearch index to search, the type of the
document, etc). How do we do this?
• Solution: We make a settings module in biothings that all
code within biothings refers to. That module looks for an
environment variable called BIOTHING_SETTINGS with the
name of a module that can be imported to set project specific
variables.
– export BIOTHING_SETTINGS = ‘biothings.config’
• Similar to Django.
Biothings - Settings
Biothings – Project template
• At this point, we have the tools necessary to easily create and
subclass 4 types of biothings handlers (BiothingHandler,
QueryHandler, MetaDataHandler, StatusHandler), and the
elasticsearch query class (ESQuery)
• Could definitely stop here and have a useful tool, but we
wanted to make it even easier to create a new project (also
enforces a uniform project structure across all biothings APIs).
• To do this we have a project template folder containing the
project directory structure and some skeleton code:
– config.py,
– URL patterns to Handlers connection
– Handlers to ESQuery connection
Biothings - Project template
• To create the actual project directory from the
template, we wrote a small function: start-project.py
– Usage: python start-project.py <path-to-project-
directory> <biothing-object-name>
– python start-project.py ~ variant
• Any folder or file in the template directory will be
created in the project directory. The contents of any
file are passed through the python String.template
function before they are created in the project
directory.
Biothings –
Project
template
www.api.handlers
Biothings –
MyVariant
Project
www.api.handlers
Biothings –
MyVariant
Project
www.api.handlers
Part 1
Biothings –
MyVariant
Project
www.api.handlers
Part 2
Biothings –
MyVariant
Project
www.api.es
Part 1
Biothings –
MyVariant
Project
www.api.es
Part 2
Recreating MyVariant.info using biothings.api
• Recreated current MyVariant.info service using the
biothings.api framework
– Very little extra code required (~100 lines)
– Less than a day of time to create the web front end from start.
– https://github.com/cyrus0824/myvariant.info_new
• Seems disingenuous to gauge the utility of a tool by recreating
a codebase if that tool was itself created from the codebase
=> Should try implementing other APIs, especially
MyGene.info (has more varied gene specific query options),
and modify biothings as needed.
Future work
• Integrate data load and data index functions into
biothings
• Documentation! – Projects like this need very good
documentation to be of any use to an API developer
(on the level of tornado’s excellent documentation:
http://www.tornadoweb.org/en/stable/web.html)
• Auto-generate clients (python client, R client)
• Auto-generate ansible-playbook to create cluster
hardware on AWS
• One-click API…

More Related Content

PPTX
Biothings presentation
PPT
NCBO Technology Overview
PPTX
Android session 4-behestee
PDF
LiveFolders as feeds
PDF
Entity Extraction from Natural Language Text using Apache NiFi and Idyl E3
PDF
Building Scalable and Personalized News Feed
PPTX
Social network architecture - Part 2. News feed
PDF
Natinal Diploma in Electronic engineering
Biothings presentation
NCBO Technology Overview
Android session 4-behestee
LiveFolders as feeds
Entity Extraction from Natural Language Text using Apache NiFi and Idyl E3
Building Scalable and Personalized News Feed
Social network architecture - Part 2. News feed
Natinal Diploma in Electronic engineering

Viewers also liked (11)

PDF
Google drive examen final
DOCX
Larry M. Girard Resume
DOCX
Sistemas economicos
PPT
Treball sintesis
DOCX
cv-Mansoor Engineering-New
PPTX
WRC Live
PDF
Toast Masters
PDF
Historic Cruise ms Rotterdam
PDF
Trabajar el ataque organizado
PDF
Digital Leadership, Ibrahim Evsan, 2015
TXT
147561730 slum-case-study-kolkata
Google drive examen final
Larry M. Girard Resume
Sistemas economicos
Treball sintesis
cv-Mansoor Engineering-New
WRC Live
Toast Masters
Historic Cruise ms Rotterdam
Trabajar el ataque organizado
Digital Leadership, Ibrahim Evsan, 2015
147561730 slum-case-study-kolkata
Ad

Similar to Biothings presentation (20)

PDF
BioThings SDK: a toolkit for building high-performance data APIs in biology
PPTX
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
PPTX
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
PPTX
Biothings APIs: high-performance bioentity-centric web services
PPTX
High-performance web services for gene and variant annotations
PDF
Developing an open source community for cloud bioinformatics
PDF
Bonnal bosc2010 bio_ruby
PPTX
PPTX
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
PDF
Project
PDF
My Open Access papers
PDF
Biopython Project Update (BOSC 2012)
ODP
biopython, doctest and makefiles
PDF
E Talevich - Biopython project-update
PPT
Biodiversity Heritage Library Articles Demo
PDF
G3 talk rld_2
PPTX
Bioconda and the Conda Package Manager
PDF
SciPy 2025 - Packaging a Scientific Python Project
PPTX
Open Babel project overview
PDF
B Chapman - Codefest BOSC2012
BioThings SDK: a toolkit for building high-performance data APIs in biology
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings and SmartAPI: building an ecosystem of interoperable biological kno...
Biothings APIs: high-performance bioentity-centric web services
High-performance web services for gene and variant annotations
Developing an open source community for cloud bioinformatics
Bonnal bosc2010 bio_ruby
BioThings API: Promoting Best-practices via a Biomedical API Development Ecos...
Project
My Open Access papers
Biopython Project Update (BOSC 2012)
biopython, doctest and makefiles
E Talevich - Biopython project-update
Biodiversity Heritage Library Articles Demo
G3 talk rld_2
Bioconda and the Conda Package Manager
SciPy 2025 - Packaging a Scientific Python Project
Open Babel project overview
B Chapman - Codefest BOSC2012
Ad

Recently uploaded (20)

PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
IP : I ; Unit I : Preformulation Studies
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
HVAC Specification 2024 according to central public works department
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Education and Perspectives of Education.pptx
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
My India Quiz Book_20210205121199924.pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PPTX
Module on health assessment of CHN. pptx
PPTX
Climate Change and Its Global Impact.pptx
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Journal of Dental Science - UDMY (2022).pdf
IP : I ; Unit I : Preformulation Studies
Race Reva University – Shaping Future Leaders in Artificial Intelligence
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
International_Financial_Reporting_Standa.pdf
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
HVAC Specification 2024 according to central public works department
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Education and Perspectives of Education.pptx
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
My India Quiz Book_20210205121199924.pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
Module on health assessment of CHN. pptx
Climate Change and Its Global Impact.pptx
Cambridge-Practice-Tests-for-IELTS-12.docx
Environmental Education MCQ BD2EE - Share Source.pdf

Biothings presentation

  • 2. Motivation • Isolate the common aspects of MyGene and MyVariant codebases and make them available in a separate framework: biothings.api • Allows easier development of additional biothings APIs (Disease, Drug/Chemical, GO, Species… -> JSON, aggregate on a single field) • Allows easier maintenance and development of current biothings (gene, variant).
  • 3. System Overview • The tornado HTTP server consists of handlers that contain the code to run when a particular URL pattern is matched, e.g. /variant/, or /metadata • The biothing codebase essentially contains the connection between the appropriate Tornado HTTP Request Handler for a request and the elasticsearch query that executes that request
  • 4. Biothings – HTTP Handling • tornado.web.RequestHandler: base tornado class for HTTP request handling. Important class methods: get/post, get_arguments, write • biothings.www.helper.BaseHandler: contains methods common to all biothings RequestHandlers. Important class methods: get_query_params, return_json • biothings.www.api.handlers.QueryHandler: contains methods to implement the biothings query endpoint. Important class methods: get, post, _examine_kwargs • biothings.www.api.handlers.BiothingHandler: contains methods to implement the biothings annotation endpoint. Important class methods: get, post, _examine_kwargs • biothings.www.api.handlers.MetaDataHandler: contains methods to implement the metadata endpoint • biothings.www.api.handlers.StatusHandler: contains methods to implement a status endpoint for AWS ELB
  • 5. Biothings – HTTP Handling • biothings.www.api.handlers.BiothingHandler: – GET request (e.g. /variant/chr6:g.152708291G>A) – POST request (e.g. /variant/)
  • 6. Biothings – HTTP Handling • biothings.www.api.handlers.QueryHandler: – GET request (e.g. /query?q=_exists_:dbsnp) – POST request (e.g. /query/)
  • 7. Biothings – Elasticsearch query • biothings.www.api.es.ESQuery – contains the python code for constructing the elasticsearch query and formatting the resulting data – query(q, **kwargs) – Contains the elasticsearch query to run with data obtained from a GET or POST to the /query/ endpoint. – get_biothing(bid, **kwargs) – Contains the elasticsearch query to run with data obtained from a GET to the /annotation/ endpoint. – mget_biothings(bid_list, **kwargs) – Contains the elasticsearch query to run with data obtained from a POST to the /annotation/ endpoint. – _cleaned_res(res) – Contains the code to format the return object for get_biothing and mget_biothings. – _cleaned_res2(res) – Contains the code to format the return object for query. – _get_biothingdoc(hit) – Contains the code to format a single biothing object from any elasticsearch query. Called by _cleaned_res and _cleaned_res2. – _modify_biothingdoc(doc) – Contains the code to modify a biothing_doc. Called in _get_biothingdoc. Currently empty -> for subclassing.
  • 8. Biothings - Settings • Problem: Until now, we have left out the problem of how to refer to things that MUST be project specific (e.g., the name of the elasticsearch index to search, the type of the document, etc). How do we do this? • Solution: We make a settings module in biothings that all code within biothings refers to. That module looks for an environment variable called BIOTHING_SETTINGS with the name of a module that can be imported to set project specific variables. – export BIOTHING_SETTINGS = ‘biothings.config’ • Similar to Django.
  • 10. Biothings – Project template • At this point, we have the tools necessary to easily create and subclass 4 types of biothings handlers (BiothingHandler, QueryHandler, MetaDataHandler, StatusHandler), and the elasticsearch query class (ESQuery) • Could definitely stop here and have a useful tool, but we wanted to make it even easier to create a new project (also enforces a uniform project structure across all biothings APIs). • To do this we have a project template folder containing the project directory structure and some skeleton code: – config.py, – URL patterns to Handlers connection – Handlers to ESQuery connection
  • 11. Biothings - Project template • To create the actual project directory from the template, we wrote a small function: start-project.py – Usage: python start-project.py <path-to-project- directory> <biothing-object-name> – python start-project.py ~ variant • Any folder or file in the template directory will be created in the project directory. The contents of any file are passed through the python String.template function before they are created in the project directory.
  • 18. Recreating MyVariant.info using biothings.api • Recreated current MyVariant.info service using the biothings.api framework – Very little extra code required (~100 lines) – Less than a day of time to create the web front end from start. – https://github.com/cyrus0824/myvariant.info_new • Seems disingenuous to gauge the utility of a tool by recreating a codebase if that tool was itself created from the codebase => Should try implementing other APIs, especially MyGene.info (has more varied gene specific query options), and modify biothings as needed.
  • 19. Future work • Integrate data load and data index functions into biothings • Documentation! – Projects like this need very good documentation to be of any use to an API developer (on the level of tornado’s excellent documentation: http://www.tornadoweb.org/en/stable/web.html) • Auto-generate clients (python client, R client) • Auto-generate ansible-playbook to create cluster hardware on AWS • One-click API…