SlideShare a Scribd company logo
Accessing ENCODE project data
using a REST API and JSON objects.
Cricket A Sloan1
, Esther T Chan1
, Venkat S Malladi1,
, Jean M Davidson1
, Eurie L Hong1
, J Seth Strattan1
, Laurence D Rowe1
, Ben C Hitz1
Nikhil R Podduturi1
, Forrest Tanaka1
, Brian T Lee2
, Marcus Ho1
, Stuart Miyasato1
, Matt Simison1
, W James Kent2
, J Michael Cherry1
1
Stanford University School of Medicine, Department of Genetics, Stanford, CA; 2
University of California at Santa Cruz, Center for Biomolecular Science and Engineering, Santa Cruz, CA
The Encyclopedia of DNA Elements project (ENCODE) has been producing data for over eight years to investigate DNA and RNA binding proteins, chromatin
structure, transcriptional activity and DNA methylation on a variety of human and mouse tissues and cell lines. As the complexity and diversity of the data grows,
the tools required to organize, search and access the data in meaningful ways need to be more sophisticated. The ENCODE Data Coordination Center (DCC) has
incorporated a representational state transfer application programming interface (REST API) with JSON (JavaScript Object Notation) objects to facilitate the
access of ENCODE experimental metadata using a web portal. Meta-data can be accessed and data can be searched for at http://www.encodeproject.org/ using
the HTTP request from a script or the curl command. We further expand on the access capability by allowing filtering of the metadata with the use of search
urls. This system allows external researchers to write their own interfaces to access, analyze and visualize the ENCODE data. It also facilitates the integration of
the ENCODE data with other similar large-scale data sets like Epigentics Roadmap and modENCODE. Here we will present our JSON schemas, examples of the
REST API and use-cases for the search functions. Our goal is for the genomics community to use the released ENCODE data available through these methods for
data mining and integration.
Data from the ENCODE project can be accessed via the ENCODE portal (http://www.encodeproject.org) and documentation for the REST API can be accessed at :
https://www.encodeproject.org/help/rest-api.
.
@ENCODE-DCC
encode-help@lists.stanford.edu
ENCODE DCC
https://www.encodeproject.org
Metadata returned in JSON format
Sample code
https://github.com/ENCODE-
DCC/submission_sample_scripts
ENCODE REST API Documentation
https://www.encodeproject.
org/help/rest-api
Each search or page is a JSON object
curl -H "Accept: application/json"
-X GET
https://www.encodeproject.org/search/
?type=experiment
&assay_term_name=RNA-seq
&organ_slims=lung
&replicates.library.biosample.life_stage=fetal
Use search urls with any http GET accessA search produces
a JSON object with
an “@graph” field
that is a list of
minimal identifying
information about
each result in the
search
A summary page
includes all of the
details and sub-
objects in its
JSON object
More search examples
Every object that matches the string “CTCF”:
https://www.encodeproject.org/search/?
searchTerm=CTCF&format=json&frame=object
All the fastq file objects from a particular experiment ENCSR000AKS (with
reference objects embedded):
https://www.encodeproject.org/search/?
type=file&dataset=/experiments/ENCSR000AKS/&file_format=fastq&format=json&fr
ame=embedded&limit=all
All biosamples (abbreviated metadata):
https://www.encodeproject.org/search/?type=biosample&limit=all&format=json
All biosamples (full metadata with object references):
https://www.encodeproject.org/search/?
type=biosample&frame=object&limit=all&format=json
Schema profiles can be found on the site at https://www.encodeproject.org/profiles/*.json where *
is replaced by the name of the object of interest. A table of the most relevant objects can be
found below. A complete listing of all the current schemas can be found in our github, https:
//github.com/ENCODE-DCC/encoded/blob/master/src/encoded/schemas/ .
Object relationships in the metadata model
Many objects are needed to describe the varied assays, biosamples, and data processing steps that
are involved in the ENCODE project. We are incorporating JSON-LD to link and embed these
relationships. By having separate objects for donors, biosamples,antibodies, etc. we can model where
there is an exact sharing relationship.
Schema access
Construct urls to search ENCODE data
Batch download of files
More search examples
File metadata, including the
href access information for the
file itself, is found by querying
the ENCODE portal for the file
JSON object. If the file
accession is ENCFF002CTW,
then the metadata object can
be found at https://www.
encodeproject.
org/files/ENCFF002CTW/ .
The href field in that object,
/files/ENCFF002CTW@@down
load/ENCFF002CTW.
narrowPeak.gz,
is appended to the site url to
download the file itself, https:
//www.encodeproject.
org/files/ENCFF002CTW/@@d
ownload/ENCFF002CTW.
narrowPeak.gz

More Related Content

ODP
2009 0807 Lod Gmod
PPTX
Open semantic chemical structures
PDF
FAIRness through a novel combination of Web technologies
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
PPTX
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
PPTX
Supporting Dataset Descriptions in the Life Sciences
PPTX
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
PPTX
An Identifier Scheme for the Digitising Scotland Project
2009 0807 Lod Gmod
Open semantic chemical structures
FAIRness through a novel combination of Web technologies
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Supporting Dataset Descriptions in the Life Sciences
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
An Identifier Scheme for the Digitising Scotland Project

What's hot (20)

PPTX
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
PPTX
Open Science Data Repository - the platform for materials research
PPTX
Validata: A tool for testing profile conformance
PPTX
Scientific Units in the Electronic Age
PDF
2014 genome informatics Linked Data
PPTX
Chemistry Validation and Standardization Platform v2.0
PDF
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
PDF
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
PPTX
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
PPTX
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
PDF
Modeling Data with Karma – Data Integration Tool
PPTX
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
PPT
Getting Started With The Talis Platform
PDF
The DATS model: datasets descriptions for data discovery in DataMed
PPTX
A guided tour of Araport
ODP
Wikiconference 2016 talk Burgstaller
PDF
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
PDF
Verifiable, linked open knowledge that anyone can edit
PPT
Clustering the royal society of chemistry chemical repository to enable enhan...
PPT
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Open Science Data Repository - the platform for materials research
Validata: A tool for testing profile conformance
Scientific Units in the Electronic Age
2014 genome informatics Linked Data
Chemistry Validation and Standardization Platform v2.0
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
A Generic Scientific Data Model and Ontology for Representation of Chemical Data
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Modeling Data with Karma – Data Integration Tool
Healthcare Data Management using Domain Specific Languages for Metadata Manag...
Getting Started With The Talis Platform
The DATS model: datasets descriptions for data discovery in DataMed
A guided tour of Araport
Wikiconference 2016 talk Burgstaller
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
Verifiable, linked open knowledge that anyone can edit
Clustering the royal society of chemistry chemical repository to enable enhan...
Ad

Similar to The ENCODE Portal REST API (20)

PDF
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
PDF
Metadata-based tools at the ENCODE Portal
PPTX
The swings and roundabouts of a decade of fun and games with Research Objects
PPT
Elsevier developer network - developer presentation
PPTX
RO-Crate: A framework for packaging research products into FAIR Research Objects
PDF
Metadata as Linked Data for Research Data Repositories
PDF
Terminology Services
PDF
ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network
PPTX
RO-Crate: packaging metadata love notes into FAIR Digital Objects
PPT
Integrating a Domain Ontology Development Environment and an Ontology Search ...
PPT
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
PDF
The Role of Metadata in Reproducible Computational Research
PDF
E-Utilities
PDF
NCBI API - Integration into analysis code
PPTX
ACS 248th Paper 108 NIST-IUPAC Solubility Data
PPTX
A Standard Data Format for Computational Chemistry: CSX
PDF
CDISC2RDF overview with examples
PDF
Semantic Knowledge Acquisition of Information for Syntactic web
PDF
Data integration with a façade. The case of knowledge graph construction.
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
GI 2013 - ENCODE Project Data Access via RESTful API and JSON
Metadata-based tools at the ENCODE Portal
The swings and roundabouts of a decade of fun and games with Research Objects
Elsevier developer network - developer presentation
RO-Crate: A framework for packaging research products into FAIR Research Objects
Metadata as Linked Data for Research Data Repositories
Terminology Services
ToxOtis: A Java Interface to the OpenTox Predictive Toxicology Network
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Integrating a Domain Ontology Development Environment and an Ontology Search ...
INDUS: A System for Information Integration and Knowledge Acquisition from Au...
The Role of Metadata in Reproducible Computational Research
E-Utilities
NCBI API - Integration into analysis code
ACS 248th Paper 108 NIST-IUPAC Solubility Data
A Standard Data Format for Computational Chemistry: CSX
CDISC2RDF overview with examples
Semantic Knowledge Acquisition of Information for Syntactic web
Data integration with a façade. The case of knowledge graph construction.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Ad

Recently uploaded (20)

PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
Microbiology with diagram medical studies .pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
2. Earth - The Living Planet earth and life
PDF
Sciences of Europe No 170 (2025)
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
An interstellar mission to test astrophysical black holes
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
BIOMOLECULES PPT........................
PPTX
2Systematics of Living Organisms t-.pptx
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
neck nodes and dissection types and lymph nodes levels
Microbiology with diagram medical studies .pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
2. Earth - The Living Planet earth and life
Sciences of Europe No 170 (2025)
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Introduction to Fisheries Biotechnology_Lesson 1.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
ECG_Course_Presentation د.محمد صقران ppt
An interstellar mission to test astrophysical black holes
The scientific heritage No 166 (166) (2025)
Comparative Structure of Integument in Vertebrates.pptx
protein biochemistry.ppt for university classes
BIOMOLECULES PPT........................
2Systematics of Living Organisms t-.pptx
bbec55_b34400a7914c42429908233dbd381773.pdf
Derivatives of integument scales, beaks, horns,.pptx
Phytochemical Investigation of Miliusa longipes.pdf
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud

The ENCODE Portal REST API

  • 1. Accessing ENCODE project data using a REST API and JSON objects. Cricket A Sloan1 , Esther T Chan1 , Venkat S Malladi1, , Jean M Davidson1 , Eurie L Hong1 , J Seth Strattan1 , Laurence D Rowe1 , Ben C Hitz1 Nikhil R Podduturi1 , Forrest Tanaka1 , Brian T Lee2 , Marcus Ho1 , Stuart Miyasato1 , Matt Simison1 , W James Kent2 , J Michael Cherry1 1 Stanford University School of Medicine, Department of Genetics, Stanford, CA; 2 University of California at Santa Cruz, Center for Biomolecular Science and Engineering, Santa Cruz, CA The Encyclopedia of DNA Elements project (ENCODE) has been producing data for over eight years to investigate DNA and RNA binding proteins, chromatin structure, transcriptional activity and DNA methylation on a variety of human and mouse tissues and cell lines. As the complexity and diversity of the data grows, the tools required to organize, search and access the data in meaningful ways need to be more sophisticated. The ENCODE Data Coordination Center (DCC) has incorporated a representational state transfer application programming interface (REST API) with JSON (JavaScript Object Notation) objects to facilitate the access of ENCODE experimental metadata using a web portal. Meta-data can be accessed and data can be searched for at http://www.encodeproject.org/ using the HTTP request from a script or the curl command. We further expand on the access capability by allowing filtering of the metadata with the use of search urls. This system allows external researchers to write their own interfaces to access, analyze and visualize the ENCODE data. It also facilitates the integration of the ENCODE data with other similar large-scale data sets like Epigentics Roadmap and modENCODE. Here we will present our JSON schemas, examples of the REST API and use-cases for the search functions. Our goal is for the genomics community to use the released ENCODE data available through these methods for data mining and integration. Data from the ENCODE project can be accessed via the ENCODE portal (http://www.encodeproject.org) and documentation for the REST API can be accessed at : https://www.encodeproject.org/help/rest-api. . @ENCODE-DCC encode-help@lists.stanford.edu ENCODE DCC https://www.encodeproject.org Metadata returned in JSON format Sample code https://github.com/ENCODE- DCC/submission_sample_scripts ENCODE REST API Documentation https://www.encodeproject. org/help/rest-api Each search or page is a JSON object curl -H "Accept: application/json" -X GET https://www.encodeproject.org/search/ ?type=experiment &assay_term_name=RNA-seq &organ_slims=lung &replicates.library.biosample.life_stage=fetal Use search urls with any http GET accessA search produces a JSON object with an “@graph” field that is a list of minimal identifying information about each result in the search A summary page includes all of the details and sub- objects in its JSON object More search examples Every object that matches the string “CTCF”: https://www.encodeproject.org/search/? searchTerm=CTCF&format=json&frame=object All the fastq file objects from a particular experiment ENCSR000AKS (with reference objects embedded): https://www.encodeproject.org/search/? type=file&dataset=/experiments/ENCSR000AKS/&file_format=fastq&format=json&fr ame=embedded&limit=all All biosamples (abbreviated metadata): https://www.encodeproject.org/search/?type=biosample&limit=all&format=json All biosamples (full metadata with object references): https://www.encodeproject.org/search/? type=biosample&frame=object&limit=all&format=json Schema profiles can be found on the site at https://www.encodeproject.org/profiles/*.json where * is replaced by the name of the object of interest. A table of the most relevant objects can be found below. A complete listing of all the current schemas can be found in our github, https: //github.com/ENCODE-DCC/encoded/blob/master/src/encoded/schemas/ . Object relationships in the metadata model Many objects are needed to describe the varied assays, biosamples, and data processing steps that are involved in the ENCODE project. We are incorporating JSON-LD to link and embed these relationships. By having separate objects for donors, biosamples,antibodies, etc. we can model where there is an exact sharing relationship. Schema access Construct urls to search ENCODE data Batch download of files More search examples File metadata, including the href access information for the file itself, is found by querying the ENCODE portal for the file JSON object. If the file accession is ENCFF002CTW, then the metadata object can be found at https://www. encodeproject. org/files/ENCFF002CTW/ . The href field in that object, /files/ENCFF002CTW@@down load/ENCFF002CTW. narrowPeak.gz, is appended to the site url to download the file itself, https: //www.encodeproject. org/files/ENCFF002CTW/@@d ownload/ENCFF002CTW. narrowPeak.gz