InterMine 
Integrated Data Warehouse 
Use Cases: Arabidopsis & Medicago Genome Projects 
Vivek Krishnakumar 
Plant Genomics Group (EUK) 
IFX Research WIPS Meeting, 03 October 2014
Overview 
• Introduction 
• InterMine 
 Integrated data warehouse, Extensible data model, 
Flexible query system 
 Web and Programmatic Interface 
 Other InterMine instances 
• Use cases 
 Arabidopsis Information Portal (AIP) 
 Medicago truncatula Genome Database (MTGD) 
• Summary 
 Advantages 
 Caveats
Introduction 
For genome projects that wish to expose their 
data via the web (query, visualize, warehouse) 
to foster scientific collaboration, there are 
several technologies available: 
• JCVI developed software 
 Manatee (backed by an RDBMS) 
• Externally developed software 
 BioMart (federated from various databases) 
 Tripal (powered by Drupal, backed by CHADOdb) 
 InterMine
InterMine 
• Functions as a data warehouse for the integration of complex 
biological data. Integration across data types occurs based on 
a common identifier (e.g. gene primary ID) 
• Uses a flexible and extensible data model, controlled by XML 
files, driven by ontologies (Sequence [SO], Gene [SO], etc.) 
 Genomics, Proteomics, Interactions, Homology, 
Expression, Pathways (and more data types) 
 Parsers for commonly used biological data formats 
 Provides framework for adding your own data 
• Offers a flexible query system, optimized via precomputed 
tables (no need for schema denormalization) 
Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data 
Bioinformatics (2012) 28 (23): 3163-3165
InterMine (contd.) 
• Provides a user-friendly web interface exposing 
powerful features: 
 Analysis of lists (facilitate enrichment studies) 
 Full-featured report pages (one-stop shop) 
 Interactive result tables (sort, filter, summarize) 
 Visual query builder (no need to write SQL!) 
 Quick search and Region-based search 
• Fosters development of external applications 
using data hosted within InterMine via Application 
Programming Interfaces (API): 
 RESTful 
 Perl, Python, Ruby, Java, JavaScript 
Kalderimis, A. et al. InterMine: extensive web services for modern biology 
Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
Public “Mines” 
• InterMine supports querying across mines 
for cross-database integration 
• Vast number of warehouses powered by 
InterMine already exist
Arabidopsis Information Portal (AIP) 
• AIP origins 
 Funded by NSF in response to community needs, following 
termination of funding to TAIR 
• AIP objectives 
 Develop a community web resource that… 
– is sustainable and fundable and community-extensible 
– hosts analysis & visualization tools, user data spaces 
 Federation: integrate diverse data sets from distributed data 
sources; foster development of tools for and by the community 
 Maintenance of the Col-0 gold standard annotation 
• AIP methods 
 Assimilate TAIR data 
 Host an InterMine instance devoted to Arabidopsis (thale cress) 
 Offer and consume RESTful web services 
 Integrate and utilize iPlant resources
ThaleMine 
https://apps.araport.org/thalemine 
• An InterMine interface 
to Arabidopsis genomic 
data 
• Integrates a wide 
variety of data types 
(A-E, H), some of 
which are warehoused 
and others are 
federated via web 
services 
• Embedded elements 
visualizing gene 
structure (JBrowse, not 
shown), interaction 
networks (F), 
expression patterns (G)
Visual Query Builder 
Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
Interactive Result Tables Region-based search 
Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
MedicMine 
http://medicmine.jcvi.org 
• NSF funded project to 
assist with the curation 
of the Medicago 
truncatula Genome 
Assembly and 
Annotation (funding 
ended August 2014) 
• In order to warehouse 
and prolong the project 
data, an InterMine 
interface for Medicago 
was implemented 
(backed by a CHADO 
database) 
• Provides similar kind of 
functionality available via 
ThaleMine
Summary 
• Advantages 
 InterMine is a powerful biological data warehouse 
 Performs complex data integration 
 Allows fast and flexible querying 
 Well documented programmatic interface 
 Cookie-cutter, user-friendly web interface 
 Facilitates cross-talk between “mines” 
• Caveats 
 Adding more data requires a full database rebuild (incremental loading 
is not possible) because of the integration step 
• About InterMine: 
 Developed by the Micklem Lab at the University of Cambridge, UK 
 Written in Java, backed by PostgreSQLdb, deployed under Tomcat. 
Documentation and downloads available at http://www.intermine.org
Chris Town, PI 
Chris Nelson 
PM 
Lisa McDonald 
Education and 
Outreach 
Coordinator 
Jason Miller, Co-PI 
Technical Lead 
Erik Ferlanti 
SE 
Vivek Krishnakumar 
BE 
Svetlana Karamycheva 
BE 
Maria Kim 
BE 
Gos Micklem, co-PI Sergio Contrino 
Eva Huala 
Project lead, TAIR 
Software Engineer 
Bob Muller 
Technical lead, TAIR 
Matt Vaughn 
co-PI Steve Mock 
Advanced Computing 
Interfaces 
Rion Dooley, 
Web and Cloud 
Services 
Matt Hanlon, 
Web and Mobile 
Applications 
Ben Rosen 
BA

More Related Content

PPT
An On-line Collaborative Data Management System
PDF
Data repositories -- Xiamen University 2012 06-08
PDF
Investigating plant systems using data integration and network analysis
PDF
How Portable Are the Metadata Standards for Scientific Data?
PPT
Sensor metadata management with SWM (SMWCon fall 2013)
PPT
Web Information Extraction for the DB Research Domain
PPT
The eCrystals Federation
PPTX
eTRIKS Data Harmonization Service Platform
An On-line Collaborative Data Management System
Data repositories -- Xiamen University 2012 06-08
Investigating plant systems using data integration and network analysis
How Portable Are the Metadata Standards for Scientific Data?
Sensor metadata management with SWM (SMWCon fall 2013)
Web Information Extraction for the DB Research Domain
The eCrystals Federation
eTRIKS Data Harmonization Service Platform

What's hot (20)

PPT
Knowledge Discovery in an Agents Environment
PPTX
Federated Architecture with Provenance and Access Control to realize Open Dig...
PPTX
COBWEB: Brief Introduction, GBIF Secretariat
PDF
Bioinformatics presentation to students University of Minho
PDF
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
PPTX
Maelstrom-Research: Mica 2012 04-25
PPT
Curation and Preservation of Crystallography Data
PPTX
A semantic framework for biomedical image discovery
PPTX
Web Information Extraction for the Database Research Domain
PPT
Integrated research data management in the Structural Sciences
PPTX
National Data Archive (NADA) 3.0
PPTX
eCitizen Sensible-Data Design Challenge
PPTX
Towards an Infrastructure for Mining Scientific Publications
PDF
PRISM Project Update
PPTX
The agINFRA Germplasm Working Group
PDF
ETDs and Open Access for Research and Development: Issues and challenges
PPTX
Metid Match 2014 - SEEK for Science
PDF
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
PDF
Embl ebi use-cases_-_t.wildish
Knowledge Discovery in an Agents Environment
Federated Architecture with Provenance and Access Control to realize Open Dig...
COBWEB: Brief Introduction, GBIF Secretariat
Bioinformatics presentation to students University of Minho
NIH NCI Childhood Cancer Data Initiative (CCDI) Symposium Globus Poster
Maelstrom-Research: Mica 2012 04-25
Curation and Preservation of Crystallography Data
A semantic framework for biomedical image discovery
Web Information Extraction for the Database Research Domain
Integrated research data management in the Structural Sciences
National Data Archive (NADA) 3.0
eCitizen Sensible-Data Design Challenge
Towards an Infrastructure for Mining Scientific Publications
PRISM Project Update
The agINFRA Germplasm Working Group
ETDs and Open Access for Research and Development: Issues and challenges
Metid Match 2014 - SEEK for Science
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Embl ebi use-cases_-_t.wildish
Ad

Viewers also liked (8)

PDF
Ux in dm d4=r1
PPSX
დედამიწის წყლისა და ხმელეთის ობიექტები
PPTX
An overview of BizTalk
PPTX
Cami lo anongcar
PDF
Dracaena
PDF
Persuasive design presentationd3=r1
PPTX
The piece of paper
PPTX
Tutorial 1: Your First Science App - Araport Developer Workshop
Ux in dm d4=r1
დედამიწის წყლისა და ხმელეთის ობიექტები
An overview of BizTalk
Cami lo anongcar
Dracaena
Persuasive design presentationd3=r1
The piece of paper
Tutorial 1: Your First Science App - Araport Developer Workshop
Ad

Similar to Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting (20)

PPTX
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
PDF
VictorCassen
PDF
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
PPTX
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
PPTX
New ICT Trends and Issues of Librarianship
PPT
DLF 2008 Spring Forum - HarvestChoice
PDF
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
PPTX
Open@Fao presentation at the EADI Open For Development Project, 2012
PPT
Web services for sharing germplasm data sets, at FAO in Rome (2006)
PPTX
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
PDF
What is Data Commons and How Can Your Organization Build One?
PPTX
The BlueBRIDGE approach to collaborative research
PDF
Session 0.0 poster minutes madness
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PPT
Prototype Design of Open Access Institutional Repository
PDF
Tag.bio: Self Service Data Mesh Platform
PPTX
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
PPTX
GBIF: An infrastructure for infrastructures
PDF
The pulse of cloud computing with bioinformatics as an example
PDF
Enabling knowledge management in the Agronomic Domain
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
VictorCassen
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
New ICT Trends and Issues of Librarianship
DLF 2008 Spring Forum - HarvestChoice
USUGM 2014 - Erin Bolstad (ChemAxon): Consultancy report - New capabilities a...
Open@Fao presentation at the EADI Open For Development Project, 2012
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
What is Data Commons and How Can Your Organization Build One?
The BlueBRIDGE approach to collaborative research
Session 0.0 poster minutes madness
Data commons bonazzi bd2 k fundamentals of science feb 2017
Prototype Design of Open Access Institutional Repository
Tag.bio: Self Service Data Mesh Platform
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
GBIF: An infrastructure for infrastructures
The pulse of cloud computing with bioinformatics as an example
Enabling knowledge management in the Agronomic Domain

More from Vivek Krishnakumar (9)

PPTX
What's New at Araport - ICAR 2017
PPTX
JBrowse and Inter-"Mine" Communication - IMDEV 2017
PPTX
Integrate JBrowse REST API Framework with Adama Federation Architecture
PDF
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
PDF
Araport Data Integration - 2015 UMD Minisymposium
PPTX
Interoperation between InterMines
PPTX
InterMine Infrastructure LF Meeting 20150428
PPTX
JBrowse within the Arabidopsis Information Portal - PAG XXIII
PDF
Tripal within the Arabidopsis Information Portal - PAG XXIII
What's New at Araport - ICAR 2017
JBrowse and Inter-"Mine" Communication - IMDEV 2017
Integrate JBrowse REST API Framework with Adama Federation Architecture
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Araport Data Integration - 2015 UMD Minisymposium
Interoperation between InterMines
InterMine Infrastructure LF Meeting 20150428
JBrowse within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII

Recently uploaded (20)

PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
ELISA(Enzyme linked immunosorbent assay)
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
limit test definition and all limit tests
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
Understanding the Circulatory System……..
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PDF
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
PPTX
2currentelectricity1-201006102815 (1).pptx
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPTX
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
PPTX
AP CHEM 1.2 Mass spectroscopy of elements
PPT
Mutation in dna of bacteria and repairss
PDF
Packaging materials of fruits and vegetables
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PPT
Cell Structure Description and Functions
PDF
Social preventive and pharmacy. Pdf
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PDF
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
ELISA(Enzyme linked immunosorbent assay)
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
limit test definition and all limit tests
Introcution to Microbes Burton's Biology for the Health
Understanding the Circulatory System……..
Enhancing Laboratory Quality Through ISO 15189 Compliance
5.Physics 8-WBS_Light.pdfFHDGJDJHFGHJHFTY
2currentelectricity1-201006102815 (1).pptx
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Cells and Organs of the Immune System (Unit-2) - Majesh Sir.pptx
AP CHEM 1.2 Mass spectroscopy of elements
Mutation in dna of bacteria and repairss
Packaging materials of fruits and vegetables
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
Cell Structure Description and Functions
Social preventive and pharmacy. Pdf
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
7.Physics_8_WBS_Electricity.pdfXFGXFDHFHG

Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progress Meeting

  • 1. InterMine Integrated Data Warehouse Use Cases: Arabidopsis & Medicago Genome Projects Vivek Krishnakumar Plant Genomics Group (EUK) IFX Research WIPS Meeting, 03 October 2014
  • 2. Overview • Introduction • InterMine  Integrated data warehouse, Extensible data model, Flexible query system  Web and Programmatic Interface  Other InterMine instances • Use cases  Arabidopsis Information Portal (AIP)  Medicago truncatula Genome Database (MTGD) • Summary  Advantages  Caveats
  • 3. Introduction For genome projects that wish to expose their data via the web (query, visualize, warehouse) to foster scientific collaboration, there are several technologies available: • JCVI developed software  Manatee (backed by an RDBMS) • Externally developed software  BioMart (federated from various databases)  Tripal (powered by Drupal, backed by CHADOdb)  InterMine
  • 4. InterMine • Functions as a data warehouse for the integration of complex biological data. Integration across data types occurs based on a common identifier (e.g. gene primary ID) • Uses a flexible and extensible data model, controlled by XML files, driven by ontologies (Sequence [SO], Gene [SO], etc.)  Genomics, Proteomics, Interactions, Homology, Expression, Pathways (and more data types)  Parsers for commonly used biological data formats  Provides framework for adding your own data • Offers a flexible query system, optimized via precomputed tables (no need for schema denormalization) Smith, RN. et al. InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data Bioinformatics (2012) 28 (23): 3163-3165
  • 5. InterMine (contd.) • Provides a user-friendly web interface exposing powerful features:  Analysis of lists (facilitate enrichment studies)  Full-featured report pages (one-stop shop)  Interactive result tables (sort, filter, summarize)  Visual query builder (no need to write SQL!)  Quick search and Region-based search • Fosters development of external applications using data hosted within InterMine via Application Programming Interfaces (API):  RESTful  Perl, Python, Ruby, Java, JavaScript Kalderimis, A. et al. InterMine: extensive web services for modern biology Nucl. Acids Res. (1 July 2014) 42 (W1): W468-W472
  • 6. Public “Mines” • InterMine supports querying across mines for cross-database integration • Vast number of warehouses powered by InterMine already exist
  • 7. Arabidopsis Information Portal (AIP) • AIP origins  Funded by NSF in response to community needs, following termination of funding to TAIR • AIP objectives  Develop a community web resource that… – is sustainable and fundable and community-extensible – hosts analysis & visualization tools, user data spaces  Federation: integrate diverse data sets from distributed data sources; foster development of tools for and by the community  Maintenance of the Col-0 gold standard annotation • AIP methods  Assimilate TAIR data  Host an InterMine instance devoted to Arabidopsis (thale cress)  Offer and consume RESTful web services  Integrate and utilize iPlant resources
  • 8. ThaleMine https://apps.araport.org/thalemine • An InterMine interface to Arabidopsis genomic data • Integrates a wide variety of data types (A-E, H), some of which are warehoused and others are federated via web services • Embedded elements visualizing gene structure (JBrowse, not shown), interaction networks (F), expression patterns (G)
  • 9. Visual Query Builder Image created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 10. Interactive Result Tables Region-based search Images created by Benjamin Rosen (Bioinformatics Analyst, Plant Genomics Group)
  • 11. MedicMine http://medicmine.jcvi.org • NSF funded project to assist with the curation of the Medicago truncatula Genome Assembly and Annotation (funding ended August 2014) • In order to warehouse and prolong the project data, an InterMine interface for Medicago was implemented (backed by a CHADO database) • Provides similar kind of functionality available via ThaleMine
  • 12. Summary • Advantages  InterMine is a powerful biological data warehouse  Performs complex data integration  Allows fast and flexible querying  Well documented programmatic interface  Cookie-cutter, user-friendly web interface  Facilitates cross-talk between “mines” • Caveats  Adding more data requires a full database rebuild (incremental loading is not possible) because of the integration step • About InterMine:  Developed by the Micklem Lab at the University of Cambridge, UK  Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org
  • 13. Chris Town, PI Chris Nelson PM Lisa McDonald Education and Outreach Coordinator Jason Miller, Co-PI Technical Lead Erik Ferlanti SE Vivek Krishnakumar BE Svetlana Karamycheva BE Maria Kim BE Gos Micklem, co-PI Sergio Contrino Eva Huala Project lead, TAIR Software Engineer Bob Muller Technical lead, TAIR Matt Vaughn co-PI Steve Mock Advanced Computing Interfaces Rion Dooley, Web and Cloud Services Matt Hanlon, Web and Mobile Applications Ben Rosen BA