SlideShare a Scribd company logo
TranSMART Core
From tool to ecosystem
Kees van Bochove
tranSMART Workshop Amsterdam
June 17, 2013
Today, we have a chance to
writehistory.
•Microarray data analysis support
•Microarray data analysis support
•Load public microarray data from GEO data from GEO
•Load public microarray
•Store and retrievesaved analyses
•Store and retrievesaved analyses
•Search on gene name,on gene name etc.
•Search disease name, disease name e
•Genomicvariants and VCF support VCF support
•Genomicvariants and
•Load TCGA studies we have accesswe have access to
•Load TCGA studies to
•Load 1000 Genomes1000 Genomes data
•Load data

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
There has to be a betterway.
costs $ 0!
No-brainer!

Ehm.. wait a minute…
Let’s have a look how these
scientists in academia are doing.
Theylove to collaborate right?!
In 2003…
(Ancienthistory; beforeFacebook)
YetAnother ‘New’ Web-basedSolutionfor
the Management of Microarray Data ?!
NotInventedHereSyndrome

Image from Rob Hooft, CTO NetherlandsBioinformatics Centre
http://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html
Whatabout all these great FP6,
FP7, IMI, … projects?
Source code of major projects is
readilyavailableonGitHub
But… I’mafraidit’sstill up to you and
me to put the piecestogether.
Phenotype Database
Written in Grails, supports several types of
omics data, provides data integration and
visualization, has R, Groovy and PHP API’s.
Sounds familiar?

http://phenotypefoundation.org
share

reuse

specialize
Writinggood software is hard.
Sofar…
• TranSMART has a huge business potential. It’s
nosilverbulletthough.
• Scientistssometimes have troublereusingeachothers’ work. Especiallywhenit
comes to open source software.
Do they?
Time to look at some succes stories.
R and Bioconductor
Whodoesn’tlove R?
Website looks as if dates from Stone Age.
Must bethoseLaTeX-lovingphysicists.
Veryactivecommunity, and…
lots of packages.
Governance of R community
BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of
power. R was principallydevelopedfor the
benefit of the core team.”

As citedon http://blog.revolutionanalytics.com/2011/08/brian-ripley-onthe-r-development-process.html
Galaxy
Galaxy is the most widelyused open
sourcebioinformatics web interface AFAIK.
Probably in nosmallamountthanks to
theircontinuousdedication to
improving the UI.
Butthere’ssomethingelse.
GalaxyToolshed
• An open source CMS (Content Management
System) written in Python, nowadays backing
thousands of productiongrade websites
• Startedby 2 developers in 2000, nowanactive
open source project withhundreds of
activedevelopers
• In 2004, the Plone Foundation was formed to
formalize IP and secure the future of Plone
• PloneCollective has hundreds of plugins
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
What do all these successstories
have in common?
BioconductorPackages
GalaxyToolshed
PloneCollective
Drupal Modules
LessonsfortranSMART
TranSMARTneeds a marketplace and a
thrivingcommunity to survive.
To get to a functioningmarketplace, we
need a well-designed core.
There is alsoanotherreason.
TranSMARTContributions - Pharma
• Janssen
– Initialversion of tranSMART
– Genomics viewer using IGV and GenePattern
– Faceted Search interface (resultsbrowsing)

• Millenium
– Loading TCGA andmany GEO studies
– R interface forinteractingwith data directly in R
– Several R analyses availabledirectly in GUI
TranSMARTContributions - Pharma
• Sanofi
– Cleaner user interface
– Added metadata layerfor all concepts
– Study/Program categorization& file management

• Pfizer
– GWAS upload (VCF), data storage and analysis
– Enhanced data export capabilities
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
This is a mess.
Anotherreasonwhy we needthat
core.
Start the Core: I2B2 Refactoring
1. I2B2 was integratedwithtranSMART, but the
I2B2 API abstractionswereleaked all over the
place in the tranSMARTapplication.
2. We agreed in the London meeting that all
partieswould set some time apart
forworkingon the core.
3. Combined, it made sense to start working at
the clinical data API, properlyusing the I2B2
API wherepossible, and re-implement all I2B2
functionality in a new ‘core-db’ plugin.
The firstversion of core-integration
was completed half April.
Bythen, all webservice calls to whatformerly
was anoutdatedversion of the I2B2 Ontology
and CRC cells, were handled by the
newlyimplementedcore-dbplugin.
Also, a set of tests was written in the
process and API documentationgenerated.
In the long run, I believeforming a
gooddistributedworkinggroupon the
core API is a more important
delivery of this workshop
thancrunching out a stable 1.1
version.
That’show we writethathistory
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives
CurrenttranSMARTArchitecture

Kees van Bochove - The Hyve
TranSMART’s Strong Points
• Powerful, ready to go user interface
forcommon analyses (survival analysis, gene
expressionheatmapsetc.)
• Leverages i2b2 data model forclinical data and
offers unified view over different studies
• Uses a lot of good open
sourcetechnologyunder the hood (Grails, R,
SOLR, Pentaho) 
leveragingexistingcommunitydevelopments
TranSMART Building Blocks
• R: open source statistics package with CRAN,
an active repository in which many algorithms
and statistical packages are published
• Grails: a rapid application development
framework in Groovy leveraging Java
technology such as Hibernate, Spring, Quartz
• I2b2: domain specific open source package for
storing and querying clinical data
• GenePattern, maybe soon: Galaxy, KNIME?
TranSMART’sWeaknesses
• Largemonolithic codebase
withlittlemodularizationbeyond the
standardGrails MVC setup
• Code quality is problematic, especiallyJavaScript
• Test coverage is low, nofunctional / web tests and
little unit and integration tests
• No clearinternalAPI’s, only a service level that
does the plumbing.
• I2b2 integrationviolates i2b2 abstractions
tranSMART Plans
• Use a clearly modularized architecture with
separation of clinical, high dimensional, search
and metadata storage; workflow execution
enginges and knowledge repository
• Define clear API and rewrite current
implementations with good test coverage
• Use i2b2 data model, re-harmonize with latest
i2b2 APIs, and don’t use i2b2 binaries directly
• Separate analysis definitions and abstract from
workflow execution engine
http://prezi.com/t6twshyctdsk/transmart-core-refactoring
Target tranSMARTArchitecture

Kees van Bochove - The Hyve
Further reading
• Description of core API efforts:
http://thehyve.nl/rewiring-transmart
• In depthdescription of i2b2 refactoring:
http://thehyve.nl/inital-work-on-transmarts-core
• Overview of tranSMART Core API sofar:
http://thehyve.github.io/transmart-core-api/
• Example of continuousintegration test suite
(ofcore-db): https://ci.ctmmtrait.nl/browse/TMCOREDB-JOB1-51/test

More Related Content

PPTX
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
PPTX
VariantSpark a library for genomics by Lynn Langit
PPTX
Architecture of ContentMine Components contentmine.org
PDF
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
PDF
Cassava genome hub
PPTX
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
PPTX
Data analysis & integration challenges in genomics
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
"Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age"...
VariantSpark a library for genomics by Lynn Langit
Architecture of ContentMine Components contentmine.org
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Cassava genome hub
Proteomics and the "big data" trend: challenges and new possibilitites (Talk ...
Data analysis & integration challenges in genomics
What is Reproducibility? The R* brouhaha (and how Research Objects can help)

What's hot (20)

PDF
Genome Big Data
PDF
Spark Summit Europe: Share and analyse genomic data at scale
PDF
Reproducibility of model-based results: standards, infrastructure, and recogn...
PDF
Improving the Management of Computational Models -- Invited talk at the EBI
PPT
Ngsp
PPT
Annotopia open annotation services platform
PDF
Pipe dreams
PPTX
Being Reproducible: SSBSS Summer School 2017
PPT
Strata-Hadoop 2015 Presentation
PPTX
Aspects of Reproducibility in Earth Science
PDF
ICAR 2015 Poster - Araport
PPT
Toast 2015 qiime_talk2
PPTX
A guided tour of Araport
PPTX
Fairport domain specific metadata using w3 c dcat & skos w ontology views
PDF
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
PDF
Scalable Genome Analysis With ADAM
PPT
exFrame: a Semantic Web Platform for Genomics Experiments
PPT
eXframe: A Semantic Web Platform for Genomic Experiments
PDF
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
Genome Big Data
Spark Summit Europe: Share and analyse genomic data at scale
Reproducibility of model-based results: standards, infrastructure, and recogn...
Improving the Management of Computational Models -- Invited talk at the EBI
Ngsp
Annotopia open annotation services platform
Pipe dreams
Being Reproducible: SSBSS Summer School 2017
Strata-Hadoop 2015 Presentation
Aspects of Reproducibility in Earth Science
ICAR 2015 Poster - Araport
Toast 2015 qiime_talk2
A guided tour of Araport
Fairport domain specific metadata using w3 c dcat & skos w ontology views
GBIF-Norway status for the 6th European GBIF nodes meeting April 2014
Scalable Genome Analysis With ADAM
exFrame: a Semantic Web Platform for Genomics Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
PMR metabolomics and transcriptomics database and its RESTful web APIs: A dat...
Ad

Similar to tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives (20)

PDF
TranSMART: How open source software revolutionizes drug discovery through cro...
PDF
Open Source Collaboration in Drug Discovery in Pharma
PDF
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
PDF
The pulse of cloud computing with bioinformatics as an example
PDF
PPTX
Cool Informatics Tools and Services for Biomedical Research
PPTX
2015 genome-center
PDF
IDB-Cloud Providing Bioinformatics Services on Cloud
PDF
SFSCON23 - Michele Finelli - Management of large genomic data with free software
PPTX
Scaling People, Not Just Systems, to Take On Big Data Challenges
PDF
Introduction to Galaxy and RNA-Seq
PDF
Collaborations in the Extreme: 
The rise of open code development in the scie...
PPTX
Data-intensive applications on cloud computing resources: Applications in lif...
PDF
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
PPTX
2016 05 sanger
PPTX
Provenance for Data Munging Environments
PDF
Cloud computing and bioinformatics
PDF
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
PDF
Big data solution for ngs data analysis
PPTX
Computational Resources In Infectious Disease
TranSMART: How open source software revolutionizes drug discovery through cro...
Open Source Collaboration in Drug Discovery in Pharma
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
The pulse of cloud computing with bioinformatics as an example
Cool Informatics Tools and Services for Biomedical Research
2015 genome-center
IDB-Cloud Providing Bioinformatics Services on Cloud
SFSCON23 - Michele Finelli - Management of large genomic data with free software
Scaling People, Not Just Systems, to Take On Big Data Challenges
Introduction to Galaxy and RNA-Seq
Collaborations in the Extreme: 
The rise of open code development in the scie...
Data-intensive applications on cloud computing resources: Applications in lif...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
2016 05 sanger
Provenance for Data Munging Environments
Cloud computing and bioinformatics
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Big data solution for ngs data analysis
Computational Resources In Infectious Disease
Ad

More from David Peyruc (20)

PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
PPTX
Community
PPT
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
PDF
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
PPTX
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Characterization of the c...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Advancing tranSMART Analy...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: eTRIKS - Science Driven D...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: EMIF (European Medical In...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: The Accelerated Cure Proj...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Modularization (Plug‐Ins,...
tranSMART Community Meeting 5-7 Nov 13 - Session 4: tranSMART Foundation (tF)...
Community
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Pfizer’s Recent Use of tr...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART and the One Min...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart-data
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Simulation in tranSMART
tranSMART Community Meeting 5-7 Nov 13 - Session 3: Clinical Biomarker Discovery
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Developing a TR Community...
tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat
tranSMART Community Meeting 5-7 Nov 13 - Session 2: MongoDB: What, Why And When

Recently uploaded (20)

PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
SKIN Anatomy and physiology and associated diseases
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPT
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
Note on Abortion.pptx for the student note
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PPTX
post stroke aphasia rehabilitation physician
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPT
Obstructive sleep apnea in orthodontics treatment
PPTX
Transforming Regulatory Affairs with ChatGPT-5.pptx
PPTX
Important Obstetric Emergency that must be recognised
PDF
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPT
Breast Cancer management for medicsl student.ppt
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
neonatal infection(7392992y282939y5.pptx
Respiratory drugs, drugs acting on the respi system
SKIN Anatomy and physiology and associated diseases
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
شيت_عطا_0000000000000000000000000000.pdf
STD NOTES INTRODUCTION TO COMMUNITY HEALT STRATEGY.ppt
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
MENTAL HEALTH - NOTES.ppt for nursing students
Note on Abortion.pptx for the student note
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
post stroke aphasia rehabilitation physician
ASRH Presentation for students and teachers 2770633.ppt
Obstructive sleep apnea in orthodontics treatment
Transforming Regulatory Affairs with ChatGPT-5.pptx
Important Obstetric Emergency that must be recognised
Oral Aspect of Metabolic Disease_20250717_192438_0000.pdf
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Breast Cancer management for medicsl student.ppt
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
neonatal infection(7392992y282939y5.pptx

tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Objectives

  • 1. TranSMART Core From tool to ecosystem Kees van Bochove tranSMART Workshop Amsterdam June 17, 2013
  • 2. Today, we have a chance to writehistory.
  • 3. •Microarray data analysis support •Microarray data analysis support •Load public microarray data from GEO data from GEO •Load public microarray •Store and retrievesaved analyses •Store and retrievesaved analyses •Search on gene name,on gene name etc. •Search disease name, disease name e •Genomicvariants and VCF support VCF support •Genomicvariants and •Load TCGA studies we have accesswe have access to •Load TCGA studies to •Load 1000 Genomes1000 Genomes data •Load data $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
  • 4. There has to be a betterway.
  • 5. costs $ 0! No-brainer! Ehm.. wait a minute…
  • 6. Let’s have a look how these scientists in academia are doing. Theylove to collaborate right?!
  • 8. YetAnother ‘New’ Web-basedSolutionfor the Management of Microarray Data ?!
  • 9. NotInventedHereSyndrome Image from Rob Hooft, CTO NetherlandsBioinformatics Centre http://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html
  • 10. Whatabout all these great FP6, FP7, IMI, … projects?
  • 11. Source code of major projects is readilyavailableonGitHub
  • 12. But… I’mafraidit’sstill up to you and me to put the piecestogether.
  • 13. Phenotype Database Written in Grails, supports several types of omics data, provides data integration and visualization, has R, Groovy and PHP API’s. Sounds familiar? http://phenotypefoundation.org
  • 16. Sofar… • TranSMART has a huge business potential. It’s nosilverbulletthough. • Scientistssometimes have troublereusingeachothers’ work. Especiallywhenit comes to open source software.
  • 17. Do they? Time to look at some succes stories.
  • 19. Website looks as if dates from Stone Age. Must bethoseLaTeX-lovingphysicists.
  • 21. Governance of R community BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of power. R was principallydevelopedfor the benefit of the core team.” As citedon http://blog.revolutionanalytics.com/2011/08/brian-ripley-onthe-r-development-process.html
  • 23. Galaxy is the most widelyused open sourcebioinformatics web interface AFAIK. Probably in nosmallamountthanks to theircontinuousdedication to improving the UI. Butthere’ssomethingelse.
  • 25. • An open source CMS (Content Management System) written in Python, nowadays backing thousands of productiongrade websites • Startedby 2 developers in 2000, nowanactive open source project withhundreds of activedevelopers • In 2004, the Plone Foundation was formed to formalize IP and secure the future of Plone • PloneCollective has hundreds of plugins
  • 27. What do all these successstories have in common? BioconductorPackages GalaxyToolshed PloneCollective Drupal Modules
  • 28. LessonsfortranSMART TranSMARTneeds a marketplace and a thrivingcommunity to survive. To get to a functioningmarketplace, we need a well-designed core.
  • 30. TranSMARTContributions - Pharma • Janssen – Initialversion of tranSMART – Genomics viewer using IGV and GenePattern – Faceted Search interface (resultsbrowsing) • Millenium – Loading TCGA andmany GEO studies – R interface forinteractingwith data directly in R – Several R analyses availabledirectly in GUI
  • 31. TranSMARTContributions - Pharma • Sanofi – Cleaner user interface – Added metadata layerfor all concepts – Study/Program categorization& file management • Pfizer – GWAS upload (VCF), data storage and analysis – Enhanced data export capabilities
  • 33. This is a mess. Anotherreasonwhy we needthat core.
  • 34. Start the Core: I2B2 Refactoring 1. I2B2 was integratedwithtranSMART, but the I2B2 API abstractionswereleaked all over the place in the tranSMARTapplication. 2. We agreed in the London meeting that all partieswould set some time apart forworkingon the core. 3. Combined, it made sense to start working at the clinical data API, properlyusing the I2B2 API wherepossible, and re-implement all I2B2 functionality in a new ‘core-db’ plugin.
  • 35. The firstversion of core-integration was completed half April. Bythen, all webservice calls to whatformerly was anoutdatedversion of the I2B2 Ontology and CRC cells, were handled by the newlyimplementedcore-dbplugin. Also, a set of tests was written in the process and API documentationgenerated.
  • 36. In the long run, I believeforming a gooddistributedworkinggroupon the core API is a more important delivery of this workshop thancrunching out a stable 1.1 version. That’show we writethathistory
  • 39. TranSMART’s Strong Points • Powerful, ready to go user interface forcommon analyses (survival analysis, gene expressionheatmapsetc.) • Leverages i2b2 data model forclinical data and offers unified view over different studies • Uses a lot of good open sourcetechnologyunder the hood (Grails, R, SOLR, Pentaho)  leveragingexistingcommunitydevelopments
  • 40. TranSMART Building Blocks • R: open source statistics package with CRAN, an active repository in which many algorithms and statistical packages are published • Grails: a rapid application development framework in Groovy leveraging Java technology such as Hibernate, Spring, Quartz • I2b2: domain specific open source package for storing and querying clinical data • GenePattern, maybe soon: Galaxy, KNIME?
  • 41. TranSMART’sWeaknesses • Largemonolithic codebase withlittlemodularizationbeyond the standardGrails MVC setup • Code quality is problematic, especiallyJavaScript • Test coverage is low, nofunctional / web tests and little unit and integration tests • No clearinternalAPI’s, only a service level that does the plumbing. • I2b2 integrationviolates i2b2 abstractions
  • 42. tranSMART Plans • Use a clearly modularized architecture with separation of clinical, high dimensional, search and metadata storage; workflow execution enginges and knowledge repository • Define clear API and rewrite current implementations with good test coverage • Use i2b2 data model, re-harmonize with latest i2b2 APIs, and don’t use i2b2 binaries directly • Separate analysis definitions and abstract from workflow execution engine http://prezi.com/t6twshyctdsk/transmart-core-refactoring
  • 44. Further reading • Description of core API efforts: http://thehyve.nl/rewiring-transmart • In depthdescription of i2b2 refactoring: http://thehyve.nl/inital-work-on-transmarts-core • Overview of tranSMART Core API sofar: http://thehyve.github.io/transmart-core-api/ • Example of continuousintegration test suite (ofcore-db): https://ci.ctmmtrait.nl/browse/TMCOREDB-JOB1-51/test