SlideShare a Scribd company logo
the power of graphs for analyzing biological datasets

                       Davy Suvee

                    Janssen Pharmaceutica
about me

                 who am i ...
                 ➡ working as an it lead / software architect @ janssen pharmaceutica
                   • dealing with big scientific data sets
                   • hands-on expertise in big data and NoSQL technologies



                 ➡ founder of datablend
                   • provide big data and NoSQL consultancy
    Davy Suvee     • share practical knowledge and big data use cases via blog

      @DSUVEE
outline


➡ getting visual insights into big data sets
  ★ gene expression clustering (mongodb, Neo4j, Gephi)
  ★ Mutation prevalence (cassandra, Neo4j, Gephi)



➡ fluxgraph, a time machine for you graphs ...
insights in big data
➡ typical approach through warehousing
  ★ star schema with fact tables and dimension tables
insights in big data
➡ typical approach through warehousing
  ★ star schema with fact tables and dimension tables
insights in big data


                                                                                                                     ★ real-time visualization
                                                                                                                     ★ filtering
                                                                                                                     ★ metrics
                                                                                                                     ★ layouting
                                                                                                                                1, 2
                                                                                                                     ★ modular




1. http://gephi.org/plugins/neo4j-graph-database-support/   2. http://github.com/datablend/gephi-blueprints-plugin
gene expression clustering

                        ➡ oncology data set:
                          ★ 4.800 samples
                          ★ 27.000 genes


                        ➡ Question:
                          ★ for a particular subset of samples,
                          which genes are co-expressed?
mongodb for storing gene expressions
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,
  "sample_name" : "122551hp133a21.cel" ,
  "genomics_id" : 122551 ,
  "sample_id" : 343981 ,
  "donor_id" : 143981 ,
  "sample_type" : "Tissue" ,
  "sample_site" : "Ascending colon" ,
  "pathology_category" : "MALIGNANT" ,
  "pathology_morphology" : "Adenocarcinoma" ,
  "pathology_type" : "Primary malignant neoplasm of colon" ,
  "primary_site" : "Colon" ,
  "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,
                    { "gene" : "X10_at" , "expression" : 3.92335121981739} ,
                    { "gene" : "X100_at" , "expression" : 7.81638155662255} ,
                    { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,
                     … ]
}
pearson correlation through map-reduce
                         x   y

pearson correlation     43   99

                        21   65

                        25   79        0,52
                        42   75

                        57   87

                        59   81
co-expression graph


➡ create a node for each gene
➡ if correlation between two genes >= 0.8, draw an edge between both nodes
co-expression graph
graphs and time ...
➡ reproducible graph state

➡ towards a time-aware graph ...

➡ fluxgraph: a blueprints-compatible graph on top of Datomic

➡ make FluxGraph fully time-aware
   ★ travel your graph through time
   ★ time-scoped iteration of vertices and edges
   ★ temporal graph comparison
travel through time
FluxGraph fg = new FluxGraph();
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
                                          Peter
Vertex peter = ...
travel through time
FluxGraph fg = new FluxGraph();
                                   Davy

Vertex davy = fg.addVertex();
davy.setProperty(“name”,”Davy”);
                                                    Peter
Vertex peter = ...
Vertex michael = ...

                                          Michael
travel through time
FluxGraph fg = new FluxGraph();
                                     Davy




                                                      kn
                                                       ow
Vertex davy = fg.addVertex();




                                                           s
davy.setProperty(“name”,”Davy”);
                                                       Peter
Vertex peter = ...
Vertex michael = ...

Edge e1 =                                   Michael
  fg.addEdge(davy, peter,“knows”);
travel through time

                                Davy
Date checkpoint = new Date();




                                                 kn
                                                  ow
                                                      s
                                                  Peter




                                       Michael
travel through time

                                    Davy
Date checkpoint = new Date();




                                                     kn
                                                      ow
                                                          s
davy.setProperty(“name”,”David”);                     Peter




                                           Michael
travel through time

                                    David
Date checkpoint = new Date();




                                                      kn
                                                       ow
                                                           s
davy.setProperty(“name”,”David”);                      Peter




                                            Michael
travel through time

                                       David
Date checkpoint = new Date();




                                                         kn
                                                          ow
                                                              s
davy.setProperty(“name”,”David”);                         Peter




                                       kn
Edge e2 =




                                        ow
  fg.addEdge(davy, michael,“knows”);




                                            s
                                               Michael
travel through time                                           by default
time


                        kn
       Davy                  ow                            David
                                                           Davy
                                  s




                                                                             kn
                                                                              ow
                                              checkpoint




                                                                                  s



                                                                                          current
                                      Peter                                   Peter




                                                           kn
                                                            ow
                                                                s
              Michael                                              Michael
travel through time
time


                         kn
       Davy                   ow                            David
                                                            Davy
                                   s




                                                                              kn
                                                                               ow
                                               checkpoint




                                                                                   s



                                                                                       current
                                       Peter                                   Peter




                                                            kn
                                                             ow
                                                                 s
              Michael                                               Michael




                        fg.setCheckpointTime(checkpoint);
time-scoped iteration

         t1               t2               t3                 tcurrrent


              change           change            change



      Davy             Davy’            Davy’’            Davy’’’




  ➡ how to find the version of the vertex you are interested in?
time-scoped iteration
      t1                 t2                 t3                   tcurrrent




             next              next                next

    Davy              Davy’              Davy’’              Davy’’’
           previous           previous            previous
time-scoped iteration
       t1                 t2                 t3                   tcurrrent




              next              next                next

     Davy              Davy’              Davy’’              Davy’’’
            previous           previous            previous




Vertex previousDavy = davy.getPreviousVersion();
time-scoped iteration
         t1                 t2                 t3                   tcurrrent




                next              next                next

       Davy              Davy’              Davy’’              Davy’’’
              previous           previous            previous




 Vertex previousDavy = davy.getPreviousVersion();
Iterable<Vertex> allDavy = davy.getNextVersions();
time-scoped iteration
            t1                 t2                 t3                   tcurrrent




                   next              next                next

          Davy              Davy’              Davy’’              Davy’’’
                 previous           previous            previous




     Vertex previousDavy = davy.getPreviousVersion();
   Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
time-scoped iteration
            t1                 t2                 t3                   tcurrrent




                   next              next                next

          Davy              Davy’              Davy’’              Davy’’’
                 previous           previous            previous




     Vertex previousDavy = davy.getPreviousVersion();
   Iterable<Vertex> allDavy = davy.getNextVersions();
Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
       Interval valid = davy.getTimerInterval();
time-scoped iteration
➡ When does an element change?


➡ vertex:
   ★ setting or removing a property
   ★ add or remove it from an edge
   ★ being removed
time-scoped iteration
➡ When does an element change?


➡ vertex:                             ➡ edge:
   ★ setting or removing a property      ★ setting or removing a property
   ★ add or remove it from an edge       ★ being removed
   ★ being removed
time-scoped iteration
➡ When does an element change?


➡ vertex:                                ➡ edge:
   ★ setting or removing a property         ★ setting or removing a property
   ★ add or remove it from an edge          ★ being removed
   ★ being removed



➡ ... and each element is time-scoped!
temporal graph comparison

David
Davy                                          Davy




                                                                kn
                     kn




                                                                     ow
                      ow




                                                                      s
                          s
                      Peter   what changed?                          Peter
kn
 ow
     s




        Michael                                      Michael


           current                                      checkpoint
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!
temporal graph comparison
➡ difference (A , B) = union (A , B) - B
➡ ... as a (immutable) graph!                   David




  difference (                  ,          )=




                                                kn
                                                 ow
                                                     s
use case: longitudinal patient data
    t1        t2        t3        t4        t5




          smoking   smoking             death




patient   patient   patient   patient   patient




                              cancer    cancer
use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
use case: longitudinal patient data

➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)


➡ example analysis:
   ★ if a male patient is no longer smoking in 2005
   ★ what are the chances of getting lung cancer in 2010, comparing
        patients that smoked before 2005
        patients that never smoked
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males =
  fg.getVertices("gender", "male").iterator()
use case: longitudinal patient data
➡ get all male non-smokers in 2005

fg.setCheckpointTime(new DateTime(2005,12,31).toDate());

Iterator<Vertex> males =
  fg.getVertices("gender", "male").iterator()

while (males.hasNext()) {
   Vertex p2005 = males.next();
   boolean smoking2005 =
     p2005.getEdges(OUT,"smokingStatus").iterator().hasNext();
}
use case: longitudinal patient data
➡ which patients were smoking before 2005?


boolean smokingBefore2005 =
  ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() {

    public TimeAwareElement filter(TimeAwareVertex element) {
      return element.getEdges(OUT, "smokingStatus").iterator().hasNext()
        ? element : null;
    }

  }).iterator().hasNext();
use case: longitudinal patient data
➡ which patients have cancer in 2010

                                       working set of smokers
 Graph g =
   fg.difference(smokerws,
                 time2010.toDate(),
                 time2005.toDate());
use case: longitudinal patient data
➡ which patients have cancer in 2010

                                       working set of smokers
 Graph g =
   fg.difference(smokerws,
                 time2010.toDate(),
                 time2005.toDate());



➡ extract the patients that have an edge to the cancer node
Questions?

More Related Content

PDF
FluxGraph: a time-machine for your graphs
PDF
FluxGraph @ GraphDevRoom
PDF
Neo4j and bioinformatics
ODP
Graph databases in computational bioloby: case of neo4j and TitanDB
PPTX
Building a repository of biomedical ontologies with Neo4j
PPTX
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
PPTX
Temporal graph
PPTX
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
FluxGraph: a time-machine for your graphs
FluxGraph @ GraphDevRoom
Neo4j and bioinformatics
Graph databases in computational bioloby: case of neo4j and TitanDB
Building a repository of biomedical ontologies with Neo4j
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Temporal graph
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015

Viewers also liked (10)

PDF
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
PDF
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
PPTX
Ancestry Tutorial
PPTX
Mind mapping for project work
PDF
DNA Evidence with Ancestry
PPTX
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
PPTX
Neo4j GraphTalks - Einführung in Graphdatenbanken
PDF
SNP Genotyping Technologies
PPTX
Getting The Most Out Of Mind Mapping
PDF
Single Nucleotide Polymorphism Analysis (SNPs)
Validation of Identity and Ancestry SNP Panels for the Ion PGM™ System
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
Ancestry Tutorial
Mind mapping for project work
DNA Evidence with Ancestry
GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies wit...
Neo4j GraphTalks - Einführung in Graphdatenbanken
SNP Genotyping Technologies
Getting The Most Out Of Mind Mapping
Single Nucleotide Polymorphism Analysis (SNPs)
Ad

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
cuic standard and advanced reporting.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Programs and apps: productivity, graphics, security and other tools
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
The AUB Centre for AI in Media Proposal.docx
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
Ad

The power of graphs to analyze biological data

  • 1. the power of graphs for analyzing biological datasets Davy Suvee Janssen Pharmaceutica
  • 2. about me who am i ... ➡ working as an it lead / software architect @ janssen pharmaceutica • dealing with big scientific data sets • hands-on expertise in big data and NoSQL technologies ➡ founder of datablend • provide big data and NoSQL consultancy Davy Suvee • share practical knowledge and big data use cases via blog @DSUVEE
  • 3. outline ➡ getting visual insights into big data sets ★ gene expression clustering (mongodb, Neo4j, Gephi) ★ Mutation prevalence (cassandra, Neo4j, Gephi) ➡ fluxgraph, a time machine for you graphs ...
  • 4. insights in big data ➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 5. insights in big data ➡ typical approach through warehousing ★ star schema with fact tables and dimension tables
  • 6. insights in big data ★ real-time visualization ★ filtering ★ metrics ★ layouting 1, 2 ★ modular 1. http://gephi.org/plugins/neo4j-graph-database-support/ 2. http://github.com/datablend/gephi-blueprints-plugin
  • 7. gene expression clustering ➡ oncology data set: ★ 4.800 samples ★ 27.000 genes ➡ Question: ★ for a particular subset of samples, which genes are co-expressed?
  • 8. mongodb for storing gene expressions { "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} ,   "sample_name" : "122551hp133a21.cel" ,   "genomics_id" : 122551 ,   "sample_id" : 343981 ,   "donor_id" : 143981 ,   "sample_type" : "Tissue" ,   "sample_site" : "Ascending colon" ,   "pathology_category" : "MALIGNANT" ,   "pathology_morphology" : "Adenocarcinoma" ,   "pathology_type" : "Primary malignant neoplasm of colon" ,   "primary_site" : "Colon" ,   "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} ,                     { "gene" : "X10_at" , "expression" : 3.92335121981739} ,                     { "gene" : "X100_at" , "expression" : 7.81638155662255} ,                     { "gene" : "X1000_at" , "expression" : 5.44318512260619} ,                      … ] }
  • 9. pearson correlation through map-reduce x y pearson correlation 43 99 21 65 25 79 0,52 42 75 57 87 59 81
  • 10. co-expression graph ➡ create a node for each gene ➡ if correlation between two genes >= 0.8, draw an edge between both nodes
  • 12. graphs and time ... ➡ reproducible graph state ➡ towards a time-aware graph ... ➡ fluxgraph: a blueprints-compatible graph on top of Datomic ➡ make FluxGraph fully time-aware ★ travel your graph through time ★ time-scoped iteration of vertices and edges ★ temporal graph comparison
  • 13. travel through time FluxGraph fg = new FluxGraph();
  • 14. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”);
  • 15. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”); Peter Vertex peter = ...
  • 16. travel through time FluxGraph fg = new FluxGraph(); Davy Vertex davy = fg.addVertex(); davy.setProperty(“name”,”Davy”); Peter Vertex peter = ... Vertex michael = ... Michael
  • 17. travel through time FluxGraph fg = new FluxGraph(); Davy kn ow Vertex davy = fg.addVertex(); s davy.setProperty(“name”,”Davy”); Peter Vertex peter = ... Vertex michael = ... Edge e1 = Michael fg.addEdge(davy, peter,“knows”);
  • 18. travel through time Davy Date checkpoint = new Date(); kn ow s Peter Michael
  • 19. travel through time Davy Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter Michael
  • 20. travel through time David Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter Michael
  • 21. travel through time David Date checkpoint = new Date(); kn ow s davy.setProperty(“name”,”David”); Peter kn Edge e2 = ow fg.addEdge(davy, michael,“knows”); s Michael
  • 22. travel through time by default time kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael
  • 23. travel through time time kn Davy ow David Davy s kn ow checkpoint s current Peter Peter kn ow s Michael Michael fg.setCheckpointTime(checkpoint);
  • 24. time-scoped iteration t1 t2 t3 tcurrrent change change change Davy Davy’ Davy’’ Davy’’’ ➡ how to find the version of the vertex you are interested in?
  • 25. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous
  • 26. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion();
  • 27. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions();
  • 28. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions(); Iterable<Vertex> selDavy = davy.getPreviousVersions(filter);
  • 29. time-scoped iteration t1 t2 t3 tcurrrent next next next Davy Davy’ Davy’’ Davy’’’ previous previous previous Vertex previousDavy = davy.getPreviousVersion(); Iterable<Vertex> allDavy = davy.getNextVersions(); Iterable<Vertex> selDavy = davy.getPreviousVersions(filter); Interval valid = davy.getTimerInterval();
  • 30. time-scoped iteration ➡ When does an element change? ➡ vertex: ★ setting or removing a property ★ add or remove it from an edge ★ being removed
  • 31. time-scoped iteration ➡ When does an element change? ➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed
  • 32. time-scoped iteration ➡ When does an element change? ➡ vertex: ➡ edge: ★ setting or removing a property ★ setting or removing a property ★ add or remove it from an edge ★ being removed ★ being removed ➡ ... and each element is time-scoped!
  • 33. temporal graph comparison David Davy Davy kn kn ow ow s s Peter what changed? Peter kn ow s Michael Michael current checkpoint
  • 34. temporal graph comparison ➡ difference (A , B) = union (A , B) - B ➡ ... as a (immutable) graph!
  • 35. temporal graph comparison ➡ difference (A , B) = union (A , B) - B ➡ ... as a (immutable) graph! David difference ( , )= kn ow s
  • 36. use case: longitudinal patient data t1 t2 t3 t4 t5 smoking smoking death patient patient patient patient patient cancer cancer
  • 37. use case: longitudinal patient data ➡ historical data for 15.000 patients over a period of 10 years (2001- 2010)
  • 38. use case: longitudinal patient data ➡ historical data for 15.000 patients over a period of 10 years (2001- 2010) ➡ example analysis: ★ if a male patient is no longer smoking in 2005 ★ what are the chances of getting lung cancer in 2010, comparing patients that smoked before 2005 patients that never smoked
  • 39. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate());
  • 40. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate()); Iterator<Vertex> males = fg.getVertices("gender", "male").iterator()
  • 41. use case: longitudinal patient data ➡ get all male non-smokers in 2005 fg.setCheckpointTime(new DateTime(2005,12,31).toDate()); Iterator<Vertex> males = fg.getVertices("gender", "male").iterator() while (males.hasNext()) { Vertex p2005 = males.next(); boolean smoking2005 = p2005.getEdges(OUT,"smokingStatus").iterator().hasNext(); }
  • 42. use case: longitudinal patient data ➡ which patients were smoking before 2005? boolean smokingBefore2005 = ((FluxVertex)p2005).getPreviousVersions(new TimeAwareFilter() { public TimeAwareElement filter(TimeAwareVertex element) { return element.getEdges(OUT, "smokingStatus").iterator().hasNext() ? element : null; } }).iterator().hasNext();
  • 43. use case: longitudinal patient data ➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate());
  • 44. use case: longitudinal patient data ➡ which patients have cancer in 2010 working set of smokers Graph g = fg.difference(smokerws, time2010.toDate(), time2005.toDate()); ➡ extract the patients that have an edge to the cancer node