SlideShare a Scribd company logo
© 2019 KNIME AG. All Rights Reserved.
Google BigQuery for analysis of
scientific datasets: Interactive
exploration and analysis of the data
using KNIME Analytics Platform
Greg Landrum
Martyna Pawletta
Jeanette Prinz
greg.landrum@knime.com
@dr_greg_landrum
© 2019 KNIME AG. All Rights Reserved. 2
Acknowledgements
• Steve Boyer (Collabra)
• Lutz Weber (OntoChem)
• Ian Wetherbee (Google)
© 2019 KNIME AG. All Rights Reserved. 3
Google BigQuery?
• A giant collection of tables that I can query with SQL
• If the tables share common keys, I can do interesting
things
Might be an oversimplification. ☺
© 2019 KNIME AG. All Rights Reserved. 4
An aside: searching vs exploring
a.k.a. why I’m enthusiastic about this project
© 2019 KNIME AG. All Rights Reserved. 5
An aside: searching vs exploring
© 2019 KNIME AG. All Rights Reserved. 6
An aside: searching vs exploring
© 2019 KNIME AG. All Rights Reserved. 7
An aside: searching vs exploring
• There are definitely arguments for specialized
interfaces that are tailored to make answering a
particular question super efficient and easy
• But! There are times when I’m still trying to figure
out exactly what the question is
• For this it’s nice to have a giant pile of data and a
general purpose tool for exploring it
© 2019 KNIME AG. All Rights Reserved. 8
What we’re going to do here
• Do some exploration of the scientific data that’s
now in BigQuery…
• … with KNIME
© 2019 KNIME AG. All Rights Reserved. 9
Workflow part 1
© 2019 KNIME AG. All Rights Reserved. 10
Workflow part 2
© 2019 KNIME AG. All Rights Reserved. 11
The first database queries
© 2019 KNIME AG. All Rights Reserved. 12
Picking the disease/condition
© 2019 KNIME AG. All Rights Reserved. 13
Results
© 2019 KNIME AG. All Rights Reserved. 14
Compound classes
© 2019 KNIME AG. All Rights Reserved. 15
© 2019 KNIME AG. All Rights Reserved. 16

More Related Content

PDF
Building useful models for imbalanced datasets (without resampling)
PDF
Building useful models for imbalanced datasets (without resampling)
PDF
Moving from Artisanal to Industrial Machine Learning
PPTX
Hadoop UK Strata Panel Discussion
PDF
How Data Saves Time
PDF
Introduction aux algorithmes génétiques
PDF
5 Crucial Considerations for Big data adoption
PDF
Scaling Deep Learning Algorithms on Extreme Scale Architectures
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Moving from Artisanal to Industrial Machine Learning
Hadoop UK Strata Panel Discussion
How Data Saves Time
Introduction aux algorithmes génétiques
5 Crucial Considerations for Big data adoption
Scaling Deep Learning Algorithms on Extreme Scale Architectures

What's hot (13)

PDF
SpaceCurve - Integrating with Hadoop
PPTX
Visualising your Big Data: Eye Vegetables and Eye Candy
PDF
Scoring Metrics for Classification Models
PPTX
Analysis and interpretation of monitoring data
PPTX
Cluster vision at Amsterdam Tech Job Fair
PPTX
Optalysys Optical Processing for HPC
PDF
DataOps: An Agile Method for Data-Driven Organizations
PDF
kleemann8_12_16c
PDF
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
PDF
SGI Big Data Launch
PDF
Emerson Technology Group (ETG)
PDF
Utilizing Human Data Validation For KPI Analysis And Machine Learning
PDF
ODSC data science to DataOps
SpaceCurve - Integrating with Hadoop
Visualising your Big Data: Eye Vegetables and Eye Candy
Scoring Metrics for Classification Models
Analysis and interpretation of monitoring data
Cluster vision at Amsterdam Tech Job Fair
Optalysys Optical Processing for HPC
DataOps: An Agile Method for Data-Driven Organizations
kleemann8_12_16c
Wind meteodyn WT cfd micro scale modeling combined statistical learning for s...
SGI Big Data Launch
Emerson Technology Group (ETG)
Utilizing Human Data Validation For KPI Analysis And Machine Learning
ODSC data science to DataOps
Ad

Similar to Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform (20)

PDF
Augmented OLAP for Big Data
PDF
Augmented OLAP Analytics for Big Data
PDF
Open Source Story and what’s new in KNIME Software
PPTX
Your Data Nerd Friends Need You!
PDF
Interactive and reproducible data analysis with the open-source KNIME Analyti...
PPTX
Advance Data Visualization and Storytelling Virtual Workshop
 
PPTX
PPTX
What is Prototype,Rapid prototyping and Methods.
PPTX
Prototype: Its methods, techniques, and key features.
PDF
Slicing heuristics - Techniques for improving value generation, speed to mark...
PPT
U4 l01 What is big data?
PDF
Why i love Apache Spark?
PDF
10 reasons why you should choose big data hadoop as career in 2018
PDF
Deltaplan - SEO Search
PPTX
Cross Device Optimisation - Google Analytics Shortcuts
PDF
Webinar-Building a Strong Brand For Your Organization -2017-03-07
PPTX
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
PPTX
Augmented OLAP for Big Data Analytics
PDF
From Data to Insights with Google Cloud Platform
PDF
How to Use Big Data by Onehub
Augmented OLAP for Big Data
Augmented OLAP Analytics for Big Data
Open Source Story and what’s new in KNIME Software
Your Data Nerd Friends Need You!
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Advance Data Visualization and Storytelling Virtual Workshop
 
What is Prototype,Rapid prototyping and Methods.
Prototype: Its methods, techniques, and key features.
Slicing heuristics - Techniques for improving value generation, speed to mark...
U4 l01 What is big data?
Why i love Apache Spark?
10 reasons why you should choose big data hadoop as career in 2018
Deltaplan - SEO Search
Cross Device Optimisation - Google Analytics Shortcuts
Webinar-Building a Strong Brand For Your Organization -2017-03-07
Making Sense of Cyberspace, keynote for Software Engineering Institute Cyber ...
Augmented OLAP for Big Data Analytics
From Data to Insights with Google Cloud Platform
How to Use Big Data by Onehub
Ad

More from Greg Landrum (15)

PDF
Chemical registration
PDF
Mike Lynch Award Lecture, ICCS 2022
PDF
ACS San Diego - The RDKit: Open-source cheminformatics
PDF
Let’s talk about reproducible data analysis
PDF
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
PDF
Processing malaria HTS results using KNIME: a tutorial
PDF
Big (chemical) data? No Problem!
PDF
Is one enough? Data warehousing for biomedical research
PDF
Some "challenges" on the open-source/open-data front
PDF
Large scale classification of chemical reactions from patent data
PDF
Machine learning in the life sciences with knime
PDF
Open-source from/in the enterprise: the RDKit
PDF
Open-source tools for querying and organizing large reaction databases
PDF
Is that a scientific report or just some cool pictures from the lab? Reproduc...
PDF
Reproducibility in cheminformatics and computational chemistry research: cert...
Chemical registration
Mike Lynch Award Lecture, ICCS 2022
ACS San Diego - The RDKit: Open-source cheminformatics
Let’s talk about reproducible data analysis
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Processing malaria HTS results using KNIME: a tutorial
Big (chemical) data? No Problem!
Is one enough? Data warehousing for biomedical research
Some "challenges" on the open-source/open-data front
Large scale classification of chemical reactions from patent data
Machine learning in the life sciences with knime
Open-source from/in the enterprise: the RDKit
Open-source tools for querying and organizing large reaction databases
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Reproducibility in cheminformatics and computational chemistry research: cert...

Recently uploaded (20)

PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
BIOMOLECULES PPT........................
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
Microbiology with diagram medical studies .pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Biophysics 2.pdffffffffffffffffffffffffff
BIOMOLECULES PPT........................
Classification Systems_TAXONOMY_SCIENCE8.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
TOTAL hIP ARTHROPLASTY Presentation.pptx
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Placing the Near-Earth Object Impact Probability in Context
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The KM-GBF monitoring framework – status & key messages.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
POSITIONING IN OPERATION THEATRE ROOM.ppt
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
7. General Toxicologyfor clinical phrmacy.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Cell Membrane: Structure, Composition & Functions
Microbiology with diagram medical studies .pptx

Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform

  • 1. © 2019 KNIME AG. All Rights Reserved. Google BigQuery for analysis of scientific datasets: Interactive exploration and analysis of the data using KNIME Analytics Platform Greg Landrum Martyna Pawletta Jeanette Prinz greg.landrum@knime.com @dr_greg_landrum
  • 2. © 2019 KNIME AG. All Rights Reserved. 2 Acknowledgements • Steve Boyer (Collabra) • Lutz Weber (OntoChem) • Ian Wetherbee (Google)
  • 3. © 2019 KNIME AG. All Rights Reserved. 3 Google BigQuery? • A giant collection of tables that I can query with SQL • If the tables share common keys, I can do interesting things Might be an oversimplification. ☺
  • 4. © 2019 KNIME AG. All Rights Reserved. 4 An aside: searching vs exploring a.k.a. why I’m enthusiastic about this project
  • 5. © 2019 KNIME AG. All Rights Reserved. 5 An aside: searching vs exploring
  • 6. © 2019 KNIME AG. All Rights Reserved. 6 An aside: searching vs exploring
  • 7. © 2019 KNIME AG. All Rights Reserved. 7 An aside: searching vs exploring • There are definitely arguments for specialized interfaces that are tailored to make answering a particular question super efficient and easy • But! There are times when I’m still trying to figure out exactly what the question is • For this it’s nice to have a giant pile of data and a general purpose tool for exploring it
  • 8. © 2019 KNIME AG. All Rights Reserved. 8 What we’re going to do here • Do some exploration of the scientific data that’s now in BigQuery… • … with KNIME
  • 9. © 2019 KNIME AG. All Rights Reserved. 9 Workflow part 1
  • 10. © 2019 KNIME AG. All Rights Reserved. 10 Workflow part 2
  • 11. © 2019 KNIME AG. All Rights Reserved. 11 The first database queries
  • 12. © 2019 KNIME AG. All Rights Reserved. 12 Picking the disease/condition
  • 13. © 2019 KNIME AG. All Rights Reserved. 13 Results
  • 14. © 2019 KNIME AG. All Rights Reserved. 14 Compound classes
  • 15. © 2019 KNIME AG. All Rights Reserved. 15
  • 16. © 2019 KNIME AG. All Rights Reserved. 16