SlideShare a Scribd company logo
DATA INTELLIGENCE FOR ALL

Adatao Live Demo
at the First Spark Summit
Dec 2, 2013, San Francisco
(Video at the end of this deck)
Christopher Nguyen, PhD
Co-Founder & CEO
Big-Data Compute Engines, Google Apps
Engineering Director, Google Founders’ Award,
HKUST Prof, 2 successful enterprise exits,
Stanford PhD

Deep engineering &
business experience from
Google, Yahoo et al.
PhD’s in DM & ML from
UIUC, Georgia Tech,
Stanford, Berkeley, ...

Hadoop distributed/streaming analytics,Yahoo
Hadoop Eng, UIUC PhD

Machine learning & machine vision, US Army
Research Lab, Johns Hopkins PhD
Business Users
Data Scientists
Data Engineers
ONE Integrated Platform for Business & Data Science & Engineering

BIG
INSIGHTS

001 001
0 1 1 00 1 1 0
1 1 1 01 1 1 0
1 0 0 01 0 0 0
0 0 0 10 0 0 1
0 0 0 10 0 0 1
0 1 1 00 1 1 0
1 1 1 01 1 1 0

Visually Beautiful	

Interactive Data

Exploration	

Narrative Web App

BIG
COMPUTE

Powerful In-Memory Data Mining	

Machine Learning Big Analytics Platform	


(Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data)

BIG
DATA
Architecture Design
One Integrated Platform
for Business & Data Science & Engineering
Business Users

Data Scientists

Data Engineers

001 001
0 1 1 00 1 1 0
1 1 1 01 1 1 0
1 0 0 01 0 0 0
0 0 0 10 0 0 1
0 0 0 10 0 0 1
0 1 1 00 1 1 0
1 1 1 01 1 1 0

Business Users

VS

Data Scientists

Data Engineers

stack	

for	

business	

users

stack	

for	

data
science

stack	

for	

data	

eng

OTHERS
001 001
0 1 1 00 1 1 0
1 1 1 01 1 1 0
1 0 0 01 0 0 0
0 0 0 10 0 0 1
0 0 0 10 0 0 1
0 1 1 00 1 1 0
1 1 1 01 1 1 0

for Data Scientists & Engineers
Big Data Mining & Machine Learning

Powerful In-Memory Data Mining & Machine
Learning—Model Terabytes in Seconds	

Interactive, Cluster-Scale Data Munging &
Modeling with Native R, R-Studio, Python, SQL,
and Java Front-ends	

Real-Time Scoring Directly From Trained Models	

Share reproducible, live data analysis documents	

Hadoop, Cassandra, RDBMS, Streaming Data
for Business Users
Predictive Decision Making

A Beautiful New Way to Create & Share
Visual Narratives of Your Analysis	

!

Perform Ad Hoc Queries in Plain English	

!

Publish Streaming, Interactive Dashboards	

!

Collaborate With Others In Real Time	

!

Query Terabytes in Seconds.
Demo Deployment
Diagram

CLIENT

MASTER

WORKER

WORKER

WORKER

WORKER
Demo Config
Cluster: 8-node x 8-core x 30GB RAM x 1TB Disk
Data Sets: 12GB-100GB, 100M-1B rows
Airline Arrival Data, 1988-2008 from DoT
Algorithms
- LM & supporting statistics (AIC, log-likelihood, R2, cross-validation)

- Binning

- Classification metrics: confusion matrix, ROC, AUC, F1

- Logistic Regression with Ref Level for Categorical Vars

- k-Means

- Random Forest

- Naive Bayes

- Linear SVM
Algorithm Roadmap
- Hierarchical Clustering

- Text Mining (token, POS, LDA, …)

- SVD

- Markov Chain Models

- Ensemble Models

-…
Thank you!
See demo video at
!

http:/
/youtu.be/5UAdk7oHoPE?t=7m

More Related Content

PDF
Big Data presentation Tensing
PDF
Introduction to Numetric (1)
PDF
Lessons learned building a big data analytics engine, from proprietary to ope...
PPTX
Bigdata " new level"
TXT
Books neended
PDF
Real time big data analytical architecture for remote sensing application
PDF
The Future Of Big Data
PDF
What's New in Cytoscape
Big Data presentation Tensing
Introduction to Numetric (1)
Lessons learned building a big data analytics engine, from proprietary to ope...
Bigdata " new level"
Books neended
Real time big data analytical architecture for remote sensing application
The Future Of Big Data
What's New in Cytoscape

What's hot (8)

PDF
Big Data on Public Cloud
PPTX
Big data management
PPTX
PDF
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
PDF
IBM Big Data References
PPTX
The Future of Data Science
PDF
Introduction to Big Data
PDF
Apache Spark and future of advanced analytics
Big Data on Public Cloud
Big data management
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
IBM Big Data References
The Future of Data Science
Introduction to Big Data
Apache Spark and future of advanced analytics
Ad

Similar to Adatao Live Demo at the First Spark Summit (20)

PDF
Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Vall...
PDF
The Future of Data Science
PDF
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
PDF
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
PDF
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
PDF
Introduction to PySpark
PPTX
Coding software and tools used for data science management - Phdassistance
PDF
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
PPSX
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
PPTX
10 Best Platforms For Data Science and Machine Learning.pptx
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
PPTX
Introduction to Big Data and AI for Business Analytics and Prediction
PDF
PPT5: Neuron Introduction
PDF
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
PDF
ASGARD Splunk Conf 2016
PPTX
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
PDF
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
PPTX
Technology and AI sharing - From 2016 to Y2017 and Beyond
PDF
Big Data at DYNO
PDF
Big Data in Action – Real-World Solution Showcase
Adatao: Interactive, Visual, Predictive Analytics for Big Data @ Silicon Vall...
The Future of Data Science
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
BUILDING BETTER PREDICTIVE MODELS WITH COGNITIVE ASSISTANCE IN A DATA SCIENCE...
Big Data, Simple and Fast: Addressing the Shortcomings of Hadoop
Introduction to PySpark
Coding software and tools used for data science management - Phdassistance
Coding‌ ‌Software‌ ‌and‌ ‌Tools‌ ‌used‌ ‌for‌ ‌Data‌ ‌Science‌ ‌Management‌ ‌...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
10 Best Platforms For Data Science and Machine Learning.pptx
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Introduction to Big Data and AI for Business Analytics and Prediction
PPT5: Neuron Introduction
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
ASGARD Splunk Conf 2016
TAKE A LOOK AT THE TOP 7 SKILLS THAT A DATA ENGINEER CERTAINLY HAS TO HAVE
How to Swiftly Operationalize the Data Lake for Advanced Analytics Using a Lo...
Technology and AI sharing - From 2016 to Y2017 and Beyond
Big Data at DYNO
Big Data in Action – Real-World Solution Showcase
Ad

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Big Data Technologies - Introduction.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation theory and applications.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Big Data Technologies - Introduction.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Machine Learning_overview_presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Building Integrated photovoltaic BIPV_UPV.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
A comparative analysis of optical character recognition models for extracting...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation theory and applications.pdf
Empathic Computing: Creating Shared Understanding
Reach Out and Touch Someone: Haptics and Empathic Computing
Diabetes mellitus diagnosis method based random forest with bat algorithm
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

Adatao Live Demo at the First Spark Summit

  • 1. DATA INTELLIGENCE FOR ALL Adatao Live Demo at the First Spark Summit Dec 2, 2013, San Francisco (Video at the end of this deck) Christopher Nguyen, PhD Co-Founder & CEO
  • 2. Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ... Hadoop distributed/streaming analytics,Yahoo Hadoop Eng, UIUC PhD Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD
  • 3. Business Users Data Scientists Data Engineers ONE Integrated Platform for Business & Data Science & Engineering BIG INSIGHTS 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 Visually Beautiful Interactive Data
 Exploration Narrative Web App BIG COMPUTE Powerful In-Memory Data Mining Machine Learning Big Analytics Platform (Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data) BIG DATA
  • 4. Architecture Design One Integrated Platform for Business & Data Science & Engineering Business Users Data Scientists Data Engineers 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 Business Users VS Data Scientists Data Engineers stack for business users stack for data science stack for data eng OTHERS
  • 5. 001 001 0 1 1 00 1 1 0 1 1 1 01 1 1 0 1 0 0 01 0 0 0 0 0 0 10 0 0 1 0 0 0 10 0 0 1 0 1 1 00 1 1 0 1 1 1 01 1 1 0 for Data Scientists & Engineers Big Data Mining & Machine Learning Powerful In-Memory Data Mining & Machine Learning—Model Terabytes in Seconds Interactive, Cluster-Scale Data Munging & Modeling with Native R, R-Studio, Python, SQL, and Java Front-ends Real-Time Scoring Directly From Trained Models Share reproducible, live data analysis documents Hadoop, Cassandra, RDBMS, Streaming Data
  • 6. for Business Users Predictive Decision Making A Beautiful New Way to Create & Share Visual Narratives of Your Analysis ! Perform Ad Hoc Queries in Plain English ! Publish Streaming, Interactive Dashboards ! Collaborate With Others In Real Time ! Query Terabytes in Seconds.
  • 8. Demo Config Cluster: 8-node x 8-core x 30GB RAM x 1TB Disk Data Sets: 12GB-100GB, 100M-1B rows Airline Arrival Data, 1988-2008 from DoT
  • 9. Algorithms - LM & supporting statistics (AIC, log-likelihood, R2, cross-validation)
 - Binning
 - Classification metrics: confusion matrix, ROC, AUC, F1
 - Logistic Regression with Ref Level for Categorical Vars
 - k-Means
 - Random Forest
 - Naive Bayes
 - Linear SVM
  • 10. Algorithm Roadmap - Hierarchical Clustering
 - Text Mining (token, POS, LDA, …)
 - SVD
 - Markov Chain Models
 - Ensemble Models
 -…
  • 11. Thank you! See demo video at ! http:/ /youtu.be/5UAdk7oHoPE?t=7m