SlideShare a Scribd company logo
BIG DATA SCIENCE 
Chandan Rajah [ @ChandanRajah ] 
“The price of light is far less than the cost of darkness”
COST SPEED 
BENEFITS OF BIG DATA 
AGILITY CAPABILITY
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
What is Big Data ? 
Big Data ≠ Data Volume 
Big Data = Crude Oil 
Think of data like ‘Crude Oil’ 
Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
What is Data Science ? 
Data Science ≠ Statistical Analysis 
Data Science = Oil Refinery 
Data science is about ‘treating’ data; applying ‘science’ to the data; 
Refine the data ‘results’; and combine to form ‘insight’
Knowns, Unknowns & DIKUW FTW! 
known knowns 
we know we know 
known unknowns 
we know we don’t know 
unknown unknowns 
we don’t know we don’t know 
D 
DATA 
I 
INFORMATION 
K 
KNOWLEDGE 
W 
WISDOM 
U 
UNDERSTANDING 
PAST FUTURE 
Data Engineer Data Analyst Data Miner Data Scientist 
raw what how to why when 
numbers description experience cause & effect prediction 
letters context tested proven what’s best 
symbols relationship instruction 
known knowns 
known unknowns unknown unknowns 
signals reports programs models
Data Analytics to Data Discovery ? 
data you know 
data you don’t know 
questions you’re asking 
questions you’re not asking 
Data Analyst 
Data Scientist 
Data 
Analytics 
Data Discovery 
DATA MODELLING 
Y  F( X, random noise, parameters) 
ALGORITHMIC MODELLING 
Y  [ BLACK BOX ]  X
DIVIDE 
SCATTER 
Split Data in Block 
Replicate and Store 
Petabytes of Resilience 
CONQUER 
EXPLORE 
1000s of Parallel Threads 
Explore Every Path 
Machine Learning 
INSIGHT 
GATHER 
Real Time Action 
Periodic Dashboards 
Iterative Evolution 
What is the Big Idea ?
Divide = HDFS 
Name Node 
Client 1. Create Metadata 
2. Put Blocks 
1 2 3 
Control / Monitoring 
2 2 
1 1 
Data Nodes 
3 3 
WRITE 
Name Node 
Client 1. Get Metadata 
Control / Monitoring 
1 1 1 2 
2 
2 
4 3 3 3 
4 4 
2. Fetch Blocks 
Data Nodes 
READ
Conquer = MapReduce
Insight = Functional Paradigm
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
Why is Big Data needed ? 
VOLUME VELOCITY VARIETY 
Exponential growth; 2x in 2 yrs 
PB (1000 TB) is now common 
Event streams; never at rest 
640k GB per internet minute 
100s of data sources 
85% not in a table
Where in the Value Chain ? 
Generation Transport Knowledge Output Value 
BIG DATA SCIENCE 
Straddles all four Challenge Areas
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
Big Data Heat Map – Gartner 2012
Big Data Potential by Sector – McKinsey for USBLS, 2011
Big Data Investment by Industry – Gartner, 2012
Top Big Data Challenges – Gartner, 2012
Survey on Big Data Investments – IDG Survey, 2013
Survey on Main Drivers to Invest – IDG Survey, 2014
WHAT WHY 
Steps to the EPIPHANY 
WHERE 
DEMO
DEMO
COST SPEED 
RECAP OF BENEFITS 
AGILITY CAPABILITY
TIME VALUE OF DATA KNOWLEDGE IS POWER 
LAST WORDS OF WISDOM 
NOT ALL ROADS LEAD TO ROME 
I AM AN INDIVIDUAL
“The price of light is far less than the cost of darkness”

More Related Content

PPTX
Six steps to leveraging location for the Canadian insurance industry
PPTX
Data science a glance
PDF
Big Data: What's it Really About?
PDF
Business Insight 2014 - Data insights flyer
PPTX
Introduction of big data and analytics
PDF
Introduction to BigData
PDF
Biq query devfest2017_slides
PDF
Big Data Maturity Model and Governance
Six steps to leveraging location for the Canadian insurance industry
Data science a glance
Big Data: What's it Really About?
Business Insight 2014 - Data insights flyer
Introduction of big data and analytics
Introduction to BigData
Biq query devfest2017_slides
Big Data Maturity Model and Governance

What's hot (20)

PPT
Big Data Analysis for page ranking using map reduce concept
PPTX
Big data analytics
PPTX
Big data and data mining
PDF
5 Factors Impacting Your Big Data Project's Performance
PDF
What is a Data Scientist
PPTX
Big data
PPSX
De-Mystifying Big Data
DOCX
Tools for Unstructured Data Analytics
PPTX
Big data
PPTX
"Demystifying Big Data by AIBDP.org
PPTX
What is Big Data ?
PDF
Big data Introduction by Mohan
PPTX
Introduction to Big Data
PPTX
Exploring Big Data Analytics Tools
PPT
BIG DATA Analysis for page ranking using Map Reduce
PPTX
Big Data & Data Mining
PPTX
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
PPTX
AI and Applications
PDF
How to build a data science team 20115.03.13v6
Big Data Analysis for page ranking using map reduce concept
Big data analytics
Big data and data mining
5 Factors Impacting Your Big Data Project's Performance
What is a Data Scientist
Big data
De-Mystifying Big Data
Tools for Unstructured Data Analytics
Big data
"Demystifying Big Data by AIBDP.org
What is Big Data ?
Big data Introduction by Mohan
Introduction to Big Data
Exploring Big Data Analytics Tools
BIG DATA Analysis for page ranking using Map Reduce
Big Data & Data Mining
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
AI and Applications
How to build a data science team 20115.03.13v6
Ad

Similar to Big Data Science at the Digital Catapult (20)

PPTX
Steps to the Big Data Science Epiphany
PPS
Big Data Science: Intro and Benefits
PDF
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
PDF
00-01 DSnDA.pdf
PDF
Big Data & Social Analytics presentation
PPTX
Big Data and Data Science: The Technologies Shaping Our Lives
PDF
Sql saturday el salvador 2016 - Me, A Data Scientist?
PPTX
Big Data By Vijay Bhaskar Semwal
PDF
Day 00 - Introduction to machine learning with big data
PPTX
Big data Intro - Presentation to OCHackerz Meetup Group
PPTX
20211011112936_PPT01-Introduction to Big Data.pptx
PDF
All About Big Data
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PDF
Big_Data_ML_Madhu_Reddiboina
PPTX
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
PPT
Research issues in the big data and its Challenges
PPT
Understanding big data, a business perspective
PPTX
A Big Data Concept
PPTX
Introduction to Big Data
PPTX
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
Steps to the Big Data Science Epiphany
Big Data Science: Intro and Benefits
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
00-01 DSnDA.pdf
Big Data & Social Analytics presentation
Big Data and Data Science: The Technologies Shaping Our Lives
Sql saturday el salvador 2016 - Me, A Data Scientist?
Big Data By Vijay Bhaskar Semwal
Day 00 - Introduction to machine learning with big data
Big data Intro - Presentation to OCHackerz Meetup Group
20211011112936_PPT01-Introduction to Big Data.pptx
All About Big Data
Lecture 5 - Big Data and Hadoop Intro.ppt
Big_Data_ML_Madhu_Reddiboina
Sr. Jon Ander, Internet de las Cosas y Big Data: ¿hacia dónde va la Industria?
Research issues in the big data and its Challenges
Understanding big data, a business perspective
A Big Data Concept
Introduction to Big Data
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
Ad

More from Chandan Rajah (17)

PPT
Business Change through Predictive Analytics
PPT
Business Change through Predictive Analytics
PPTX
Data Disruption by Vertical Innovation
PPTX
Data Innovation in the UK
PPTX
Data Disruption by Vertical Innovation in Media
PDF
Catalysing Sector Advantage
DOCX
Rise of the Machines
PPTX
Health Innovation and the Digital Catapult
PPTX
Connected Farms ...and the Digital Catapult
PPTX
Data Innovation in the Digital Economy
PPTX
Disruptive Data in Future Care
PPTX
Data Warehouse to Data Science
PPTX
Business Impact of Predictive Analytics
PPTX
Social Triangulation with Big Data
PPTX
Big Data Science Challenges in Media
PPTX
Hadoop and friends
PPT
IPTV Case Study
Business Change through Predictive Analytics
Business Change through Predictive Analytics
Data Disruption by Vertical Innovation
Data Innovation in the UK
Data Disruption by Vertical Innovation in Media
Catalysing Sector Advantage
Rise of the Machines
Health Innovation and the Digital Catapult
Connected Farms ...and the Digital Catapult
Data Innovation in the Digital Economy
Disruptive Data in Future Care
Data Warehouse to Data Science
Business Impact of Predictive Analytics
Social Triangulation with Big Data
Big Data Science Challenges in Media
Hadoop and friends
IPTV Case Study

Recently uploaded (20)

PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Introduction to Data Science and Data Analysis
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPT
statistic analysis for study - data collection
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Introduction to the R Programming Language
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
IMPACT OF LANDSLIDE.....................
PPTX
Business_Capability_Map_Collection__pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Navigating the Thai Supplements Landscape.pdf
SET 1 Compulsory MNH machine learning intro
Introduction to Data Science and Data Analysis
DU, AIS, Big Data and Data Analytics.ppt
statistic analysis for study - data collection
Pilar Kemerdekaan dan Identi Bangsa.pptx
Introduction to the R Programming Language
retention in jsjsksksksnbsndjddjdnFPD.pptx
Global Data and Analytics Market Outlook Report
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
A Complete Guide to Streamlining Business Processes
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
IMPACT OF LANDSLIDE.....................
Business_Capability_Map_Collection__pptx
New ISO 27001_2022 standard and the changes
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Navigating the Thai Supplements Landscape.pdf

Big Data Science at the Digital Catapult

  • 1. BIG DATA SCIENCE Chandan Rajah [ @ChandanRajah ] “The price of light is far less than the cost of darkness”
  • 2. COST SPEED BENEFITS OF BIG DATA AGILITY CAPABILITY
  • 3. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 4. What is Big Data ? Big Data ≠ Data Volume Big Data = Crude Oil Think of data like ‘Crude Oil’ Big Data is about extracting ‘crude oil’; transporting it in ‘pipelines’; storing it in ‘mega tanks’
  • 5. What is Data Science ? Data Science ≠ Statistical Analysis Data Science = Oil Refinery Data science is about ‘treating’ data; applying ‘science’ to the data; Refine the data ‘results’; and combine to form ‘insight’
  • 6. Knowns, Unknowns & DIKUW FTW! known knowns we know we know known unknowns we know we don’t know unknown unknowns we don’t know we don’t know D DATA I INFORMATION K KNOWLEDGE W WISDOM U UNDERSTANDING PAST FUTURE Data Engineer Data Analyst Data Miner Data Scientist raw what how to why when numbers description experience cause & effect prediction letters context tested proven what’s best symbols relationship instruction known knowns known unknowns unknown unknowns signals reports programs models
  • 7. Data Analytics to Data Discovery ? data you know data you don’t know questions you’re asking questions you’re not asking Data Analyst Data Scientist Data Analytics Data Discovery DATA MODELLING Y  F( X, random noise, parameters) ALGORITHMIC MODELLING Y  [ BLACK BOX ]  X
  • 8. DIVIDE SCATTER Split Data in Block Replicate and Store Petabytes of Resilience CONQUER EXPLORE 1000s of Parallel Threads Explore Every Path Machine Learning INSIGHT GATHER Real Time Action Periodic Dashboards Iterative Evolution What is the Big Idea ?
  • 9. Divide = HDFS Name Node Client 1. Create Metadata 2. Put Blocks 1 2 3 Control / Monitoring 2 2 1 1 Data Nodes 3 3 WRITE Name Node Client 1. Get Metadata Control / Monitoring 1 1 1 2 2 2 4 3 3 3 4 4 2. Fetch Blocks Data Nodes READ
  • 12. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 13. Why is Big Data needed ? VOLUME VELOCITY VARIETY Exponential growth; 2x in 2 yrs PB (1000 TB) is now common Event streams; never at rest 640k GB per internet minute 100s of data sources 85% not in a table
  • 14. Where in the Value Chain ? Generation Transport Knowledge Output Value BIG DATA SCIENCE Straddles all four Challenge Areas
  • 15. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 16. Big Data Heat Map – Gartner 2012
  • 17. Big Data Potential by Sector – McKinsey for USBLS, 2011
  • 18. Big Data Investment by Industry – Gartner, 2012
  • 19. Top Big Data Challenges – Gartner, 2012
  • 20. Survey on Big Data Investments – IDG Survey, 2013
  • 21. Survey on Main Drivers to Invest – IDG Survey, 2014
  • 22. WHAT WHY Steps to the EPIPHANY WHERE DEMO
  • 23. DEMO
  • 24. COST SPEED RECAP OF BENEFITS AGILITY CAPABILITY
  • 25. TIME VALUE OF DATA KNOWLEDGE IS POWER LAST WORDS OF WISDOM NOT ALL ROADS LEAD TO ROME I AM AN INDIVIDUAL
  • 26. “The price of light is far less than the cost of darkness”

Editor's Notes

  • #3: COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  • #4: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #13: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #16: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #23: WHAT - What is Big Data Science? WHY - Why is it needed? WHERE - Where is it being used? HOW - How will it evolve?
  • #25: COST – 20x less per TB v/s Teradata, Netezza, Oracle – 75% less average marginal cost per capacity SPEED – 10x faster than Teradata, Netezza AGILITY – 115% lesser average cost per data source v/s Oracle SCIENCE – Machine learning, prediction
  • #26: TIME VALUE - Yesterday’s data is less valuable than today’s data - Historical data is more valuable than just now alone POWER - Get from unknown unknowns to known unknowns or known knowns is powerful LEAD TO ROME - Exploring with no direct business impact is not a bad thing INDIVUDUAL - Treat every customer as an individual not an aggregate and analyse - Aggregate only individual insights