SlideShare a Scribd company logo
EXPERIENCES WITH BIG DATA
SRINIVASAN SESHADRI, FOUNDER ZETTATA
WORLD BEFORE BIG DATA
It is a Capital Mistake to Theorize before one has Data                 
Sherlock Holmes
HOWEVER, DO NOT WANT TO BE HERE
AXIOMS
• Measure, Measure, Measure
• Garbage in, Garbage out
• Correlation is not Causation
• More Data Beats Cleverer Algorithms
• Algorithms that do better with more data are more interesting
• Independent Sources Of data add new signals
• Feature Engineering is the key to being a good data scientist
• How do machines and Human interplay in Big Data?
• Learn many models ‐ ensembles
• Outliers are always interesting..
MEASURE, MEASURE, MEASURE
• Have a Hypothesis
• Create a metric to determine if hypothesis is correct
• Build a solution that can be measured 
• Iterate
If you can not measure it you can not improve it – Lord Kelvin
GARBAGE IN GARBAGE OUT
WHAT DO YOU WANT THE ANSWER TO BE?
CORRELATIONS
CORRELATION IS NOT CAUSATION
• Correlation in Data Need Not Imply Correlation in Real Life
• Can find random correlations in large amounts of data
• Correlation Does Not Imply Causation
CORRELATION IS NOT CAUSATION
CORRELATION STRIKES AGAIN!!
MORE DATA BEATS CLEVERER ALGORITHMS
• Adding IMDB data For Netflix prize
• Adding Protein Expression Data or Patient Data to Gene Expression Data
• Bag of Words Approach for Word Sense Disambiguation
WORD SENSE DISAMBIGUATION
• Bank
• Sloping Land Alongside a river or a lake. It typically has thick vegetation growing..
• A financial institution that takes deposits from some customers and gives loans to others who require the 
money.
To disambiguate in typical sentences look for co‐occurrences of words with words in definition. 
Unsupervised Learning. Bootstrap a model.
The pilot landed the plane on the Hudson River amongst several boats and an appreciative audience 
cheered from the banks of the river.
He issued a check and took it to the bank so he could transfer money.
Can look for frequent co‐occurrences with each sense of the word (boats and check respectively) and build 
a larger bag of words in which to disambiguate.
WORD SENSE DISAMBIGUATION
FEATURE ENGINEERING
Can not expect arbitrarily complex models to be learned by the computer
FEATURE ENGINEERING
CITYY 1 LAT. CITY 1 LNG. CITY 2 LAT. CITY 2 LNG. DRIVABLE?
123.24 46.71 121.33 47.34 Yes
123.24 56.91 121.33 55.23 Yes
123.24 46.71 121.33 55.34 No
123.24 46.71 130.99 47.34 No
FEATURE ENGINEERING
DISTANCE (MI.) DRIVABLE?
14 Yes
28 Yes
705 No
2432 No
OF HUMANS AND MACHINES
• Partnership is important
• Aha moment and the strategy comes from humans..
• Machines do the hard work of calculating fast and do not tire 
• Maybe some day Machines will be able to do more than they are asked to do explicitly.. Today Explicit 
Instructions are the norm..
ENSEMBLES ‐ OUTLIERS ARE NOT INTERESTING – FOR 
CLASSIFIERS
• Learn many models from random subsets of training data 
• Effect of outliers is reduced on a majority of the models
• Random Forests
OUTLIERS ARE ALWAYS INTERESTING FOR RANKING 
PROBLEMS
• You have to be so good that they can not ignore you
• My personal thesis: Average in everything is boring. Be 
outstanding in something.
• Outliers along some dimension always have interesting 
information – whenever you are combining multiple 
variables to come up with one global rank
• Search
• Job Interviews!
UNKNOWN UNKNOWNS – VERY INTERESTING TO A 
BUSINESS – OUTLIERS
BIG DATA AND HEALTHCARE
ARE YOU IN THE JOB MARKET?
www.zettata.com
sesh@zetatta.com
Thanks!

More Related Content

PPTX
Data analytics with managerial application ass 3
PDF
Lean Data Science
PPTX
Mark Graban Deming Red Bead 2016 SHS
PPTX
The Real Lessons of Dr. Deming’s Red Bead Factory
PDF
The art of shifting perspectives - Rachel Davies
PDF
Max Shron, Thinking with Data at the NYC Data Science Meetup
PPTX
Data Scientists Are Analysts Are Also Software Engineers
PDF
Big Data LA 2016: Backstage to a Data Driven Culture
Data analytics with managerial application ass 3
Lean Data Science
Mark Graban Deming Red Bead 2016 SHS
The Real Lessons of Dr. Deming’s Red Bead Factory
The art of shifting perspectives - Rachel Davies
Max Shron, Thinking with Data at the NYC Data Science Meetup
Data Scientists Are Analysts Are Also Software Engineers
Big Data LA 2016: Backstage to a Data Driven Culture

What's hot (20)

PDF
Current Reality Tree
PPTX
Skyscanner reality trees current reality trees future reality trees By Suzann...
PPTX
Lean Metrics
PDF
Ryan Ripley - The #NoEstimatesMovement
PDF
How Data Science Builds Better Products - Data Science Pop-up Seattle
PPTX
Solving Problems with Theory of Constraints Current Reality Trees @ Lean Agil...
PPTX
Hype research session 17/03/2015
PDF
Data Collection for Research Based Organizations to Aid Research!
PDF
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
PDF
Mark Graban Intro to Lean - Frisco
PPTX
Michael Plante, Inside Sales: The AI Revolution
PDF
Hiring for Data Scientists - Data Science Pop-up Seattle
PDF
Math in data
PDF
Why Content Projects Fail - Deane Barker - Presentation at eZ Conference 2017
PDF
How to get value out of data
PPTX
Information Technology - Discover the Root Cause and Develop a solution throu...
PDF
ADDD (Automated Data Driven Decisions) – How To Make it Work
PPTX
Measurement
PPTX
Data Scientist: Sexiest job of the 21st century
PDF
Automated decision making with predictive applications – Big Data Brussels
Current Reality Tree
Skyscanner reality trees current reality trees future reality trees By Suzann...
Lean Metrics
Ryan Ripley - The #NoEstimatesMovement
How Data Science Builds Better Products - Data Science Pop-up Seattle
Solving Problems with Theory of Constraints Current Reality Trees @ Lean Agil...
Hype research session 17/03/2015
Data Collection for Research Based Organizations to Aid Research!
Automated Decision Making with Predictive Applications – Big Data Düsseldorf
Mark Graban Intro to Lean - Frisco
Michael Plante, Inside Sales: The AI Revolution
Hiring for Data Scientists - Data Science Pop-up Seattle
Math in data
Why Content Projects Fail - Deane Barker - Presentation at eZ Conference 2017
How to get value out of data
Information Technology - Discover the Root Cause and Develop a solution throu...
ADDD (Automated Data Driven Decisions) – How To Make it Work
Measurement
Data Scientist: Sexiest job of the 21st century
Automated decision making with predictive applications – Big Data Brussels
Ad

Viewers also liked (20)

PPTX
Data preprocessing
PDF
Ten Commandments for Tackling Fraud: The Role of Big Data and Predictive Anal...
PPTX
My magazine edited
PPS
Very beautiful
PPTX
Tomer Shiran, MapR_Hadoop&SQL
PPTX
Susheel Patel, Pivotal_Hadoop&SQL
PDF
Redefine healthcare with IT by Niranjan Thirumale
PPTX
1.nigam shah stanford_meetup
PPTX
The Hive Think Tank: Rocking the Database World with RocksDB
PPTX
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
PDF
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
PPS
San martin 2013 2014
PPT
Redbook
PDF
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
PDF
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
PDF
Expt panel hive_data_rp_20130320_final-1
PPTX
Pre production planning
PPTX
La musica
PDF
Mumhsocialpdf
PPTX
Bizitzaren historia
Data preprocessing
Ten Commandments for Tackling Fraud: The Role of Big Data and Predictive Anal...
My magazine edited
Very beautiful
Tomer Shiran, MapR_Hadoop&SQL
Susheel Patel, Pivotal_Hadoop&SQL
Redefine healthcare with IT by Niranjan Thirumale
1.nigam shah stanford_meetup
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive "Data Virtualization" Introduction - Jim Green, CEO of Composite Sof...
Notes from the (greasy) field by Ranjit Nair - Co-founder and CTO, Altizon
San martin 2013 2014
Redbook
[Japanese Content] Lance Riedel_The App Server, The Hive in Tokyo_Aug29
[Japanese Content] TM Ravi_ Tokyo Presentation_TheHive_Sept 2013
Expt panel hive_data_rp_20130320_final-1
Pre production planning
La musica
Mumhsocialpdf
Bizitzaren historia
Ad

Similar to Experiences with big data by Srinivasan Seshadri (20)

PDF
Data science and good questions eric kostello
PPTX
Data science
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PPTX
Data science
PPTX
Data science
PDF
Yo. big data. understanding data science in the era of big data.
PPTX
Don't Fear Failure
PDF
Clare Corthell: Learning Data Science Online
PPTX
Data Mining Lecture_2.pptx
PDF
The top mistakes you're making in your Data Science interview - Omri Allouche
PPTX
Online Course: Real Statistics: A Radical Approach
PPTX
Correlation does not mean causation
PPTX
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
PDF
Data Science and some classical challenges by Søren Højsgaard
PPTX
Researching artificial intelligence and machine learning a lgorithms final
PPTX
Tips for Doing Accurate Data Journalism
PDF
Data science
PDF
Introduction to big data for the EA course at Solvay MBA
PPTX
How to Become a Data Science Company instead of a company with Data Scientist...
PDF
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...
Data science and good questions eric kostello
Data science
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
Data science
Data science
Yo. big data. understanding data science in the era of big data.
Don't Fear Failure
Clare Corthell: Learning Data Science Online
Data Mining Lecture_2.pptx
The top mistakes you're making in your Data Science interview - Omri Allouche
Online Course: Real Statistics: A Radical Approach
Correlation does not mean causation
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science and some classical challenges by Søren Højsgaard
Researching artificial intelligence and machine learning a lgorithms final
Tips for Doing Accurate Data Journalism
Data science
Introduction to big data for the EA course at Solvay MBA
How to Become a Data Science Company instead of a company with Data Scientist...
Pdf analytics-and-witch-doctoring -why-executives-succumb-to-the-black-box-me...

More from The Hive (20)

PDF
"Responsible AI", by Charlie Muirhead
PPTX
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
PDF
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
PDF
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
PPTX
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
PDF
Data Science in the Enterprise
PDF
AI in Software for Augmenting Intelligence Across the Enterprise
PPTX
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
PPTX
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
PPTX
Social Impact & Ethics of AI by Steve Omohundro
PDF
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
PDF
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
PDF
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
PPTX
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
PDF
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
PPTX
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
PDF
The Hive Think Tank: Heron at Twitter
PPTX
The Hive Think Tank: Unpacking AI for Healthcare
PPTX
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
PDF
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
"Responsible AI", by Charlie Muirhead
Translating a Trillion Points of Data into Therapies, Diagnostics, and New In...
Digital Transformation; Digital Twins for Delivering Business Value in IIoT
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
Data Science in the Enterprise
AI in Software for Augmenting Intelligence Across the Enterprise
“ High Precision Analytics for Healthcare: Promises and Challenges” by Sriram...
"The Future of Manufacturing" by Sujeet Chand, SVP&CTO, Rockwell Automation
Social Impact & Ethics of AI by Steve Omohundro
The Hive Think Tank: AI in The Enterprise by Venkat Srinivasan
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...
The Hive Think Tank: The Future Of Customer Support - AI Driven Automation
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital Change
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
The Hive Think Tank: Heron at Twitter
The Hive Think Tank: Unpacking AI for Healthcare
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Global journeys: estimating international migration
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Logistic Regression ml machine learning.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
Global journeys: estimating international migration
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
STUDY DESIGN details- Lt Col Maksud (21).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Supervised vs unsupervised machine learning algorithms
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Logistic Regression ml machine learning.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Moving the Public Sector (Government) to a Digital Adoption
Database Infoormation System (DBIS).pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Knowledge Engineering Part 1

Experiences with big data by Srinivasan Seshadri