SlideShare a Scribd company logo
4 Advice for your Big Data initiative
Jari Koister
Talk at IEEE Big Data/Cloud Conference, June 28th 2013
Complexity and Direction of Predictive Big Data
A few learning that may increase you likely hood of
success.
Infochimps about challenges….
Brownelles November 14, 20123
Complex Environment
4
DataScience
Big Data
Predictive Analysis
Machine Learning
Marketing Analytics
Sales Analytics
Columnar Data Bases
DataCubes
Hadoop
Hive
Spark
PigImpala
ETL
WebAnalytics
Churn
Segmentation
Clustering
Drill
Propensity
Uplift
Business Intelligence
Chief Intelligence Officer
Data Warehouse
InformationValuation
Entity Linkage
De-duplication
ImmutableStore
MesosSupervised
Un-supervised
Non-parametric
Big Data
Gartner believes big data is neither a technology
nor a distinct and uniquely measured market of
products. We believe it is a phenomenon brought
about by rapid data growth, complex new data
types and parallel advancements in technology,
all combining to enable people to analyze
information in new ways to produce more useful
insights about the world around them.
Brownelles November 14, 20125
6
Hype, Maturity, Potential…
Gartner Hypercycle for Big Data, 2012
What is changing?
Brownelles November 14, 20127
Experts
Intermediate
Beginners
A Few Tens Hundreds
Many
Algorithms
Experimental
Value Focused
Audience
Data Sources
Complexity and Direction of Predictive Big Data
A few learnings that may increase you likely hood of
success.
1st (4) Advice: Don’t get bogged down
in technology.
9
Data Access (Query Expressiveness)
Scale
HDFS
HBase
ParAccelRedShift
Cassandra CouchBase
Cascading
Riak
MySQL
Vertica
InfoBright
VectorWise
Spark
CitusData
WibiData
Phoenix
MSSQL
MSAS Mahout
Map/Reduce
R MatLab
SciPy
Snow
Hive
Impala
Drill Pig
2nd (4) Advice: Find a DQE provider
Brownelles November 14, 201210
Complex
Entity linkage
Fuzzy matching
External data
De duplication
Repetitive
&
Scale
Continous
Lots of data
Common
Necessary
but not
unique
3rd(4) Advice: Be Realistic
Brownelles November 14, 201211
Narrow solution Customized
Low Investment
High Investment
*Size Indicates Return
4th(4) Advice: Scale is expensive,
sample when you can.
12
http://www.agilone.com/email-marketing/what-you-shouldnt-need-to-know-about-big-data-and-machine-learning/
Relation
Simple Complex Noisy Biased
Sample Big Data Overkill ✓ ✓ N/A
Large Overkill ✓ ✓ ≈✓
Small ✓ ✗ ✗ ✗
Data set of
Learning Scoring
Propensity to buy Sample Complete
Customer clustering Sample Complete
Customer segmentation Sample Complete
U2P Recommendation Sample Complete
P2P Recommendations Complete Complete
Bonus Advice: Orchestration is a ….
1
13
Batch Real-timeDead-line-time Speed-of-thought
Eventual
L Revenue
impact
*Size indicates # of customer
immediately impacted
M Revenue
impact
S Revenue
impact
Thank you for listening
jari@agilone.com
14

More Related Content

PDF
Action Intelligence for Social Good
PPTX
Multi Cloud Data Integration- Retail
PPTX
Asking the Right Questions of Your Data
PPTX
The Business Of Big Data (Ga Preso) Final
PDF
Where's the Brain in Analytics
PDF
Becoming (Big) Data Driven presentation at BusinessMeetsIt Big Data seminar M...
PPTX
The Research Board Presentation (1) (1)
PPTX
Big Data and BI Best Practices
Action Intelligence for Social Good
Multi Cloud Data Integration- Retail
Asking the Right Questions of Your Data
The Business Of Big Data (Ga Preso) Final
Where's the Brain in Analytics
Becoming (Big) Data Driven presentation at BusinessMeetsIt Big Data seminar M...
The Research Board Presentation (1) (1)
Big Data and BI Best Practices

What's hot (20)

PDF
Big, small or just complex data?
PPTX
Big Data
PPTX
DMTI Spatial Location Hub Analytics: big data, analytics, visualization
PPTX
Candor - open analytics nyc
PPTX
Big data analytics in banking sector
PPT
"Big Data Dreams"
PPTX
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
PDF
Big Data – From Strategy to Production
PDF
Data Discovery and Governance
 
PDF
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
PDF
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
PPTX
Dell hans timmerman v1.1
PDF
Summary of Insights Learned from the Data Science Program Team Training
PDF
Introduction to Data Mining, Business Intelligence and Data Science
PDF
Chief Data Officer: Evolution to the Chief Analytics Officer and Data Science
PDF
Why Alt Data Is So Important
PDF
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
PPTX
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
PDF
Pieter den Hamer Alliander
PPTX
World of distributed analytics
Big, small or just complex data?
Big Data
DMTI Spatial Location Hub Analytics: big data, analytics, visualization
Candor - open analytics nyc
Big data analytics in banking sector
"Big Data Dreams"
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data – From Strategy to Production
Data Discovery and Governance
 
Microsoft Next 2014 - Insights session 2 - Turning data into a business advan...
3 Steps to Turning CCPA & Data Privacy into Personalized Customer Experiences
Dell hans timmerman v1.1
Summary of Insights Learned from the Data Science Program Team Training
Introduction to Data Mining, Business Intelligence and Data Science
Chief Data Officer: Evolution to the Chief Analytics Officer and Data Science
Why Alt Data Is So Important
The State of Big Data Adoption: A Glance at Top Industries Adopting Big Data ...
Strata Data Conference 2019 : Scaling Visualization for Big Data in the Cloud
Pieter den Hamer Alliander
World of distributed analytics
Ad

Similar to Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013. (20)

PDF
Accenture big-data
PDF
Expert Big Data Tips
PDF
Starting small with big data
PDF
Article Evaluation 4
PPTX
Making advanced analytics work for you
PPTX
Data set The Future of Big Data
PDF
CompTIA Colloquium 2014: Big Data: Are You Ready for this Growing Market?
PDF
PPTX
Making advanced analytics work for you
PDF
Lightning talk on the future of analytics - CloudCamp London, 2016
PDF
43948_HPE Big Data Svcs infographic final
PDF
Big Data; Big Potential: How to find the talent who can harness its power
PDF
Getting down to business on Big Data analytics
PDF
IRJET - Big Data: Evolution Cum Revolution
PDF
Data Science and Culture
PDF
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
PDF
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
PPTX
Making Advanced Analytics Work for You
PPTX
Making advanced analytics work for you
DOCX
Bibliography for Big Data Professionals
Accenture big-data
Expert Big Data Tips
Starting small with big data
Article Evaluation 4
Making advanced analytics work for you
Data set The Future of Big Data
CompTIA Colloquium 2014: Big Data: Are You Ready for this Growing Market?
Making advanced analytics work for you
Lightning talk on the future of analytics - CloudCamp London, 2016
43948_HPE Big Data Svcs infographic final
Big Data; Big Potential: How to find the talent who can harness its power
Getting down to business on Big Data analytics
IRJET - Big Data: Evolution Cum Revolution
Data Science and Culture
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Making Advanced Analytics Work for You
Making advanced analytics work for you
Bibliography for Big Data Professionals
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
August Patch Tuesday
PPT
Teaching material agriculture food technology
PPTX
1. Introduction to Computer Programming.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Getting Started with Data Integration: FME Form 101
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Spectroscopy.pptx food analysis technology
August Patch Tuesday
Teaching material agriculture food technology
1. Introduction to Computer Programming.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Heart disease approach using modified random forest and particle swarm optimi...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
A Presentation on Artificial Intelligence
Encapsulation_ Review paper, used for researhc scholars
cloud_computing_Infrastucture_as_cloud_p
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
OMC Textile Division Presentation 2021.pptx
Network Security Unit 5.pdf for BCA BBA.
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Getting Started with Data Integration: FME Form 101
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Talk at IEEE Big Data/Cloud conference in Santa Clara, June 28th, 2013.

  • 1. 4 Advice for your Big Data initiative Jari Koister Talk at IEEE Big Data/Cloud Conference, June 28th 2013
  • 2. Complexity and Direction of Predictive Big Data A few learning that may increase you likely hood of success.
  • 4. Complex Environment 4 DataScience Big Data Predictive Analysis Machine Learning Marketing Analytics Sales Analytics Columnar Data Bases DataCubes Hadoop Hive Spark PigImpala ETL WebAnalytics Churn Segmentation Clustering Drill Propensity Uplift Business Intelligence Chief Intelligence Officer Data Warehouse InformationValuation Entity Linkage De-duplication ImmutableStore MesosSupervised Un-supervised Non-parametric
  • 5. Big Data Gartner believes big data is neither a technology nor a distinct and uniquely measured market of products. We believe it is a phenomenon brought about by rapid data growth, complex new data types and parallel advancements in technology, all combining to enable people to analyze information in new ways to produce more useful insights about the world around them. Brownelles November 14, 20125
  • 6. 6 Hype, Maturity, Potential… Gartner Hypercycle for Big Data, 2012
  • 7. What is changing? Brownelles November 14, 20127 Experts Intermediate Beginners A Few Tens Hundreds Many Algorithms Experimental Value Focused Audience Data Sources
  • 8. Complexity and Direction of Predictive Big Data A few learnings that may increase you likely hood of success.
  • 9. 1st (4) Advice: Don’t get bogged down in technology. 9 Data Access (Query Expressiveness) Scale HDFS HBase ParAccelRedShift Cassandra CouchBase Cascading Riak MySQL Vertica InfoBright VectorWise Spark CitusData WibiData Phoenix MSSQL MSAS Mahout Map/Reduce R MatLab SciPy Snow Hive Impala Drill Pig
  • 10. 2nd (4) Advice: Find a DQE provider Brownelles November 14, 201210 Complex Entity linkage Fuzzy matching External data De duplication Repetitive & Scale Continous Lots of data Common Necessary but not unique
  • 11. 3rd(4) Advice: Be Realistic Brownelles November 14, 201211 Narrow solution Customized Low Investment High Investment *Size Indicates Return
  • 12. 4th(4) Advice: Scale is expensive, sample when you can. 12 http://www.agilone.com/email-marketing/what-you-shouldnt-need-to-know-about-big-data-and-machine-learning/ Relation Simple Complex Noisy Biased Sample Big Data Overkill ✓ ✓ N/A Large Overkill ✓ ✓ ≈✓ Small ✓ ✗ ✗ ✗ Data set of Learning Scoring Propensity to buy Sample Complete Customer clustering Sample Complete Customer segmentation Sample Complete U2P Recommendation Sample Complete P2P Recommendations Complete Complete
  • 13. Bonus Advice: Orchestration is a …. 1 13 Batch Real-timeDead-line-time Speed-of-thought Eventual L Revenue impact *Size indicates # of customer immediately impacted M Revenue impact S Revenue impact
  • 14. Thank you for listening jari@agilone.com 14