SlideShare a Scribd company logo
Hadoop: What’s Next?
Mike Olson
Reflections On You
12+ months using on average
114.5TB average size
66 average nodes in Use
500+ certified on Hadoop in 1 year
60+PB Total
Data from pre-conference survey
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Immutable Law of Data
RDBMS
Hadoop
Volume, Variety, Velocity increase
Immutable Law of Data
RDBMS
Hadoop
Volume, Variety, Velocity increase
Geopbytes
Brontobytes
Yottabytes
Zettabytes
Exabytes
Terabytes
Linked
Complex
Unstructured
Pre-relational
Raw
Detailed
Heterogeneous
Dirty
Graphs
Large
Schemaless
Hadoop Was Built for Data.
Proven at Scale
Room to Grow
Open Source Wins.
Hadoop: The Core of a Platform
A Platform Built by You
Hue Hue SDK
OozieOozie
HBaseFlume, Sqoop
Zookeeper / Avro
Hive
Pig/
Hive
The Vendor Ecosystem
A Platform Enabling Applications…
Query &
Reporting
Complex
ETL
Trade
Compliance
POS
Analysis
Search
Quality
Click Stream
Analysis
Machine
Learning
Graph Analysis
And
More…
Fraud
Detection
Archive
Scientific Security
Solving Critical Business Problems
• Modeling true risk
• Customer churn
analysis
• Recommendation
engine
• Ad targeting
• PoS transaction analysis
• Analyzing network data
to predict failure
• Threat analysis
• Trade surveillance
• Search quality
• Data “sandbox”
• Capture critical IT data
• Monitoring usage
• Driving bottom line value
• Risk analysis
• Customer insight
• Drive growth
• Customer intimacy
• Precision targeting
• Driving top line growth
So Much To See Today!
• Optimizing search
• Advanced analytics in the Army
• Using Flume &Hive for log data
• Analyzing VOIP data with R
What’s Next?
Market
• Adoption
• Agility
• Flexibility
Technology
• Accelerated innovation
from community
• More tools e.g., monitoring
• More automation
• More stability
• More interfaces
• At the core of the open source platform for
data
• Four years old and going strong!
Keynote - Cloudera - Mike Olson - Hadoop World 2010
Organizational Impact
• More knobs and dials
• Fine grain control
• Achieve previously impossible /
impractical
• Save money
• Save time
• Greater flexibility with data
Copyright 2010 Cloudera Inc. All rights reserved
Hadoop World Keynote (NOTES)
• Themes
– Hadoop is already a big deal
• Keep in mind the why
• Solving real problems now
– It is about the platform with Hadoop at the
core
• Why
• Helps you profit
• More accessible now than ever, real people with
enterprise ops and enterprise skills, no longer the
exclusive demand of the PhDs
– What’s on the Horizon for Hadoop
Copyright 2010 Cloudera Inc. All rights reserved
Hadoop is Having a
Transformative Impact (notes)
• Continued growth and excitement
• Transformative to your career, your enterprise, your market
– Star maker
– Get ready for Hadoop being a big deal for your companies
– Your market – hyper personalization
– Use data to interact in a more customized fashion
– “It’s hard not to have a TB of data” – Mike
– Operability and SLAs for a critical enterprise platform
– Education and training
– A new stack for analytics (CEP (flume) CDH (Sqoop) dbms/BI)
• Future is now
– Use cases now and impact it is having and where it will be, look at
Facebook, Yahoo, eBay etc.
Copyright 2010 Cloudera Inc. All rights reserved
What is on the Horizon for
Hadoop (notes)
• Continued growth and excitement
• Transformative to your career, your enterprise, your market
– Star maker –
• good for your career, help make critical changes in the way customers are supported, major new business opportunities etc.
• Pull cloudera certification #’s
– Get ready for Hadoop being a big deal for your companies
• Enterprise will be more agile and able capture and analyze more data to better target ads, find fraud, etc.
• Agility – impacts the things that matter to you
• What’s happened before the transaction
– Your market – hyper personalization
• 100s’s of vertical apps to be created (developers are you listening?)
• Trend that crosses? Any other trend we can compare to? DBMS growth? Improvements in operations,
• How detailed sources have changed
• Devices, understanding how people interact with your business – retail, online entertainment, fin serv, government
– Use data to interact in a more customized fashion
– “It’s hard not to have a TB of data” – Mike
– Operability and SLAs for a critical enterprise platform
– Education and training
– A new stack for analytics (CEP (flume) CDH (sqoop) dbms/BI)
• Future is now
– Use cases now and impact it is having and where it will be, look at Facebook, Yahoo, eBay etc.
Copyright 2010 Cloudera Inc. All rights reserved
Emerging Importance
of Data Scientist
• Able to impact business at many
levels
• New conference focused data and
data related roles — O’Reilly
Strata Conference
Copyright 2010 Cloudera Inc. All rights reserved
Unprecedented Data Volume,
Velocity and Variety
Data Growth
Out Pacing
Processing Power
Organizations
Swamped and
Turning to Hadoop
61% CAGR
42% CAGR
Data
Transistors
Copyright 2010 Cloudera Inc. All rights reserved
Transforming Analytic
Requirements
• Insight into this data needs more than simple
tabular analysis
– More is needed for meaningful answers
• You can and will do deeper and more
introspective analysis
– Machine learning, natural language processing, clustering,
sophisticated statistical analysis, modeling and back testing
• Looking for patterns
– You can see patterns in lots of data that are invisible in less
data. You need pattern discovery tools
Copyright 2010 Cloudera Inc. All rights reserved
Hadoop: Already a Big Deal!!
Massive Adoption
Vibrant & Growing Community
100’s of PB Under Management
1000’s of Implementations
Benefitting From a Dynamic
OS Community
• Community around
Hadoop is proliferating
and expanding
• > ½ Hadoop sub-projects
promoted to TLPs
• Dozens of related projects
• 100’s of developers
& growing
Copyright 2010 Cloudera Inc. All rights reserved
Interest in Hadoop Has Exploded
More are looking for it
Leading analysts report
significant growth
in inquiries
Major increase
in coverage
Copyright 2010 Cloudera Inc. All rights reserved
A Data Management Platform
Applications
Copyright 2010 Cloudera Inc. All rights reserved
Market Impact
• Hyper personalization
• Extreme targeting
• Expand competitive advantages
• Better retention of customers
• Improved risk analysis

More Related Content

PDF
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
PPTX
Managing Growing Transaction Volumes Using Hadoop
PDF
Analyzing Unstructured Data in Hadoop Webinar
PPTX
Predictive analytics from a to z
PPT
Web analyticsandbigdata techweek2011
PDF
Introduction to Hadoop
PPT
Gartner peer forum sept 2011 orbitz
PPTX
Becoming Data-Driven Through Cultural Change
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Managing Growing Transaction Volumes Using Hadoop
Analyzing Unstructured Data in Hadoop Webinar
Predictive analytics from a to z
Web analyticsandbigdata techweek2011
Introduction to Hadoop
Gartner peer forum sept 2011 orbitz
Becoming Data-Driven Through Cultural Change

What's hot (18)

PPTX
Benchmarking Digital Readiness: Moving at the Speed of the Market
PPTX
Latest corp big data and acme
PPTX
Hooduku - Big data analytics - case study
PDF
Best Practices for Big Data Analytics with Machine Learning by Datameer
PPTX
Modernizing Architecture for a Complete Data Strategy
PDF
The Emerging Data Lake IT Strategy
PDF
The paradox of big data - dataiku / oxalide APEROTECH
PDF
Earley Executive Roundtable Summary - Data Analytics
PPTX
The Future of Data Management: The Enterprise Data Hub
PPTX
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
PPTX
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
PDF
Customer Case Studies of Self-Service Big Data Analytics
PPTX
Unlocking data science in the enterprise - with Oracle and Cloudera
PDF
BAR360 open data platform presentation at DAMA, Sydney
PPTX
Platfora Girl Geek Dinner
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PDF
Intro to Data Science on Hadoop
PDF
Enabling digital business with governed data lake
Benchmarking Digital Readiness: Moving at the Speed of the Market
Latest corp big data and acme
Hooduku - Big data analytics - case study
Best Practices for Big Data Analytics with Machine Learning by Datameer
Modernizing Architecture for a Complete Data Strategy
The Emerging Data Lake IT Strategy
The paradox of big data - dataiku / oxalide APEROTECH
Earley Executive Roundtable Summary - Data Analytics
The Future of Data Management: The Enterprise Data Hub
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes
Customer Case Studies of Self-Service Big Data Analytics
Unlocking data science in the enterprise - with Oracle and Cloudera
BAR360 open data platform presentation at DAMA, Sydney
Platfora Girl Geek Dinner
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Intro to Data Science on Hadoop
Enabling digital business with governed data lake
Ad

Similar to Keynote - Cloudera - Mike Olson - Hadoop World 2010 (20)

PDF
Getting Started with Big Data for Business Managers
PDF
Creating a Next-Generation Big Data Architecture
PDF
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
PDF
How to implement Hadoop successfully
PPTX
How to implement hadoop successfuly
PDF
Are You Prepared For The Future Of Data Technologies?
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
PPTX
PDF
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
PDF
Complement Your Existing Data Warehouse with Big Data & Hadoop
PDF
Incorporating the Data Lake into Your Analytic Architecture
PDF
Big dataservicesatfidel
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PPT
Data Discovery, Visualization, and Apache Hadoop
PDF
Big Data & SQL: The On-Ramp to Hadoop
PPTX
Introduction To Big Data & Hadoop
PPT
Oh! Session on Introduction to BIG Data
PDF
02 a holistic approach to big data
PDF
Operationalizing Data Analytics
PDF
Create your Big Data vision and Hadoop-ify your data warehouse
Getting Started with Big Data for Business Managers
Creating a Next-Generation Big Data Architecture
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
How to implement Hadoop successfully
How to implement hadoop successfuly
Are You Prepared For The Future Of Data Technologies?
Big-Data-Seminar-6-Aug-2014-Koenig
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Complement Your Existing Data Warehouse with Big Data & Hadoop
Incorporating the Data Lake into Your Analytic Architecture
Big dataservicesatfidel
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Data Discovery, Visualization, and Apache Hadoop
Big Data & SQL: The On-Ramp to Hadoop
Introduction To Big Data & Hadoop
Oh! Session on Introduction to BIG Data
02 a holistic approach to big data
Operationalizing Data Analytics
Create your Big Data vision and Hadoop-ify your data warehouse
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
PPTX
Cloudera Data Impact Awards 2021 - Finalists
PPTX
2020 Cloudera Data Impact Awards Finalists
PPTX
Edc event vienna presentation 1 oct 2019
PPTX
Machine Learning with Limited Labeled Data 4/3/19
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
PPTX
Modern Data Warehouse Fundamentals Part 3
PPTX
Modern Data Warehouse Fundamentals Part 2
PPTX
Modern Data Warehouse Fundamentals Part 1
PPTX
Extending Cloudera SDX beyond the Platform
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
PPTX
Analyst Webinar: Doing a 180 on Customer 360
PPTX
Build a modern platform for anti-money laundering 9.19.18
PPTX
Introducing the data science sandbox as a service 8.30.18
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18

Recently uploaded (20)

PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Cloud computing and distributed systems.
NewMind AI Monthly Chronicles - July 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Diabetes mellitus diagnosis method based random forest with bat algorithm
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf

Keynote - Cloudera - Mike Olson - Hadoop World 2010

  • 2. Reflections On You 12+ months using on average 114.5TB average size 66 average nodes in Use 500+ certified on Hadoop in 1 year 60+PB Total Data from pre-conference survey
  • 4. Immutable Law of Data RDBMS Hadoop Volume, Variety, Velocity increase
  • 5. Immutable Law of Data RDBMS Hadoop Volume, Variety, Velocity increase Geopbytes Brontobytes Yottabytes Zettabytes Exabytes Terabytes
  • 7. Hadoop Was Built for Data.
  • 11. Hadoop: The Core of a Platform
  • 12. A Platform Built by You Hue Hue SDK OozieOozie HBaseFlume, Sqoop Zookeeper / Avro Hive Pig/ Hive
  • 14. A Platform Enabling Applications… Query & Reporting Complex ETL Trade Compliance POS Analysis Search Quality Click Stream Analysis Machine Learning Graph Analysis And More… Fraud Detection Archive Scientific Security
  • 15. Solving Critical Business Problems • Modeling true risk • Customer churn analysis • Recommendation engine • Ad targeting • PoS transaction analysis • Analyzing network data to predict failure • Threat analysis • Trade surveillance • Search quality • Data “sandbox”
  • 16. • Capture critical IT data • Monitoring usage • Driving bottom line value
  • 17. • Risk analysis • Customer insight • Drive growth
  • 18. • Customer intimacy • Precision targeting • Driving top line growth
  • 19. So Much To See Today! • Optimizing search • Advanced analytics in the Army • Using Flume &Hive for log data • Analyzing VOIP data with R
  • 20. What’s Next? Market • Adoption • Agility • Flexibility Technology • Accelerated innovation from community • More tools e.g., monitoring • More automation • More stability • More interfaces
  • 21. • At the core of the open source platform for data • Four years old and going strong!
  • 23. Organizational Impact • More knobs and dials • Fine grain control • Achieve previously impossible / impractical • Save money • Save time • Greater flexibility with data Copyright 2010 Cloudera Inc. All rights reserved
  • 24. Hadoop World Keynote (NOTES) • Themes – Hadoop is already a big deal • Keep in mind the why • Solving real problems now – It is about the platform with Hadoop at the core • Why • Helps you profit • More accessible now than ever, real people with enterprise ops and enterprise skills, no longer the exclusive demand of the PhDs – What’s on the Horizon for Hadoop Copyright 2010 Cloudera Inc. All rights reserved
  • 25. Hadoop is Having a Transformative Impact (notes) • Continued growth and excitement • Transformative to your career, your enterprise, your market – Star maker – Get ready for Hadoop being a big deal for your companies – Your market – hyper personalization – Use data to interact in a more customized fashion – “It’s hard not to have a TB of data” – Mike – Operability and SLAs for a critical enterprise platform – Education and training – A new stack for analytics (CEP (flume) CDH (Sqoop) dbms/BI) • Future is now – Use cases now and impact it is having and where it will be, look at Facebook, Yahoo, eBay etc. Copyright 2010 Cloudera Inc. All rights reserved
  • 26. What is on the Horizon for Hadoop (notes) • Continued growth and excitement • Transformative to your career, your enterprise, your market – Star maker – • good for your career, help make critical changes in the way customers are supported, major new business opportunities etc. • Pull cloudera certification #’s – Get ready for Hadoop being a big deal for your companies • Enterprise will be more agile and able capture and analyze more data to better target ads, find fraud, etc. • Agility – impacts the things that matter to you • What’s happened before the transaction – Your market – hyper personalization • 100s’s of vertical apps to be created (developers are you listening?) • Trend that crosses? Any other trend we can compare to? DBMS growth? Improvements in operations, • How detailed sources have changed • Devices, understanding how people interact with your business – retail, online entertainment, fin serv, government – Use data to interact in a more customized fashion – “It’s hard not to have a TB of data” – Mike – Operability and SLAs for a critical enterprise platform – Education and training – A new stack for analytics (CEP (flume) CDH (sqoop) dbms/BI) • Future is now – Use cases now and impact it is having and where it will be, look at Facebook, Yahoo, eBay etc. Copyright 2010 Cloudera Inc. All rights reserved
  • 27. Emerging Importance of Data Scientist • Able to impact business at many levels • New conference focused data and data related roles — O’Reilly Strata Conference Copyright 2010 Cloudera Inc. All rights reserved
  • 28. Unprecedented Data Volume, Velocity and Variety Data Growth Out Pacing Processing Power Organizations Swamped and Turning to Hadoop 61% CAGR 42% CAGR Data Transistors Copyright 2010 Cloudera Inc. All rights reserved
  • 29. Transforming Analytic Requirements • Insight into this data needs more than simple tabular analysis – More is needed for meaningful answers • You can and will do deeper and more introspective analysis – Machine learning, natural language processing, clustering, sophisticated statistical analysis, modeling and back testing • Looking for patterns – You can see patterns in lots of data that are invisible in less data. You need pattern discovery tools Copyright 2010 Cloudera Inc. All rights reserved
  • 30. Hadoop: Already a Big Deal!! Massive Adoption Vibrant & Growing Community 100’s of PB Under Management 1000’s of Implementations
  • 31. Benefitting From a Dynamic OS Community • Community around Hadoop is proliferating and expanding • > ½ Hadoop sub-projects promoted to TLPs • Dozens of related projects • 100’s of developers & growing Copyright 2010 Cloudera Inc. All rights reserved
  • 32. Interest in Hadoop Has Exploded More are looking for it Leading analysts report significant growth in inquiries Major increase in coverage Copyright 2010 Cloudera Inc. All rights reserved
  • 33. A Data Management Platform Applications Copyright 2010 Cloudera Inc. All rights reserved
  • 34. Market Impact • Hyper personalization • Extreme targeting • Expand competitive advantages • Better retention of customers • Improved risk analysis

Editor's Notes

  • #3: ** 180 public, self-declared powered by Apache Hadoop and related projects (apache powered by wiki)** that’s a small fraction of users** There are 900+ people hereAttendees right now: 915Responses: 151Avg usage in months: > 12mo (for non 0 responses), 8.76mo overallAverage storage size: 114.5TB (for non 0 responses) 78TB overallTotal data under management: 11.79PB for responses, 104.7PB extrapolatedAverage # nodes: 66.25 (for non 0 responses) 46 overallTotal notes under management: 6957 for responses, 60625 extrapolatedLongest time spent on Hadoop: 36 monthsMost data: 2PBLargest cluster: 1300 nodesPercentage over 10 nodes: 55%Percentage over 10TB: 38.5%** 40% are here to start learning about Hadoop
  • #4: Last year, we told you that Hadoop was going to be a big deal. It is.You’re here, so you know that. You’re the reason why.Shout outs: Doug, Tom (new book!); Yahoo! for its early and continued contribution (security). Global community.Three things driving success.
  • #5: The big story is data. There’s a lot of it. The increase is fastest in varieties of data poorly suited to RDBMS’ tabular storage. 2.5 zettaytes (x1000 exabytes) of new information in 2012 Digital universe grew by 62% last year to 800 zettabytes and will grow to 1.2 yottabytes this year
  • #6: In case you want to know what comes next.Keynotes will sound increasingly ridiculous.
  • #7: Corollary:With volume, variety and velocity, data is messy.In the 80s and 90s, RDBMS was the answer.Now, we see row stores and column stores (Vertica), emerging class of distributed hash-style systems (Membase, Hbase, Cassandra), Hadoop. Major shift in the data center!
  • #8: Hadoop was built for data. Cheap storage lets you deal with volume, but also: variety and velocity.This technology was invented by the biggest names on the Web – Google, Facebook, Yahoo! – to solve their business problems. But those are also your business problems. Those companies weren’t different than you; they were just a little bit in the future. Hadoop is the high-value analytics engine for today’s businesses.
  • #9: It is famous for getting big – no “up to”.
  • #10: Variety and velocity sometimes come before volume.RECALL: First reason for Hadoop’s success: data.
  • #11: Second reason: Open source.Don’t hand Oracle a loaded gun.Community is thrashing proprietary. Bill Joy: All the smartest people work somewhere else.Linux, Postgres, MySQL, JBoss, Xensource: Commoditiized existing markets.Not so, Hadoop! No existing commercial product does large-scale, powerful analytics over a mixture of complex and structured data. First truly innovative OSS platform I know.
  • #12: Third reason: No longer just HDFS and MapReduce.Talking to Kellan Elliott-McRae, VP Eng Etsy. “Toolchain is mature.” Talks like an engineer, but he’s right.Last year: Hadoop. This year: A mature, flexible, powerful data management platform ready for enterprise-grade use.Cloudera has been in the market for two years, solving real problems. We’ve badly needed this platform to emerge from the pieces.We believe deeply in the open source ethos. Not political: Gives us a huge unfair advantage in the market.We spent two years listening to you.
  • #13: Today, we are announcing the newest release of CDHv3.Integrated the critical pieces of the platform from the global open source community.Incorported Yahoo!’s excellent work on security – see Todd Lipcon’s talk today.Deeply committed to OSS. This is and will always be a 100% pure Apache-licensed platform. No one can turn off your access to your data, and no one can take away your critical business analytics.
  • #14: Hadoop: Innovative, powerful, platform, but just one piece of the puzzle in your data center.Needs to play nicely with all the other systems and tools you run.Established vendors, with Cloudera, have announced integration with Hadoop. You’ll see this list grow quickly over the next year. “Right tool for the job.”
  • #15: This platform is ripe for creation of new vertical apps and analytical tools. Expect to see an explosion in the number and variety of solutions built on it.
  • #16: So what are people doing?You’re surrounded by more than 900 other people today. They’ll tell you how they’re using Hadoop to solve real business problems. Not science projects; not a toy. The platform is driving costs down and letting companies extract profit from data in ways that were impossible just a few years ago.You’ll hear about real use cases from smart people.
  • #17: You'll hear General Electric talk about how they're able to capture and monitor data on their mission-critical IT systems. They're able to track load and usage, monitor what users are doing and watch for impending failures. GE runs a large internal cloud; with the Hadoop platform, they're squeezing costs out of that cloud, and driving value straight to the bottom line.
  • #18: You'll hear Bank of America talk about its very fast-growing Hadoop cluster. BofA uses the platform to better understand its customers, and to more precisely score risk in its individual user accounts and in its portfolio as a whole. These days, the player in the market who is the most right about risk wins, and wins big. Banks can reduce their reserve capital requirements, freeing up money to invest elsewhere, and can avoid loss and drive gains by choosing market positions with the best information available.
  • #19: You'll hear eBay describe how they're using this platform to get much closer to customers. They're able to understand much better the preferences and behavior of individual users. That means they can deliver precisely-targeted products in auctions, tailored exactly to the individual. That precise targeting turns into more purchases, and more purchases drives eBay's top line.
  • #20: Forty-two sessions across a very long day.Unlike last year, we’ve scheduled lots of breaks for networking and hallway conversations – always the real reason to attend a conference!Hilary Mason from bit.ly, Flip Kromer of Infochimps, Drew Conway, Terry Jones and Clay Johnson are the young Turks transforming the industry, defining a whole new profession of "data scientist" and creating new frameworks and tools for understanding data and creating insight and profit out of information. Hit the coffee breaks and work the hallways hard.
  • #21: We are, all together, in the middle of a dramatic change in how the world works. Data is central to government, business, security and leisure. That's more true today than it was when we saw you here last year. It will be still more true next year. We have all, together, created this amazing, powerful, innovative new open source software platform.Over the next twelve months, you’ll see the market realize this even more broadly. You’ll see the platform continue to evolve and improve. You’ll see more companies take advantage of the power of Hadoop to unlock information from data and drive value and profit.
  • #22: Look how far Hadoop has come!Data, open source and the emergence of the platform have made the software mainstream. It’s enterprise-ready, and works with enterprise tools. Look at the progress since Doug created the project back in 2006. Imagine the transformation we’ll see over the next four years.And I love how, even with all that progress, the little yellow elephant is still pushing!Go out and enjoy the day. Hear some great talks and make new friends. Get a book signed by Tom. We’ll see you back here this afternoon – and don’t forget the reception afterward, one floor down.
  • #23: Tim is the founder and CEO of O’Reilly Media. Its goal: to be a catalyst for technology change by capturing and transmitting the knowledge of "alpha geeks" and other innovators.Privileged to know him for about fourteen years.Learned a great deal about open source, community and technology from conversations, his speeches and his writings.Created a number of very important industry conferences – OSCON, Etech and others. New in February – Strata, aimed squarely at data scientists.Thank Tim for coming.
  • #26: Let's also try to drive home the message that we expect hundreds of vertical apps to be built on top of CDH over the next year or two. So it's not just about the users, it's about the developers (, developers, developers!!!).
  • #28: Opportunities for individuals to find and solve interesting, professionally rewarding problems.help make critical changes in the way customers are supported and interacted withContribute to the bottom linemajor new business opportunities etc.
  • #29: New kinds of data introduced daily – mashups, sensors, log files, videos etc.Data created at machine speedI think it's worth highlighting that this platform acknowledges the three biggest trends in enterprise data management today: growth in data volumes, growth in unstructured data, and clouds/commodity hardware. Analytics and apps are compelling – see the stories at this conference
  • #35: Creation of new leadersSeparate competitors