SlideShare a Scribd company logo
BIG
                                                          DATA
                                                         IS CHANGING THE
                                                              WORLD


© Copyright 2010 EMC Corporation. All rights reserved.                     1
IN THIS DECADE THE DIGITAL UNIVERSE

    WILL GROW 44X
      FROM 0.9 ZETTABYTES TO 35.2 ZETTABYTES



     Source : 2010 IDC Digital Universe Study



© Copyright 2010 EMC Corporation. All rights reserved.   2
90% OF THE
      DIGITAL UNIVERSE IS
      UNSTRUCTURED

      Source: 2011 IDC Digital Universe Study



© Copyright 2010 EMC Corporation. All rights reserved.   3
Big Data Has Arrived
                                                                       Electronic
                                                                       Payments                    Video Rendering



                                                                                       Video
       Mobile Sensors                         Social Media                          Surveillance

                                                                             Medical Imaging
                                                                                                         Gene
                                                                                                         Sequencing

                                                         Geophysical
             Smart Grids
                                                         Exploration


© Copyright 2010 EMC Corporation. All rights reserved.                                                                4
Deliver Better Healthcare With Big Data
      Billion Dollar Specialty Care Service Provider
                                                        Legacy System &                                     New System &
                                                                                            International
                                                        Traditional Data                       Results            Big Data
                              Quality Of Patient Care




                                                                            Treatment
                                                                           Pathways On
                                                            Treatment      All The Data
                                                           Pathways On                                          Individual
                                                          Summary Data                                        Patient History

                                                                                           Social &
                                                                                          Economic
                                                                                           Factors




© Copyright 2010 EMC Corporation. All rights reserved.                                                                          5
Increase Profit Margins With Big Data
      Retail Banking Firm Aligns Offers To Customers
                                                Legacy System &                                        New System &
                                                Traditional Data                                             Big Data

                                                                      Profit-Based
                              Customer Profit




                                                                   Recommendations

                                                                                                            Identify
                                                     Agent
                                                                                                           “At-Risk”
                                                  “Best Guess”
                                                                                                          Customers
                                                                                        User Based
                                                                                     Recommendations




© Copyright 2010 EMC Corporation. All rights reserved.                                                                  6
Classifying and segmenting Big Data
          • Rich content stores—original intellectual property or value-added
                   – Media, VOD, content creation, special effects, satellite imagery, GIS data

          • Generated from workflow—must be managed/processed quickly & cheaply
                   – Manufacturing, simulation, electronic design

          • Develop new intellectual property based on big data
                   – Pharmaceutical companies doing customised drug development

          • Companies, public sector, utilities mining data for business advantage
          • Some mine consumer data—higher-volume and potentially higher-value



© Copyright 2010 EMC Corporation. All rights reserved.                                            7
Big Data is File & Unstructured Data
                            90
                            80
                            70
                            60
                EXABYTES




                            50
                            40
                            30
                            20
                            10
                             0
                                        2009             2010           2011        2012          2013      2014
                                                         File Based: 60.7% CAGR   Block Based: 21.8% CAGR

                           By 2012, 80% of all storage capacity sold will be for file-based data

         Source: IDC
© Copyright 2010 EMC Corporation. All rights reserved.                                                             8
Why is Big Data appearing now?




                                                         Source: IDC



© Copyright 2010 EMC Corporation. All rights reserved.                 9
Gartner’s 3 V’s of Big Data




© Copyright 2010 EMC Corporation. All rights reserved.   10
“The Internet of Things”
          • Massive explosion of smart devices, all sending, receiving, storing data
                   – handhelds, tablets, cameras
                   – Human-oriented devices
          • Non-human-oriented devices
                   – sensors, embedded CPUs
          • Social networking messages & data grow exponentially
                   – Twitter feeds, Facebook updates, LinkedIn messages
          • Increasingly, business is conducted digitally – or digitized
          • Big Data is global – any source to any target


© Copyright 2010 EMC Corporation. All rights reserved.                                 11
Source:
                                                         GoGlobe



© Copyright 2010 EMC Corporation. All rights reserved.             12
Companies want to store big data—Why?
          • Google – Originally thought of as “search engine”
                   – Now: Storing the Internet, storing every search query
          • Facebook, Twitter – Just social media?
                   – Storing every message you send, monitoring every
                     market trend
          • Amazon – your every purchase, forever
          • Carriers – Storing location-based data on everyone

© Copyright 2010 EMC Corporation. All rights reserved.                       13
Social Networking Analysis
                                                         Courtesy of NSF Workshop on Social Modeling


© Copyright 2010 EMC Corporation. All rights reserved.                                                 14
The race is on
      • Big Data leads to the Optimised Organisation
      • Takes a long time to build a functioning data
        warehouse, analytics tools, connect to business
      • Many companies have a head start
      • Every CIO needs to consider Big Data in their
        strategy to stay ahead
               – How to manage, how to leverage

© Copyright 2010 EMC Corporation. All rights reserved.    15
A little retailer I once knew
      • Why can Amazon beat everyone on price?
      • Purchase information used to adjust supply chain
      • Shipping and logistics adjusted according to conditions on
        the ground and supply chain
      • Other customers’ information used to provide
        recommendations, improve experience
      • Not just Amazon: Tesco, Carrefour, Metro, etc all taking
        advantage

© Copyright 2010 EMC Corporation. All rights reserved.               16
How do we make decisions?
      • Good data is hard to get—so often on no data at all
      • Often on information from peers, colleagues,
        reports, or because it’s always been done that way
      • Many companies fail because they fail to detect
           shifts in consumer demand
      • Internet has made customers more segmented, and
        causes customer choice to change faster

© Copyright 2010 EMC Corporation. All rights reserved.        17
Moving to a Data-Driven Model
                                                         • Managing with the facts
                                                         • Making a science out of data!
                                                         • Experimental model—different
                                                           than BI
                                                         • Moving from “gut feel” to
                                                           rational, scentific decisions


© Copyright 2010 EMC Corporation. All rights reserved.                                     18
Big-Data-based Decisions
      • Unlock value by making information transparent
        and useable at higher frequency
      • More accurate information (e.g. inventories, trends)
      • Tailor products more precisely
      • Sophisticated analytics makes for better decisions
      • Better products (via web feedback, sensors, etc)
                                                         Source: McKinsey



© Copyright 2010 EMC Corporation. All rights reserved.                      19
What holds back big data?
      • Not ICT—compute & storage getting
        bigger, cheaper, easier
      • Not the quantity of data (see slide 1)
      • Not the value—large-scale Big Data
        projects generally have great ROI
      • Real problems are organisational
        change and talent acquisition


© Copyright 2010 EMC Corporation. All rights reserved.   20
© Copyright 2010 EMC Corporation. All rights reserved.   21
How are people doing it?
         • Enterprises ingesting > 1PB data per day within 5 yrs
         • Big data is often largely unstructured
         • Hadoop is an application written to analyze big data
                   –     open source, Java-based

         • Big data can mean billions to trillions of files
                   –     Each file can be gigabytes to terabytes in size
         •    Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language
              processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are
              examples of big data techniques

         • This means petabytes to exabytes of data




© Copyright 2010 EMC Corporation. All rights reserved.                                                                                     22
How do you manage and design for Big Data?
          • Scale and parallelism are the keys
                   – Big data is far too big to process sequentially
                   – Too much coming in too quickly
                   – Example: Banks seeking to process market data
                     more quickly, reducing decision making time from
                     days to minutes
          • Answer: Scale-out storage and scale-out processing


© Copyright 2010 EMC Corporation. All rights reserved.                  23
Cramming big data onto traditional models
       Server




                                                         Scalability
       Network




                                                         Performance
                                                         Management
                                                         Availability
                                                         Cost
       Storage




© Copyright 2010 EMC Corporation. All rights reserved.                  24
A different idea – scale-out
       Server




                                                         Scalability
       Network




                                                         Performance
                                                         Management
                                                         Availability
                                                         Cost
       Storage




© Copyright 2010 EMC Corporation. All rights reserved.                  25
Enterprise Hadoop: Greenplum & Isilon
      • Easier and more reliable
               – Packaged Hadoop distribution with Isilon storage
      • Purpose-built Hadoop infrastructure
               – Faster, less risk
      • Sharing expertise to address the talent gap
               – Architecture, data science, and roadmap services
      • Proven at scale with worldwide support
               – 24x7 one call Hadoop support from EMC
               – Key component of Greenplum UAP
               – Unstructured data processing


© Copyright 2010 EMC Corporation. All rights reserved.              26
Increasing Demand for Advanced Analytics
      • Complex
               – Deep, rich analysis of big data sets
               – Ad hoc, interactive analysis, not structured reports
      • Timely
               – On-going, frequent analysis (e.g. daily, weekly)
               – Insights delivered in minutes/seconds
      • Actionable
               – Forward looking, predictive insight
               – Create new business value


© Copyright 2010 EMC Corporation. All rights reserved.                  27
EMC Greenplum: Purpose-built for Big Data
      • EMC Greenplum is a shared nothing, massively parallel
        processing (MPP) data warehouse system
      • Core principle of data computing is to move the processing
        dramatically closer to the data and to the people

                          Fast Data
                           Loading
                                                         Extreme Performance        Unified
                                                          & Elastic Scalability   Data Access


© Copyright 2010 EMC Corporation. All rights reserved.                                          28
MPP Shared-Nothing Architecture
    Greenplum’s Massively                                                                                         MapReduce
    Parallel Processing (MPP)
    Database has extreme
    scalability on general purpose                         Master
    systems                                                Servers                              ...                           ...
                                                         Query planning
    Automatic parallelization                             and dispatch

         – Load and query like any                         Network
           database                                      Interconnect
    Scan and process in parallel                          Segment                                                                                     ...
         – Extremely scalable and I/O                     Servers ...
           optimized                                      Storage and
                                                             query             ...      ...    ...    ...   ...      ...      ...   ...   ...   ...
    Linear scalability by adding                          processing

    nodes                                                  External
         – Each adds storage, query                        Sources
           performance and loading                        MPP loading,
                                                         streaming, etc.
           performance



© Copyright 2011 EMC Corporation. All rights reserved.       EMC Confidential – NDA Required                                                                29
EMC Hadoop.
                                                         Open Source.
                                                         Fully Supported By
                                                         EMC.
© Copyright 2010 EMC Corporation. All rights reserved.                        30
The EMC Big Data “Stack”
4       Collaborative                                          Act
                                                         Documentum xCP
                                                                             ?

3          Real Time                                        Analyze
                                                         Greenplum, Hadoop

2       Structured &
        Unstructured



                                                              Store
1          Petabyte
            Scale
                                                          Isilon and Atmos



© Copyright 2010 EMC Corporation. All rights reserved.                           31
THANK
                                                          YOU
                                                         HAVE A GREAT
                                                         CONFERENCE!
© Copyright 2010 EMC Corporation. All rights reserved.                  32

More Related Content

PDF
Saiful hidayat perkembangan e health dalam dunia kefarmasian i stn
PDF
Saiful Hidayat : Telkom ehealth Cloud, Infrastruktur Cloud Computing Untuk Re...
PPTX
2012 Windstream Overview[1]
PDF
101 ab 1415-1445
PDF
Intel Cloud summit: Big Data by Nick Knupffer
PDF
IT Forum 26.09.12 - Byerne - fremtidens Mekka for IT
PDF
Widyatama.lecture.applied networking.iv-week-08-multimedia+ims
PPT
MFW12: Dirk deRoos (IBM)
Saiful hidayat perkembangan e health dalam dunia kefarmasian i stn
Saiful Hidayat : Telkom ehealth Cloud, Infrastruktur Cloud Computing Untuk Re...
2012 Windstream Overview[1]
101 ab 1415-1445
Intel Cloud summit: Big Data by Nick Knupffer
IT Forum 26.09.12 - Byerne - fremtidens Mekka for IT
Widyatama.lecture.applied networking.iv-week-08-multimedia+ims
MFW12: Dirk deRoos (IBM)

Viewers also liked (15)

PPT
NAG December 2012
PPTX
Licensing challenges under distributed education - Martyn Jansen
PPT
Electronic resources and student attainment - Phil Adams
PPTX
Single sign-on to online subscriptions with OpenAthens
PPTX
Traditional outsourcing is dead! What should your new ICT partner look like?
PPTX
Usability workshop: the good, the bad and the ugly
PDF
NAG Presentation to Eduserv Maths and Stats Software Dec2014
PPTX
IWMW11: A2 working against the silo
PPTX
Cloud Computing - a legal view from Bird & Bird
PPT
Trust me – I’m a journalist - Frank Walker
PPTX
Security in the cloud - making it a safe prospect
PPT
Extending Access Management to Business & Community Engagement - John Paschoud
PPTX
Logos, Labels and Login - Rod Widdowson
PPT
The potential of DevOps for cloud
PPTX
Making federations work together more effectively - Nicole Harris, JISC Adva...
NAG December 2012
Licensing challenges under distributed education - Martyn Jansen
Electronic resources and student attainment - Phil Adams
Single sign-on to online subscriptions with OpenAthens
Traditional outsourcing is dead! What should your new ICT partner look like?
Usability workshop: the good, the bad and the ugly
NAG Presentation to Eduserv Maths and Stats Software Dec2014
IWMW11: A2 working against the silo
Cloud Computing - a legal view from Bird & Bird
Trust me – I’m a journalist - Frank Walker
Security in the cloud - making it a safe prospect
Extending Access Management to Business & Community Engagement - John Paschoud
Logos, Labels and Login - Rod Widdowson
The potential of DevOps for cloud
Making federations work together more effectively - Nicole Harris, JISC Adva...
Ad

Similar to Rob anderson (20)

PDF
Greenplum hadoop
PDF
Greenplum hadoop
PPTX
KMWorld Presentation
PDF
Big Data Analytics
 
PDF
Big data cloud cloud circle keynote_final laura colvine 8th november 2012
 
PDF
Big Data & the Cloud
PDF
IBM Big Data Platform Nov 2012
PPTX
TSB_IoT_Presentations_27June2012
PPTX
Demonstrating the Future of Data Science
PDF
101 ab 1415-1445
PDF
48 benot-long
PDF
DiabetesManagement mHIseminar.Peeples
PDF
Peter Schelkens - Future Media and Imaging
PDF
Keynote - Randy Newell of IBM
PPTX
On Demand Cloud Services Coury
PPT
Low Hon Chau
PPTX
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
PPTX
Smarter Planet: How Big Data changes our world
PDF
Powering Next Generation Data Architecture With Apache Hadoop
PDF
Latest news phoenix
Greenplum hadoop
Greenplum hadoop
KMWorld Presentation
Big Data Analytics
 
Big data cloud cloud circle keynote_final laura colvine 8th november 2012
 
Big Data & the Cloud
IBM Big Data Platform Nov 2012
TSB_IoT_Presentations_27June2012
Demonstrating the Future of Data Science
101 ab 1415-1445
48 benot-long
DiabetesManagement mHIseminar.Peeples
Peter Schelkens - Future Media and Imaging
Keynote - Randy Newell of IBM
On Demand Cloud Services Coury
Low Hon Chau
Dr. Shahbaz Ali, CEO at Tarmin - Business Transformation in the Age of Big Data
Smarter Planet: How Big Data changes our world
Powering Next Generation Data Architecture With Apache Hadoop
Latest news phoenix
Ad

More from Eduserv (20)

PPTX
Phase two of OpenAthens SP evolution including OpenID connect option
PPTX
Partnership Licensing - allowing access to licensed resources
PPTX
Lightning talk - EBSCO
PPTX
Lightning talk - Boopsie
PPTX
Lightning talk - Softlink
PPTX
Lightning talk - Third Iron BrowZine
PPTX
Lightning talk - Eduserv Chest Agreements
PPTX
Phase one of OpenAthens SP evolution
PPTX
Key considerations when mapping your end user experience
PPTX
Our product development methodology
PPTX
How Readers Discover Content
PPTX
OpenAthens product update
PPTX
OpenAthens Customer Conference - Welcome address
PPTX
Generating leads with content marketing
PPTX
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
PDF
Mobius from Maplesoft
PDF
QSR NVivo
PPTX
How Eduserv are helping local government organisations
PPTX
Is cloud the right fit for your needs?
PPTX
Planning your cloud strategy: Adur and Worthing Councils
Phase two of OpenAthens SP evolution including OpenID connect option
Partnership Licensing - allowing access to licensed resources
Lightning talk - EBSCO
Lightning talk - Boopsie
Lightning talk - Softlink
Lightning talk - Third Iron BrowZine
Lightning talk - Eduserv Chest Agreements
Phase one of OpenAthens SP evolution
Key considerations when mapping your end user experience
Our product development methodology
How Readers Discover Content
OpenAthens product update
OpenAthens Customer Conference - Welcome address
Generating leads with content marketing
Pre-launch introduction to the new OpenAthens SP dashboard - 13/09/2016
Mobius from Maplesoft
QSR NVivo
How Eduserv are helping local government organisations
Is cloud the right fit for your needs?
Planning your cloud strategy: Adur and Worthing Councils

Recently uploaded (20)

PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Architecture types and enterprise applications.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
OMC Textile Division Presentation 2021.pptx
Chapter 5: Probability Theory and Statistics
Developing a website for English-speaking practice to English as a foreign la...
1. Introduction to Computer Programming.pptx
Getting started with AI Agents and Multi-Agent Systems
NewMind AI Weekly Chronicles - August'25-Week II
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Univ-Connecticut-ChatGPT-Presentaion.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
TLE Review Electricity (Electricity).pptx
Architecture types and enterprise applications.pdf
Programs and apps: productivity, graphics, security and other tools
Zenith AI: Advanced Artificial Intelligence
Hindi spoken digit analysis for native and non-native speakers
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A comparative study of natural language inference in Swahili using monolingua...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Rob anderson

  • 1. BIG DATA IS CHANGING THE WORLD © Copyright 2010 EMC Corporation. All rights reserved. 1
  • 2. IN THIS DECADE THE DIGITAL UNIVERSE WILL GROW 44X FROM 0.9 ZETTABYTES TO 35.2 ZETTABYTES Source : 2010 IDC Digital Universe Study © Copyright 2010 EMC Corporation. All rights reserved. 2
  • 3. 90% OF THE DIGITAL UNIVERSE IS UNSTRUCTURED Source: 2011 IDC Digital Universe Study © Copyright 2010 EMC Corporation. All rights reserved. 3
  • 4. Big Data Has Arrived Electronic Payments Video Rendering Video Mobile Sensors Social Media Surveillance Medical Imaging Gene Sequencing Geophysical Smart Grids Exploration © Copyright 2010 EMC Corporation. All rights reserved. 4
  • 5. Deliver Better Healthcare With Big Data Billion Dollar Specialty Care Service Provider Legacy System & New System & International Traditional Data Results Big Data Quality Of Patient Care Treatment Pathways On Treatment All The Data Pathways On Individual Summary Data Patient History Social & Economic Factors © Copyright 2010 EMC Corporation. All rights reserved. 5
  • 6. Increase Profit Margins With Big Data Retail Banking Firm Aligns Offers To Customers Legacy System & New System & Traditional Data Big Data Profit-Based Customer Profit Recommendations Identify Agent “At-Risk” “Best Guess” Customers User Based Recommendations © Copyright 2010 EMC Corporation. All rights reserved. 6
  • 7. Classifying and segmenting Big Data • Rich content stores—original intellectual property or value-added – Media, VOD, content creation, special effects, satellite imagery, GIS data • Generated from workflow—must be managed/processed quickly & cheaply – Manufacturing, simulation, electronic design • Develop new intellectual property based on big data – Pharmaceutical companies doing customised drug development • Companies, public sector, utilities mining data for business advantage • Some mine consumer data—higher-volume and potentially higher-value © Copyright 2010 EMC Corporation. All rights reserved. 7
  • 8. Big Data is File & Unstructured Data 90 80 70 60 EXABYTES 50 40 30 20 10 0 2009 2010 2011 2012 2013 2014 File Based: 60.7% CAGR Block Based: 21.8% CAGR By 2012, 80% of all storage capacity sold will be for file-based data Source: IDC © Copyright 2010 EMC Corporation. All rights reserved. 8
  • 9. Why is Big Data appearing now? Source: IDC © Copyright 2010 EMC Corporation. All rights reserved. 9
  • 10. Gartner’s 3 V’s of Big Data © Copyright 2010 EMC Corporation. All rights reserved. 10
  • 11. “The Internet of Things” • Massive explosion of smart devices, all sending, receiving, storing data – handhelds, tablets, cameras – Human-oriented devices • Non-human-oriented devices – sensors, embedded CPUs • Social networking messages & data grow exponentially – Twitter feeds, Facebook updates, LinkedIn messages • Increasingly, business is conducted digitally – or digitized • Big Data is global – any source to any target © Copyright 2010 EMC Corporation. All rights reserved. 11
  • 12. Source: GoGlobe © Copyright 2010 EMC Corporation. All rights reserved. 12
  • 13. Companies want to store big data—Why? • Google – Originally thought of as “search engine” – Now: Storing the Internet, storing every search query • Facebook, Twitter – Just social media? – Storing every message you send, monitoring every market trend • Amazon – your every purchase, forever • Carriers – Storing location-based data on everyone © Copyright 2010 EMC Corporation. All rights reserved. 13
  • 14. Social Networking Analysis Courtesy of NSF Workshop on Social Modeling © Copyright 2010 EMC Corporation. All rights reserved. 14
  • 15. The race is on • Big Data leads to the Optimised Organisation • Takes a long time to build a functioning data warehouse, analytics tools, connect to business • Many companies have a head start • Every CIO needs to consider Big Data in their strategy to stay ahead – How to manage, how to leverage © Copyright 2010 EMC Corporation. All rights reserved. 15
  • 16. A little retailer I once knew • Why can Amazon beat everyone on price? • Purchase information used to adjust supply chain • Shipping and logistics adjusted according to conditions on the ground and supply chain • Other customers’ information used to provide recommendations, improve experience • Not just Amazon: Tesco, Carrefour, Metro, etc all taking advantage © Copyright 2010 EMC Corporation. All rights reserved. 16
  • 17. How do we make decisions? • Good data is hard to get—so often on no data at all • Often on information from peers, colleagues, reports, or because it’s always been done that way • Many companies fail because they fail to detect shifts in consumer demand • Internet has made customers more segmented, and causes customer choice to change faster © Copyright 2010 EMC Corporation. All rights reserved. 17
  • 18. Moving to a Data-Driven Model • Managing with the facts • Making a science out of data! • Experimental model—different than BI • Moving from “gut feel” to rational, scentific decisions © Copyright 2010 EMC Corporation. All rights reserved. 18
  • 19. Big-Data-based Decisions • Unlock value by making information transparent and useable at higher frequency • More accurate information (e.g. inventories, trends) • Tailor products more precisely • Sophisticated analytics makes for better decisions • Better products (via web feedback, sensors, etc) Source: McKinsey © Copyright 2010 EMC Corporation. All rights reserved. 19
  • 20. What holds back big data? • Not ICT—compute & storage getting bigger, cheaper, easier • Not the quantity of data (see slide 1) • Not the value—large-scale Big Data projects generally have great ROI • Real problems are organisational change and talent acquisition © Copyright 2010 EMC Corporation. All rights reserved. 20
  • 21. © Copyright 2010 EMC Corporation. All rights reserved. 21
  • 22. How are people doing it? • Enterprises ingesting > 1PB data per day within 5 yrs • Big data is often largely unstructured • Hadoop is an application written to analyze big data – open source, Java-based • Big data can mean billions to trillions of files – Each file can be gigabytes to terabytes in size • Directed graph analysis, Collaborative Filtering, A/B testing, Associative Rule Learning, Classification, Natural Language processing, Data Mining, Pattern Matching, Sentiment Analysis, Comparative Effectiveness, Clinical Decision Support are examples of big data techniques • This means petabytes to exabytes of data © Copyright 2010 EMC Corporation. All rights reserved. 22
  • 23. How do you manage and design for Big Data? • Scale and parallelism are the keys – Big data is far too big to process sequentially – Too much coming in too quickly – Example: Banks seeking to process market data more quickly, reducing decision making time from days to minutes • Answer: Scale-out storage and scale-out processing © Copyright 2010 EMC Corporation. All rights reserved. 23
  • 24. Cramming big data onto traditional models Server Scalability Network Performance Management Availability Cost Storage © Copyright 2010 EMC Corporation. All rights reserved. 24
  • 25. A different idea – scale-out Server Scalability Network Performance Management Availability Cost Storage © Copyright 2010 EMC Corporation. All rights reserved. 25
  • 26. Enterprise Hadoop: Greenplum & Isilon • Easier and more reliable – Packaged Hadoop distribution with Isilon storage • Purpose-built Hadoop infrastructure – Faster, less risk • Sharing expertise to address the talent gap – Architecture, data science, and roadmap services • Proven at scale with worldwide support – 24x7 one call Hadoop support from EMC – Key component of Greenplum UAP – Unstructured data processing © Copyright 2010 EMC Corporation. All rights reserved. 26
  • 27. Increasing Demand for Advanced Analytics • Complex – Deep, rich analysis of big data sets – Ad hoc, interactive analysis, not structured reports • Timely – On-going, frequent analysis (e.g. daily, weekly) – Insights delivered in minutes/seconds • Actionable – Forward looking, predictive insight – Create new business value © Copyright 2010 EMC Corporation. All rights reserved. 27
  • 28. EMC Greenplum: Purpose-built for Big Data • EMC Greenplum is a shared nothing, massively parallel processing (MPP) data warehouse system • Core principle of data computing is to move the processing dramatically closer to the data and to the people Fast Data Loading Extreme Performance Unified & Elastic Scalability Data Access © Copyright 2010 EMC Corporation. All rights reserved. 28
  • 29. MPP Shared-Nothing Architecture Greenplum’s Massively MapReduce Parallel Processing (MPP) Database has extreme scalability on general purpose Master systems Servers ... ... Query planning Automatic parallelization and dispatch – Load and query like any Network database Interconnect Scan and process in parallel Segment ... – Extremely scalable and I/O Servers ... optimized Storage and query ... ... ... ... ... ... ... ... ... ... Linear scalability by adding processing nodes External – Each adds storage, query Sources performance and loading MPP loading, streaming, etc. performance © Copyright 2011 EMC Corporation. All rights reserved. EMC Confidential – NDA Required 29
  • 30. EMC Hadoop. Open Source. Fully Supported By EMC. © Copyright 2010 EMC Corporation. All rights reserved. 30
  • 31. The EMC Big Data “Stack” 4 Collaborative Act Documentum xCP ? 3 Real Time Analyze Greenplum, Hadoop 2 Structured & Unstructured Store 1 Petabyte Scale Isilon and Atmos © Copyright 2010 EMC Corporation. All rights reserved. 31
  • 32. THANK YOU HAVE A GREAT CONFERENCE! © Copyright 2010 EMC Corporation. All rights reserved. 32