SlideShare a Scribd company logo
It’s Not About the “Big”
             in Big Data
$26.5 BILLION




  Nick Leeson      Jerome Kerviel      Kweku Adoboli      Bruno Iksil
   Barings Bank     Societe Generale         UBS              JPMC
    $1.4 billion       $6 billion         $1.4 billion     $17.5 Billion




          Nasa                          $17.6B

Rogue Traders                                            $26.5B
Hadoop in 2009
Hadoop 2012
Hype Cycle




        We are
         here
Crossing the Chasm




        We are
         here
Hadoop - Enabler
Database Harddrive




                              Unstructured ( 61.7% growths )




       time to find one record        =    logb N * 10ms
       log100(100,000,000) * 10ms    =    40ms
       time to read record           =    10ms
       10,000,000 * 50ms             =    5.8 days
Hadoop Harddrive




      throughput                =   10MB/s
      time to transfer record   =   10ms
      10,000,000 * 10ms         =   1.5 days
      random reads              =   (5.8 days)
Laws of Physics



             Random
             Sequential                                                             Values/Sec.
                                316
  Disk
                                                                               53,200,000



                                      1,924
  SSD
                                                                              42,200,000



                                                                             36,700,000
Memory
                                                                                          358,200,000


         1      10        100         1,000   10,000   100,000   1,000,000   10,000,000    100,000,000 1,000,000,000
                                                                                                          Adam Jacobs
                                                                                           The Pathologies of Big Data
People cost




Hardware Cost
Time to insight




Time for decision
Volume

                        Data
                        Size


                      Data
                    Complexity
            Sp ang




                                       es
                                 So ta
              Ch
              ee e




                                   urc
                                   Da
                 do




                                              ety
Ve




                    f




                                               ri
   loc




                                            Va
     y it
Impossible: 360 View
Game Changer

     Slow         Static        Barrier

                                 Business
      ETL      Data Warehouse
                                Intelligence




     Fast       Dynamic            View


    Raw Load      Hadoop        Data Pipeline
NO - SQL




                     RDBMS


  Standard SQL   Structured Data     Response in sec.


     No SQL      Unstructured Data        Batch



                     Hadoop
Common Applications


               Asset Management Analytics
                      Security Analytics
                   Product Cohort Analytics
                   Advanced Web Analytics



    Structured +
                           Many
    Unstructured                           Decision Makers
                        Data Sources
        Data
follow us: @datameer
 @stefanGroschupf
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data
Not about the Big in Big Data

More Related Content

PDF
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
PPTX
Hadoop and R Go to the Movies
PDF
Ibm big data ibm marriage of hadoop and data warehousing
PDF
2012.04.26 big insights streams im forum2
PDF
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
PDF
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
PPTX
Unlocking value in your (big) data
PPT
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
FLASH MEMORY: THE BIG DATA from Structure:Data 2012
Hadoop and R Go to the Movies
Ibm big data ibm marriage of hadoop and data warehousing
2012.04.26 big insights streams im forum2
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Unlocking value in your (big) data
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)

Similar to Not about the Big in Big Data (20)

PDF
Building Big Data Applications
PDF
Intel Cloud Summit: Big Data
PDF
Intel Cloud summit: Big Data by Nick Knupffer
PDF
Meta scale kognitio hadoop webinar
PDF
Dell Fluid Data Management vo virtuálnych prostrediach
PDF
Big data primer
PPTX
Big Data a big deal?
PPTX
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
PPTX
Kurukshetra - Big Data
PPTX
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
PDF
Hortonworks roadshow
PDF
IBM Stream au Hadoop User Group
PDF
Big Data @ Bodensee Barcamp 2010
PDF
Accelerate Return on Data
PDF
Left Brain, Right Brain: How to Unify Enterprise Analytics
PDF
The surge of_storge_ben woo
PDF
Making your Analytics Investment Pay Off - StampedeCon 2012
PDF
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
PPTX
Big Data, Big Content, and Aligning Your Storage Strategy
PDF
Simplifying Big Data Analytics for the Business
Building Big Data Applications
Intel Cloud Summit: Big Data
Intel Cloud summit: Big Data by Nick Knupffer
Meta scale kognitio hadoop webinar
Dell Fluid Data Management vo virtuálnych prostrediach
Big data primer
Big Data a big deal?
Big Data, Hadoop, Hortonworks and Microsoft HDInsight
Kurukshetra - Big Data
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Hortonworks roadshow
IBM Stream au Hadoop User Group
Big Data @ Bodensee Barcamp 2010
Accelerate Return on Data
Left Brain, Right Brain: How to Unify Enterprise Analytics
The surge of_storge_ben woo
Making your Analytics Investment Pay Off - StampedeCon 2012
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Big Data, Big Content, and Aligning Your Storage Strategy
Simplifying Big Data Analytics for the Business
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Ad

Recently uploaded (20)

PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PPT
What is a Computer? Input Devices /output devices
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
August Patch Tuesday
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
STKI Israel Market Study 2025 version august
PPTX
Programs and apps: productivity, graphics, security and other tools
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
project resource management chapter-09.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A comparative study of natural language inference in Swahili using monolingua...
A contest of sentiment analysis: k-nearest neighbor versus neural network
1 - Historical Antecedents, Social Consideration.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
Developing a website for English-speaking practice to English as a foreign la...
Final SEM Unit 1 for mit wpu at pune .pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
What is a Computer? Input Devices /output devices
OMC Textile Division Presentation 2021.pptx
August Patch Tuesday
Hindi spoken digit analysis for native and non-native speakers
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
STKI Israel Market Study 2025 version august
Programs and apps: productivity, graphics, security and other tools
Module 1.ppt Iot fundamentals and Architecture
project resource management chapter-09.pdf

Not about the Big in Big Data

  • 1. It’s Not About the “Big” in Big Data
  • 2. $26.5 BILLION Nick Leeson Jerome Kerviel Kweku Adoboli Bruno Iksil Barings Bank Societe Generale UBS JPMC $1.4 billion $6 billion $1.4 billion $17.5 Billion Nasa $17.6B Rogue Traders $26.5B
  • 5. Hype Cycle We are here
  • 6. Crossing the Chasm We are here
  • 8. Database Harddrive Unstructured ( 61.7% growths ) time to find one record = logb N * 10ms log100(100,000,000) * 10ms = 40ms time to read record = 10ms 10,000,000 * 50ms = 5.8 days
  • 9. Hadoop Harddrive throughput = 10MB/s time to transfer record = 10ms 10,000,000 * 10ms = 1.5 days random reads = (5.8 days)
  • 10. Laws of Physics Random Sequential Values/Sec. 316 Disk 53,200,000 1,924 SSD 42,200,000 36,700,000 Memory 358,200,000 1 10 100 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000 Adam Jacobs The Pathologies of Big Data
  • 12. Time to insight Time for decision
  • 13. Volume Data Size Data Complexity Sp ang es So ta Ch ee e urc Da do ety Ve f ri loc Va y it
  • 15. Game Changer Slow Static Barrier Business ETL Data Warehouse Intelligence Fast Dynamic View Raw Load Hadoop Data Pipeline
  • 16. NO - SQL RDBMS Standard SQL Structured Data Response in sec. No SQL Unstructured Data Batch Hadoop
  • 17. Common Applications Asset Management Analytics Security Analytics Product Cohort Analytics Advanced Web Analytics Structured + Many Unstructured Decision Makers Data Sources Data
  • 18. follow us: @datameer @stefanGroschupf