SlideShare a Scribd company logo
Open source Big Data case study: Building a
platform for remote device support at NetApp
(Part II – Technical)
Topics



                                                     Big Data Perspective

                                                     Case Study: NetApp AutoSupport

                                                     Technology Primer

                                                     Design Overview




Copyright © 2012 Accenture All rights reserved.                                        2
Big Data

         The concept is disruptive. The technology is disruptive. And, markets and
         clients are being impacted.




                                                        1 Wordle for   Credit Suisse, Does Size Matter Only?, September 2011


Copyright © 2012 Accenture All rights reserved.                                                                                3
Shifts in Data and Analytics
                    The changing landscape and required winning strategies are creating shifts
                    within Big Data collection and analytics
                         Data Explosion                                                  Monetization
                                                   • Unstructured data is doubling                         • Growth of enterprise data
                                                     every 3 months                                          monetization services
                                                   • 2011 saw 47% growth overall                           • Large retailers monetizing own
                                                   • By 2015, number of networked                            data to provide insights to
                                                     devices will be 2x global                               suppliers
                                                     population
                      Data-led Innovation                                                Social Media
                                                   • De-coupling data from                                 • Growing market for scrubbed,
                                                     applications                                            aggregate data from social
                                                   • Disparate external data shaping                         media and blogs
                                                     context                                               • Greater focus on data that
                                                   • Cost effective mobilization of                          provides insight in a customer’s
                                                     massive scale data                                      digital persona

                           Technology                                                  Data Mobilization
                                                   • Commodity priced storage and                          • Novel approaches to analyze
                                                     compute                                                 unstructured data creating
                                                                                                             shorter time from data to insight
                                                   • Emergence of open source and
                                                     big data technologies solving                         • Shift towards data consumption
                                                     production problems at scale                            in multiple environments
                                                                                                             (business apps, mobile, social)


 Copyright © 2012 Accenture All rights reserved.                                                                                                 4
The Big Data Approach

                                                        Treat data as a strategic asset, seek to
                                                        maximize it’s value to the organization


                                                        Invest in common services, data platforms
                                                        and tools


                                                        Rapidly prototype, deliver, and measure
                                                        value-added data services, evolve over time


                                              •   Data-driven decision making   •   End-to-end ownership of
                                              •   Experimentation and               services
                                                  continuous improvement with   •   Sharing of platform, tools and
                                                  academic rigor                    code
                                                                                                          Culture
Copyright © 2012 Accenture All rights reserved.                                                                      5
Topics



                                                     Big Data Perspective

                                                     Case Study: NetApp AutoSupport

                                                     Technology Primer

                                                     Design Overview




Copyright © 2012 Accenture All rights reserved.                                        6
Client Context

                      NetApp, Inc.
                      • Industry: Data storage, data management
                      • 77% Fortune 500 companies are customers
                      • Creator of Data ONTAP: industry leading storage OS




Copyright © 2012 Accenture All rights reserved.                              7
AutoSupport

                                                                •   Secure automated “call-home” service
                                                                •   Catch issues before they become critical
                                                                •   System monitoring and alerting
                                                                •   RMA requests without customer action
                                                                •   Faster incident management


                                                                         AutoSupport
                                                  Storage Devices         Messages        AutoSupport
                                                                                         Data Warehouse




Copyright © 2012 Accenture All rights reserved.                                                                8
Business Challenges
                                                                                      SAP CRM                   MyASUP               eBI              STOR             ASUP Tools              Analytics & Mining


   • Increase in response times / lower                                                                                                                                                                              Presentation




     availability for services                                                            CRM Module

                                                                                      Rules Module
                                                                                                                    Java Interface

                                                                                                                       Rules
                                                                                                                        Rules
                                                                                                                                                          Jasper

                                                                                                                                                      Stored Proc
                                                                                                                                                                     Rest Interface

                                                                                                                                                                              Rules
                                                                                                                                                                               Rules
                                                                                                                                                                                                     Rules
                                                                                                                                                                                                      Rules
                                                                                                                                                                                                           Various   Interface


                                                                                                                                                                                                                     Rules

   • Incoming data volume doubling every 16
                                                                                                                         Rules                                                   Rules                  Rules
                                                                                                                                           eB
                                                                                  PMBTA                                                                                BI
                                                                                                                                           I
                                                                                                                                                                                                                     Integrate


     months                                                                               Custom ETL            Custom ETL
                                                                                                                                                DSS

                                                                                                                                                      Custom ETL         Custom ETL                                  Transform


   • Proliferation of ad hoc datamarts and                                      Xterra DB               PWillows
                                                                                                                                       DW 3
                                                                                                                                                ODS

                                                                                                                                                                       DW 2                             Adhoc DB’s
                                                                                                                                                                                                                     Stage



     point solutions                                                             Xterra
                                                                                 Parser
                                                                                                          Light
                                                                                                          Parser
                                                                                                                       Parser
                                                                                                                                                Loader

                                                                                                                                            Parser
                                                                                                                                                                    Core
                                                                                                                                                                    Parser                           Adhoc           Extract



   • Unable to analyze full AutoSupport
                                                                                                                                                                                                     Parsers

                                                                                                       Xterra
                                                                                                       File
                                                                                                                                                                                                                     Source

     contents efficiently
                                                                                                                                                         SAP CRM                GEO      DRM      HDD
                                                                                ASUP                                                                     STAGE      PNOW                                   DM
                                                                                                                             File Storage
                                                                                Messages




                                                                     AutoSupport Flat-File Storage Requirement
                                                  3500
                                                  3000
                                                                                 Total Usage (tb)
                                                  2500
                                                                                 Projected Total Usage (tb)
                                                  2000
                                                  1500                           Doubles
                                                  1000
                                                   500
                                                    0
                                                    Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16


Copyright © 2012 Accenture All rights reserved.                                                                                                                                                                                     9
Solution Design Goals
Improve data access and technology cost effectiveness and performance.

 •    Improve system response times
      and data availability
 •    Expose common data services for
      consumption across business units
 •    Standardize key business metrics
      into common rules repository
 •    Lower operational costs as
      ecosystem continues to scale
 •    Provide more granular analytical
      capabilities


 Copyright © 2012 Accenture All rights reserved.                         10
Role of Open Source
                      Platform is composed of open source technologies purpose-built for large-scale
                      storage, processing and analysis




                                                     1 Actual Big Data Solution Blueprint for a hybrid deployment




Copyright © 2012 Accenture All rights reserved.                                                                     11
Topics



                                                     Big Data Perspective

                                                     Case Study: NetApp AutoSupport

                                                     Technology Primer

                                                     Design Overview




Copyright © 2012 Accenture All rights reserved.                                        12
Technology Primer – Hadoop
Hadoop Distributed Filesystem                     Hadoop MapReduce
(HDFS)                                            • Parallel processing for large datasets
• Divides files into smaller “blocks”,              across machines
  stored across machines                          • Breaks job into tasks, using a simple map()
• Automated replication, fault tolerance            and reduce() paradigm for data flows




Copyright © 2012 Accenture All rights reserved.                                              13
Technology Primer – MapReduce

MapReduce
                                                                                         Map(key,value)
(Simple Example – Word Count)
                                                                                         Reduce(key, List<value> values)
                                                  Map Phase              Shuffle Phase

                                                              <one,1>
                                                                                                          <one,1>
                                                     m        <fish,1>
                    Input                                                                                 <two,1>
                                                                                              r
                 One fish,                                    <two,1>
                                                     m        <fish,1>                                    <red,1>
                 two fish,
                                                                                              r           <blue,1>
                 red fish,
                 blue fish.                                   <red,1>
                                                     m
                                                              <fish,1>
                                                                                              r            <fish,4>

                                                     m        <blue,1>
                                                              <fish,1>
Copyright © 2012 Accenture All rights reserved.                                                                            14
Technology Primer – NoSQL

• “Not only” SQL
   • Catch-all term for various non-relational database systems

• Typical areas of differentation
   • Data model semantics
                 • eg. Database, Document, Key-Value
        • CAP trade-offs
                 • Consistency, Availability, Partition-Tolerance
        • Scale-out architecture
                 • eg. Sharding, Distributed hash
        • Query language

                                  Examples: HBase, Cassandra, mongoDB, Neo4j, etc.
Copyright © 2012 Accenture All rights reserved.                                      15
Topics



                                                     Big Data Perspective

                                                     Case Study: NetApp AutoSupport

                                                     Technology Primer

                                                     Design Overview




Copyright © 2012 Accenture All rights reserved.                                        16
Data Pipeline Overview



                                                                           Data Service
                                                                            Interface

                      Incoming Messages


                                                              Core Data      Ad hoc
                                                  Ingestion
                                                              Processing    analytics




                                                                               ETL




Copyright © 2012 Accenture All rights reserved.                                           17
Data Ingestion
    Technologies
    • Apache Flume, Apache Hadoop, Drools BRMS, JMS
    Capabilities
    • Handle dynamic data volumes
                                                                                           Notifications
    • Normalization of disparate file formats
    • Real-time aggregation of documents                                                         JMS

    • JMS alerts for critical messages
                                                         Parsing tier           Aggregation & sink tier

Documents from
Front End HTTP/SMTP                                  Flume              Flume           Flume
Gateway                               Routing tier   agent              agent           agent
                                                                                                           Aggregated files


                                            Flume    Flume              Flume           Flume
                                            client   agent              agent           agent
                                            Rules                                                                    HDFS
                                            Engine
                                                     Flume              Flume           Flume
                                                     agent              agent           agent

Copyright © 2012 Accenture All rights reserved.                                                                               18
Core Data Processing
Technologies
• MapReduce, HBase, Solr, Avro
Capabilities
• Parallel processing for increased throughput
• Efficient storage of complex data objects in Avro
                                                                                                   Search indexes



                                                  Parse text                                           Solr
                                                  contents     Transform and derive data objects
                                                                                                          Primary storage
           Documents gathered
           from Flume                              Map
                                                                                                            HBase
                                                                        Reduce
                                                   Map                 HDFS
                                                                     Write derived objects to            Data warehouse
                                                                     data stores

                                                   Map
                                                                         Reduce                               Hive
Copyright © 2012 Accenture All rights reserved.                                                                             19
Data Services
 Technologies
 • Apache HBase, Solr, Tomcat
 Capabilities
 • Unified web services API for end
   users
 • Support for complex queries and
   searches across multiple dimensions
   with Solr
 • Access both raw and derived content
   for a given system




Copyright © 2012 Accenture All rights reserved.   20
Analytics / ETL
 Technologies
 • Apache Hive, Pig, Datameer (Ad hoc analytics)
 • Pentaho (ETL / Data Integration)
 Capabilities
 • Analytical environment for both business analysts and “power
   users”
    • Hive or Pig as higher level query languages
    • Datameer for analytics with a spreadsheet UI
 • ETL through Pentaho MapReduce
          • (runs Pentaho ETL server inside of a MapReduce Job)



Copyright © 2012 Accenture All rights reserved.                   21
Successes and Challenges
  Successes
  • Web service interface contracts simplified integration with
    user tools, allowed for flexibility in internal implementation
  • Open source core allowed rapid for rapid iteration
  • Met or exceeded all SLAs using commodity hardware,
    significantly driving down costs
  Challenges
  • Monitoring a large distributed system requires discipline and
    a strong operations team
  • Shared storage systems and Big Data technologies don’t
    always play well together
  • “Schemaless” systems can become a headache to
    maintain, especially with complex data models

Copyright © 2012 Accenture All rights reserved.                      22
Thank you

                                                  Jonathan Bender
                                                  Consultant, Accenture Technology Labs
                                                  jonathan.bender@accenture.com




Copyright © 2012 Accenture All rights reserved.                                           23

More Related Content

PPTX
Innovation in maintenance - Accenture
PDF
Infographic - Digitizing Energy: Unlocking business value with digital techno...
PPTX
Workforce 2025 - Financial Services Skills & Roles Of The Future
PDF
Surviving the mobile invasion - They're spreading fast and they're vulnerable
PDF
WUD2010 Sophia 03 - A. Andres Del Valle (Accenture Labs) : Technology design ...
PPTX
Prestige Institute of Management and Research, Indore
PPTX
Accenture’s IT Transformation
PDF
Best Practices in Implementing Social and Mobile CX for Utilities
Innovation in maintenance - Accenture
Infographic - Digitizing Energy: Unlocking business value with digital techno...
Workforce 2025 - Financial Services Skills & Roles Of The Future
Surviving the mobile invasion - They're spreading fast and they're vulnerable
WUD2010 Sophia 03 - A. Andres Del Valle (Accenture Labs) : Technology design ...
Prestige Institute of Management and Research, Indore
Accenture’s IT Transformation
Best Practices in Implementing Social and Mobile CX for Utilities

What's hot (20)

PDF
Big Data - A Real Life Revolution
PPTX
3 Pillars Reworking the Revolution
PDF
Infinite investor presentation March 2013
PDF
The Digital Consumer: Know me, Inform me, Make it easy and Get it to me
PDF
Case study slideshare
PPTX
The Connected Industrial Workforce
PDF
HPMC 2014 - CX and Mobility showcase - Oracle
PPTX
Delivering applications at the pace of business
PDF
Digital transformation slideshare
PPTX
Microsoft Dynamics Customer Stories
PPTX
The Return on Invest in the Internet of Things. Mastering the Digital Transfo...
PPTX
CWIN17 san francisco-al liubinskas- api amplification v4
PDF
Nextgen invent services slideshare
PDF
LBC FINAL presentation
PDF
Accenture Technology Vision 2019 for Pega
PDF
Value-driven Warehouse Automation | Accenture
PDF
Return on Digital Technologies: Insights for OFES Companies
PDF
Strategy slideshare
PDF
Gaining Momentum for IaaS
PDF
Technology Vision 2020: The Analytics Angle with SAS
Big Data - A Real Life Revolution
3 Pillars Reworking the Revolution
Infinite investor presentation March 2013
The Digital Consumer: Know me, Inform me, Make it easy and Get it to me
Case study slideshare
The Connected Industrial Workforce
HPMC 2014 - CX and Mobility showcase - Oracle
Delivering applications at the pace of business
Digital transformation slideshare
Microsoft Dynamics Customer Stories
The Return on Invest in the Internet of Things. Mastering the Digital Transfo...
CWIN17 san francisco-al liubinskas- api amplification v4
Nextgen invent services slideshare
LBC FINAL presentation
Accenture Technology Vision 2019 for Pega
Value-driven Warehouse Automation | Accenture
Return on Digital Technologies: Insights for OFES Companies
Strategy slideshare
Gaining Momentum for IaaS
Technology Vision 2020: The Analytics Angle with SAS
Ad

Viewers also liked (14)

PDF
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
PDF
NetApp FAS2200 Series with Flash Pool
PDF
File recovery with ShareFile on NetApp
PPTX
NetApp Management Pack for VMware vRealize Operations | Blue Medora
PPTX
Cisco UCS with NetApp Storage for SAP HANA Solution
PDF
Docker Orchestrators
PDF
How to solve misalignment lun netapp on linux servers by Ivan
PDF
Apresentações | Jantar Exclusivo Cisco e Netapp | 27 de Junho de 2012 | Spett...
PPTX
Use the power of Microsoft Azure with NetApp Storage
PDF
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
PDF
FedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and Coalfire
PDF
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
PPTX
How to shutdown and power up of the netapp cluster mode storage system
PPTX
IDC EMEA Flash numbers for Q3CY16
VMworld 2013: Low-Cost, High-Performance Storage for VMware Horizon Desktops
NetApp FAS2200 Series with Flash Pool
File recovery with ShareFile on NetApp
NetApp Management Pack for VMware vRealize Operations | Blue Medora
Cisco UCS with NetApp Storage for SAP HANA Solution
Docker Orchestrators
How to solve misalignment lun netapp on linux servers by Ivan
Apresentações | Jantar Exclusivo Cisco e Netapp | 27 de Junho de 2012 | Spett...
Use the power of Microsoft Azure with NetApp Storage
MongoDB Europe 2016 - Deploying MongoDB on NetApp storage
FedRAMP Compliant FlexPod architecture from NetApp, Cisco, HyTrust and Coalfire
Addressing Issues of Risk & Governance in OpenStack without sacrificing Agili...
How to shutdown and power up of the netapp cluster mode storage system
IDC EMEA Flash numbers for Q3CY16
Ad

Similar to OSC2012: Big Data Using Open Source: Netapp Project - Technical (20)

PPTX
Zakipoint Introduction
PPT
01 im overview high level
PPTX
The New Enterprise Data Platform
KEY
Exploring Big Data value for your business
PDF
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
PDF
Building a business intelligence architecture fit for the 21st century by Jon...
PPTX
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
PDF
Analytics big data ibm
PDF
IBM-Infoworld Big Data deep dive
PDF
Ibm big data ibm marriage of hadoop and data warehousing
PDF
Hortonworks roadshow
PPTX
Secure Big Data Analytics - Hadoop & Intel
PDF
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
PPTX
Big data? No. Big Decisions are What You Want
PPTX
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
PDF
Using Big Data Smarter Decision Making
PDF
Shared Services Canada - A Transformational Journey Through Enterprise Initia...
PDF
Information på agendaen
PPT
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
PDF
APAC Big Data Strategy_RK
Zakipoint Introduction
01 im overview high level
The New Enterprise Data Platform
Exploring Big Data value for your business
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Building a business intelligence architecture fit for the 21st century by Jon...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Analytics big data ibm
IBM-Infoworld Big Data deep dive
Ibm big data ibm marriage of hadoop and data warehousing
Hortonworks roadshow
Secure Big Data Analytics - Hadoop & Intel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
Big data? No. Big Decisions are What You Want
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
Using Big Data Smarter Decision Making
Shared Services Canada - A Transformational Journey Through Enterprise Initia...
Information på agendaen
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
APAC Big Data Strategy_RK

More from Accenture the Netherlands (20)

PPTX
Achieving Success in Digital for Manufacturing & Operations
PPTX
Digital grid: Disruptive digital technologies
PDF
High Performance Business Study 2015 H1
PDF
Using simulations and serious games for enabling transformational change
PPTX
Digital Business - Accenture
PDF
HPMC 2014 - How Analytics can improve your customer experience - SAS
PDF
HPMC 2014 - Optimizing customer value - Pega Systems
PDF
HPMC 2014 - The value of analytics - SAS
PDF
HPMC 2014 - How the customer 2.0 changed marketing - Microsoft
PDF
HPMC 2014 - Annual trends service design 2014 - Fjord
PDF
HPMC 2014 - Thirteen tenets of a successful digital business - Acquity group
PDF
HPMC 2014 - Digital business edge - Accenture
PDF
HPMC 2014 - The New Energy Consumer - Accenture Netherlands
PDF
HPMC 2013 - Philips
PDF
HPMC 2013 - Nationale Nederlanden
PDF
HPMC 2013 - Shell
PDF
HPMC 2013 - Oracle
PDF
HPMC 2013 - Microsoft Avanade
PDF
OSC2012: Identity Analytics: Exploiting Digital Breadcrumbs
PDF
OSC2012: How a 2000-Year Old Knot Untangles Legacy
Achieving Success in Digital for Manufacturing & Operations
Digital grid: Disruptive digital technologies
High Performance Business Study 2015 H1
Using simulations and serious games for enabling transformational change
Digital Business - Accenture
HPMC 2014 - How Analytics can improve your customer experience - SAS
HPMC 2014 - Optimizing customer value - Pega Systems
HPMC 2014 - The value of analytics - SAS
HPMC 2014 - How the customer 2.0 changed marketing - Microsoft
HPMC 2014 - Annual trends service design 2014 - Fjord
HPMC 2014 - Thirteen tenets of a successful digital business - Acquity group
HPMC 2014 - Digital business edge - Accenture
HPMC 2014 - The New Energy Consumer - Accenture Netherlands
HPMC 2013 - Philips
HPMC 2013 - Nationale Nederlanden
HPMC 2013 - Shell
HPMC 2013 - Oracle
HPMC 2013 - Microsoft Avanade
OSC2012: Identity Analytics: Exploiting Digital Breadcrumbs
OSC2012: How a 2000-Year Old Knot Untangles Legacy

OSC2012: Big Data Using Open Source: Netapp Project - Technical

  • 1. Open source Big Data case study: Building a platform for remote device support at NetApp (Part II – Technical)
  • 2. Topics  Big Data Perspective  Case Study: NetApp AutoSupport  Technology Primer  Design Overview Copyright © 2012 Accenture All rights reserved. 2
  • 3. Big Data The concept is disruptive. The technology is disruptive. And, markets and clients are being impacted. 1 Wordle for Credit Suisse, Does Size Matter Only?, September 2011 Copyright © 2012 Accenture All rights reserved. 3
  • 4. Shifts in Data and Analytics The changing landscape and required winning strategies are creating shifts within Big Data collection and analytics Data Explosion Monetization • Unstructured data is doubling • Growth of enterprise data every 3 months monetization services • 2011 saw 47% growth overall • Large retailers monetizing own • By 2015, number of networked data to provide insights to devices will be 2x global suppliers population Data-led Innovation Social Media • De-coupling data from • Growing market for scrubbed, applications aggregate data from social • Disparate external data shaping media and blogs context • Greater focus on data that • Cost effective mobilization of provides insight in a customer’s massive scale data digital persona Technology Data Mobilization • Commodity priced storage and • Novel approaches to analyze compute unstructured data creating shorter time from data to insight • Emergence of open source and big data technologies solving • Shift towards data consumption production problems at scale in multiple environments (business apps, mobile, social) Copyright © 2012 Accenture All rights reserved. 4
  • 5. The Big Data Approach Treat data as a strategic asset, seek to maximize it’s value to the organization Invest in common services, data platforms and tools Rapidly prototype, deliver, and measure value-added data services, evolve over time • Data-driven decision making • End-to-end ownership of • Experimentation and services continuous improvement with • Sharing of platform, tools and academic rigor code Culture Copyright © 2012 Accenture All rights reserved. 5
  • 6. Topics  Big Data Perspective  Case Study: NetApp AutoSupport  Technology Primer  Design Overview Copyright © 2012 Accenture All rights reserved. 6
  • 7. Client Context NetApp, Inc. • Industry: Data storage, data management • 77% Fortune 500 companies are customers • Creator of Data ONTAP: industry leading storage OS Copyright © 2012 Accenture All rights reserved. 7
  • 8. AutoSupport • Secure automated “call-home” service • Catch issues before they become critical • System monitoring and alerting • RMA requests without customer action • Faster incident management AutoSupport Storage Devices Messages AutoSupport Data Warehouse Copyright © 2012 Accenture All rights reserved. 8
  • 9. Business Challenges SAP CRM MyASUP eBI STOR ASUP Tools Analytics & Mining • Increase in response times / lower Presentation availability for services CRM Module Rules Module Java Interface Rules Rules Jasper Stored Proc Rest Interface Rules Rules Rules Rules Various Interface Rules • Incoming data volume doubling every 16 Rules Rules Rules eB PMBTA BI I Integrate months Custom ETL Custom ETL DSS Custom ETL Custom ETL Transform • Proliferation of ad hoc datamarts and Xterra DB PWillows DW 3 ODS DW 2 Adhoc DB’s Stage point solutions Xterra Parser Light Parser Parser Loader Parser Core Parser Adhoc Extract • Unable to analyze full AutoSupport Parsers Xterra File Source contents efficiently SAP CRM GEO DRM HDD ASUP STAGE PNOW DM File Storage Messages AutoSupport Flat-File Storage Requirement 3500 3000 Total Usage (tb) 2500 Projected Total Usage (tb) 2000 1500 Doubles 1000 500 0 Jan-05 Jan-06 Jan-07 Jan-08 Jan-09 Jan-10 Jan-11 Jan-12 Jan-13 Jan-14 Jan-15 Jan-16 Copyright © 2012 Accenture All rights reserved. 9
  • 10. Solution Design Goals Improve data access and technology cost effectiveness and performance. • Improve system response times and data availability • Expose common data services for consumption across business units • Standardize key business metrics into common rules repository • Lower operational costs as ecosystem continues to scale • Provide more granular analytical capabilities Copyright © 2012 Accenture All rights reserved. 10
  • 11. Role of Open Source Platform is composed of open source technologies purpose-built for large-scale storage, processing and analysis 1 Actual Big Data Solution Blueprint for a hybrid deployment Copyright © 2012 Accenture All rights reserved. 11
  • 12. Topics  Big Data Perspective  Case Study: NetApp AutoSupport  Technology Primer  Design Overview Copyright © 2012 Accenture All rights reserved. 12
  • 13. Technology Primer – Hadoop Hadoop Distributed Filesystem Hadoop MapReduce (HDFS) • Parallel processing for large datasets • Divides files into smaller “blocks”, across machines stored across machines • Breaks job into tasks, using a simple map() • Automated replication, fault tolerance and reduce() paradigm for data flows Copyright © 2012 Accenture All rights reserved. 13
  • 14. Technology Primer – MapReduce MapReduce Map(key,value) (Simple Example – Word Count) Reduce(key, List<value> values) Map Phase Shuffle Phase <one,1> <one,1> m <fish,1> Input <two,1> r One fish, <two,1> m <fish,1> <red,1> two fish, r <blue,1> red fish, blue fish. <red,1> m <fish,1> r <fish,4> m <blue,1> <fish,1> Copyright © 2012 Accenture All rights reserved. 14
  • 15. Technology Primer – NoSQL • “Not only” SQL • Catch-all term for various non-relational database systems • Typical areas of differentation • Data model semantics • eg. Database, Document, Key-Value • CAP trade-offs • Consistency, Availability, Partition-Tolerance • Scale-out architecture • eg. Sharding, Distributed hash • Query language Examples: HBase, Cassandra, mongoDB, Neo4j, etc. Copyright © 2012 Accenture All rights reserved. 15
  • 16. Topics  Big Data Perspective  Case Study: NetApp AutoSupport  Technology Primer  Design Overview Copyright © 2012 Accenture All rights reserved. 16
  • 17. Data Pipeline Overview Data Service Interface Incoming Messages Core Data Ad hoc Ingestion Processing analytics ETL Copyright © 2012 Accenture All rights reserved. 17
  • 18. Data Ingestion Technologies • Apache Flume, Apache Hadoop, Drools BRMS, JMS Capabilities • Handle dynamic data volumes Notifications • Normalization of disparate file formats • Real-time aggregation of documents JMS • JMS alerts for critical messages Parsing tier Aggregation & sink tier Documents from Front End HTTP/SMTP Flume Flume Flume Gateway Routing tier agent agent agent Aggregated files Flume Flume Flume Flume client agent agent agent Rules HDFS Engine Flume Flume Flume agent agent agent Copyright © 2012 Accenture All rights reserved. 18
  • 19. Core Data Processing Technologies • MapReduce, HBase, Solr, Avro Capabilities • Parallel processing for increased throughput • Efficient storage of complex data objects in Avro Search indexes Parse text Solr contents Transform and derive data objects Primary storage Documents gathered from Flume Map HBase Reduce Map HDFS Write derived objects to Data warehouse data stores Map Reduce Hive Copyright © 2012 Accenture All rights reserved. 19
  • 20. Data Services Technologies • Apache HBase, Solr, Tomcat Capabilities • Unified web services API for end users • Support for complex queries and searches across multiple dimensions with Solr • Access both raw and derived content for a given system Copyright © 2012 Accenture All rights reserved. 20
  • 21. Analytics / ETL Technologies • Apache Hive, Pig, Datameer (Ad hoc analytics) • Pentaho (ETL / Data Integration) Capabilities • Analytical environment for both business analysts and “power users” • Hive or Pig as higher level query languages • Datameer for analytics with a spreadsheet UI • ETL through Pentaho MapReduce • (runs Pentaho ETL server inside of a MapReduce Job) Copyright © 2012 Accenture All rights reserved. 21
  • 22. Successes and Challenges Successes • Web service interface contracts simplified integration with user tools, allowed for flexibility in internal implementation • Open source core allowed rapid for rapid iteration • Met or exceeded all SLAs using commodity hardware, significantly driving down costs Challenges • Monitoring a large distributed system requires discipline and a strong operations team • Shared storage systems and Big Data technologies don’t always play well together • “Schemaless” systems can become a headache to maintain, especially with complex data models Copyright © 2012 Accenture All rights reserved. 22
  • 23. Thank you Jonathan Bender Consultant, Accenture Technology Labs jonathan.bender@accenture.com Copyright © 2012 Accenture All rights reserved. 23