SlideShare a Scribd company logo
WANdisco and Hadoop:
The Future of Big Data
     December 11, 2012
• WANdisco: Wide Area Network Distributed Computing
• Patented technology for active-active replication
• Leader in tools for software engineers (Subversion)
• No venture capital, angel investors or private equity funding
• Listed on the London Stock Exchange on June 1, 2012 in a highly
  successful IPO (LSE:WAND)
• Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast
  (UK), Chengdu (China), Tokyo (Japan)




                                                                    2
WANdisco Technology   Traditional Approach




                                             3
"unlike conventional solutions, the multi-site
computing system architecture does not rely on a central
transaction coordinator that is known to be a single-point-of-
failure."
                                                                  4
“Big Data is the new
       definitive source of
   competitive advantage
 across all industries. For
 those organizations that
understand and embrace
    the new reality of Big
 Data, the possibilities for
new innovation, improved
    agility, and increased
    profitability are nearly
                   endless”
                     - Wikibon




                             5
• Fixing specific problems:
    • Easy to use appliance
    • High availability
    • Disaster recovery / zero time
      to recovery (over a WAN)
• Highly differentiated
    • Nobody else can do
      active-active replication
      over a WAN




                                      6
Dr. Konstantin Shvachko
•   Co-founder of AltorStor, acquired by WANdisco
•   Was part of the team that invented Hadoop at Yahoo in 2006 and went on to
    become the Principal Big Data architect at eBay
•   Credited with the creation and maintenance of the Hadoop Distributed File
    System (HDFS), which is at the very core of both Hadoop and any replication
    solution for Hadoop
Jagane Sundar
•   Co-founder of AltorStor, acquired by WANdisco
•   Was responsible for conceiving, architecting and managing the development
    of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud
•   Visionary behind AltoStor’s Cloud and Big Data Storage Appliance
•   Former Director of Hadoop Engineering at Yahoo! and managed the
    development of Hadoop 0.20.204 with Disk Fail In Place
Complementary IP and skills
•   Ideal fit with our patented active-active replication technology
•   Altostor founders faced problems (scaling, performance and high availability)
    we are planning to solve

                                                                                    7
• 24-by-7 Reliability, Availability, Scalability and Performance
• Planned as well as unplanned outages are extremely expensive
• Steep learning curve and dearth of trained specialists
• Many enterprises forced to rely on public cloud options such as Amazon
    •   Expensive hourly billing models
    •   Vendor lock-in with difficult migration paths
    •   Periodic availability and performance problems
    •   Data security concerns with cloud-based deployments
• Moving from batch model to real-time transaction model
• Our product suite will be designed to meet all of these challenges



                                                                           8
• Plug-and-play pre-packaged software eliminates the need for
  specialized Hadoop skills
• Wizard based deployment, monitoring and management
• Supports migration from Amazon to private in-house clouds
• S3-enabled filesystem unique to WANdisco’s AltoStor appliance
   • Allows searches for any kind of data (images, videos, etc.) based
      on descriptive characteristics
• HBase support for real-time transaction processing




                                                                         9
NameNode    NameNode
       NameNode




     HDFS Data




                       10
• Works over a LAN within a single data
  center
                                          NameNode    NameNode
• Works over a WAN across data centers
  thousands of miles apart

• Supports simultaneous read and write
  access on every server

                                              HDFS Data




                                                                 11
Public Cloud S3 Apps for
            the private cloud – e.g.
            JungleDisk, SmugMug,
             Senduit, Zmanda, etc.                                        Traditional Hadoop M/R
                                                    HBase Apps                     Apps

Step 1
  WANdisco AltoStor
     Appliance
  Hadoop Mgmt Server                   S3 API          HBase API        HDFS API            JobTracker
• Deploy Hadoop(s)
• Manage Hadoop                                              AltoStor Hadoop
• Monitor Hadoop
                                     Step 3



    Enterprise                                     Physical (e.g. rack of Dell servers) or
     Active                                        Virtual Infrastructure (e.g. VMware VI)
    Directory
                   Step 2
                                              for WANdisco AltoStor to use in building Hadoop

                                                                                                         12
• WANdisco AltoStor appliance has the capability to deploy on virtualized
  infrastructure such as VMware
• Advantages:
    • Extra level of reliability
    • Elasticity – live cluster shrinking and expansion
    • Extremely high hardware resource utilization
    • Resource isolation
    • Ease of management – preconfigured VMs




                                                                            13
HDFS Clients



HDFS Cluster
                                       DataNodes


      Active
                   Standby
    NameNode
                  NameNode




                      Shared Storage



                                                   14
Client                                 Client                                     Client




      Active                               Active                                       Active
    NameNode                             NameNode                             …       NameNode
                    Proposal Handler




                                                           Proposal Handler




                                                                                                      Proposal Handler
                                                                                  Dispatch
                                       Dispatch
Dispatch




                                          WANdisco PAXOS


                                                                                                                         15
Questions and
  Answers
Thank You


http://blogs.wandisco.com/autho
r/jagane-sundar

Visit us www.wandisc.com

More Related Content

PDF
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
02.28.13 WANdisco ApacheCon 2013
PPTX
Selective Data Replication with Geographically Distributed Hadoop
PPTX
Hadoop Backup and Disaster Recovery
PPTX
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
PPTX
Hadoop configuration & performance tuning
PDF
HDFS Architecture
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
Supporting Financial Services with a More Flexible Approach to Big Data
02.28.13 WANdisco ApacheCon 2013
Selective Data Replication with Geographically Distributed Hadoop
Hadoop Backup and Disaster Recovery
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Hadoop configuration & performance tuning
HDFS Architecture

What's hot (20)

PPTX
Hadoop File system (HDFS)
PPTX
Geo-based content processing using hbase
PPTX
Hadoop introduction
PPTX
HDFS Tiered Storage
PDF
Hadoop disaster recovery
PPTX
Backup and Disaster Recovery in Hadoop
PPTX
Hadoop
PDF
Hadoop Fundamentals I
PDF
Hadoop Overview kdd2011
PPTX
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
PPTX
Apache Hadoop
PPTX
Introduction to Hadoop
PPT
PDF
Hadoop 101
 
PPTX
2. hadoop fundamentals
PDF
Hadoop HDFS
PPTX
Overview of Big data, Hadoop and Microsoft BI - version1
PDF
Distributed Computing with Apache Hadoop: Technology Overview
PDF
Design, Scale and Performance of MapR's Distribution for Hadoop
PPTX
Introduction to HDFS
Hadoop File system (HDFS)
Geo-based content processing using hbase
Hadoop introduction
HDFS Tiered Storage
Hadoop disaster recovery
Backup and Disaster Recovery in Hadoop
Hadoop
Hadoop Fundamentals I
Hadoop Overview kdd2011
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Apache Hadoop
Introduction to Hadoop
Hadoop 101
 
2. hadoop fundamentals
Hadoop HDFS
Overview of Big data, Hadoop and Microsoft BI - version1
Distributed Computing with Apache Hadoop: Technology Overview
Design, Scale and Performance of MapR's Distribution for Hadoop
Introduction to HDFS
Ad

Viewers also liked (8)

PDF
Supporting Financial Services with a More Flexible Approach to Big Data
KEY
Large scale ETL with Hadoop
PDF
Non-Stop Hadoop for Hortonworks
PPT
Business Continuity And Disaster Recovery Notes
PPT
Disaster Recovery Presentation
PPTX
An Introduction to Disaster Recovery Planning
PPTX
The A to Z Guide to Business Continuity and Disaster Recovery
PPT
Disaster Recovery Plan for IT
Supporting Financial Services with a More Flexible Approach to Big Data
Large scale ETL with Hadoop
Non-Stop Hadoop for Hortonworks
Business Continuity And Disaster Recovery Notes
Disaster Recovery Presentation
An Introduction to Disaster Recovery Planning
The A to Z Guide to Business Continuity and Disaster Recovery
Disaster Recovery Plan for IT
Ad

Similar to Hadoop and WANdisco: The Future of Big Data (20)

PPTX
Why Virtualization is important by Tom Phelan of BlueData
PDF
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
PDF
Hadoop Successes and Failures to Drive Deployment Evolution
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
PDF
Discover hdp 2.2 hdfs - final
PDF
End of RAID as we know it with Ceph Replication
PDF
Intro to GlusterFS Webinar - August 2011
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
PPTX
Hadoop in the Clouds, Virtualization and Virtual Machines
ODP
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
PDF
Discover.hdp2.2.ambari.final[1]
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
PDF
Inside Triton, July 2015
PDF
Red Hat Storage - Introduction to GlusterFS
PDF
Inside the Hadoop Machine @ VMworld
PDF
App Cap2956v2 121001194956 Phpapp01 (1)
PDF
App cap2956v2-121001194956-phpapp01 (1)
Why Virtualization is important by Tom Phelan of BlueData
13. The Transition to IPv6 and the Necessity for IP Address Management - Frey...
Hadoop Successes and Failures to Drive Deployment Evolution
Vmware Serengeti - Based on Infochimps Ironfan
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
Discover hdp 2.2 hdfs - final
End of RAID as we know it with Ceph Replication
Intro to GlusterFS Webinar - August 2011
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hadoop in the Clouds, Virtualization and Virtual Machines
Liberate Your Files with a Private Cloud Storage Solution powered by Open Source
Discover.hdp2.2.ambari.final[1]
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
New Ceph capabilities and Reference Architectures
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Inside Triton, July 2015
Red Hat Storage - Introduction to GlusterFS
Inside the Hadoop Machine @ VMworld
App Cap2956v2 121001194956 Phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)

More from WANdisco Plc (13)

PDF
Hadoop scalability
PDF
Forrester On Using Subversion to Optimize Globally Distributed Development
PPTX
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
PPTX
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
PPTX
02.19.13 WANDisco SVN Training: Branching Options for Development
PPTX
uberSVN introduction by WANdisco
PPT
Subversion Zen
PPT
WANdisco Subversion Support Services
PPT
Make Subversion Agile
PPT
Why Svn
PPT
Subversion in 2010 and Beyond
PPT
Forrester Research on Optimizing Globally Distributed Software Development Us...
PPT
Forrester Research on Globally Distributed Development Using Subversion
Hadoop scalability
Forrester On Using Subversion to Optimize Globally Distributed Development
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
02.28.13 WANDisco SVN Training: Getting Info Out of SVN
02.19.13 WANDisco SVN Training: Branching Options for Development
uberSVN introduction by WANdisco
Subversion Zen
WANdisco Subversion Support Services
Make Subversion Agile
Why Svn
Subversion in 2010 and Beyond
Forrester Research on Optimizing Globally Distributed Software Development Us...
Forrester Research on Globally Distributed Development Using Subversion

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Big Data Technologies - Introduction.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
A Presentation on Artificial Intelligence
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Monthly Chronicles - July 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Big Data Technologies - Introduction.pptx
Network Security Unit 5.pdf for BCA BBA.

Hadoop and WANdisco: The Future of Big Data

  • 1. WANdisco and Hadoop: The Future of Big Data December 11, 2012
  • 2. • WANdisco: Wide Area Network Distributed Computing • Patented technology for active-active replication • Leader in tools for software engineers (Subversion) • No venture capital, angel investors or private equity funding • Listed on the London Stock Exchange on June 1, 2012 in a highly successful IPO (LSE:WAND) • Offices in San Ramon (CA), Boston (MA), Sheffield (UK), Belfast (UK), Chengdu (China), Tokyo (Japan) 2
  • 3. WANdisco Technology Traditional Approach 3
  • 4. "unlike conventional solutions, the multi-site computing system architecture does not rely on a central transaction coordinator that is known to be a single-point-of- failure." 4
  • 5. “Big Data is the new definitive source of competitive advantage across all industries. For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless” - Wikibon 5
  • 6. • Fixing specific problems: • Easy to use appliance • High availability • Disaster recovery / zero time to recovery (over a WAN) • Highly differentiated • Nobody else can do active-active replication over a WAN 6
  • 7. Dr. Konstantin Shvachko • Co-founder of AltorStor, acquired by WANdisco • Was part of the team that invented Hadoop at Yahoo in 2006 and went on to become the Principal Big Data architect at eBay • Credited with the creation and maintenance of the Hadoop Distributed File System (HDFS), which is at the very core of both Hadoop and any replication solution for Hadoop Jagane Sundar • Co-founder of AltorStor, acquired by WANdisco • Was responsible for conceiving, architecting and managing the development of AltoScale’s Hadoop As A Service platform before selling it to VertiCloud • Visionary behind AltoStor’s Cloud and Big Data Storage Appliance • Former Director of Hadoop Engineering at Yahoo! and managed the development of Hadoop 0.20.204 with Disk Fail In Place Complementary IP and skills • Ideal fit with our patented active-active replication technology • Altostor founders faced problems (scaling, performance and high availability) we are planning to solve 7
  • 8. • 24-by-7 Reliability, Availability, Scalability and Performance • Planned as well as unplanned outages are extremely expensive • Steep learning curve and dearth of trained specialists • Many enterprises forced to rely on public cloud options such as Amazon • Expensive hourly billing models • Vendor lock-in with difficult migration paths • Periodic availability and performance problems • Data security concerns with cloud-based deployments • Moving from batch model to real-time transaction model • Our product suite will be designed to meet all of these challenges 8
  • 9. • Plug-and-play pre-packaged software eliminates the need for specialized Hadoop skills • Wizard based deployment, monitoring and management • Supports migration from Amazon to private in-house clouds • S3-enabled filesystem unique to WANdisco’s AltoStor appliance • Allows searches for any kind of data (images, videos, etc.) based on descriptive characteristics • HBase support for real-time transaction processing 9
  • 10. NameNode NameNode NameNode HDFS Data 10
  • 11. • Works over a LAN within a single data center NameNode NameNode • Works over a WAN across data centers thousands of miles apart • Supports simultaneous read and write access on every server HDFS Data 11
  • 12. Public Cloud S3 Apps for the private cloud – e.g. JungleDisk, SmugMug, Senduit, Zmanda, etc. Traditional Hadoop M/R HBase Apps Apps Step 1 WANdisco AltoStor Appliance Hadoop Mgmt Server S3 API HBase API HDFS API JobTracker • Deploy Hadoop(s) • Manage Hadoop AltoStor Hadoop • Monitor Hadoop Step 3 Enterprise Physical (e.g. rack of Dell servers) or Active Virtual Infrastructure (e.g. VMware VI) Directory Step 2 for WANdisco AltoStor to use in building Hadoop 12
  • 13. • WANdisco AltoStor appliance has the capability to deploy on virtualized infrastructure such as VMware • Advantages: • Extra level of reliability • Elasticity – live cluster shrinking and expansion • Extremely high hardware resource utilization • Resource isolation • Ease of management – preconfigured VMs 13
  • 14. HDFS Clients HDFS Cluster DataNodes Active Standby NameNode NameNode Shared Storage 14
  • 15. Client Client Client Active Active Active NameNode NameNode … NameNode Proposal Handler Proposal Handler Proposal Handler Dispatch Dispatch Dispatch WANdisco PAXOS 15
  • 16. Questions and Answers

Editor's Notes

  • #11: SPOFs – Name Node, Hbase, YARN
  • #12: SPOFs – Name Node, Hbase, YARN