Modern Data Architecture
…for Non-Stop Hadoop

© Hortonworks Inc. 2013

Page 1
Your Presenters
• Jagane Sundar (@jagane)
– CTO of Big Data at WANdisco
–  Co-founder of AltoStor and former Director of
Engineering in Yahoo’s Hadoop group
–  Managed Hadoop 0.20.204 release for Yahoo

• Rohit Bakhshi (@Rohit2b)
– Product Management at Hortonworks
–  Focus on HDP Platform Services, Hadoop
Core and Windows enablement
–  Enjoy live jazz and expresso

© Hortonworks Inc. 2013

Page 2
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop in the MDA
• WANdisco’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 3
APPLICATIONS	
  

Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MPP	
  

MANAGE	
  &	
  
MONITOR	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Page 4
APPLICATIONS	
  

Existing Data Architecture
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

DATA	
  	
  SYSTEM	
  

2.8	
  ZB	
  in	
  2012	
  
85%	
  from	
  New	
  Data	
  Types	
  
RDBMS	
  

EDW	
  

MPP	
  

REPOSITORIES	
  

15x	
  Machine	
  Data	
  by	
  2020	
  
40	
  ZB	
  by	
  2020	
  

SOURCES	
  

Source: IDC

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013

Page 5
APPLICATIONS	
  

Modern Data Architecture Enabled
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  
DEV	
  &	
  DATA	
  
TOOLS	
  

SOURCES	
  

DATA	
  	
  SYSTEM	
  

BUILD	
  &	
  
TEST	
  

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
MONITOR	
  

MPP	
  

REPOSITORIES	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 6
Drivers of Hadoop Adoption
Architectural
A Modern Data
Architecture

New Business
Applications

Complement your existing data
systems: the right workload in the
right place

Types of Big Data
•  CRM, ERP
•  Server log
•  Clickstream

•  Sentiment/Social
•  Machine/Sensor
•  Geo-locations

© Hortonworks Inc. 2013 - Confidential

Page 7
Opportunity in types of data
1.  Sentiment
Understand how your customers feel about your brand and
products – right now

2.  Clickstream
Capture and analyze website visitors’ data trails and
optimize your website

3.  Sensor/Machine
Discover patterns in data streaming automatically from
remote sensors and machines

4.  Geographic

Value

Analyze location-based data to manage operations where
they occur

5.  Server Logs
Research logs to diagnose process failures and prevent
security breaches

6.  Unstructured (txt, video, pictures, etc..)
Understand patterns in files across millions of web pages,
emails, and documents

© Hortonworks Inc. 2013 - Confidential

Page 8
3

Requirements for Hadoop Adoption
Requirements for Hadoop’s Role
in the Modern Data Architecture

Integrated

Interoperable with
existing data center
investments

Key Services
Skills

Platform, operational and
data services essential for
the enterprise

Leverage your existing
skills: development,
operations, analytics

© Hortonworks Inc. 2013 - Confidential

Page 9
Requirements for Enterprise Hadoop

1
2
3

Key Services
Platform, Operational and
Data services essential
for the enterprise

OPERATIONAL	
  
SERVICES	
  
AMBARI	
  

HBASE	
  

CORE	
  

PIG	
  

SQOOP	
  
LOAD	
  &	
  	
  
EXTRACT	
  

Skills

	
  
	
  

PLATFORM	
  	
  
SERVICES	
  

Integrated

MAP	
  	
  
REDUCE	
  
	
  

NFS	
  

TEZ	
  

YARN	
  	
  	
  

WebHDFS	
  

KNOX*	
  

HIVE	
  &	
  

HCATALOG	
  

HDFS	
  
Enterprise Readiness
High Availability, Disaster
Recovery, Rolling Upgrades,
Security and Snapshots

HORTONWORKS	
  	
  
DATA	
  PLATFORM	
  (HDP)	
  

Engineered with existing
data center investments
OS/VM	
  

© Hortonworks Inc. 2013 - Confidential

FLUME	
  

FALCON*	
  
OOZIE	
  

Leverage your existing
skills: development,
analytics, operations

DATA	
  
SERVICES	
  

Cloud	
  

Appliance	
  
Page 10
Requirements for Enterprise Hadoop

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

PROVISION	
  

MANAGE	
  

MONITOR	
  

Engineered with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 11
Familiar and Existing Tools

3

Leverage your existing
skills: development,
analytics, operations

Integration

DEVELOP	
  
ANALYZE	
  

2

Skills

Platform, operational and
data services essential
for the enterprise

OPERATE	
  

1

Key Services
COLLECT	
  

PROCESS	
  

BUILD	
  

EXPLORE	
  

QUERY	
  

DELIVER	
  

BusinessObjects BI

PROVISION	
  

MANAGE	
  

MONITOR	
  

Interoperable with existing
data center investments

© Hortonworks Inc. 2013 - Confidential

Page 12
APPLICATIONS	
  

Requirements for Enterprise Hadoop
Custom	
  
Applica4ons	
  

Business	
  	
  
Analy4cs	
  

Packaged	
  
Applica4ons	
  

Integrated with
DEV	
  &	
  DATA	
  
TOOLS	
  

Applications
BUILD	
  &	
  

DATA	
  	
  SYSTEM	
  

Business Intelligence,
TEST	
  
Developer IDEs,
Data Integration

SOURCES	
  

3

OPERATIONAL	
  
TOOLS	
  
RDBMS	
  

EDW	
  

MANAGE	
  &	
  
Systems
MONITOR	
  

MPP	
  

Data Systems & Storage,
Systems Management

REPOSITORIES	
  

Platforms

Integration 	
  
Exis4ng	
  Sources	
  

Engineered with Lexisting
(CRM,	
  ERP,	
  Clickstream,	
   ogs)	
  
data center investments

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Operating Systems,
Virtualization, Cloud,
Appliances

Page 13
DATA	
  SYSTEM	
  

APPLICATIONS	
  

WANdisco in the Modern Data Architecture
BusinessObjects BI

DEV	
  &	
  DATA	
  TOOLS	
  

OPERATIONAL	
  TOOLS	
  
RDBMS	
  

EDW	
  

HANA

MPP	
  

SOURCES	
  

INFRASTRUCTURE	
  

Exis4ng	
  Sources	
  	
  

(CRM,	
  ERP,	
  Clickstream,	
  Logs)	
  

© Hortonworks Inc. 2013 - Confidential

Emerging	
  Sources	
  	
  

(Sensor,	
  Sen4ment,	
  Geo,	
  Unstructured)	
  

Page 14
Non-Stop Hadoop for Hortonworks
•  Non-stop technology delivers continuous uptime
with no data loss
•  One Hadoop cluster across data centers any
distance
•  Eliminates the bottleneck of a single active
NameNode
•  Automatic backup, failover and recovery within
across data centers
•  LAN-speed read and write

© Hortonworks Inc. 2013 - Confidential

Page 15
Today’s Topics
• Introduction
• Drivers for the Modern Data Architecture (MDA)
• Apache Hadoop’s role in the MDA
• WANdisco’s role in the MDA
• Q&A

© Hortonworks Inc. 2013

Page 16
WANdisco Background
u 

WANdisco: Wide Area Network Distributed Computing
–  Enterprise ready, high availability software solutions that enable globally distributed
organizations to meet today’s data challenges of secure storage, scalability and availability

u 

Leader in tools for software engineers – Subversion
–  Apache Software Foundation sponsor

u 

Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)

u 

US patented active-active replication technology granted, November 2012

u 

Global locations
–  San Ramon (CA)
–  Chengdu (China)
–  Tokyo (Japan)
–  Boston (MA)
–  Sheffield (UK)
–  Belfast (UK)

© WANdisco 2013

/ page 17
Customers

© WANdisco 2013
WANdisco
u 

Overarching theme - We’re enabling global protection against:
•  Data loss
•  Downtime
•  Loss of Intellectual Property
•  Loss of revenue/time to market
•  Falling behind the competition

© WANdisco 2013
Non-Stop Hadoop
Extending HDFS across Data Centers
u 

Single HDFS that spans multiple Data
Centers across the world

u 

Provides 100% Uptime for Hadoop

u 

Built as an extension on top of
Apache Hadoop HDFS

u 

100 % HDFS / 100% compatibility
with Hadoop applications –
Applications run unmodified

u 

Applications can run in any Data
Center

u 

Not Simple Mirroring or a Copy

© WANdisco 2013
WANdisco DConE
Distributed Coordination Engine
u 

WANdisco’s patented WAN capable Paxos implementation
–  Mathematically proven
–  Provides distributed co-ordination of File system metadata
• 
• 

Create, Modify, Delete

• 

u 

Active-Active (All locations)
Share nothing (No Leader)

No restrictions on distance between data centers
–  US Patent granted for time independent implementation of Paxos

u 

Not based on SAN block device synchronization such as EMC SRDF
– 

SAN block replication has distance limits resulting from the inability of file systems such as
NTFS and ext4 to tolerate long RTTs to block storage

– 

Possible distribution of corrupted blocks

© WANdisco 2013
Apache Hadoop

© WANdisco 2013

/ page 22
Apache Hadoop

© WANdisco 2013

/ page 23
Apache Hadoop

© WANdisco 2013

/ page 24
Apache Hadoop

© WANdisco 2013

/ page 25
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 26
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 27
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 28
Non-Stop Hadoop over WAN
Continuous availability

© WANdisco 2013

/ page 29
Non-Stop Hadoop over WAN
Unlimited performance and scalability

© WANdisco 2013

/ page 30
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 31
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 32
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 33
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 34
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 35
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 36
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 37
Non-Stop Hadoop over WAN
Automated failover and recovery

© WANdisco 2013

/ page 38
Non-Stop Hadoop
u  Architecture
–  Non-Intrusive - Not Simple Mirroring or a Copy
–  Does not modify Apache Hadoop
–  Runs on HDP 2 and later

u  Provides

100% Uptime for Hadoop

–  Provides Continuous Availability of HDFS Data
–  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures
u 

Enables HDFS to be Deployed Globally – Across the WAN
–  Extends HDFS Across Multiple Data Centers
–  Unifies the HDFS Namespace
–  Exceeds Business Continuity Requirements for SLAs and Compliance

u 

Load Balances NameNode Traffic for Increased Scalability

© WANdisco 2013
DEMO

DEMO

© WANdisco 2013

/ page 40
Use Cases for Non-Stop Hadoop with
Hortonworks
u 

Disaster Recovery
–  Data is as current as possible (no periodic synchronizations)
–  Virtually zero downtime to recover from regional data center failure
–  Regulatory compliance

u 

Load Balancing

u 

Multi Data Center Ingest
–  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP
–  Parallel ingest methods don’t require redirected data streams

u 

Global MapReduce
–  Global Click Stream Analysis
–  Global Log Analysis
–  Etc.

u 

Maximize Resource Utilization
–  All data centers can be used to run different jobs concurrently

© WANdisco 2013

/ page 41
Key Takeaways
Non-Stop Hadoop for Hortonworks
u 

Non-Stop Hadoop make Hadoop Enterprise/Production Ready

u 

Load balancing eliminates the bottleneck of a single NameNode

u 

Active-Active replication solves the Hadoop high availability issue

u 

No job restarts or lost time for NameNode failures (Continuous Availability)

u 

Single HDFS across multiple data centers
–  No out of sync issues
–  No Load Balancer maintenance problems

u 

Data Centers can be located at any distance from each other

u 

If any Data Center fails, applications can be run on any other replicated Data
Center

u 

If a Data Center is completely lost, any other replica of that Data Center can be
used to restore it

© WANdisco 2013

/ page 42
Next Steps:
More about Non-Stop Hadoop for Hortonworks
http://www.wandisco.com/hadoop/non-stop-hadoophortonworks

Get started on Hadoop with Hortonworks
Sandbox
http://hortonworks.com/hadoop-tutorial/

Try Non-Stop Hadoop for Hortonworks
Contact us: WANdisco@hortonworks.com

© Hortonworks Inc. 2013

Page 43

More Related Content

PPTX
Backup and Disaster Recovery in Hadoop
PPTX
Top Hadoop Big Data Interview Questions and Answers for Fresher
PPTX
A New "Sparkitecture" for modernizing your data warehouse
PPTX
Evolving HDFS to a Generalized Storage Subsystem
PPTX
Big data Hadoop
PPTX
Data warehousing with Hadoop
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
PPTX
Hadoop crash course workshop at Hadoop Summit
Backup and Disaster Recovery in Hadoop
Top Hadoop Big Data Interview Questions and Answers for Fresher
A New "Sparkitecture" for modernizing your data warehouse
Evolving HDFS to a Generalized Storage Subsystem
Big data Hadoop
Data warehousing with Hadoop
Combine SAS High-Performance Capabilities with Hadoop YARN
Hadoop crash course workshop at Hadoop Summit

What's hot (20)

PDF
Building a Hadoop Data Warehouse with Impala
PPTX
Hadoop in the Cloud - The what, why and how from the experts
PPT
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
PPTX
Deep Learning using Spark and DL4J for fun and profit
PPTX
Mutable Data in Hive's Immutable World
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
PPTX
Hadoop and Hive in Enterprises
PPTX
Introduction to Hadoop - The Essentials
PDF
Big Data Architecture Workshop - Vahid Amiri
PPTX
Introduction to Hadoop
PPTX
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
PPT
The Time Has Come for Big-Data-as-a-Service
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
PDF
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
PDF
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
PPTX
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
PPTX
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
PPTX
Hortonworks Yarn Code Walk Through January 2014
Building a Hadoop Data Warehouse with Impala
Hadoop in the Cloud - The what, why and how from the experts
Best Practices for Deploying Hadoop (BigInsights) in the Cloud
Deep Learning using Spark and DL4J for fun and profit
Mutable Data in Hive's Immutable World
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hadoop and Hive in Enterprises
Introduction to Hadoop - The Essentials
Big Data Architecture Workshop - Vahid Amiri
Introduction to Hadoop
20140228 - Singapore - BDAS - Ensuring Hadoop Production Success
The Time Has Come for Big-Data-as-a-Service
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
YARN: the Key to overcoming the challenges of broad-based Hadoop Adoption
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Big Data in the Cloud - The What, Why and How from the Experts
Hortonworks Yarn Code Walk Through January 2014
Ad

Viewers also liked (20)

PPTX
Hadoop Backup and Disaster Recovery
PPT
Disaster Recovery & Data Backup Strategies
PPTX
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
PDF
Hadoop disaster recovery
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
PPTX
Selective Data Replication with Geographically Distributed Hadoop
PPTX
Hadoop Operations - Best Practices from the Field
PPT
Disaster Recovery Plan for IT
PPTX
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
PDF
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
PPTX
Biokimia
PPTX
2012 06 hortonworks paris hug
PPTX
Hive data migration (export/import)
PPTX
What the Enterprise Requires - Business Continuity and Visibility
PDF
Integrating Docker with Mesos and Marathon
PPT
Distcp
PPTX
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
PPTX
Hadoop and WANdisco: The Future of Big Data
PDF
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
PPTX
HBase Snapshots
Hadoop Backup and Disaster Recovery
Disaster Recovery & Data Backup Strategies
Solving Hadoop Replication Challenges with an Active-Active Paxos Algorithm
Hadoop disaster recovery
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Selective Data Replication with Geographically Distributed Hadoop
Hadoop Operations - Best Practices from the Field
Disaster Recovery Plan for IT
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Biokimia
2012 06 hortonworks paris hug
Hive data migration (export/import)
What the Enterprise Requires - Business Continuity and Visibility
Integrating Docker with Mesos and Marathon
Distcp
03.13.13 WANDisco SVN Training: Advanced Branching & Merging
Hadoop and WANdisco: The Future of Big Data
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
HBase Snapshots
Ad

Similar to Non-Stop Hadoop for Hortonworks (20)

PDF
Modern Data Architecture: In-Memory with Hadoop - the new BI
PDF
Hortonworks kognitio webinar 10 dec 2013
PDF
Apache Hadoop on the Open Cloud
PDF
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
PDF
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PDF
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
PPTX
Hortonworks Oracle Big Data Integration
PDF
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
PDF
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
PPTX
Yahoo! Hack Europe
PPTX
Munich HUG 21.11.2013
PPTX
PPTX
Hortonworks.bdb
PPTX
Hadoop Reporting and Analysis - Jaspersoft
PDF
Enterprise Apache Hadoop: State of the Union
PDF
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
PDF
Teradata - Presentation at Hortonworks Booth - Strata 2014
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Modern Data Architecture: In-Memory with Hadoop - the new BI
Hortonworks kognitio webinar 10 dec 2013
Apache Hadoop on the Open Cloud
The Modern Data Architecture for Advanced Business Intelligence with Hortonwo...
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
Building a Modern Data Architecture with Enterprise Hadoop
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Hortonworks Oracle Big Data Integration
Web Briefing: Unlock the power of Hadoop to enable interactive analytics
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Yahoo! Hack Europe
Munich HUG 21.11.2013
Hortonworks.bdb
Hadoop Reporting and Analysis - Jaspersoft
Enterprise Apache Hadoop: State of the Union
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Teradata - Presentation at Hortonworks Booth - Strata 2014
Supporting Financial Services with a More Flexible Approach to Big Data
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PPTX
Configure Apache Mutual Authentication
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Five Habits of High-Impact Board Members
PPT
What is a Computer? Input Devices /output devices
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
The various Industrial Revolutions .pptx
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
STKI Israel Market Study 2025 version august
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
Configure Apache Mutual Authentication
Comparative analysis of machine learning models for fake news detection in so...
Enhancing plagiarism detection using data pre-processing and machine learning...
Microsoft Excel 365/2024 Beginner's training
CloudStack 4.21: First Look Webinar slides
Five Habits of High-Impact Board Members
What is a Computer? Input Devices /output devices
UiPath Agentic Automation session 1: RPA to Agents
OpenACC and Open Hackathons Monthly Highlights July 2025
The influence of sentiment analysis in enhancing early warning system model f...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
sustainability-14-14877-v2.pddhzftheheeeee
Zenith AI: Advanced Artificial Intelligence
The various Industrial Revolutions .pptx
Custom Battery Pack Design Considerations for Performance and Safety
STKI Israel Market Study 2025 version august
Developing a website for English-speaking practice to English as a foreign la...
Consumable AI The What, Why & How for Small Teams.pdf
Benefits of Physical activity for teenagers.pptx
Module 1.ppt Iot fundamentals and Architecture

Non-Stop Hadoop for Hortonworks

  • 1. Modern Data Architecture …for Non-Stop Hadoop © Hortonworks Inc. 2013 Page 1
  • 2. Your Presenters • Jagane Sundar (@jagane) – CTO of Big Data at WANdisco –  Co-founder of AltoStor and former Director of Engineering in Yahoo’s Hadoop group –  Managed Hadoop 0.20.204 release for Yahoo • Rohit Bakhshi (@Rohit2b) – Product Management at Hortonworks –  Focus on HDP Platform Services, Hadoop Core and Windows enablement –  Enjoy live jazz and expresso © Hortonworks Inc. 2013 Page 2
  • 3. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 3
  • 4. APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MPP   MANAGE  &   MONITOR   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 4
  • 5. APPLICATIONS   Existing Data Architecture Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DATA    SYSTEM   2.8  ZB  in  2012   85%  from  New  Data  Types   RDBMS   EDW   MPP   REPOSITORIES   15x  Machine  Data  by  2020   40  ZB  by  2020   SOURCES   Source: IDC Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 Page 5
  • 6. APPLICATIONS   Modern Data Architecture Enabled Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   DEV  &  DATA   TOOLS   SOURCES   DATA    SYSTEM   BUILD  &   TEST   OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   MONITOR   MPP   REPOSITORIES   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 6
  • 7. Drivers of Hadoop Adoption Architectural A Modern Data Architecture New Business Applications Complement your existing data systems: the right workload in the right place Types of Big Data •  CRM, ERP •  Server log •  Clickstream •  Sentiment/Social •  Machine/Sensor •  Geo-locations © Hortonworks Inc. 2013 - Confidential Page 7
  • 8. Opportunity in types of data 1.  Sentiment Understand how your customers feel about your brand and products – right now 2.  Clickstream Capture and analyze website visitors’ data trails and optimize your website 3.  Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines 4.  Geographic Value Analyze location-based data to manage operations where they occur 5.  Server Logs Research logs to diagnose process failures and prevent security breaches 6.  Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents © Hortonworks Inc. 2013 - Confidential Page 8
  • 9. 3 Requirements for Hadoop Adoption Requirements for Hadoop’s Role in the Modern Data Architecture Integrated Interoperable with existing data center investments Key Services Skills Platform, operational and data services essential for the enterprise Leverage your existing skills: development, operations, analytics © Hortonworks Inc. 2013 - Confidential Page 9
  • 10. Requirements for Enterprise Hadoop 1 2 3 Key Services Platform, Operational and Data services essential for the enterprise OPERATIONAL   SERVICES   AMBARI   HBASE   CORE   PIG   SQOOP   LOAD  &     EXTRACT   Skills     PLATFORM     SERVICES   Integrated MAP     REDUCE     NFS   TEZ   YARN       WebHDFS   KNOX*   HIVE  &   HCATALOG   HDFS   Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots HORTONWORKS     DATA  PLATFORM  (HDP)   Engineered with existing data center investments OS/VM   © Hortonworks Inc. 2013 - Confidential FLUME   FALCON*   OOZIE   Leverage your existing skills: development, analytics, operations DATA   SERVICES   Cloud   Appliance   Page 10
  • 11. Requirements for Enterprise Hadoop 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   PROVISION   MANAGE   MONITOR   Engineered with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 11
  • 12. Familiar and Existing Tools 3 Leverage your existing skills: development, analytics, operations Integration DEVELOP   ANALYZE   2 Skills Platform, operational and data services essential for the enterprise OPERATE   1 Key Services COLLECT   PROCESS   BUILD   EXPLORE   QUERY   DELIVER   BusinessObjects BI PROVISION   MANAGE   MONITOR   Interoperable with existing data center investments © Hortonworks Inc. 2013 - Confidential Page 12
  • 13. APPLICATIONS   Requirements for Enterprise Hadoop Custom   Applica4ons   Business     Analy4cs   Packaged   Applica4ons   Integrated with DEV  &  DATA   TOOLS   Applications BUILD  &   DATA    SYSTEM   Business Intelligence, TEST   Developer IDEs, Data Integration SOURCES   3 OPERATIONAL   TOOLS   RDBMS   EDW   MANAGE  &   Systems MONITOR   MPP   Data Systems & Storage, Systems Management REPOSITORIES   Platforms Integration   Exis4ng  Sources   Engineered with Lexisting (CRM,  ERP,  Clickstream,   ogs)   data center investments © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Operating Systems, Virtualization, Cloud, Appliances Page 13
  • 14. DATA  SYSTEM   APPLICATIONS   WANdisco in the Modern Data Architecture BusinessObjects BI DEV  &  DATA  TOOLS   OPERATIONAL  TOOLS   RDBMS   EDW   HANA MPP   SOURCES   INFRASTRUCTURE   Exis4ng  Sources     (CRM,  ERP,  Clickstream,  Logs)   © Hortonworks Inc. 2013 - Confidential Emerging  Sources     (Sensor,  Sen4ment,  Geo,  Unstructured)   Page 14
  • 15. Non-Stop Hadoop for Hortonworks •  Non-stop technology delivers continuous uptime with no data loss •  One Hadoop cluster across data centers any distance •  Eliminates the bottleneck of a single active NameNode •  Automatic backup, failover and recovery within across data centers •  LAN-speed read and write © Hortonworks Inc. 2013 - Confidential Page 15
  • 16. Today’s Topics • Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • WANdisco’s role in the MDA • Q&A © Hortonworks Inc. 2013 Page 16
  • 17. WANdisco Background u  WANdisco: Wide Area Network Distributed Computing –  Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability u  Leader in tools for software engineers – Subversion –  Apache Software Foundation sponsor u  Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) u  US patented active-active replication technology granted, November 2012 u  Global locations –  San Ramon (CA) –  Chengdu (China) –  Tokyo (Japan) –  Boston (MA) –  Sheffield (UK) –  Belfast (UK) © WANdisco 2013 / page 17
  • 19. WANdisco u  Overarching theme - We’re enabling global protection against: •  Data loss •  Downtime •  Loss of Intellectual Property •  Loss of revenue/time to market •  Falling behind the competition © WANdisco 2013
  • 20. Non-Stop Hadoop Extending HDFS across Data Centers u  Single HDFS that spans multiple Data Centers across the world u  Provides 100% Uptime for Hadoop u  Built as an extension on top of Apache Hadoop HDFS u  100 % HDFS / 100% compatibility with Hadoop applications – Applications run unmodified u  Applications can run in any Data Center u  Not Simple Mirroring or a Copy © WANdisco 2013
  • 21. WANdisco DConE Distributed Coordination Engine u  WANdisco’s patented WAN capable Paxos implementation –  Mathematically proven –  Provides distributed co-ordination of File system metadata •  •  Create, Modify, Delete •  u  Active-Active (All locations) Share nothing (No Leader) No restrictions on distance between data centers –  US Patent granted for time independent implementation of Paxos u  Not based on SAN block device synchronization such as EMC SRDF –  SAN block replication has distance limits resulting from the inability of file systems such as NTFS and ext4 to tolerate long RTTs to block storage –  Possible distribution of corrupted blocks © WANdisco 2013
  • 22. Apache Hadoop © WANdisco 2013 / page 22
  • 23. Apache Hadoop © WANdisco 2013 / page 23
  • 24. Apache Hadoop © WANdisco 2013 / page 24
  • 25. Apache Hadoop © WANdisco 2013 / page 25
  • 26. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 26
  • 27. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 27
  • 28. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 28
  • 29. Non-Stop Hadoop over WAN Continuous availability © WANdisco 2013 / page 29
  • 30. Non-Stop Hadoop over WAN Unlimited performance and scalability © WANdisco 2013 / page 30
  • 31. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 31
  • 32. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 32
  • 33. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 33
  • 34. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 34
  • 35. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 35
  • 36. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 36
  • 37. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 37
  • 38. Non-Stop Hadoop over WAN Automated failover and recovery © WANdisco 2013 / page 38
  • 39. Non-Stop Hadoop u  Architecture –  Non-Intrusive - Not Simple Mirroring or a Copy –  Does not modify Apache Hadoop –  Runs on HDP 2 and later u  Provides 100% Uptime for Hadoop –  Provides Continuous Availability of HDFS Data –  Guarantees 100% Uptime of HDFS During all 4 Categories of Failures u  Enables HDFS to be Deployed Globally – Across the WAN –  Extends HDFS Across Multiple Data Centers –  Unifies the HDFS Namespace –  Exceeds Business Continuity Requirements for SLAs and Compliance u  Load Balances NameNode Traffic for Increased Scalability © WANdisco 2013
  • 41. Use Cases for Non-Stop Hadoop with Hortonworks u  Disaster Recovery –  Data is as current as possible (no periodic synchronizations) –  Virtually zero downtime to recover from regional data center failure –  Regulatory compliance u  Load Balancing u  Multi Data Center Ingest –  Information doesn’t need to be sent to one DC and then copied back to the other using DistCP –  Parallel ingest methods don’t require redirected data streams u  Global MapReduce –  Global Click Stream Analysis –  Global Log Analysis –  Etc. u  Maximize Resource Utilization –  All data centers can be used to run different jobs concurrently © WANdisco 2013 / page 41
  • 42. Key Takeaways Non-Stop Hadoop for Hortonworks u  Non-Stop Hadoop make Hadoop Enterprise/Production Ready u  Load balancing eliminates the bottleneck of a single NameNode u  Active-Active replication solves the Hadoop high availability issue u  No job restarts or lost time for NameNode failures (Continuous Availability) u  Single HDFS across multiple data centers –  No out of sync issues –  No Load Balancer maintenance problems u  Data Centers can be located at any distance from each other u  If any Data Center fails, applications can be run on any other replicated Data Center u  If a Data Center is completely lost, any other replica of that Data Center can be used to restore it © WANdisco 2013 / page 42
  • 43. Next Steps: More about Non-Stop Hadoop for Hortonworks http://www.wandisco.com/hadoop/non-stop-hadoophortonworks Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/ Try Non-Stop Hadoop for Hortonworks Contact us: WANdisco@hortonworks.com © Hortonworks Inc. 2013 Page 43