SlideShare a Scribd company logo
Page 1 © Hortonworks Inc. 2014
Discover HDP 2.1
Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing
Manager
Rohit Bakhshi
Hortonworks Senior Product Manager &
PM for Apache Hadoop & Apache Solr in
Hortonworks Data Platform
Vinod Vavilapalli
Foundational Hadoop Architect, Hortonworks
Engineer, PMC for Apache Hadoop & Leads
YARN Development at Hortonworks
Page 3 © Hortonworks Inc. 2014
Agenda
•  Overview of YARN in HDFS
•  New YARN & HDFS Features in HDP 2.1
•  Q & A
Page 4 © Hortonworks Inc. 2014
OPERATIONS	
  TOOLS	
  
Provision,
Manage &
Monitor
DEV	
  &	
  DATA	
  TOOLS	
  
Build &
Test
A Modern Data ArchitectureAPPLICATIONS	
  DATA	
  	
  SYSTEM	
  
REPOSITORIES	
  
RDBMS	
   EDW	
   MPP	
  
Business	
  	
  
Analy<cs	
  
Custom	
  
Applica<ons	
  
Packaged	
  
Applica<ons	
  
Governance
&Integration
ENTERPRISE HADOOP
Security
Operations
Data Access
Data Management
SOURCES	
  
OLTP,	
  ERP,	
  
CRM	
  Systems	
  
Documents,	
  	
  
Emails	
  
Web	
  Logs,	
  
Click	
  Streams	
  
Social	
  
Networks	
  
Machine	
  
Generated	
  
Sensor	
  
Data	
  
GeolocaCon	
  
Data	
  
Page 5 © Hortonworks Inc. 2014
HDP 2.1: Enterprise Hadoop
HDP 2.1
Hortonworks Data Platform
	
  	
  
Provision,	
  
Manage	
  &	
  
Monitor	
  
	
  
Ambari	
  
Zookeeper	
  
Scheduling	
  
	
  
Oozie	
  
Data	
  Workflow,	
  
Lifecycle	
  &	
  
Governance	
  
	
  
Falcon	
  
Sqoop	
  
Flume	
  
NFS	
  
WebHDFS	
  
YARN	
  :	
  Data	
  Opera<ng	
  System	
  
DATA	
  	
  MANAGEMENT	
  
DATA	
  	
  ACCESS	
  
GOVERNANCE	
  &	
  
INTEGRATION	
  
OPERATIONS	
  
Script	
  
	
  
Pig	
  
	
  
	
  
Search	
  
	
  
Solr	
  
	
  
	
  
SQL	
  
	
  
Hive/Tez,	
  
HCatalog	
  
	
  
	
  
NoSQL	
  
	
  
HBase	
  
Accumulo	
  
	
  
	
  
Stream	
  
	
  	
  
Storm	
  
	
  
	
  
	
  
Others	
  
	
  
In-­‐Memory	
  
AnalyCcs,	
  	
  
ISV	
  engines	
  
1	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
  
°	
  
N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
Batch	
  
	
  
Map	
  
Reduce	
  
	
  
	
  
SECURITY	
  
Authen<ca<on	
  
Authoriza<on	
  
Accoun<ng	
  
Data	
  Protec<on	
  
	
  
Storage:	
  HDFS	
  
Resources:	
  YARN	
  
Access:	
  Hive,	
  …	
  	
  
Pipeline:	
  Falcon	
  
Cluster:	
  Knox	
  
Page 6 © Hortonworks Inc. 2014
HDP 2.1: Data Management
HDP 2.1
Hortonworks Data Platform
Provision,	
  
Manage	
  &	
  
Monitor	
  
	
  
Ambari	
  
Zookeeper	
  
Scheduling	
  
	
  
Oozie	
  
Data	
  Workflow,	
  
Lifecycle	
  &	
  
Governance	
  
	
  
Falcon	
  
Sqoop	
  
Flume	
  
NFS	
  
WebHDFS	
  
DATA	
  	
  ACCESS	
  
GOVERNANCE	
  &	
  
INTEGRATION	
  
OPERATIONS	
  
Script	
  
	
  
Pig	
  
	
  
	
  
Search	
  
	
  
Solr	
  
	
  
	
  
SQL	
  
	
  
Hive/Tez,	
  
HCatalog	
  
	
  
	
  
NoSQL	
  
	
  
HBase	
  
Accumulo	
  
	
  
	
  
Stream	
  
	
  	
  
Storm	
  
	
  
	
  
	
  
Others	
  
	
  
In-­‐Memory	
  
AnalyCcs,	
  	
  
ISV	
  engines	
  
Batch	
  
	
  
Map	
  
Reduce	
  
	
  
	
  
SECURITY	
  
Authen<ca<on	
  
Authoriza<on	
  
Accoun<ng	
  
Data	
  Protec<on	
  
	
  
Storage:	
  HDFS	
  
Resources:	
  YARN	
  
Access:	
  Hive,	
  …	
  	
  
Pipeline:	
  Falcon	
  
Cluster:	
  Knox	
  
	
  	
  
YARN	
  :	
  Data	
  Opera<ng	
  System	
  
DATA	
  	
  MANAGEMENT	
  
1	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
   °	
  
°	
  
°	
  
N	
  
HDFS	
  	
  
(Hadoop	
  Distributed	
  File	
  System)	
  
Page 7 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 8 © Hortonworks Inc. 2014
Apache Hadoop YARN and HDFS
Flexible
Enables other purpose-built data
processing models beyond
MapReduce (batch), such as
interactive and streaming
Efficient
Double processing IN Hadoop on
the same hardware while
providing predictable
performance & quality of service
Shared
Provides a stable, reliable,
secure foundation and
shared operational services
across multiple workloads
The Data Operating System for Hadoop 2.0
Data	
  Processing	
  Engines	
  Run	
  Na<vely	
  IN	
  Hadoop	
  
BATCH	
  
MapReduce	
  
INTERACTIVE	
  
Tez	
  
STREAMING	
  
Storm	
  
IN-­‐MEMORY	
  
Spark	
  
GRAPH	
  
Giraph	
  
SAS	
  
LASR,	
  HPA	
  
ONLINE	
  
HBase,	
  Accumulo	
  
	
  
OTHERS	
  
	
  
HDFS:	
  Redundant,	
  Reliable	
  Storage	
  
YARN:	
  Cluster	
  Resource	
  Management	
  	
  	
  
Page 9 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 10 © Hortonworks Inc. 2014
HDP 2.1 HDFS: What’s New
HDFS	
  Extended	
  ACLs	
  
•  Provides	
  granular	
  access	
  control	
  to	
  datasets	
  in	
  HDFS	
  
Security	
  
THEME	
  
HTTPs	
  Wire	
  Encryp<on	
  	
  
•  swebhdfs:	
  HTTPs support for WebHDFS
•  HTTPs support for Hadoop WebUI
Security	
  
THEME	
  
HDFS	
  DataNode	
  Caching	
  
•  Enhanced	
  read	
  performance	
  via	
  in	
  memory	
  caching	
  of	
  files	
  
Performance	
  
THEME	
  
Page 11 © Hortonworks Inc. 2014
HDFS Coordinated DataNode Caching
•  In memory cache for
HDFS file - enhanced
read performance
•  Identify files to be
cached through
centralized
management controls
•  Manage caching
through pools and
directives
Page 12 © Hortonworks Inc. 2014
HDP 2.1 YARN: What’s New
Resource	
  Manager	
  High	
  Availability	
  
•  No	
  service	
  disrupCon	
  in	
  YARN	
  
Reliability	
  
THEME	
  
Applica<on	
  Timeline	
  Server	
  
•  Operational monitoring across all YARN applications
Monitoring	
  
THEME	
  
Capacity	
  Scheduler	
  Pre-­‐emp<on	
  
•  Enforce	
  SLAs	
  across	
  applicaCons	
  and	
  organizaCons	
  
Scheduling	
  
THEME	
  
Page 13 © Hortonworks Inc. 2014
YARN Resource Manager (RM) HA
Automated failover
HDP detects and reacts to Resource Manager
host & process failures
Active/Standby
Standby ResourceManager with access to
shared state store
Fencing Protection against Split Brain
Full stack resiliency
- Entire HDP Stack certified with
ResourceManager HA
- RM Restart enables application recovery
Integrated into HDP stack
- No external HA Frameworks
- No external storage needed
Page 14 © Hortonworks Inc. 2014
Client
Standby
RM
Active
RM
ZooKeeper Service
Cluster
Monitor and try to take
active lock
Monitor and
maintain active
lock
Store State
YARN RM HA: Architecture
NodeManager NodeManager NodeManager
Page 15 © Hortonworks Inc. 2014
Application Timeline Server
Entity and Event
collection
Applications of all types can create entities and
send events
Pluggable store Depending on site requirements
REST APIs
Applications and user-interfaces can access
information via REST
Visualizations
Users can build tools and visualizations using the
APIs
Users and Admins
Applications as well as the system entities/
events
Page 16 © Hortonworks Inc. 2014
Application Timeline Server
App	
  Timeline	
  
Server	
  
AMBARI	
  
Custom	
  
App	
  
Monitoring	
  
Client	
  
Page 17 © Hortonworks Inc. 2014
Capacity Scheduler Preemption
•  Enforce
SLAs
•  Preempt
across
queues
1.  Current Capacity
2.  Guaranteed Capacity
3.  Pending Requests
Gather	
  	
  
Queue	
  	
  
State	
  
STEP	
  1	
  
1.  Figure out what is needed to achieve capacity balance
2.  Select applications to preempt: Over cap. Qs and FIFO order
3.  Respect bounds on amount of preemption allowed for each
round
Iden<fy	
  set	
  of	
  
preemp<ons	
  
STEP	
  2	
  
1.  Remove reservations from the most recently assigned app
2.  Issue preemptions for containers of same app (reverse
chronological order, last assigned container first)
3.  App Master pre-emption is last resort.
Preempt	
  
applica<on(s)	
  
STEP	
  3	
  
1.  Track containers that have been issued by not yet executed
preemption
2.  After a set of execution periods, forcibly kill these containers
Kill	
  containers	
  
STEP	
  4	
  
Page 18 © Hortonworks Inc. 2014
Agenda
Overview Features Q & A
Page 19 © Hortonworks Inc. 2014
Learn More About the Hadoop Operating System
Hortonworks.com/labs/yarn/
Register for the remaining 3
Discover HDP 2.1 Webinars
Hortonworks.com/
webinars
Next Webinar:
Apache Solr for Hadoop Search
Thursday, June 12, 10am Pacific

More Related Content

PDF
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
PPTX
Introduction to the Hortonworks YARN Ready Program
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
PDF
Discover.hdp2.2.h base.final[2]
PDF
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
PDF
Splunk-hortonworks-risk-management-oct-2014
PDF
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance
Discover HDP 2.1: Apache Falcon for Data Governance in Hadoop
Introduction to the Hortonworks YARN Ready Program
Discover HDP 2.1: Apache Solr for Hadoop Search
Combine SAS High-Performance Capabilities with Hadoop YARN
Discover.hdp2.2.h base.final[2]
Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Splunk-hortonworks-risk-management-oct-2014
Discover HDP 2.2: Apache Falcon for Hadoop Data Governance

What's hot (20)

PDF
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
PDF
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
PPTX
Hortonworks Yarn Code Walk Through January 2014
PDF
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
PPTX
YARN Ready: Integrating to YARN with Tez
PDF
Delivering Apache Hadoop for the Modern Data Architecture
PDF
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
PDF
Enterprise Hadoop with Hortonworks and Nimble Storage
PPTX
State of the Union with Shaun Connolly
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
PDF
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
PDF
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
PDF
Discover.hdp2.2.storm and kafka.final
PPTX
Stinger.Next by Alan Gates of Hortonworks
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
PDF
Hortonworks - What's Possible with a Modern Data Architecture?
PDF
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
PDF
Discover hdp 2.2 hdfs - final
Supporting Financial Services with a More Flexible Approach to Big Data
Discover HDP 2.2: Even Faster SQL Queries with Apache Hive and Stinger.next
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks Yarn Code Walk Through January 2014
Discover hdp 2.2: Data storage innovations in Hadoop Distributed Filesystem (...
YARN Ready: Integrating to YARN with Tez
Delivering Apache Hadoop for the Modern Data Architecture
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Enterprise Hadoop with Hortonworks and Nimble Storage
State of the Union with Shaun Connolly
Hp Converged Systems and Hortonworks - Webinar Slides
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Discover.hdp2.2.storm and kafka.final
Stinger.Next by Alan Gates of Hortonworks
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks - What's Possible with a Modern Data Architecture?
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Discover hdp 2.2 hdfs - final
Ad

Viewers also liked (20)

PDF
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
PDF
Career Report 2013
PPTX
Application Timeline Server - Past, Present and Future
PPTX
Hadoop YARN overview
PDF
An Introduction to MapReduce 2 and YARN
PPTX
Apache Hadoop YARN: Present and Future
PDF
HDP2 and YARN operations point
PPTX
YARN Ready - Integrating to YARN using Slider Webinar
PDF
Hortonworks Technical Workshop - build a yarn ready application with apache ...
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
PDF
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
PDF
Map Reduce v2 and YARN - CHUG - 20120604
PPTX
Get Started Building YARN Applications
PDF
Dataguise hortonworks insurance_feb25
PDF
Hortonworks and Platfora in Financial Services - Webinar
PDF
Hortonworks sqrrl webinar v5.pptx
PDF
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
PPTX
Apache Ambari: Managing Hadoop and YARN
PDF
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
PPTX
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Accelerate Big Data Application Development with Cascading and HDP, Hortonwor...
Career Report 2013
Application Timeline Server - Past, Present and Future
Hadoop YARN overview
An Introduction to MapReduce 2 and YARN
Apache Hadoop YARN: Present and Future
HDP2 and YARN operations point
YARN Ready - Integrating to YARN using Slider Webinar
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Developing YARN Applications - Integrating natively to YARN July 24 2014
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Map Reduce v2 and YARN - CHUG - 20120604
Get Started Building YARN Applications
Dataguise hortonworks insurance_feb25
Hortonworks and Platfora in Financial Services - Webinar
Hortonworks sqrrl webinar v5.pptx
YARN webinar series: Using Scalding to write applications to Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Discover HDP 2.1: Using Apache Ambari to Manage Hadoop Clusters
Ad

Similar to Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS (20)

PDF
Discover.hdp2.2.ambari.final[1]
PPTX
Introduction to the Hadoop EcoSystem
PDF
How YARN Enables Multiple Data Processing Engines in Hadoop
PDF
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
PPTX
Supporting Financial Services with a More Flexible Approach to Big Data
PDF
Azure Cafe Marketplace with Hortonworks March 31 2016
PPTX
Hadoop In Action
PPTX
Hadoop crash course workshop at Hadoop Summit
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
PDF
YARN - Strata 2014
PPTX
Hadoop crashcourse v3
PPTX
Realtime Analytics in Hadoop
PPTX
Realtime analytics + hadoop 2.0
PDF
OSDC 2013 | Introduction into Hadoop by Olivier Renault
PPTX
Apache Hadoop YARN: Past, Present and Future
PDF
Storm Demo Talk - Colorado Springs May 2015
PPTX
Hackathon bonn
PDF
Hadoop Present - Open Enterprise Hadoop
PPTX
Cloud Austin Meetup - Hadoop like a champion
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Discover.hdp2.2.ambari.final[1]
Introduction to the Hadoop EcoSystem
How YARN Enables Multiple Data Processing Engines in Hadoop
Discover HDP 2.2: Comprehensive Hadoop Security with Apache Ranger and Apache...
Supporting Financial Services with a More Flexible Approach to Big Data
Azure Cafe Marketplace with Hortonworks March 31 2016
Hadoop In Action
Hadoop crash course workshop at Hadoop Summit
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
YARN - Strata 2014
Hadoop crashcourse v3
Realtime Analytics in Hadoop
Realtime analytics + hadoop 2.0
OSDC 2013 | Introduction into Hadoop by Olivier Renault
Apache Hadoop YARN: Past, Present and Future
Storm Demo Talk - Colorado Springs May 2015
Hackathon bonn
Hadoop Present - Open Enterprise Hadoop
Cloud Austin Meetup - Hadoop like a champion
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
PDF
HDF 3.2 - What's New
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
PDF
Premier Inside-Out: Apache Druid
PDF
Accelerating Data Science and Real Time Analytics at Scale
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
PDF
Making Enterprise Big Data Small with Ease
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
PDF
Driving Digital Transformation Through Global Data Management
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Johns Hopkins - Using Hadoop to Secure Access Log Events
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
HDF 3.2 - What's New
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
IBM+Hortonworks = Transformation of the Big Data Landscape
Premier Inside-Out: Apache Druid
Accelerating Data Science and Real Time Analytics at Scale
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Making Enterprise Big Data Small with Ease
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Driving Digital Transformation Through Global Data Management
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Unlock Value from Big Data with Apache NiFi and Streaming CDC

Recently uploaded (20)

PDF
Nekopoi APK 2025 free lastest update
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
ISO 45001 Occupational Health and Safety Management System
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
L1 - Introduction to python Backend.pptx
PDF
medical staffing services at VALiNTRY
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Digital Strategies for Manufacturing Companies
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
PTS Company Brochure 2025 (1).pdf.......
Nekopoi APK 2025 free lastest update
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
ISO 45001 Occupational Health and Safety Management System
Understanding Forklifts - TECH EHS Solution
CHAPTER 2 - PM Management and IT Context
L1 - Introduction to python Backend.pptx
medical staffing services at VALiNTRY
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Digital Strategies for Manufacturing Companies
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Which alternative to Crystal Reports is best for small or large businesses.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Odoo Companies in India – Driving Business Transformation.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
System and Network Administraation Chapter 3
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PTS Company Brochure 2025 (1).pdf.......

Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS

  • 1. Page 1 © Hortonworks Inc. 2014 Discover HDP 2.1 Apache Hadoop 2.4.0, YARN & HDFS Hortonworks. We do Hadoop.
  • 2. Page 2 © Hortonworks Inc. 2014 Speakers Justin Sears Hortonworks Product Marketing Manager Rohit Bakhshi Hortonworks Senior Product Manager & PM for Apache Hadoop & Apache Solr in Hortonworks Data Platform Vinod Vavilapalli Foundational Hadoop Architect, Hortonworks Engineer, PMC for Apache Hadoop & Leads YARN Development at Hortonworks
  • 3. Page 3 © Hortonworks Inc. 2014 Agenda •  Overview of YARN in HDFS •  New YARN & HDFS Features in HDP 2.1 •  Q & A
  • 4. Page 4 © Hortonworks Inc. 2014 OPERATIONS  TOOLS   Provision, Manage & Monitor DEV  &  DATA  TOOLS   Build & Test A Modern Data ArchitectureAPPLICATIONS  DATA    SYSTEM   REPOSITORIES   RDBMS   EDW   MPP   Business     Analy<cs   Custom   Applica<ons   Packaged   Applica<ons   Governance &Integration ENTERPRISE HADOOP Security Operations Data Access Data Management SOURCES   OLTP,  ERP,   CRM  Systems   Documents,     Emails   Web  Logs,   Click  Streams   Social   Networks   Machine   Generated   Sensor   Data   GeolocaCon   Data  
  • 5. Page 5 © Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform     Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   YARN  :  Data  Opera<ng  System   DATA    MANAGEMENT   DATA    ACCESS   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox  
  • 6. Page 6 © Hortonworks Inc. 2014 HDP 2.1: Data Management HDP 2.1 Hortonworks Data Platform Provision,   Manage  &   Monitor     Ambari   Zookeeper   Scheduling     Oozie   Data  Workflow,   Lifecycle  &   Governance     Falcon   Sqoop   Flume   NFS   WebHDFS   DATA    ACCESS   GOVERNANCE  &   INTEGRATION   OPERATIONS   Script     Pig       Search     Solr       SQL     Hive/Tez,   HCatalog       NoSQL     HBase   Accumulo       Stream       Storm         Others     In-­‐Memory   AnalyCcs,     ISV  engines   Batch     Map   Reduce       SECURITY   Authen<ca<on   Authoriza<on   Accoun<ng   Data  Protec<on     Storage:  HDFS   Resources:  YARN   Access:  Hive,  …     Pipeline:  Falcon   Cluster:  Knox       YARN  :  Data  Opera<ng  System   DATA    MANAGEMENT   1   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   °   N   HDFS     (Hadoop  Distributed  File  System)  
  • 7. Page 7 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • 8. Page 8 © Hortonworks Inc. 2014 Apache Hadoop YARN and HDFS Flexible Enables other purpose-built data processing models beyond MapReduce (batch), such as interactive and streaming Efficient Double processing IN Hadoop on the same hardware while providing predictable performance & quality of service Shared Provides a stable, reliable, secure foundation and shared operational services across multiple workloads The Data Operating System for Hadoop 2.0 Data  Processing  Engines  Run  Na<vely  IN  Hadoop   BATCH   MapReduce   INTERACTIVE   Tez   STREAMING   Storm   IN-­‐MEMORY   Spark   GRAPH   Giraph   SAS   LASR,  HPA   ONLINE   HBase,  Accumulo     OTHERS     HDFS:  Redundant,  Reliable  Storage   YARN:  Cluster  Resource  Management      
  • 9. Page 9 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • 10. Page 10 © Hortonworks Inc. 2014 HDP 2.1 HDFS: What’s New HDFS  Extended  ACLs   •  Provides  granular  access  control  to  datasets  in  HDFS   Security   THEME   HTTPs  Wire  Encryp<on     •  swebhdfs:  HTTPs support for WebHDFS •  HTTPs support for Hadoop WebUI Security   THEME   HDFS  DataNode  Caching   •  Enhanced  read  performance  via  in  memory  caching  of  files   Performance   THEME  
  • 11. Page 11 © Hortonworks Inc. 2014 HDFS Coordinated DataNode Caching •  In memory cache for HDFS file - enhanced read performance •  Identify files to be cached through centralized management controls •  Manage caching through pools and directives
  • 12. Page 12 © Hortonworks Inc. 2014 HDP 2.1 YARN: What’s New Resource  Manager  High  Availability   •  No  service  disrupCon  in  YARN   Reliability   THEME   Applica<on  Timeline  Server   •  Operational monitoring across all YARN applications Monitoring   THEME   Capacity  Scheduler  Pre-­‐emp<on   •  Enforce  SLAs  across  applicaCons  and  organizaCons   Scheduling   THEME  
  • 13. Page 13 © Hortonworks Inc. 2014 YARN Resource Manager (RM) HA Automated failover HDP detects and reacts to Resource Manager host & process failures Active/Standby Standby ResourceManager with access to shared state store Fencing Protection against Split Brain Full stack resiliency - Entire HDP Stack certified with ResourceManager HA - RM Restart enables application recovery Integrated into HDP stack - No external HA Frameworks - No external storage needed
  • 14. Page 14 © Hortonworks Inc. 2014 Client Standby RM Active RM ZooKeeper Service Cluster Monitor and try to take active lock Monitor and maintain active lock Store State YARN RM HA: Architecture NodeManager NodeManager NodeManager
  • 15. Page 15 © Hortonworks Inc. 2014 Application Timeline Server Entity and Event collection Applications of all types can create entities and send events Pluggable store Depending on site requirements REST APIs Applications and user-interfaces can access information via REST Visualizations Users can build tools and visualizations using the APIs Users and Admins Applications as well as the system entities/ events
  • 16. Page 16 © Hortonworks Inc. 2014 Application Timeline Server App  Timeline   Server   AMBARI   Custom   App   Monitoring   Client  
  • 17. Page 17 © Hortonworks Inc. 2014 Capacity Scheduler Preemption •  Enforce SLAs •  Preempt across queues 1.  Current Capacity 2.  Guaranteed Capacity 3.  Pending Requests Gather     Queue     State   STEP  1   1.  Figure out what is needed to achieve capacity balance 2.  Select applications to preempt: Over cap. Qs and FIFO order 3.  Respect bounds on amount of preemption allowed for each round Iden<fy  set  of   preemp<ons   STEP  2   1.  Remove reservations from the most recently assigned app 2.  Issue preemptions for containers of same app (reverse chronological order, last assigned container first) 3.  App Master pre-emption is last resort. Preempt   applica<on(s)   STEP  3   1.  Track containers that have been issued by not yet executed preemption 2.  After a set of execution periods, forcibly kill these containers Kill  containers   STEP  4  
  • 18. Page 18 © Hortonworks Inc. 2014 Agenda Overview Features Q & A
  • 19. Page 19 © Hortonworks Inc. 2014 Learn More About the Hadoop Operating System Hortonworks.com/labs/yarn/ Register for the remaining 3 Discover HDP 2.1 Webinars Hortonworks.com/ webinars Next Webinar: Apache Solr for Hadoop Search Thursday, June 12, 10am Pacific