SlideShare a Scribd company logo
A modern, flexible approach to Hadoop implementation
HPE Big Data Reference Architecture
Gilles Noisette
HPE EMEA Big Data Center Of Excellence
November 2015
Agenda
• Big Data IT infrastructure trends
• Hadoop Evolution & Architecture trends
• Hadoop YARN Labelling
• Hadoop Storage Tiering
• New HPE Architecture approach to Big Data
• HPE Big Data Reference Architecture
• Scaling Hadoop more efficiently
• HPE BDRA Components
• HPE BDRA in a virtualized context
• HPE Big Data Architecture long term view
IT infrastructures must evolve to handle Big Data demands
• Multiple silos with multiple copies
of the same data
• Difficult to standardize on a
consistent server architecture
• Less elastic than other virtualized
or converged infrastructure
• Large scale makes density, cost
and power problematic
Challenges
The Analytic Cycle
The Pace of Change
The Pace of Change
And how people are buying Hadoop is changing also….
Hadoop YARN Labelling
Running applications on particular set of nodes
YARN Labelling (Node-labels / Hadoop 2.6 / jira YARN-796)
Capability to create groups of similar nodes to run different types of
applications with different workload, each, on the most appropriate group
of node
• Admin tags nodes with labels (e.g.: GPU, Storm)
− One node can have more than one label (e.g.: GPU, m710)
• Applications can include labels in container requests
Enabling the next Generation of Hadoop Applications . . .
NodeManager
[Storm]
Application
Master
I want a GPU
NodeManager
[GPU, m710]
HPE Moonshot cartridge
NodeManager
[Analytic, XL170r]
HPE Apollo blades
YARN Labels are used in production
YARN Labelling case studies
Vinod Vavilapalli – @Tshooter
Yahoo! uses machines with GPUs on #Hadoop clusters (#YARN) to model
'beautiful' images on Flickr. #hadoopsummit
1:43 AM - 16 Apr 2015
Vinod Vavilapalli – @Tshooter
.@pcnudde talking about #Yahoo using custom #Hadoop #YARN apps together
with Node labels / High CPU machines for learning. #hadoopsummit
1:49 AM - 16 Apr 2015
Yahoo uses YARN labels
eBay cluster use YARN labels to
• Separate Machine Learning workloads from regular workloads
• Separate licensed software to some machines
• Enable GPU workloads
• Separate organizational workloads
Mayank Bansal, ebay
Hadoop Storage tiering
Hadoop Architecture trends
HDFS Tiering / Heterogeneous Storage Tiers (HDFS-2832)
Allows a single cluster to have multiple storage tiers such as ARCHIVE, DISK,
SSD, RAM-disk.
Awareness of storage media allow HDFS to make better decisions about the
placement of block data with input from applications. Distribution of replicas
could be based on its performance and durability requirements.
• Phase2:
–HDFS-5682 - Application APIs for heterogeneous storage
–HDFS-7228 - SSD storage tier
–HDFS-5851 - Memory as a storage tier
HDFS Archival Storage Design (HDFS-6584)
– Introduces a new concept of storage policies. For accommodating future storage
technology and different cluster characteristics, cluster administrators will be able to
modify the predefined storage policies and/or define custom storage policies.
– Data policy names : Very Hot  Hot  Warm  Luke Warm  Cold
Ebay use Tiered Storage for its Hadoop cluster
HDFS Tiering case study
 40 PB / 2000 nodes cluster was getting full
HDFS Tiering features
• Data reside on same cluster in a standard HDFS
• Data could easily move back and forth, to and from,
the Archive
• Tiered storage is operated using storage types and
storage policies
• Archival policy is based on access pattern
– Antony Benoy, ebay
40 PB / 2000 nodes
DISK
10 PB / 48 nodes
ARCHIVAL
HDFS
Hadoop gets asymmetric
but I thought we were taking the work to the data…
B
App
L1 L1 L1
Isolate
A A A
nodes
labels
Hot
All replicas on DISK
Warm
1 replica on DISK, others on
ARCHIVE
Cold
All replicas on
ARCHIVE
Hadoop cluster
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
DISK
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
ARCHIVE
Yarn Labels
Allows applications running
in yarn containers to be
constrained to designated
nodes in the cluster
HDFS Tiering
Allows the creation of pools of
storage for SSD, HDD and
Archive, RAM-disk, leveraging
different server configurations
What about Data Locality ?
New complementary approach to address Big Data demands
Storage Optimized Servers
Benefits of HPE Big Data Reference Architecture
HPE Moonshot and Apollo servers address a variety of enterprise big data needs
Cluster consolidation
Multiple big data environments can
directly access a shared pool of data
Flexibility to scale
Scale compute and storage
independently
Maximum elasticity
Rapidly provision compute without
affecting storage
Breakthrough economics
Significantly better density, cost and
power through workload optimized
components
DFSIO testing on Big Data Reference architecture
Better numbers with optimized IO Servers for HDFS
HPE Big Data Reference Architecture
Hadoop and its ecosystem take advantage of the BDRA
17
Ethernet
Network Switches
East - West Networking
Impala
HPE Hadoop Traditional vs HPE Big Data Reference Architecture
2X Hadoop MapReduce performance with the same footprint
2.5X HBase performance with the same footprint
Note: Comparison configuration is ProLiant DL380 Gen9 servers
2 x Higher Density
2.4 x Memory Density
46% Less Power (Watts)
Traditional
architecture
Big Data
Reference
Architecture
versus
1.5PB configuration example
Comparable Hadoop performance and raw compute (SpecInt) power
Compared to 2U rackmount BDRA
Acquisition cost 3% lower
Power 54% lower
Density (total rack U) 2x density
5 year power/cooling savings (assume $.20/kWh) $472K
HOT COLD
Independent scaling of compute and storage
[ HPE ProLiant DL380 Gen9 ] vs [ HPE Moonshot for Computing + HPE ProLiant Apollo 4200 for Storage ]
HPE Big Data Reference Architecture
Traditional
Architecture
2.8x compute
97% of the storage capacity
4x the memory
1.6x compute
1.5x the storage capacity
2.5x the memory
90% of the compute
2.1x the storage capacity
1.5x the memory
HPE BDRA Components
24
Hadoop performance density > 2 times better - Power consumption = 0.5
HPE Big Data Reference Architecture
Scale-Out Building blocks
HPE Apollo
Scalable System
Storage optimized servers
Cost-effective industry
standard storage server
purpose built for big data with
converged infrastructure that
offers high density energy-
efficient storage
HPE Network Switches
East – West Networking
HPE Moonshot System
with 45 x m710 Compute nodes
HPE Apollo 2200
with 4 x XL170r Gen9 High Compute nodes
Compute
optimized
servers
Front
Rear
HPE Moonshot 1500
28
2 internal switches
45 hot-plug cartridges
• 1-node = 45 servers in a chassis
• 4-nodes =180 servers in a chassis
• HP Moonshot-45G (45 x1Gb port)
• HP Moonshot-180G (180 x1Gb port)
• HP Moonshot-45XG (45 x10Gb port)
Web-cache
64-bit ARM
m400
Remote PCs
XenDesktop
m700
Big Data, Hadoop
Video transcoding
m710p
Real-time analytics
Telecom, finance
m800
Web-hosting
180 servers in 4.3U
m350
Full WEB-infrastructure in
a single chassis
Dedicated hosting
m300
45 Hadoop Low-power Hadoop compute nodes per enclosure !
Big Data Compute Node
Big data Storage Node
HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
Big data Storage Node for Backup or Archival
HPE Apollo 4510 - Very High density Big Data storage server
Scalable density
Lower TCO
Workload
optimized
Rack-scale storage server density
Up to 5.44 PB in 42U rack
Rack-scale extreme density – 5.44 PB per Rack!
Cost effective
68 LFF HDDs/SSDs in 4U server
chassis for low-cost, power & space efficient
solutions
Configuration flexibility
Balance capacity, cost and throughput with flexible
options for disks, CPUs , I/O and interconnects
HPE BDRA in a Virtualized context
Usage example
33
HPE BDRA used for multi-tenancy or Hadoop as a Service
Multi-tenancy or Hadoop as a service, are made easier when separating the
data processing service and the storage management service as it brings
Often based on a Virtualized environment
– Better workload isolation between
YARN applications
– More flexibility by scaling compute and
storage independently
– Full elasticity on the computing side
– Rapidly provision and decommission
compute without affecting storage
VMDK
HPE BDRA used in a fully Elastic Virtualized environment
Compute and Storage nodes are virtualized in a different manner
363PARF400
3PARF400
3PARF400
VMDK
VMDK
Ext4
Ext4
Ext4
Hadoop DataNode
Virtualization Hosts
3PARF400
3PARF400
3PARF400
3PARF400
3PARF400
3PARF400
HadoopComputeNode
HadoopComputeNode
HadoopComputeNode
HadoopComputeNode
VMDKExt4
HostVM
BDRAStorageNode
BDRAComputeNodes
Summarizing &
HPE Big Data Architecture long term view
37
HPE Big Data Reference Architecture
– The HPE BDRA is a complementary Hadoop reference Architecture that brings
• Elasticity  extreme elasticity brought to Hadoop
• Flexibility  adaptive architecture that makes IT more responsive
• Efficiency  scale compute and storage independantly
– It takes advantage of new Hadoop trends and features like
• Hadoop YARN Labels
• Hadoop HDFS Tiering
– The target customers are
• Mature Hadoop customers who want to consolidate clusters
• People who need virtualization, multi-tenancy, Elasticity or want to build a
smart Data Lake
• People who want to optimize the density and the power consumption
(breakthrough economics)
– The BDRA works with fully standard Hadoop stacks (no patches, not proprietary)
• Cloudera Enterprise 5
• Hortonworks Data Platform 2
• MapR M5
HPE BDRA Optimized Compute & Storage nodes
Support multiple compute and storage blocks
Converged Infrastructure benefits for Big Data
Hadoop Node Labels feature (jira YARN-796)
• Combined with the HPE Big Data Reference Architecture, compute nodes
can be dynamically assigned as there is no need for data repartitioning
• HPE contributed IP into the Hadoop trunk, working with Hortonworks
• Allows scheduling of YARN containers to specific pools of nodes
HPE BDRA CI for Big Data long term view
Evolve to support multiple compute and storage blocks
Multi-temperate Storage using HDFS Tiering and ObjectStores
Workload Optimized compute nodes to accelerate various big data software
Thankyou!
Learn more on how your organization can benefit from
HPE Big Data Reference Architecture
HPE Big Data Reference Architecture: Overview
HPE Big Data Reference Architecture: Hortonworks implementation
HPE Big Data Reference Architecture: Cloudera implementation
HPE Big Data Reference Architecture: MapR implementation
Running HBase on the HPE Big Data Reference Architecture
http://www.hpe.com/go/hadoop

More Related Content

PPTX
Achieving cloud scale with microservices based applications on azure
PDF
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
PDF
IBM Power8 announce
PPTX
SQL Server on Linux - march 2017
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PPTX
Scaling Data Science on Big Data
PPTX
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
PPTX
Insights into Real World Data Management Challenges
Achieving cloud scale with microservices based applications on azure
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IBM Power8 announce
SQL Server on Linux - march 2017
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Scaling Data Science on Big Data
Smart Enterprise Big Data Bus for the Modern Responsive Enterprise
Insights into Real World Data Management Challenges

What's hot (20)

PPTX
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PDF
Hadoop and NoSQL joining forces by Dale Kim of MapR
PPTX
Introduction to Kudu - StampedeCon 2016
PPTX
Hadoop for the Masses
PPT
Migrating legacy ERP data into Hadoop
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PDF
Data-In-Motion Unleashed
PPTX
DEVNET-1166 Open SDN Controller APIs
PPTX
The DAP - Where YARN, HBase, Kafka and Spark go to Production
PPTX
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
PPTX
How Glidewell Moves Data to Amazon Redshift
PPTX
Oracle Big data at work
PPTX
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
PPTX
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
PPTX
Insights into Real-world Data Management Challenges
PPTX
Breakout: Hadoop and the Operational Data Store
PPTX
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
PPTX
Wrangling Customer Usage Data with Hadoop
PDF
High Performance Spatial-Temporal Trajectory Analysis with Spark
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Hadoop and NoSQL joining forces by Dale Kim of MapR
Introduction to Kudu - StampedeCon 2016
Hadoop for the Masses
Migrating legacy ERP data into Hadoop
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Data-In-Motion Unleashed
DEVNET-1166 Open SDN Controller APIs
The DAP - Where YARN, HBase, Kafka and Spark go to Production
Meetup Oracle Database MAD: 2.1 Data Management Trends: SQL, NoSQL y Big Data
How Glidewell Moves Data to Amazon Redshift
Oracle Big data at work
Webinar: Buckle Up: The Future of the Distributed Database is Here - DataStax...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Insights into Real-world Data Management Challenges
Breakout: Hadoop and the Operational Data Store
Bridging the gap: achieving fast data synchronization from SAP HANA by levera...
Wrangling Customer Usage Data with Hadoop
High Performance Spatial-Temporal Trajectory Analysis with Spark
Ad

Viewers also liked (20)

PDF
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
PDF
Building a Modern Data Architecture with Enterprise Hadoop
PDF
LEG Keynote: Linda Knippers - HP
PDF
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
PDF
HBase internals
PDF
Discover London eAgenda
PPTX
Overview of hpe, essential learnings
PPT
Connected Education Reference Architecture
PPT
Connected Finance Reference Architecture
PPTX
2016 04-06 vitalis nordic reference architecture personal connected health
PDF
2016 Blade Server Networking Leader Brand Leader Infographics
PPTX
Business Architecture Patterns (BPM in Practice conference)
PDF
Ma The Role Of Bpm In Business Architecture 2007 11
PDF
POE+ L2 switches HPE FlexNetwork 5130 vs Dell Networking N2048P
PPTX
Transforming Insurance Operations through Data and Analytics
PDF
IBM Flex System Networking in an Enterprise Data Center
PDF
Enterprise-Grade Networking in OpenStack
PPTX
Graphics for big data reference architecture blog
PPTX
3D IT Architecture - Data Center
PPTX
The LightConnectTM Fabric V-POD Data Center Architecture
Microsoft and Hortonworks Delivers the Modern Data Architecture for Big Data
Building a Modern Data Architecture with Enterprise Hadoop
LEG Keynote: Linda Knippers - HP
4AA6-8601ENW-HPE_RA_SAP_HANA_Vora_Hortonworks_HDP
HBase internals
Discover London eAgenda
Overview of hpe, essential learnings
Connected Education Reference Architecture
Connected Finance Reference Architecture
2016 04-06 vitalis nordic reference architecture personal connected health
2016 Blade Server Networking Leader Brand Leader Infographics
Business Architecture Patterns (BPM in Practice conference)
Ma The Role Of Bpm In Business Architecture 2007 11
POE+ L2 switches HPE FlexNetwork 5130 vs Dell Networking N2048P
Transforming Insurance Operations through Data and Analytics
IBM Flex System Networking in an Enterprise Data Center
Enterprise-Grade Networking in OpenStack
Graphics for big data reference architecture blog
3D IT Architecture - Data Center
The LightConnectTM Fabric V-POD Data Center Architecture
Ad

Similar to Key trends in Big Data and new reference architecture from Hewlett Packard Enterprise / Gilles Noisette (Hewlett Packard) (20)

PPTX
Empower Data-Driven Organizations with HPE and Hadoop
PPTX
New Ceph capabilities and Reference Architectures
PPTX
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
HPE Solutions for Challenges in AI and Big Data
PDF
Saviak lviv ai-2019-e-mail (1)
PDF
Red hat storage el almacenamiento disruptivo
PPTX
1. beyond mission critical virtualizing big data and hadoop
PDF
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
PPTX
Hadoop
PPTX
HADOOP TECHNOLOGY ppt
PDF
9.-dados e processamento distribuido-hadoop.pdf
PPTX
HADOOP TECHNOLOGY ppt
PPTX
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
PPTX
Big Data in the Cloud - The What, Why and How from the Experts
PDF
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
PPTX
Hadoop in a Nutshell
PDF
clusterstor-hadoop-data-sheet
PDF
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
Empower Data-Driven Organizations with HPE and Hadoop
New Ceph capabilities and Reference Architectures
Software Defined Storage, Big Data and Ceph - What Is all the Fuss About?
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
HPE Solutions for Challenges in AI and Big Data
Saviak lviv ai-2019-e-mail (1)
Red hat storage el almacenamiento disruptivo
1. beyond mission critical virtualizing big data and hadoop
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
Hadoop
HADOOP TECHNOLOGY ppt
9.-dados e processamento distribuido-hadoop.pdf
HADOOP TECHNOLOGY ppt
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Big Data in the Cloud - The What, Why and How from the Experts
VMworld 2013: Beyond Mission Critical: Virtualizing Big-Data, Hadoop, HPC, Cl...
Hadoop in a Nutshell
clusterstor-hadoop-data-sheet
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.

More from Ontico (20)

PDF
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
PDF
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
PPTX
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
PDF
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
PDF
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PDF
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PDF
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
PDF
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
PPTX
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
PDF
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
PPTX
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
PPTX
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
PDF
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
PPT
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
PPTX
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
PPTX
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
PPTX
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
PPTX
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
PDF
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
web development for engineering and engineering
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPT
Project quality management in manufacturing
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PDF
Well-logging-methods_new................
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPT
Mechanical Engineering MATERIALS Selection
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
OOP with Java - Java Introduction (Basics)
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
web development for engineering and engineering
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Project quality management in manufacturing
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Internet of Things (IOT) - A guide to understanding
Arduino robotics embedded978-1-4302-3184-4.pdf
Well-logging-methods_new................
Lecture Notes Electrical Wiring System Components
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Mechanical Engineering MATERIALS Selection
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Key trends in Big Data and new reference architecture from Hewlett Packard Enterprise / Gilles Noisette (Hewlett Packard)

  • 1. A modern, flexible approach to Hadoop implementation HPE Big Data Reference Architecture Gilles Noisette HPE EMEA Big Data Center Of Excellence November 2015
  • 2. Agenda • Big Data IT infrastructure trends • Hadoop Evolution & Architecture trends • Hadoop YARN Labelling • Hadoop Storage Tiering • New HPE Architecture approach to Big Data • HPE Big Data Reference Architecture • Scaling Hadoop more efficiently • HPE BDRA Components • HPE BDRA in a virtualized context • HPE Big Data Architecture long term view
  • 3. IT infrastructures must evolve to handle Big Data demands • Multiple silos with multiple copies of the same data • Difficult to standardize on a consistent server architecture • Less elastic than other virtualized or converged infrastructure • Large scale makes density, cost and power problematic Challenges
  • 5. The Pace of Change
  • 6. The Pace of Change And how people are buying Hadoop is changing also….
  • 7. Hadoop YARN Labelling Running applications on particular set of nodes YARN Labelling (Node-labels / Hadoop 2.6 / jira YARN-796) Capability to create groups of similar nodes to run different types of applications with different workload, each, on the most appropriate group of node • Admin tags nodes with labels (e.g.: GPU, Storm) − One node can have more than one label (e.g.: GPU, m710) • Applications can include labels in container requests Enabling the next Generation of Hadoop Applications . . . NodeManager [Storm] Application Master I want a GPU NodeManager [GPU, m710] HPE Moonshot cartridge NodeManager [Analytic, XL170r] HPE Apollo blades
  • 8. YARN Labels are used in production YARN Labelling case studies Vinod Vavilapalli – @Tshooter Yahoo! uses machines with GPUs on #Hadoop clusters (#YARN) to model 'beautiful' images on Flickr. #hadoopsummit 1:43 AM - 16 Apr 2015 Vinod Vavilapalli – @Tshooter .@pcnudde talking about #Yahoo using custom #Hadoop #YARN apps together with Node labels / High CPU machines for learning. #hadoopsummit 1:49 AM - 16 Apr 2015 Yahoo uses YARN labels eBay cluster use YARN labels to • Separate Machine Learning workloads from regular workloads • Separate licensed software to some machines • Enable GPU workloads • Separate organizational workloads Mayank Bansal, ebay
  • 9. Hadoop Storage tiering Hadoop Architecture trends HDFS Tiering / Heterogeneous Storage Tiers (HDFS-2832) Allows a single cluster to have multiple storage tiers such as ARCHIVE, DISK, SSD, RAM-disk. Awareness of storage media allow HDFS to make better decisions about the placement of block data with input from applications. Distribution of replicas could be based on its performance and durability requirements. • Phase2: –HDFS-5682 - Application APIs for heterogeneous storage –HDFS-7228 - SSD storage tier –HDFS-5851 - Memory as a storage tier HDFS Archival Storage Design (HDFS-6584) – Introduces a new concept of storage policies. For accommodating future storage technology and different cluster characteristics, cluster administrators will be able to modify the predefined storage policies and/or define custom storage policies. – Data policy names : Very Hot  Hot  Warm  Luke Warm  Cold
  • 10. Ebay use Tiered Storage for its Hadoop cluster HDFS Tiering case study  40 PB / 2000 nodes cluster was getting full HDFS Tiering features • Data reside on same cluster in a standard HDFS • Data could easily move back and forth, to and from, the Archive • Tiered storage is operated using storage types and storage policies • Archival policy is based on access pattern – Antony Benoy, ebay 40 PB / 2000 nodes DISK 10 PB / 48 nodes ARCHIVAL HDFS
  • 11. Hadoop gets asymmetric but I thought we were taking the work to the data… B App L1 L1 L1 Isolate A A A nodes labels Hot All replicas on DISK Warm 1 replica on DISK, others on ARCHIVE Cold All replicas on ARCHIVE Hadoop cluster DISK DISK DISK DISK DISK DISK DISK DISK DISK ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE ARCHIVE Yarn Labels Allows applications running in yarn containers to be constrained to designated nodes in the cluster HDFS Tiering Allows the creation of pools of storage for SSD, HDD and Archive, RAM-disk, leveraging different server configurations What about Data Locality ?
  • 12. New complementary approach to address Big Data demands Storage Optimized Servers
  • 13. Benefits of HPE Big Data Reference Architecture HPE Moonshot and Apollo servers address a variety of enterprise big data needs Cluster consolidation Multiple big data environments can directly access a shared pool of data Flexibility to scale Scale compute and storage independently Maximum elasticity Rapidly provision compute without affecting storage Breakthrough economics Significantly better density, cost and power through workload optimized components
  • 14. DFSIO testing on Big Data Reference architecture Better numbers with optimized IO Servers for HDFS
  • 15. HPE Big Data Reference Architecture Hadoop and its ecosystem take advantage of the BDRA 17 Ethernet Network Switches East - West Networking Impala
  • 16. HPE Hadoop Traditional vs HPE Big Data Reference Architecture 2X Hadoop MapReduce performance with the same footprint 2.5X HBase performance with the same footprint Note: Comparison configuration is ProLiant DL380 Gen9 servers 2 x Higher Density 2.4 x Memory Density 46% Less Power (Watts) Traditional architecture Big Data Reference Architecture versus
  • 17. 1.5PB configuration example Comparable Hadoop performance and raw compute (SpecInt) power Compared to 2U rackmount BDRA Acquisition cost 3% lower Power 54% lower Density (total rack U) 2x density 5 year power/cooling savings (assume $.20/kWh) $472K
  • 18. HOT COLD Independent scaling of compute and storage [ HPE ProLiant DL380 Gen9 ] vs [ HPE Moonshot for Computing + HPE ProLiant Apollo 4200 for Storage ] HPE Big Data Reference Architecture Traditional Architecture 2.8x compute 97% of the storage capacity 4x the memory 1.6x compute 1.5x the storage capacity 2.5x the memory 90% of the compute 2.1x the storage capacity 1.5x the memory
  • 20. Hadoop performance density > 2 times better - Power consumption = 0.5 HPE Big Data Reference Architecture Scale-Out Building blocks HPE Apollo Scalable System Storage optimized servers Cost-effective industry standard storage server purpose built for big data with converged infrastructure that offers high density energy- efficient storage HPE Network Switches East – West Networking HPE Moonshot System with 45 x m710 Compute nodes HPE Apollo 2200 with 4 x XL170r Gen9 High Compute nodes Compute optimized servers Front Rear
  • 21. HPE Moonshot 1500 28 2 internal switches 45 hot-plug cartridges • 1-node = 45 servers in a chassis • 4-nodes =180 servers in a chassis • HP Moonshot-45G (45 x1Gb port) • HP Moonshot-180G (180 x1Gb port) • HP Moonshot-45XG (45 x10Gb port) Web-cache 64-bit ARM m400 Remote PCs XenDesktop m700 Big Data, Hadoop Video transcoding m710p Real-time analytics Telecom, finance m800 Web-hosting 180 servers in 4.3U m350 Full WEB-infrastructure in a single chassis Dedicated hosting m300 45 Hadoop Low-power Hadoop compute nodes per enclosure !
  • 23. Big data Storage Node HPE Apollo 4200 - Bringing Big Data storage server density to enterprise
  • 24. Big data Storage Node for Backup or Archival HPE Apollo 4510 - Very High density Big Data storage server Scalable density Lower TCO Workload optimized Rack-scale storage server density Up to 5.44 PB in 42U rack Rack-scale extreme density – 5.44 PB per Rack! Cost effective 68 LFF HDDs/SSDs in 4U server chassis for low-cost, power & space efficient solutions Configuration flexibility Balance capacity, cost and throughput with flexible options for disks, CPUs , I/O and interconnects
  • 25. HPE BDRA in a Virtualized context Usage example 33
  • 26. HPE BDRA used for multi-tenancy or Hadoop as a Service Multi-tenancy or Hadoop as a service, are made easier when separating the data processing service and the storage management service as it brings Often based on a Virtualized environment – Better workload isolation between YARN applications – More flexibility by scaling compute and storage independently – Full elasticity on the computing side – Rapidly provision and decommission compute without affecting storage
  • 27. VMDK HPE BDRA used in a fully Elastic Virtualized environment Compute and Storage nodes are virtualized in a different manner 363PARF400 3PARF400 3PARF400 VMDK VMDK Ext4 Ext4 Ext4 Hadoop DataNode Virtualization Hosts 3PARF400 3PARF400 3PARF400 3PARF400 3PARF400 3PARF400 HadoopComputeNode HadoopComputeNode HadoopComputeNode HadoopComputeNode VMDKExt4 HostVM BDRAStorageNode BDRAComputeNodes
  • 28. Summarizing & HPE Big Data Architecture long term view 37
  • 29. HPE Big Data Reference Architecture – The HPE BDRA is a complementary Hadoop reference Architecture that brings • Elasticity  extreme elasticity brought to Hadoop • Flexibility  adaptive architecture that makes IT more responsive • Efficiency  scale compute and storage independantly – It takes advantage of new Hadoop trends and features like • Hadoop YARN Labels • Hadoop HDFS Tiering – The target customers are • Mature Hadoop customers who want to consolidate clusters • People who need virtualization, multi-tenancy, Elasticity or want to build a smart Data Lake • People who want to optimize the density and the power consumption (breakthrough economics) – The BDRA works with fully standard Hadoop stacks (no patches, not proprietary) • Cloudera Enterprise 5 • Hortonworks Data Platform 2 • MapR M5
  • 30. HPE BDRA Optimized Compute & Storage nodes Support multiple compute and storage blocks
  • 31. Converged Infrastructure benefits for Big Data Hadoop Node Labels feature (jira YARN-796) • Combined with the HPE Big Data Reference Architecture, compute nodes can be dynamically assigned as there is no need for data repartitioning • HPE contributed IP into the Hadoop trunk, working with Hortonworks • Allows scheduling of YARN containers to specific pools of nodes
  • 32. HPE BDRA CI for Big Data long term view Evolve to support multiple compute and storage blocks Multi-temperate Storage using HDFS Tiering and ObjectStores Workload Optimized compute nodes to accelerate various big data software
  • 33. Thankyou! Learn more on how your organization can benefit from HPE Big Data Reference Architecture HPE Big Data Reference Architecture: Overview HPE Big Data Reference Architecture: Hortonworks implementation HPE Big Data Reference Architecture: Cloudera implementation HPE Big Data Reference Architecture: MapR implementation Running HBase on the HPE Big Data Reference Architecture http://www.hpe.com/go/hadoop