SlideShare a Scribd company logo
© 2009 VMware Inc. All rights reserved
vSphere Big Data Extensions 之
Hadoop 参考架构和性能最佳实践
李欣慧
大数据研发高级工程师
VMware 中国研发中心
2
Agenda
Recommended Deployment Topology
 Plan Your Cluster
3
Virtualization
Host
VMDK
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration on Single Worker
VMDKVMDK
Ext4 Ext4 Ext4 Ext4
4
Standard Deployment Configuration on Single Worker
Virtualization
Host
VMDK
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
VMDKVMDK
Ext4 Ext4 Ext4 Ext4
5
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration
6
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Task-
tracker
Ext4 Ext4 Ext4
mapred.local.dir
Standard Deployment Configuration
7
Virtualization
Host
OS Image –
VMDK
Hadoop
Virtual
Node 1
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK
… …
Standard Deployment Configuration for D/C Separation
8
Data Path for Combined vs. Data/Compute Separation
Virtualization
Host
Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch
Hadoop Virtual NodeHadoop Virtual Node
Virtual Switch
TaskTrackerTaskTracker
 Serengeti provide local storage based temp for D/C separation.
• Each compute VM needs its own temp space
• Required temp space is different from an application to another
• Can result in wasted space
9
Recommended Topology of Data/Compute Separation
Virtualization
Host
VMDKOS Image –
VMDK
Hadoop
Virtual
Node 1
Ext4
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
VMDK
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
…
10
Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch Virtualization
Host
Hadoop Virtual
Node 1
Hadoop Virtual
Node 2
TaskTrackerTaskTracker
Virtual Switch
Data Path for Local TT Storage vs. NFS Temp
 Serengeti provide NFS based temp for D/C separation
• Improve local storage space utilization.
• Trade-off between bandwidth efficiency vs. overhead of NFS.
11
Consolidated Storage on Single DN VM
Virtualization
Host
OS Image –
VMDK
Hadoop
Virtual
Node 1
Task-
tracker
Shared storage
SAN/NAS
Local disks
OS Image –
VMDK
VMDK VMDK VMDK VMDK VMDK VMDK VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4dirdirdirdirdirdirdirdir
VMDK
… …
NFS
Client
NFS
Server
12
Recommended Topology of Computing Only Cluster
Virtualization
Host
OS Image –
VMDK
Shared storage
SAN/NAS
OS Image –
VMDK
Hadoop
Virtual
Node 2
Datanode
Ext4
Hadoop
Virtual
Node 1
Task-
tracker
Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4
…
VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK
VMDK
13
Plan Your Cluster
 Start with a small cluster and grow it as required
• Initially just four or six nodes
• Increase amount of computation/data/memory as required
• Available space of HDFS = (DFS Remaining . value * 95%)/
dfs.replication.value
 Choose right hardware – master node
• Namenode and Jobtracker often run on same machine for smaller clusters
• Consider HA/FT settings
• separate NameNode and Jobtracker from slave nodes’ host.
• Dual power supplies
14
Plan Your Cluster
 Choose right hardware – slave node
• 2 * Quad-core CPUs at least, HT enabled
• RAM
• Consider 6% overhead for virtualization
• Recommend 4-8 GB memory per core
• Storage
• At least 8 disks per host, 12 disks per host may be ideal for absolute performance
but probably not for price-performance.
• Recommend 1-1.5 disks per core
• JBOD, SATA RPM7,200 is fine
• A good practical maximum is 24TB or 36TB per slave node. More than that will result
in massive network traffic if a node dies and block re-replication must take place.
15
Plan Your Cluster
 Networking
• Use dedicate switches for your Hadoop cluster and Nodes are connected to a
top-of-rack switch
• Nodes should be connected at a minimum speed of 1Gb/sec and consider
10Gb/sec for clusters with large scale of intermediate data
• Racks are interconnected via core switches
• Core switches should connect to top-of-rack switches by dual 10Gb/sec links
• Redundant top-of-rack switches, core switches
• Separate management network and vm network
• Adopt vDS and dvport groups that span hosts and ensure configuration consistency
for vms and virtual ports for functions of Vmotion and network storage
• Leave the management port out of your vDS
16
Virtualization Host
Networking Configurations – Four 1G NICs
vmnic 0
pSwitch 1
Virtual Switch 1
Hadoop cluster
VM portgroup
vmnic 1
pSwitch 2
Virtual Switch 0
MGMT
192.168.1.100
VMOTION
192.168.3.100
FT
192.168.4.100
VMKERNEL
192.168.2.100
vmnic 3
 Hadoop vm traffic goes
through vSwitch1
(vmnic2 and vmnic3,
both active)
 On vSwitch0, it goes
through MGMT, VM
kernel on
vmnic0(active, vmnic1
on standby)
 vMotion and FT on
vmnic1 (active, vmnic0
on standby)
1Gbs 1Gbs
vmnic 2
1Gbs 1Gbs
17
Virtualization Host
Networking Configurations -10G for Hadoop VMs
vmnic 0
pSwitch 1
Virtual Switch 1
Hadoop cluster
VM portgroup
vmnic 1
pSwitch 2
Virtual Switch 0
MGMT
192.168.1.100
VMOTION
192.168.3.100
FT
192.168.4.100
VMKERNEL
192.168.2.100
vmnic 2
 Hadoop vm traffic goes
through vSwitch1
(vmnic3)
 10G for Hadop cluster
vms
• more performance
benefits
• If any need, keep
redundancy with the other
suit of vmnic /pSwitch
 Keep redundancy for
management network
pSwitch 3
1Gbs 1Gbs
10 GBe
18
vSphere Configurations
 Configure hosts with NTP service and to ensure the time on all the
nodes is synchronized
 Virtual Disk Settings
• One datastore per physical disk
• Warm-up is needed on the provisioned cluster
 NUMA scheduler important for virtualized Hadoop performance
• Poor configuration can result in 12%(1)
performance degradation
• Data VM preferably should be distributed across NUMA nodes
 Provision right VM size
• Reserve 6% memory for vSphere usage
• Avoid over-commitment
• Enable NUMA and keep VM size within the NUMA node
19
For Existing Devices
 Crudely fit existing resource capacity for Hadoop
• CPU : RAM : Throughput - 4*1333MHZ: 32G: 800M/s
 Use powerful machine to run master node/computing node
 Use high throughput machine for slave node/data node
20
Q&A

More Related Content

PPTX
Cinder Live Migration and Replication - OpenStack Summit Austin
PPTX
RHEVM - Live Storage Migration
PDF
Disaster recovery of OpenStack Cinder using DRBD
PPTX
Cinder - status of replication
PDF
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
PPTX
Optimizing VM images for OpenStack with KVM/QEMU
PDF
Kvm performance optimization for ubuntu
PDF
Cinder enhancements-for-replication-using-stateless-snapshots
Cinder Live Migration and Replication - OpenStack Summit Austin
RHEVM - Live Storage Migration
Disaster recovery of OpenStack Cinder using DRBD
Cinder - status of replication
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
Optimizing VM images for OpenStack with KVM/QEMU
Kvm performance optimization for ubuntu
Cinder enhancements-for-replication-using-stateless-snapshots

What's hot (20)

PDF
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
PDF
(Free and Net) BSD Xen Roadmap
PPTX
VMworld 2017 vSAN Network Design
PDF
GlusterFS CTDB Integration
PPTX
2017 VMUG Storage Policy Based Management
PDF
kdump: usage and_internals
ODP
Disk Performance Comparison Xen v.s. KVM
PDF
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
PPTX
VMware Performance Troubleshooting
PDF
Kvm optimizations
PDF
The dark side of stretched cluster
PDF
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
PDF
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
PDF
Approaching hyperconvergedopenstack
PDF
2021.02 new in Ceph Pacific Dashboard
PDF
VMworld 2013: VMware Virtual SAN Technical Best Practices
PDF
VMware vSphere Networking deep dive
PPTX
Enterprise Storage NAS - Dual Controller
PPTX
TDS-16489U - Dual Processor
PDF
How Ceph performs on ARM Microserver Cluster
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
(Free and Net) BSD Xen Roadmap
VMworld 2017 vSAN Network Design
GlusterFS CTDB Integration
2017 VMUG Storage Policy Based Management
kdump: usage and_internals
Disk Performance Comparison Xen v.s. KVM
XPDS14 - Scaling Xen's Aggregate Storage Performance - Felipe Franciosi, Citrix
VMware Performance Troubleshooting
Kvm optimizations
The dark side of stretched cluster
XPDS14: Xen 4.5 Roadmap - Konrad Wilk, Oracle
XPDS14 - Intel(r) Virtualization Technology for Directed I/O (VT-d) Posted In...
Approaching hyperconvergedopenstack
2021.02 new in Ceph Pacific Dashboard
VMworld 2013: VMware Virtual SAN Technical Best Practices
VMware vSphere Networking deep dive
Enterprise Storage NAS - Dual Controller
TDS-16489U - Dual Processor
How Ceph performs on ARM Microserver Cluster
Ad

Similar to 4. v sphere big data extensions hadoop (20)

PDF
Postgres the hardway
PDF
Virtualizing Apache Spark and Machine Learning with Justin Murray
PPTX
Big Data in Container; Hadoop Spark in Docker and Mesos
PDF
Storage user cases
PDF
VMware - Virtual SAN - IT Changes Everything
PDF
Windsor: Domain 0 Disaggregation for XenServer and XCP
PDF
VDCF Overview
PPTX
Building a Stretched Cluster using Virtual SAN 6.1
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
PPTX
20150531 virtualizatino station 2.0 partner's day
PDF
Devconf2017 - Can VMs networking benefit from DPDK
PPT
Automating Your CloudStack Cloud with Puppet
PDF
Road show 2015 triangle meetup
PDF
Virtualizing Apache Spark with Justin Murray
PDF
Xen Virtualization 2008
PDF
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
PDF
Presentation oracle rac on vsphere 5
ODP
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
PDF
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
PPTX
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Postgres the hardway
Virtualizing Apache Spark and Machine Learning with Justin Murray
Big Data in Container; Hadoop Spark in Docker and Mesos
Storage user cases
VMware - Virtual SAN - IT Changes Everything
Windsor: Domain 0 Disaggregation for XenServer and XCP
VDCF Overview
Building a Stretched Cluster using Virtual SAN 6.1
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
20150531 virtualizatino station 2.0 partner's day
Devconf2017 - Can VMs networking benefit from DPDK
Automating Your CloudStack Cloud with Puppet
Road show 2015 triangle meetup
Virtualizing Apache Spark with Justin Murray
Xen Virtualization 2008
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-Device
Presentation oracle rac on vsphere 5
Virtual Distro Dispatcher - A light-weight Desktop-as-a-Service solution
What we unlearned_and_learned_by_moving_from_m9000_to_ssc_ukoug2014
MongoDB – Sharded cluster tutorial - Percona Europe 2017
Ad

More from Chiou-Nan Chen (20)

PDF
Moving NEON to 64 bits
PDF
64-bit Android
PDF
Intelligent Power Allocation
PPTX
3. v sphere big data extensions
PPTX
2. hadoop
PPTX
1. beyond mission critical virtualizing big data and hadoop
PPTX
5. pivotal hd 2013
PDF
Emc keynote 1130 1200
PDF
Emc keynote 1030 1130
PDF
Emc keynote 0945 1030
PDF
Emc keynote 0930 0945
PDF
102 1600-1630
PDF
102 1530-1600
PDF
102 1430-1445
PDF
102 1315-1345
PDF
102 1630 1700
PDF
102 1445 1515
PDF
101 cd 1630-1700
PDF
101 cd 1600-1630
PDF
101 cd 1445-1515
Moving NEON to 64 bits
64-bit Android
Intelligent Power Allocation
3. v sphere big data extensions
2. hadoop
1. beyond mission critical virtualizing big data and hadoop
5. pivotal hd 2013
Emc keynote 1130 1200
Emc keynote 1030 1130
Emc keynote 0945 1030
Emc keynote 0930 0945
102 1600-1630
102 1530-1600
102 1430-1445
102 1315-1345
102 1630 1700
102 1445 1515
101 cd 1630-1700
101 cd 1600-1630
101 cd 1445-1515

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Big Data Technologies - Introduction.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Cloud computing and distributed systems.
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Big Data Technologies - Introduction.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Cloud computing and distributed systems.
NewMind AI Weekly Chronicles - August'25 Week I
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

4. v sphere big data extensions hadoop

  • 1. © 2009 VMware Inc. All rights reserved vSphere Big Data Extensions 之 Hadoop 参考架构和性能最佳实践 李欣慧 大数据研发高级工程师 VMware 中国研发中心
  • 3. 3 Virtualization Host VMDK Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration on Single Worker VMDKVMDK Ext4 Ext4 Ext4 Ext4
  • 4. 4 Standard Deployment Configuration on Single Worker Virtualization Host VMDK Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir VMDKVMDK Ext4 Ext4 Ext4 Ext4
  • 5. 5 Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration
  • 6. 6 Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4 Task- tracker Ext4 Ext4 Ext4 mapred.local.dir Standard Deployment Configuration
  • 7. 7 Virtualization Host OS Image – VMDK Hadoop Virtual Node 1 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK … … Standard Deployment Configuration for D/C Separation
  • 8. 8 Data Path for Combined vs. Data/Compute Separation Virtualization Host Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Hadoop Virtual NodeHadoop Virtual Node Virtual Switch TaskTrackerTaskTracker  Serengeti provide local storage based temp for D/C separation. • Each compute VM needs its own temp space • Required temp space is different from an application to another • Can result in wasted space
  • 9. 9 Recommended Topology of Data/Compute Separation Virtualization Host VMDKOS Image – VMDK Hadoop Virtual Node 1 Ext4 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode VMDK Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 …
  • 10. 10 Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Virtualization Host Hadoop Virtual Node 1 Hadoop Virtual Node 2 TaskTrackerTaskTracker Virtual Switch Data Path for Local TT Storage vs. NFS Temp  Serengeti provide NFS based temp for D/C separation • Improve local storage space utilization. • Trade-off between bandwidth efficiency vs. overhead of NFS.
  • 11. 11 Consolidated Storage on Single DN VM Virtualization Host OS Image – VMDK Hadoop Virtual Node 1 Task- tracker Shared storage SAN/NAS Local disks OS Image – VMDK VMDK VMDK VMDK VMDK VMDK VMDK VMDK Hadoop Virtual Node 2 Datanode Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4dirdirdirdirdirdirdirdir VMDK … … NFS Client NFS Server
  • 12. 12 Recommended Topology of Computing Only Cluster Virtualization Host OS Image – VMDK Shared storage SAN/NAS OS Image – VMDK Hadoop Virtual Node 2 Datanode Ext4 Hadoop Virtual Node 1 Task- tracker Ext4Ext4Ext4Ext4Ext4Ext4Ext4Ext4 … VMDK VMDK VMDK VMDK VMDK VMDK VMDKVMDK VMDK
  • 13. 13 Plan Your Cluster  Start with a small cluster and grow it as required • Initially just four or six nodes • Increase amount of computation/data/memory as required • Available space of HDFS = (DFS Remaining . value * 95%)/ dfs.replication.value  Choose right hardware – master node • Namenode and Jobtracker often run on same machine for smaller clusters • Consider HA/FT settings • separate NameNode and Jobtracker from slave nodes’ host. • Dual power supplies
  • 14. 14 Plan Your Cluster  Choose right hardware – slave node • 2 * Quad-core CPUs at least, HT enabled • RAM • Consider 6% overhead for virtualization • Recommend 4-8 GB memory per core • Storage • At least 8 disks per host, 12 disks per host may be ideal for absolute performance but probably not for price-performance. • Recommend 1-1.5 disks per core • JBOD, SATA RPM7,200 is fine • A good practical maximum is 24TB or 36TB per slave node. More than that will result in massive network traffic if a node dies and block re-replication must take place.
  • 15. 15 Plan Your Cluster  Networking • Use dedicate switches for your Hadoop cluster and Nodes are connected to a top-of-rack switch • Nodes should be connected at a minimum speed of 1Gb/sec and consider 10Gb/sec for clusters with large scale of intermediate data • Racks are interconnected via core switches • Core switches should connect to top-of-rack switches by dual 10Gb/sec links • Redundant top-of-rack switches, core switches • Separate management network and vm network • Adopt vDS and dvport groups that span hosts and ensure configuration consistency for vms and virtual ports for functions of Vmotion and network storage • Leave the management port out of your vDS
  • 16. 16 Virtualization Host Networking Configurations – Four 1G NICs vmnic 0 pSwitch 1 Virtual Switch 1 Hadoop cluster VM portgroup vmnic 1 pSwitch 2 Virtual Switch 0 MGMT 192.168.1.100 VMOTION 192.168.3.100 FT 192.168.4.100 VMKERNEL 192.168.2.100 vmnic 3  Hadoop vm traffic goes through vSwitch1 (vmnic2 and vmnic3, both active)  On vSwitch0, it goes through MGMT, VM kernel on vmnic0(active, vmnic1 on standby)  vMotion and FT on vmnic1 (active, vmnic0 on standby) 1Gbs 1Gbs vmnic 2 1Gbs 1Gbs
  • 17. 17 Virtualization Host Networking Configurations -10G for Hadoop VMs vmnic 0 pSwitch 1 Virtual Switch 1 Hadoop cluster VM portgroup vmnic 1 pSwitch 2 Virtual Switch 0 MGMT 192.168.1.100 VMOTION 192.168.3.100 FT 192.168.4.100 VMKERNEL 192.168.2.100 vmnic 2  Hadoop vm traffic goes through vSwitch1 (vmnic3)  10G for Hadop cluster vms • more performance benefits • If any need, keep redundancy with the other suit of vmnic /pSwitch  Keep redundancy for management network pSwitch 3 1Gbs 1Gbs 10 GBe
  • 18. 18 vSphere Configurations  Configure hosts with NTP service and to ensure the time on all the nodes is synchronized  Virtual Disk Settings • One datastore per physical disk • Warm-up is needed on the provisioned cluster  NUMA scheduler important for virtualized Hadoop performance • Poor configuration can result in 12%(1) performance degradation • Data VM preferably should be distributed across NUMA nodes  Provision right VM size • Reserve 6% memory for vSphere usage • Avoid over-commitment • Enable NUMA and keep VM size within the NUMA node
  • 19. 19 For Existing Devices  Crudely fit existing resource capacity for Hadoop • CPU : RAM : Throughput - 4*1333MHZ: 32G: 800M/s  Use powerful machine to run master node/computing node  Use high throughput machine for slave node/data node

Editor's Notes

  • #5: Combined model refers to TaskTracker runs in the same node with DataNode System disks are put on shared storage (or local storage) without split to leverage extensibility, HA/FT, vMotion features. Data disks are suggested to put on local storage, which will be split with available datastores and partitioned with peered vms for performance consideration At least 8 disks per host.
  • #10: SAN/NAS storage based temp for D/C separation deployment: Pro: extensibility of temp space Con: performance downgrade
  • #13: Shared storage based Datanode Local storage based computing only cluster
  • #17: Please refer to “VMware vSphere Design” document on networking part.