SlideShare a Scribd company logo
Best Practices in Deploying and Managing
HPC Clusters with Intel® Xeon Phi™
Louise Westoby
WW Marketing Manager, IBM Platform Computing
June 18, 2013
Business Innovation Stressing IT
Objective: Gain competitive advantage
• Innovate with more complex applications / simulations / analytics
• Long processing limits number of iterations in a given time period
• Explosion of data improves results but adds complexity
• Delays and high cost of adding new applications
• Difficult to use systemsEnd Users /
Business
Objective: Reduce cost while maintaining service
• Infrastructure silos to meet peak service level requirements
• CapEx and OpEx budget growth constrained
• Infrastructure issues – power/cooling, space, etc.
• Rise of lower cost resources (x86) and virtualization
• Evolving trend toward heterogeneous, multi-core programming models
IT
Organizations
VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Application
Businesses need to overcome infrastructure limitations to
maximize the value of compute and data-intensive applications
Application
Examples
• Simulation
• Analysis
• Design
• Big data
IT constrained
• Long wait times
• Low utilization
• IT Sprawl
IBM Platform Computing
Software
Big Data / Hadoop
Simulation and
Modeling
Analytics
Today Future
Make lots of
computers look like
“one”
Prioritized matching
of supply with
demand
Benefits
• High utilization
• Throughput
• Performance
• Prioritization
• Reduced cost
Repeated for
many
applications
and groups
• Clusters
• Grid
• HPC Cloud
VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Faster time
to results
Use fewer
resources
HPC Cloud / Cluster
Mgmt
Complete range of technical computing management software to
maximize high performance applications
Workload and
Resource
Management
Data
Management
Infrastructure
Management
Platform LSF Family
Batch, MPI workloads with process
mgmt, monitoring, analytics, user
portal, license mgmt
Platform HPC
Simplified, integrated HPC
management software for batch, MPI
workloads integrated with systems
Platform Symphony Family
High throughput, near ‘real time’
parallel compute and Big Data /
MapReduce workloads
Big Data /
Hadoop
Simulation /
Modeling
AnalyticsApplications
Heterogeneous
Resources
Compute Storage Network
Virtual, Physical, Desktop, Server, Cloud
Platform Cluster Manager Family
Provision and manage
Single Cluster (Standard) to Dynamic Clouds (Advanced)
General Parallel File System (GPFS)
High performance, distributed parallel file system
System X and Platform Computing: better together
Reference Ecosystem – Leverage the tight integration between IBM System x,
Platform Computing software and Intel technology
RHEL MS
System X
App App App
Q Logic
InfiniBand
Intel
Xeon
Intel
Xeon Phi
Intel
Intel Cluster
Ready
IBM Platform
Computing
Leveraging Platform HPC to properly provision and configure
Xeon Phi environment
Add Intel MPSS
packages to the
repository
Create provisioning
template to include
MPSS package
Provision all nodes
with Xeon Phi cards
Generate MPSS
configuration on
nodes with Xeon Phi
Create network
bridge & configure
Xeon Phi network
Start mpss service
automatically on
system boot up
1. Provision nodes and install MPSS
2. Install Intel® Xeon Phi ®compilers and run time software
3. Configure Platform HPC ELIM
Levering Platform LSF or Platform HPC to simplify
scheduling of Intel® Xeon Phi™ jobs
• Job can be submitted by specifying the following
metrics:
– Number of Xeon Phi cards required on each node
– Any metrics the Xeon Phi ELIM collects
• Job will be placed on nodes with available Xeon Phi
cards that meet the resource requirements
– Numerate Xeon Phi card on a node allowing multiple
jobs running on the same node using designated cards
• Agnostic to Xeon Phi execution mode (offload, native,
etc.)
• Job information
– Indication of which Xeon Phi cards are used
Collecting Xeon Phi Metrics
• Total number of cards per
node
• Number of cores per
accelerator
• Core temperature (Celsius)
• Frequency (GHz)
• Total power (Watts)
• Total Free memory (MB)
Cluster Node
Platform HPC monitoring system
• Single agent for both resource monitoring
and resource management
• Based on 20 years of Platform technology
– Light weight and small footprint
– Scalable
– Robust
– Extendable
– Fully automated failover
• Added monitoring metrics shown in
Platform HPC web GUI automatically
• Added monitoring metrics can be used to
define alerts
LIM
Xeon Phi
ELIM
GPU ELIM
Other
ELIMs
Management Node
Master
LIM PERF:
Monitoring &
Reporting
Master
Scheduler
Mudpot: Intel® Xeon Phi™ Cluster is used for advanced
computing at the NCAR Wyoming Supercomputing Center
9
IBM Platform LSF Leveraged at NCAR to manage complex,
heterogeneous compute environment
• From user POV there is one
place to submit jobs, regardless
of resource
• Different queues depending on
job type (e.g. regular, bigmem,
gpgpu)
• Allows multistage jobs to run on
multiple resources
– Large model run on
Yellowstone
– Dependent Data-Analysis Run
on Geyser
• Sharing between projects
managed transparently
10
Thank you!

More Related Content

PPTX
High performance computing with accelarators
PDF
High Performance Computing using MPI
PDF
Introduction to High Performance Computing
PPTX
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
PDF
GSM UMTS LTE Site Commissioning software
PDF
Accela Ericsson Rehome Module
PDF
Deeplearningusingcloudpakfordata
PPTX
High performance computing
High performance computing with accelarators
High Performance Computing using MPI
Introduction to High Performance Computing
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
GSM UMTS LTE Site Commissioning software
Accela Ericsson Rehome Module
Deeplearningusingcloudpakfordata
High performance computing

What's hot (20)

PDF
IBM BOA for POWER
PDF
Accela NSN Site NodeB Rehome
PPTX
High performance computing
PDF
High performance computing tutorial, with checklist and tips to optimize clus...
PPT
High Performance Computing
PPTX
High performance computing for research
PDF
High Performance Computing: an Introduction for the Society of Actuaries
PPTX
Atempo Big Data protection at ISC 2018 - June 2018
PDF
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
PDF
Unified NMS platform (Single OSS/M2000/Netact Platform)
PPTX
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
PDF
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
PPT
Comet Cloud
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PPTX
NS4: Enabling Programmable Data Plane Simulation
PDF
RAPIDS, GPUs & Python - AWS Community Day Melbourne
PPTX
1. OCP workshop
PDF
OpenPOWER Webinar on Machine Learning for Academic Research
PDF
High Performance Computing
PDF
High performance computing - building blocks, production & perspective
IBM BOA for POWER
Accela NSN Site NodeB Rehome
High performance computing
High performance computing tutorial, with checklist and tips to optimize clus...
High Performance Computing
High performance computing for research
High Performance Computing: an Introduction for the Society of Actuaries
Atempo Big Data protection at ISC 2018 - June 2018
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Unified NMS platform (Single OSS/M2000/Netact Platform)
EXTENT-2017: Heterogeneous Computing Trends and Business Value Creation
Performing Simulation-Based, Real-time Decision Making with Cloud HPC
Comet Cloud
How the Development Bank of Singapore solves on-prem compute capacity challen...
NS4: Enabling Programmable Data Plane Simulation
RAPIDS, GPUs & Python - AWS Community Day Melbourne
1. OCP workshop
OpenPOWER Webinar on Machine Learning for Academic Research
High Performance Computing
High performance computing - building blocks, production & perspective
Ad

Similar to Deploying and Managing HPC Clusters with IBM Platform and Intel Xeon Phi Coprocessor (20)

PDF
HPC Compass IBM Special 2013/14
PDF
HPC kompass ibm_special_2013/2014
PDF
Ibm special hpc
PDF
Best Practices for On-Demand HPC in Enterprises
PPT
Xtw01t7v021711 cluster
PDF
Deployment of an HPC Cloud based on Intel hardware
PDF
How the IBM Platform LSF Architecture Accelerates Technical Computing
PDF
Could the “C” in HPC stand for Cloud?
PDF
Maxwell siuc hpc_description_tutorial
PPTX
Software Defined Infrastructure
PDF
HPC HUB - Virtual Supercomputer on Demand
PDF
Intelligent Cluster Data Sheet
PDF
IBM Intelligent Cluster-Data Sheet
PPTX
HPE Hybrid HPC strategy including UberCloud Containers
PDF
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
PPTX
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
PDF
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
PDF
Fujitsu industry ready Intel Xeon Phi Coprocessor based solutions
PDF
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
HPC Compass IBM Special 2013/14
HPC kompass ibm_special_2013/2014
Ibm special hpc
Best Practices for On-Demand HPC in Enterprises
Xtw01t7v021711 cluster
Deployment of an HPC Cloud based on Intel hardware
How the IBM Platform LSF Architecture Accelerates Technical Computing
Could the “C” in HPC stand for Cloud?
Maxwell siuc hpc_description_tutorial
Software Defined Infrastructure
HPC HUB - Virtual Supercomputer on Demand
Intelligent Cluster Data Sheet
IBM Intelligent Cluster-Data Sheet
HPE Hybrid HPC strategy including UberCloud Containers
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
Bright Cluster Manager: A Comprehensive, Integrated Management Solution for P...
XPDS16: High-Performance Virtualization for HPC Cloud on Xen - Jun Nakajima &...
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Fujitsu industry ready Intel Xeon Phi Coprocessor based solutions
Pivotal: Operationalizing 1000 Node Hadoop Cluster - Analytics Workbench
 
Ad

More from Intel IT Center (20)

PDF
AI Crash Course- Supercomputing
PPTX
FPGA Inference - DellEMC SURFsara
PDF
High Memory Bandwidth Demo @ One Intel Station
PDF
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
PDF
Disrupt Hackers With Robust User Authentication
PDF
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
PDF
Harness Digital Disruption to Create 2022’s Workplace Today
PPTX
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
PDF
Achieve Unconstrained Collaboration in a Digital World
PDF
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
PDF
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
PPTX
Identity Protection for the Digital Age
PDF
Three Steps to Making a Digital Workplace a Reality
PDF
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
PDF
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
PDF
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
AI Crash Course- Supercomputing
FPGA Inference - DellEMC SURFsara
High Memory Bandwidth Demo @ One Intel Station
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
Disrupt Hackers With Robust User Authentication
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Harness Digital Disruption to Create 2022’s Workplace Today
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Achieve Unconstrained Collaboration in a Digital World
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
Identity Protection for the Digital Age
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
Diabetes mellitus diagnosis method based random forest with bat algorithm
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Dropbox Q2 2025 Financial Results & Investor Presentation
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
MIND Revenue Release Quarter 2 2025 Press Release
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Per capita expenditure prediction using model stacking based on satellite ima...
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)

Deploying and Managing HPC Clusters with IBM Platform and Intel Xeon Phi Coprocessor

  • 1. Best Practices in Deploying and Managing HPC Clusters with Intel® Xeon Phi™ Louise Westoby WW Marketing Manager, IBM Platform Computing June 18, 2013
  • 2. Business Innovation Stressing IT Objective: Gain competitive advantage • Innovate with more complex applications / simulations / analytics • Long processing limits number of iterations in a given time period • Explosion of data improves results but adds complexity • Delays and high cost of adding new applications • Difficult to use systemsEnd Users / Business Objective: Reduce cost while maintaining service • Infrastructure silos to meet peak service level requirements • CapEx and OpEx budget growth constrained • Infrastructure issues – power/cooling, space, etc. • Rise of lower cost resources (x86) and virtualization • Evolving trend toward heterogeneous, multi-core programming models IT Organizations
  • 3. VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES Application Businesses need to overcome infrastructure limitations to maximize the value of compute and data-intensive applications Application Examples • Simulation • Analysis • Design • Big data IT constrained • Long wait times • Low utilization • IT Sprawl IBM Platform Computing Software Big Data / Hadoop Simulation and Modeling Analytics Today Future Make lots of computers look like “one” Prioritized matching of supply with demand Benefits • High utilization • Throughput • Performance • Prioritization • Reduced cost Repeated for many applications and groups • Clusters • Grid • HPC Cloud VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES Faster time to results Use fewer resources HPC Cloud / Cluster Mgmt
  • 4. Complete range of technical computing management software to maximize high performance applications Workload and Resource Management Data Management Infrastructure Management Platform LSF Family Batch, MPI workloads with process mgmt, monitoring, analytics, user portal, license mgmt Platform HPC Simplified, integrated HPC management software for batch, MPI workloads integrated with systems Platform Symphony Family High throughput, near ‘real time’ parallel compute and Big Data / MapReduce workloads Big Data / Hadoop Simulation / Modeling AnalyticsApplications Heterogeneous Resources Compute Storage Network Virtual, Physical, Desktop, Server, Cloud Platform Cluster Manager Family Provision and manage Single Cluster (Standard) to Dynamic Clouds (Advanced) General Parallel File System (GPFS) High performance, distributed parallel file system
  • 5. System X and Platform Computing: better together Reference Ecosystem – Leverage the tight integration between IBM System x, Platform Computing software and Intel technology RHEL MS System X App App App Q Logic InfiniBand Intel Xeon Intel Xeon Phi Intel Intel Cluster Ready IBM Platform Computing
  • 6. Leveraging Platform HPC to properly provision and configure Xeon Phi environment Add Intel MPSS packages to the repository Create provisioning template to include MPSS package Provision all nodes with Xeon Phi cards Generate MPSS configuration on nodes with Xeon Phi Create network bridge & configure Xeon Phi network Start mpss service automatically on system boot up 1. Provision nodes and install MPSS 2. Install Intel® Xeon Phi ®compilers and run time software 3. Configure Platform HPC ELIM
  • 7. Levering Platform LSF or Platform HPC to simplify scheduling of Intel® Xeon Phi™ jobs • Job can be submitted by specifying the following metrics: – Number of Xeon Phi cards required on each node – Any metrics the Xeon Phi ELIM collects • Job will be placed on nodes with available Xeon Phi cards that meet the resource requirements – Numerate Xeon Phi card on a node allowing multiple jobs running on the same node using designated cards • Agnostic to Xeon Phi execution mode (offload, native, etc.) • Job information – Indication of which Xeon Phi cards are used Collecting Xeon Phi Metrics • Total number of cards per node • Number of cores per accelerator • Core temperature (Celsius) • Frequency (GHz) • Total power (Watts) • Total Free memory (MB)
  • 8. Cluster Node Platform HPC monitoring system • Single agent for both resource monitoring and resource management • Based on 20 years of Platform technology – Light weight and small footprint – Scalable – Robust – Extendable – Fully automated failover • Added monitoring metrics shown in Platform HPC web GUI automatically • Added monitoring metrics can be used to define alerts LIM Xeon Phi ELIM GPU ELIM Other ELIMs Management Node Master LIM PERF: Monitoring & Reporting Master Scheduler
  • 9. Mudpot: Intel® Xeon Phi™ Cluster is used for advanced computing at the NCAR Wyoming Supercomputing Center 9
  • 10. IBM Platform LSF Leveraged at NCAR to manage complex, heterogeneous compute environment • From user POV there is one place to submit jobs, regardless of resource • Different queues depending on job type (e.g. regular, bigmem, gpgpu) • Allows multistage jobs to run on multiple resources – Large model run on Yellowstone – Dependent Data-Analysis Run on Geyser • Sharing between projects managed transparently 10