Voltaire Fabric Collective Accelerator ™ (FCA) Accelerate your Fabric
The Challenge: Collective Operations Performance Collective operations take large amount of the application run time and don’t scale well System/OS “noise” affects scalability Simple offload solutions DON’T address the key problems: Network congestion due to “All-to-All” communication Computation & messaging performance Difficult to manage and orchestrate  Poor application scalability and low cluster efficiency
Collective Communication Portion of MPI Runtime Percentage Collective Operations % of MPI Job Runtime
Introducing   Voltaire Fabric Collective Accelerator (FCA) ……… . CPU in switch used to offload collective operations Collective tree & Rank placement optimized to the topology Use of IB multicast for result distribution Inter-core communication optimized +   +   +   +   +   +
FCA Solution Architecture First fully integrated solution to offload collectives Combines intelligence on server, switches, and management UFM™  - Automates fabric collective offload/monitoring and integrates with schedulers Voltaire “smart”  switch  based CPUs perform reduction and messaging operation Voltaire  OMA  (Open MPI plug-in) - Addresses host side collective (multi-core)
FCA Addressing the Problem End To End Increase performance, reduce congestion Reduce fabric traffic to single message per wire, dramatically reduce congestion FCA offload “shields” collective operation from node “noise” Enable non-blocking collective (overlap communication and calculation) Linear scalability to many thousands of nodes with predictable hardware performance  Simple, fully integrated  No change in application – OMA drop-in Open MPI plug-in Switches come equipped with FCA offload code out of the box UFM automates the process and integrates with scheduler, saving setup burden Fully integrated monitoring capabilities  FCA reduced collective operations runtime by up to 100X 11K nodes MPI collective operations within 25 usec
FCA Preliminary Performance Results 78% 66%
FCA What is the alternative/competitive solution? A Fabric Wide Challenge requires a Fabric Wide Solution Integration with Job Schedulers Network Congestion Elimination Fabric switches offload computation 1-2% 30-40% Expected MPI Job runtime Improvement OS “noise” reduction Result distribution based on IB multicast Topology aware NIC-based offload FCA
FCA Bringing InfiniBand to Capability Clusters Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks  Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing Price/Complexity Performance Capacity Capability
Voltaire Fabric Collective Accelerator Summary Fabric computing offload  Combination of SW & HW in a single solution Offloading blocking computational tasks Algorithms leveraging the topology for computation (trees) Extreme MPI performance & scalability Capability computing on commodity clusters Two orders of magnitude, ten-times faster in Collective runtime Linear scalability (O18) Transparent to the application Standard Open MPI plug-in Plug & play - No need for do any code changes Simple SDK for integration with other MPIs Accelerate your Fabric!
Thank You

More Related Content

PPTX
MATLAB LTE Toolbox Projects Research Help
PPTX
Trimble
PPT
3D-DRESD CiTiES
PDF
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
PDF
Unlock your power with 4 dimensional actionable intelligence
PDF
PPT
Topic2-Network_FlexRayCommBus_20111013.ppt
PPT
Pushing the limits of Controller Area Network (CAN)
MATLAB LTE Toolbox Projects Research Help
Trimble
3D-DRESD CiTiES
Recent MIP Performance Improvements in IBM ILOG CPLEX Optimization Studio
Unlock your power with 4 dimensional actionable intelligence
Topic2-Network_FlexRayCommBus_20111013.ppt
Pushing the limits of Controller Area Network (CAN)

Similar to Fca Product Overview Feb222010 As (20)

PDF
E pmp for public presentation feb 27 2015
PPTX
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
PDF
Voltaire ufm en_nov10
PDF
Airspan: Network Densification using Outdoor and Indoor Small Cells
PDF
Presentatie Festo
PPTX
Multi-layer Control Plane
PDF
IBM MQ - better application performance
PPT
Genesys System - 8dec2010
PDF
IRJET- Low Complexity FBMC OQAM Method for Double Selective Channel
PPT
Programmable I/O Controllers as Data Center Sensor Networks
PPT
High Performance Communication for Oracle using InfiniBand
PDF
A guide to Computer on Modules (COMs)
PPTX
Data center network reference architecture with hpe flex fabric
PDF
lecture Instrumentation - Sensors and actorsPLC.pdf
PDF
PI Technologies for Process Automation
PDF
Optical network evolution
PPTX
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
PPTX
Power path viewer_technical_presentation
PPTX
Eric Sorenson Keynote
PPT
IEEE LCN 2007: Kalman Graffi - Overlay Bandwidth Management: Scheduling and A...
E pmp for public presentation feb 27 2015
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
Voltaire ufm en_nov10
Airspan: Network Densification using Outdoor and Indoor Small Cells
Presentatie Festo
Multi-layer Control Plane
IBM MQ - better application performance
Genesys System - 8dec2010
IRJET- Low Complexity FBMC OQAM Method for Double Selective Channel
Programmable I/O Controllers as Data Center Sensor Networks
High Performance Communication for Oracle using InfiniBand
A guide to Computer on Modules (COMs)
Data center network reference architecture with hpe flex fabric
lecture Instrumentation - Sensors and actorsPLC.pdf
PI Technologies for Process Automation
Optical network evolution
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Power path viewer_technical_presentation
Eric Sorenson Keynote
IEEE LCN 2007: Kalman Graffi - Overlay Bandwidth Management: Scheduling and A...
Ad

More from Chris O'Neal (20)

PDF
Nano hub u-nanoscaletransistors
PDF
Intel Xeon Phi Hotchips Architecture Presentation
PDF
236341 Idc How Nations Are Using Hpc August 2012
PPT
My Ocean Breve
PDF
Incite Ir Final 7 19 11
PDF
Ersa11 Holland
PDF
Cloud Computing White Paper
PDF
Dell Hpc Leadership
PPTX
Idc Eu Study Slides 10.9.2010
PDF
Tolly210137 Force10 Networks E1200i Energy
PDF
IDC: EU HPC Strategy
PDF
Tpc Energy Publications July 2 10 B
PDF
Coffee break
PDF
Tachion
PDF
Longbiofuel
PDF
Casl Fact Sht
PDF
Fujitsu_ISC10
PPT
Rogue Wave Corporate Vision(P) 5.19.10
PDF
Hpc R2 Beta2 Press Deck 2010 04 07
PPTX
Q Dell M23 Leap V2x
Nano hub u-nanoscaletransistors
Intel Xeon Phi Hotchips Architecture Presentation
236341 Idc How Nations Are Using Hpc August 2012
My Ocean Breve
Incite Ir Final 7 19 11
Ersa11 Holland
Cloud Computing White Paper
Dell Hpc Leadership
Idc Eu Study Slides 10.9.2010
Tolly210137 Force10 Networks E1200i Energy
IDC: EU HPC Strategy
Tpc Energy Publications July 2 10 B
Coffee break
Tachion
Longbiofuel
Casl Fact Sht
Fujitsu_ISC10
Rogue Wave Corporate Vision(P) 5.19.10
Hpc R2 Beta2 Press Deck 2010 04 07
Q Dell M23 Leap V2x
Ad

Fca Product Overview Feb222010 As

  • 1. Voltaire Fabric Collective Accelerator ™ (FCA) Accelerate your Fabric
  • 2. The Challenge: Collective Operations Performance Collective operations take large amount of the application run time and don’t scale well System/OS “noise” affects scalability Simple offload solutions DON’T address the key problems: Network congestion due to “All-to-All” communication Computation & messaging performance Difficult to manage and orchestrate Poor application scalability and low cluster efficiency
  • 3. Collective Communication Portion of MPI Runtime Percentage Collective Operations % of MPI Job Runtime
  • 4. Introducing Voltaire Fabric Collective Accelerator (FCA) ……… . CPU in switch used to offload collective operations Collective tree & Rank placement optimized to the topology Use of IB multicast for result distribution Inter-core communication optimized + + + + + +
  • 5. FCA Solution Architecture First fully integrated solution to offload collectives Combines intelligence on server, switches, and management UFM™ - Automates fabric collective offload/monitoring and integrates with schedulers Voltaire “smart” switch based CPUs perform reduction and messaging operation Voltaire OMA (Open MPI plug-in) - Addresses host side collective (multi-core)
  • 6. FCA Addressing the Problem End To End Increase performance, reduce congestion Reduce fabric traffic to single message per wire, dramatically reduce congestion FCA offload “shields” collective operation from node “noise” Enable non-blocking collective (overlap communication and calculation) Linear scalability to many thousands of nodes with predictable hardware performance Simple, fully integrated No change in application – OMA drop-in Open MPI plug-in Switches come equipped with FCA offload code out of the box UFM automates the process and integrates with scheduler, saving setup burden Fully integrated monitoring capabilities FCA reduced collective operations runtime by up to 100X 11K nodes MPI collective operations within 25 usec
  • 7. FCA Preliminary Performance Results 78% 66%
  • 8. FCA What is the alternative/competitive solution? A Fabric Wide Challenge requires a Fabric Wide Solution Integration with Job Schedulers Network Congestion Elimination Fabric switches offload computation 1-2% 30-40% Expected MPI Job runtime Improvement OS “noise” reduction Result distribution based on IB multicast Topology aware NIC-based offload FCA
  • 9. FCA Bringing InfiniBand to Capability Clusters Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing Price/Complexity Performance Capacity Capability
  • 10. Voltaire Fabric Collective Accelerator Summary Fabric computing offload Combination of SW & HW in a single solution Offloading blocking computational tasks Algorithms leveraging the topology for computation (trees) Extreme MPI performance & scalability Capability computing on commodity clusters Two orders of magnitude, ten-times faster in Collective runtime Linear scalability (O18) Transparent to the application Standard Open MPI plug-in Plug & play - No need for do any code changes Simple SDK for integration with other MPIs Accelerate your Fabric!