Fca Product Overview Feb222010 As

Voltaire Fabric Collective Accelerator ™ (FCA) Accelerate your Fabric

The Challenge: Collective Operations Performance Collective operations take large amount of the application run time and don’t scale well System/OS “noise” affects scalability Simple offload solutions DON’T address the key problems: Network congestion due to “All-to-All” communication Computation & messaging performance Difficult to manage and orchestrate Poor application scalability and low cluster efficiency

Collective Communication Portion of MPI Runtime Percentage Collective Operations % of MPI Job Runtime

Introducing Voltaire Fabric Collective Accelerator (FCA) ……… . CPU in switch used to offload collective operations Collective tree & Rank placement optimized to the topology Use of IB multicast for result distribution Inter-core communication optimized + + + + + +

FCA Solution Architecture First fully integrated solution to offload collectives Combines intelligence on server, switches, and management UFM™ - Automates fabric collective offload/monitoring and integrates with schedulers Voltaire “smart” switch based CPUs perform reduction and messaging operation Voltaire OMA (Open MPI plug-in) - Addresses host side collective (multi-core)

FCA Addressing the Problem End To End Increase performance, reduce congestion Reduce fabric traffic to single message per wire, dramatically reduce congestion FCA offload “shields” collective operation from node “noise” Enable non-blocking collective (overlap communication and calculation) Linear scalability to many thousands of nodes with predictable hardware performance Simple, fully integrated No change in application – OMA drop-in Open MPI plug-in Switches come equipped with FCA offload code out of the box UFM automates the process and integrates with scheduler, saving setup burden Fully integrated monitoring capabilities FCA reduced collective operations runtime by up to 100X 11K nodes MPI collective operations within 25 usec

FCA Preliminary Performance Results 78% 66%

FCA What is the alternative/competitive solution? A Fabric Wide Challenge requires a Fabric Wide Solution Integration with Job Schedulers Network Congestion Elimination Fabric switches offload computation 1-2% 30-40% Expected MPI Job runtime Improvement OS “noise” reduction Result distribution based on IB multicast Topology aware NIC-based offload FCA

FCA Bringing InfiniBand to Capability Clusters Scalability of collective operations has been limiting the reach of InfiniBand when it comes to capability computing FCA is the first and only solution in the market allowing collective operations to efficiently scale to thousand of ranks Voltaire is the only standard-based, high performance fabric solution suitable for both capability & capacity computing Price/Complexity Performance Capacity Capability

Voltaire Fabric Collective Accelerator Summary Fabric computing offload Combination of SW & HW in a single solution Offloading blocking computational tasks Algorithms leveraging the topology for computation (trees) Extreme MPI performance & scalability Capability computing on commodity clusters Two orders of magnitude, ten-times faster in Collective runtime Linear scalability (O18) Transparent to the application Standard Open MPI plug-in Plug & play - No need for do any code changes Simple SDK for integration with other MPIs Accelerate your Fabric!

Fca Product Overview Feb222010 As

More Related Content

Similar to Fca Product Overview Feb222010 As (20)

More from Chris O'Neal (20)

Fca Product Overview Feb222010 As