[9-6-2016] Openlab Poster-v3

Ioannis.Charalampidis@cern.ch
Lazaros.Lazaridis@cern.ch
CERN, June 2016
libFabric
ofi://
Transport
nanomsg
ALFA
usNIC
FairMQ
For the ALICE O2 upgrade, the simulation and
reconstruction software for the ALICE
experiment are using the ALFA1 framework.
Among other abstractions, ALFA/FairROOT
framework provides a Message-Queue
abstraction library, the FairMQ2, that is
lightweight wrapper around ØMQ and NanoMsg
libraries.
Since the project’s goal is to implement a new
transport for FairMQ we decided to extend the
functionality of either of these libraries.
We chose NanoMsg3 because of it’s
clean and modular internals.
Introduc)on
ØMQ
fi_send(
endpoint,
buffer, len,
mr_desc,
context
);
buffer … ..
fi_recv(
endpoint,
buffer, len,
mr_desc,
context
);
buffer … ..
RDMA*
Memory Region Memory Region
fi_send(
endpoint,
buffer, len,
mr_desc,
context
);
fi_recv(
endpoint,
buffer, len,
mr_desc,
context
);
Tx CQ Rx CQ Tx CQ Rx CQ
SEND
ACK
fi_cq_read( &event ); fi_cq_read( &event );
* libfabric has custom event polling func?ons
One of the powerful features of the usNIC
fabric is the fact that it can bypass the linux
kernel from user-space when using the
libFabric4 library.
This relieves the kernel from the IP stack
overhead, reclaiming it’s CPU time for more
useful operations.
usNIC + Kernel Bypass
The ofi:// Transport
The project is implemented as a patch to the
NanoMsg sources5 that introduces the Open
Fabrics Interface (OFI) transport.
The transport translates the POSIX-like API of
NanoMsg into an RDMA-like API for libFabric,
transparently to the user. To achieve this it
uses a dynamic memory registration (MR)
mechanism that tries to reduce the amount of
MR performed, while being agnostic of the
user’s intentions.
1.  Technical Design Report for the Upgrade of the Online–
Offline Computing System, The ALICE Collaboration
2.  https://github.com/FairRootGroup/FairRoot/tree/master/
fairmq
3.  http://nanomsg.org
4.  http://ofiwg.github.io/libfabric
5.  https://github.com/wavesoft/nanomsg-transport-ofi
6.  https://github.com/wavesoft/robob
7.  https://github.com/ofiwg/libfabric/issues?q=author
%3Awavesoft
8.  https://github.com/nanomsg/nanomsg/pull/612
In a similar manner, it uses the high-level
libFabric polling API, instead of the FD - based
NanoMsg polling API, making it possible to
support any fabric without any modification.
True Zero-Copy
One of the implementation requirements of the
OFI transport was to ensure that no memcpy
operations will take place between the user’s
request and the transfer on the wire.
That’s reasonable if you consider
that the message sizes vary
from 50Mb to 1Gb
A useful by-product of this project was the
development of robob6, a fully automated benchmarking
utility, for ensuring the quality of the measured values
By-Products
Outcomes
We frequently encountered roadblocks while
working this project, since we were using
new products and open source components.
We had frequent interactions with NanoMsg
and CISCO developers7 and we contributed
our own modifications8.
Nonetheless we managed to create a
prototype were we demonstrated the
feasibility of the transport and it’s
performance.
On the right we present some preliminary
measurements using the OFI transport
between two Intel Xeon E5-2690 machines
with UCSC-PCIE-C40Q NICs, connected
through switch with a 40Gbit copper cable. 0
5
10
15
20
25
30
35
8192 16384 32768 65536 1048576 2097152 4194304 8388608 16777216 33554432 67108864 134217728
Throughput (GB/s) for Different Message Sizes
OFI [GBit/s] TCP [Gbit/s] ØMQ [Gbit/s]
www.cern.ch/openlab
Poster by Ioannis Charalampidis. Special thanks
to our supervisor, Predrag Buncic, to Artur
Barczyk for his guidance, to Mohammad Al-
Turany and Peter Hristov

[9-6-2016] Openlab Poster-v3

More Related Content

What's hot (20)

Similar to [9-6-2016] Openlab Poster-v3 (20)

[9-6-2016] Openlab Poster-v3