SlideShare a Scribd company logo
A RINA light implementation
Vincenzo Maffione
20/02/2017
Introduction (1)
● A Free and Open Source light implementation of RINA for Linux
● Implementation splitted between user-space and kernel-space
● KISS approach β†’ codebase is clean and essential
● Focus:
β—‹ basic functionality - do few things but to them well
β—‹ stability and performance - support deployments with hundreds of nodes
β—‹ minimality - avoid over-engineering
● Main goal: a baseline implementation for future RINA products
● Code and documentation available at https://github.com/vmaffione/rlite
Introduction (2)
● ~ 27 Klocs (not including blanks)
β—‹ kernel-space: ~ 9 Klocs
β—‹ user-space: ~ 18 Klocs
β–  including tools and example applications
● Written mostly in C (some parts are C++ for convenience)
β—‹ C: 14 Klocs
β—‹ C++: 7 Klocs
● Network applications can be written in C
● Python bindings available to write network applications in Python
Introduction (3)
● kernel-space is implemented as a set of out-of-tree kernel modules, which run
on the unmodified Linux kernel.
β—‹ Linux Kbuild system is used to build the modules against the running kernel
β—‹ Build time (no parallel make): 3-15 seconds
● user-space is implemented as a set of shared libraries and programs
β—‹ CMake is used to configure and build libraries and executables
β—‹ Build time (no parallel make): 15-60 seconds
Basic features (1)
● Applications:
β—‹ Flow allocation and deallocation, with QoS specification
β—‹ Application registration and unregistration
β—‹ Data transfer
● Stack administration:
β—‹ Creation, deletion and configuration of IPCPs
β—‹ Registration and enrollment among IPCPs
β—‹ Monitoring and inspection
β–  inspection of IPCPs in the system
β–  inspection of RIBs
β–  per-flow statistics
Basic features (2)
● QoS (supported through DTCP):
β—‹ Flow control
β—‹ Retransmission control
β—‹ Maximum allowable gap
β—‹ Simple token-bucket rate-limiting
● Decent performance (detailed performance plots to come)
β—‹ About 9.5 Gbps on a 10 Gbit link without flow control and retransmission
β—‹ About 6 Gbps on a 10 Gbit link with flow control
β—‹ A lot of room for optimizations
● Stability indicators
β—‹ Done 10 days long VM-based experiments with up 35 nodes, two levels of normal DIFs and 50 flows
allocations per second
β—‹ Done experiments with up to 10 levels of DIFs
Architecture overview (1)
rlite-ctl uipcps daemon application
librlite-cdaplibrlite-conf
librlite
/dev/rlite /dev/rlite-io
rlite
shim-eth
shim-loopback
shim-tcp4
normal
user-space
kernel-space
shim-hv
shim-udp4
Architecture overview (2)
● kernel-space
β—‹ Supports control operations
β—‹ Implements datapath
β—‹ Keeps state
● user-space
β—‹ Libraries to abstract interaction with kernel-space functionalities
β—‹ A daemon to implement management part of (many) IPCPs
β—‹ An iproute2-like command-line tool to administer the stack
● Interactions between kernel-space and user-space only happen through character devices β†’
therefore through file descriptors
Kernel-space architecture (1)
● Supported functionalities:
β—‹ IPCP creation, deletion and configuration (kernel keeps a per-IPCP data structure)
β—‹ Flow (de)allocation (kernel keeps a per-flow data structure)
β—‹ Application (un)registration (kernel keeps a data structure for each registered application)
β—‹ RMT, DTP and DTCP components of the normal IPCP
β—‹ Shim IPCP processes (e.g. interaction with network device drivers)
● State is maintained in kernel-space:
β—‹ user-space can crash or be restarted at any time
β—‹ user-space can recover state from kernel
Kernel-space architecture (2)
● User-space interacts with kernel-space only through two character devices
β—‹ /dev/rlite for control operations
β—‹ /dev/rlite-io for data transfer and synchronization
● Consequently, interactions only happen through file descriptors
● Both are β€œcloning devices”
β—‹ each open() creates a new independent kernel-space instance
● Both devices support blocking and non-blocking operation
β—‹ Standard poll() and select() widely used with the devices
Kernel-space architecture (3)
● /dev/rlite used for control operations
β—‹ flow (de)allocation
β—‹ Application (un)registration
β—‹ IPCP creation, deletion and configuration
β—‹ Management of PDU forwarding table
β—‹ interactions between user-space and kernel-space parts of IPCPs
β—‹ inspection and monitoring operations on flows and IPCPs
β—‹ ...
Kernel-space architecture (4)
● Control operations follow a request/response paradigm:
β—‹ write() to the control device to submit a request message
β—‹ Response messages (not always present) can be read through read()
● The control device is used to avoid ioctls() and netlink
β—‹ Easier porting to other OSes (e.g. FreeBSD)
● Request and response messages are represented by packed structs and are
serialized/deserialized during the user-space ←→ kernel-space transition
β—‹ support for string (de)serialization
β—‹ support for (apn, api, aen, aei) name (de)serialization
Kernel-space architecture (5)
● /dev/rlite-io for data transfer and synchronization
β—‹ read()
β—‹ write()
β—‹ select(), poll(), epoll()
● Application workflow:
β—‹ Use the control device to allocate a flow (kernel-space object)
β—‹ Bind the flow to a newly-created data transfer file descriptor - this is the only task performed by
means of ioctl()
β—‹ Use the data transfer file descriptor to exchange SDUs and/or wait for events
β—‹ Close file descriptor to deallocate the associated flow
● Special binding mode to exchange management SDUs
Kernel-space architecture (6)
● Usual abstract factory pattern to manage different types of IPCPs
β—‹ normal: implementation of the regular IPCP
β—‹ shim-loopback: supports system-local IPC, with optional queued mode to decouple TX and RX
code-paths, and optional packet drop emulation
β—‹ shim-eth: uses network device drivers to transmit and receive SDUs, sharing the device with
the Linux network stack
β—‹ shim-udp4: tunnels RINA traffic over UDP socket; mostly implemented in user-space, only
data transfer is implemented in kernel-space
β—‹ shim-tcp4: same as shim-udp4, but using a TCP socket; deprecated, since it duplicates
flow-control and congestion control done in higher layers
β—‹ shim-hv: uses VMPI devices to transmit and receive SDUs
Some kernel-space internals
● Reference counters widely used to manage lifetime of objects (e.g. IPCPs,
flows, registered applications, PDUs)
● sk_buff-like approach to avoid copies throughout the datapath
● dynamic allocation of PDU buffers
β—‹ The amount of header space to reserve at allocation time is precomputed by the user-space
daemon, depending on the local IPCP dependency graph
● All PDU queues are limited in size to keep memory usage under control
● Deferred work (workqueues) used only when necessary, to keep latency low
β—‹ Example: driver transmission routine directly executes in the context of an application write()
system call, when possible
Architecture overview
rlite-ctl uipcps daemon application
librlite-cdaplibrlite-conf
librlite
/dev/rlite /dev/rlite-io
rlite
shim-eth
shim-loopback
shim-tcp4
normal
user-space
kernel-space
shim-hv
shim-udp4
user-space libraries
● librlite (written in C)
β—‹ main library, abstracts interactions with the rlite control device (/dev/rlite)
β—‹ provides common utilities and helpers (application names, flow specification, control
messages, ...)
β—‹ provides an API for RINA applications
● Other libraries
β—‹ librlite-conf (C): extends librlite with kernel-space IPCP management functionalities
β—‹ librlite-cdap (C++): CDAP implementation based on Google Protocol Buffer
librlite - Overview
● librlite provides API calls to interact with control device instances
β—‹ Validation, serialization and deserialization of control messages in both directions (user β†’
kernel, kernel β†’ user)
● It defines a POSIX-like APIs for applications:
β—‹ Reminiscent of the socket API, to ease porting of existing socket applications...
β—‹ … yet with the full power of RINA API (QoS support and complete naming scheme)
β—‹ Easy to learn for grown-up network developers!
β—‹ Documentation available at https://github.com/vmaffione/rlite/blob/master/include/rina/api.h
β—‹ Other resources: https://github.com/IRATI/stack/wiki/Application-API
librlite - Application API
● Main API calls:
β—‹ int rina_open() β†’ fd
β–  Opens a control device instance, returning a file descriptor.
β—‹ int rina_flow_alloc(dif_name, local_name, remote_name, flowspec, flags) β†’ fd
β–  Issues a flow allocation request and possibly wait for the associated response. Returns a file descriptor to be
used for data transfer.
β—‹ int rina_register(fd, dif_name, appl_name, flags)
β–  Register an application into a given DIF.
β—‹ int rina_register(fd, dif_name, appl_name, flags)
β–  Unregister an application from a given DIF.
β—‹ int rina_flow_accept(fd, flags) β†’ remote_appl, flowspec
β–  Wait and possibly accept an incoming flow request, where the destination application is one of the ones
registered to the control device referred by fd. Returns a file descriptor to be used for data transfer.
librlite-conf
● It is the backend for the rlite-ctl stack administration tool
● Exports the management and inspection functionalities:
β—‹ IPCP creation
β—‹ IPCP deletion
β—‹ IPCP configuration
β—‹ Fetch of current flows (with related statistics)
β—‹ Dump state of a specific flow
β—‹ Synchronization with uipcps daemon, to wait for the user-space part of an IPCP to show up
β—‹ ...
librlite-cdap
● CDAP implementation using Google Protocol Buffer as concrete syntax
● Provides CDAP message constructors, serializers and deserializers
● Provides CDAP connections object to send and receive CDAP messages
● Each CDAP connection wraps a file descriptor
β—‹ In this way CDAP can be used over arbitrary file descriptors
β—‹ Primarily meant to be used with /dev/rlite-io file descriptors
β—‹ No dependencies on other parts of rlite, can be reused as a stand-alone component
Architecture overview
rlite-ctl uipcps daemon application
librlite-cdaplibrlite-conf
librlite
/dev/rlite /dev/rlite-io
rlite
shim-eth
shim-loopback
shim-tcp4
normal
user-space
kernel-space
shim-hv
shim-udp4
Uipcps daemon - Overview
● A multi-threaded single-process daemon that implements management part of some IPCPs
● When an IPCP is created by the kernel, the daemon gets notified, and creates the corresponding
user-space IPCP (uipcp)
● For regular IPCPs, it implements:
β—‹ Flow allocation RIB objects
β—‹ Directory Forwarding Table RIB objects
β—‹ Enrollment RIB objects and enrollment state machines
β—‹ Routing RIB objects
β—‹ Address allocation RIB objects
● For shim-tudp4 IPCPs it implements UDP sockets setup and dynamic UDP port allocation
● For shim-tcp4 IPCPs it implements TCP connection setup and teardown for both client and server
side (connect(), accept(), etc.)
Uipcps daemon - Internals
● A custom event-loop thread for each IPCP
● An additional thread that implements a UNIX socket server to serve requests coming from the
rlite-ctl tool (or other future agents)
● Abstract factory pattern to manage different types of uipcps
● Reference counters used to manage uipcps lifetime
● Subsystems:
β—‹ UNIX socket server, written in C
β—‹ uipcps container for generic uipcp management (creation, deletion, …), written in C
β—‹ shim-udp4 and shim-tcp4 user-space implementation, written in C
β—‹ normal IPCP user-space implementation, written in C++ manly because of CDAP
● C++ code confined inside the uipcp-normal statically linked library.
Uipcps daemon - Subsystems
rlite-ctl
uipcp daemon
librlite-cdap
librlite
application
unix
server
uipcps
container
normal
shim
udp4
Uipcp daemon - Event loop
● A custom event-loop on top of rlite control devices
● The event-loop thread to select() over many file descriptors
β—‹ rlite control devices: when events happen on the control device, event-specific callbacks get
executed
β—‹ Other file descriptors: when an event is ready on one of those, an user-provided callback gets
executed
● Supports timers, that can be used to execute a callback after a certain
amount of time
Uipcp daemon - Advanced features
● The uipcp-containers module keeps track of the IPCPs in the local system
and the flows allocated among them
β—‹ This information is maintained in a graph of local IPCPs
β—‹ A node for each IPCP, an edge for each inter-IPCP flow
β—‹ Graph used for automatic computation of:
β–  per-IPCP Maximum SDU size (using the constraints provided by shim DIFs)
β–  per-IPCP PCI header space to be reserved at kernel buffer allocation
β—‹ Result of computation is pushed to the kernel for optimized operation
● Optional automatic re-enrollment triggers to create N-1 flows where they are
missing
rlite-ctl
● An ip-route2-like command-line tool to administer and monitor IPCP
processes
● Functionalities:
β—‹ IPCP creation and deletion
β—‹ IPCP configuration
β—‹ Registration of an IPCP to a DIF
β—‹ Enrollment between a local IPCP and a remote IPCP
β—‹ Show list of IPCPs
β—‹ Show RIB of a DIF
β—‹ Show list of flows
β—‹ Dump state of a specific flow
Common functionalities
● Common code is compiled both in user-space and kernel-space, to ease
maintenance:
β—‹ Serialization and deserialization routines of control messages across user/kernel interface
β–  Table-based serialization/deserialization, adding a new message is straightforward
β—‹ Helper functions for RINA names - (APN, API, AEN, AEI) tuples.
Available RINA application
● Example applications:
β—‹ rinaperf: multi-threaded client/server capable of parallel flow allocation, implementing basic
connectivity and performance testing: ping, request-response, unidirectional bandwidth
β—‹ rina-echo-async: single-threaded event-loop based client/server tool, capable of concurrent
flow allocation and concurrent flow management
● Real application
β—‹ nginx: RINA port of the popular Nginx server
β—‹ dropbear: RINA port of the Dropbear ssh client/server
β—‹ rina-gw: Event-loop application acting as an application gateway between a RINA network and
an IP network
β–  It forwards TCP connections over RINA flows and the other way around
Demo
● RINA/TCP gateway, to make TCP/IP world interact with RINA world
● Minimally patched Nginx Web Server runs over RINA
TCP/IP
NETWORK
Proxy host
Client host 1
Web
browser
rina-gw
Server host 1
patched
nginx
RINA NETWORK
Client host 2
Web
browser
RINA flow
TCP connection
Demo
● RINA/TCP gateway, to make TCP/IP world interact with RINA world
● Minimally patched Nginx Web Server runs over RINA
VM A
patched
nginx
VM B
rina-gw Browser
n.1.DIF (normal)
Shim-eth (e.1.DIF)
TCP

More Related Content

PPTX
IRATI: an open source RINA implementation for Linux/OS
PPTX
The hague rina-workshop-mobility-eduard
PPTX
The hague rina-workshop-intro-eduard
PDF
Multi-operator "IPC" VPN Slices: Applying RINA to Overlay Networking
PPTX
Pristine rina-sdk-icc-2016
PPTX
The hageu rina-workshop-security-peter
PPTX
3. RINA use cases, results, benefits
PPTX
The hague rina-workshop-nfv-diego
IRATI: an open source RINA implementation for Linux/OS
The hague rina-workshop-mobility-eduard
The hague rina-workshop-intro-eduard
Multi-operator "IPC" VPN Slices: Applying RINA to Overlay Networking
Pristine rina-sdk-icc-2016
The hageu rina-workshop-security-peter
3. RINA use cases, results, benefits
The hague rina-workshop-nfv-diego

What's hot (20)

PDF
RINA Introduction, part I
PPT
EU-Taiwan Workshop on 5G Research, PRISTINE introduction
PPTX
Pristine rina-security-icc-2016
PPTX
PRISTINE @ FIA Athens 2014
PPTX
Pristine glif 2015
PPTX
1. RINA motivation - TF Workshop
PPTX
Rina sdn-2016 mobility
PPTX
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
PDF
Rina p4 rina workshop
PPTX
Experimental evaluation of a RINA prototype - GC 2014
PPTX
Pristine rina-tnc-2016
PDF
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
PPTX
2. RINA overview - TF workshop
PDF
Rina acc-icc16-stein
PDF
First Contact: Can Switching to RINA save the Internet?
PPTX
The hague rina-workshop-interop-deployment_vincenzo
PPTX
RINA as a Clean-Slate Approach to Software Networks
PPTX
The hague rina-workshop-congestioncontrol-peyman
PDF
Rina sim workshop
PDF
LF_DPDK17_Abstract APIs for DPDK and ODP
RINA Introduction, part I
EU-Taiwan Workshop on 5G Research, PRISTINE introduction
Pristine rina-security-icc-2016
PRISTINE @ FIA Athens 2014
Pristine glif 2015
1. RINA motivation - TF Workshop
Rina sdn-2016 mobility
RINA motivation, introduction and IRATI goals. IEEE ANTS 2012
Rina p4 rina workshop
Experimental evaluation of a RINA prototype - GC 2014
Pristine rina-tnc-2016
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
2. RINA overview - TF workshop
Rina acc-icc16-stein
First Contact: Can Switching to RINA save the Internet?
The hague rina-workshop-interop-deployment_vincenzo
RINA as a Clean-Slate Approach to Software Networks
The hague rina-workshop-congestioncontrol-peyman
Rina sim workshop
LF_DPDK17_Abstract APIs for DPDK and ODP
Ad

Similar to Rlite software-architecture (1) (20)

PDF
Linux Services and Networking, Systemd vs Cron.pdf
PDF
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
PDF
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
Β 
PDF
Heterogeneous multiprocessing on androd and i.mx7
PDF
Linux-Internals-and-Networking
PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
PDF
ACM_Intro_Containers_Cloud.pdf Cloud.pdf
PDF
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
PDF
Kernel Recipes 2015 - So you want to write a Linux driver framework
PDF
BKK16-103 OpenCSD - Open for Business!
Β 
PDF
Free the Functions with Fn project!
PDF
DPDK In Depth
PPTX
Devicemgmt
Β 
PDF
linux_internals_2.3 (1).pdf Γ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PDF
OpenDataPlane Testing in Travis
PDF
LAS16-TR06: Remoteproc & rpmsg development
Β 
PDF
optee~--10299019iui74978429962974902774.pdf
PPTX
Tensorflow internal
DOCX
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Linux Services and Networking, Systemd vs Cron.pdf
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
HKG18-411 - Introduction to OpenAMP which is an open source solution for hete...
Β 
Heterogeneous multiprocessing on androd and i.mx7
Linux-Internals-and-Networking
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
ACM_Intro_Containers_Cloud.pdf Cloud.pdf
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Kernel Recipes 2015 - So you want to write a Linux driver framework
BKK16-103 OpenCSD - Open for Business!
Β 
Free the Functions with Fn project!
DPDK In Depth
Devicemgmt
Β 
linux_internals_2.3 (1).pdf Γ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
OpenDataPlane Testing in Travis
LAS16-TR06: Remoteproc & rpmsg development
Β 
optee~--10299019iui74978429962974902774.pdf
Tensorflow internal
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Ad

More from ARCFIRE ICT (20)

PDF
Error and Flow Control Protocol (EFCP) Design and Implementation: A Data Tran...
PDF
Large-scale Experimentation with Network Abstraction for Network Configuratio...
PDF
Design Considerations for RINA Congestion Control over WiFi Links
PDF
One of the Ways How to Make RIB Distributed
PDF
Unifying WiFi and VLANs with the RINA model
PDF
Experimenting with Real Application-specific QoS Guarantees in a Large-scale ...
PPTX
Exp3mq
PPTX
Pristine rina-tnc-2016
PPTX
Distributed mobility management and application discovery
PPTX
Mobility mangement rina iwcnc
PDF
6 security130123
PDF
5 mngmt idd130115
PDF
5 mngmt idd130115jd
PDF
4 addressing theory130115
PDF
3 addressingthe problem130123
PDF
2 introto rina-e130123
PDF
1 lost layer130123
PDF
Rumba CNERT presentation
PDF
5. Rumba presentation
PPTX
4. Clearwater on rina
Error and Flow Control Protocol (EFCP) Design and Implementation: A Data Tran...
Large-scale Experimentation with Network Abstraction for Network Configuratio...
Design Considerations for RINA Congestion Control over WiFi Links
One of the Ways How to Make RIB Distributed
Unifying WiFi and VLANs with the RINA model
Experimenting with Real Application-specific QoS Guarantees in a Large-scale ...
Exp3mq
Pristine rina-tnc-2016
Distributed mobility management and application discovery
Mobility mangement rina iwcnc
6 security130123
5 mngmt idd130115
5 mngmt idd130115jd
4 addressing theory130115
3 addressingthe problem130123
2 introto rina-e130123
1 lost layer130123
Rumba CNERT presentation
5. Rumba presentation
4. Clearwater on rina

Recently uploaded (20)

PDF
Paper PDF World Game (s) Great Redesign.pdf
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
PPT
Ethics in Information System - Management Information System
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PPTX
Internet___Basics___Styled_ presentation
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
Mathew Digital SEO Checklist Guidlines 2025
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
Power Point - Lesson 3_2.pptx grad school presentation
Β 
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Digital Literacy And Online Safety on internet
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PDF
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
PPTX
presentation_pfe-universite-molay-seltan.pptx
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PPTX
Funds Management Learning Material for Beg
PDF
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
DOCX
Unit-3 cyber security network security of internet system
Paper PDF World Game (s) Great Redesign.pdf
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Ethics in Information System - Management Information System
Tenda Login Guide: Access Your Router in 5 Easy Steps
Internet___Basics___Styled_ presentation
The New Creative Director: How AI Tools for Social Media Content Creation Are...
Mathew Digital SEO Checklist Guidlines 2025
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Power Point - Lesson 3_2.pptx grad school presentation
Β 
An introduction to the IFRS (ISSB) Stndards.pdf
Digital Literacy And Online Safety on internet
Design_with_Watersergyerge45hrbgre4top (1).ppt
Smart Home Technology for Health Monitoring (www.kiu.ac.ug)
presentation_pfe-universite-molay-seltan.pptx
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
Funds Management Learning Material for Beg
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
Job_Card_System_Styled_lorem_ipsum_.pptx
Sims 4 Historia para lo sims 4 para jugar
Unit-3 cyber security network security of internet system

Rlite software-architecture (1)

  • 1. A RINA light implementation Vincenzo Maffione 20/02/2017
  • 2. Introduction (1) ● A Free and Open Source light implementation of RINA for Linux ● Implementation splitted between user-space and kernel-space ● KISS approach β†’ codebase is clean and essential ● Focus: β—‹ basic functionality - do few things but to them well β—‹ stability and performance - support deployments with hundreds of nodes β—‹ minimality - avoid over-engineering ● Main goal: a baseline implementation for future RINA products ● Code and documentation available at https://github.com/vmaffione/rlite
  • 3. Introduction (2) ● ~ 27 Klocs (not including blanks) β—‹ kernel-space: ~ 9 Klocs β—‹ user-space: ~ 18 Klocs β–  including tools and example applications ● Written mostly in C (some parts are C++ for convenience) β—‹ C: 14 Klocs β—‹ C++: 7 Klocs ● Network applications can be written in C ● Python bindings available to write network applications in Python
  • 4. Introduction (3) ● kernel-space is implemented as a set of out-of-tree kernel modules, which run on the unmodified Linux kernel. β—‹ Linux Kbuild system is used to build the modules against the running kernel β—‹ Build time (no parallel make): 3-15 seconds ● user-space is implemented as a set of shared libraries and programs β—‹ CMake is used to configure and build libraries and executables β—‹ Build time (no parallel make): 15-60 seconds
  • 5. Basic features (1) ● Applications: β—‹ Flow allocation and deallocation, with QoS specification β—‹ Application registration and unregistration β—‹ Data transfer ● Stack administration: β—‹ Creation, deletion and configuration of IPCPs β—‹ Registration and enrollment among IPCPs β—‹ Monitoring and inspection β–  inspection of IPCPs in the system β–  inspection of RIBs β–  per-flow statistics
  • 6. Basic features (2) ● QoS (supported through DTCP): β—‹ Flow control β—‹ Retransmission control β—‹ Maximum allowable gap β—‹ Simple token-bucket rate-limiting ● Decent performance (detailed performance plots to come) β—‹ About 9.5 Gbps on a 10 Gbit link without flow control and retransmission β—‹ About 6 Gbps on a 10 Gbit link with flow control β—‹ A lot of room for optimizations ● Stability indicators β—‹ Done 10 days long VM-based experiments with up 35 nodes, two levels of normal DIFs and 50 flows allocations per second β—‹ Done experiments with up to 10 levels of DIFs
  • 7. Architecture overview (1) rlite-ctl uipcps daemon application librlite-cdaplibrlite-conf librlite /dev/rlite /dev/rlite-io rlite shim-eth shim-loopback shim-tcp4 normal user-space kernel-space shim-hv shim-udp4
  • 8. Architecture overview (2) ● kernel-space β—‹ Supports control operations β—‹ Implements datapath β—‹ Keeps state ● user-space β—‹ Libraries to abstract interaction with kernel-space functionalities β—‹ A daemon to implement management part of (many) IPCPs β—‹ An iproute2-like command-line tool to administer the stack ● Interactions between kernel-space and user-space only happen through character devices β†’ therefore through file descriptors
  • 9. Kernel-space architecture (1) ● Supported functionalities: β—‹ IPCP creation, deletion and configuration (kernel keeps a per-IPCP data structure) β—‹ Flow (de)allocation (kernel keeps a per-flow data structure) β—‹ Application (un)registration (kernel keeps a data structure for each registered application) β—‹ RMT, DTP and DTCP components of the normal IPCP β—‹ Shim IPCP processes (e.g. interaction with network device drivers) ● State is maintained in kernel-space: β—‹ user-space can crash or be restarted at any time β—‹ user-space can recover state from kernel
  • 10. Kernel-space architecture (2) ● User-space interacts with kernel-space only through two character devices β—‹ /dev/rlite for control operations β—‹ /dev/rlite-io for data transfer and synchronization ● Consequently, interactions only happen through file descriptors ● Both are β€œcloning devices” β—‹ each open() creates a new independent kernel-space instance ● Both devices support blocking and non-blocking operation β—‹ Standard poll() and select() widely used with the devices
  • 11. Kernel-space architecture (3) ● /dev/rlite used for control operations β—‹ flow (de)allocation β—‹ Application (un)registration β—‹ IPCP creation, deletion and configuration β—‹ Management of PDU forwarding table β—‹ interactions between user-space and kernel-space parts of IPCPs β—‹ inspection and monitoring operations on flows and IPCPs β—‹ ...
  • 12. Kernel-space architecture (4) ● Control operations follow a request/response paradigm: β—‹ write() to the control device to submit a request message β—‹ Response messages (not always present) can be read through read() ● The control device is used to avoid ioctls() and netlink β—‹ Easier porting to other OSes (e.g. FreeBSD) ● Request and response messages are represented by packed structs and are serialized/deserialized during the user-space ←→ kernel-space transition β—‹ support for string (de)serialization β—‹ support for (apn, api, aen, aei) name (de)serialization
  • 13. Kernel-space architecture (5) ● /dev/rlite-io for data transfer and synchronization β—‹ read() β—‹ write() β—‹ select(), poll(), epoll() ● Application workflow: β—‹ Use the control device to allocate a flow (kernel-space object) β—‹ Bind the flow to a newly-created data transfer file descriptor - this is the only task performed by means of ioctl() β—‹ Use the data transfer file descriptor to exchange SDUs and/or wait for events β—‹ Close file descriptor to deallocate the associated flow ● Special binding mode to exchange management SDUs
  • 14. Kernel-space architecture (6) ● Usual abstract factory pattern to manage different types of IPCPs β—‹ normal: implementation of the regular IPCP β—‹ shim-loopback: supports system-local IPC, with optional queued mode to decouple TX and RX code-paths, and optional packet drop emulation β—‹ shim-eth: uses network device drivers to transmit and receive SDUs, sharing the device with the Linux network stack β—‹ shim-udp4: tunnels RINA traffic over UDP socket; mostly implemented in user-space, only data transfer is implemented in kernel-space β—‹ shim-tcp4: same as shim-udp4, but using a TCP socket; deprecated, since it duplicates flow-control and congestion control done in higher layers β—‹ shim-hv: uses VMPI devices to transmit and receive SDUs
  • 15. Some kernel-space internals ● Reference counters widely used to manage lifetime of objects (e.g. IPCPs, flows, registered applications, PDUs) ● sk_buff-like approach to avoid copies throughout the datapath ● dynamic allocation of PDU buffers β—‹ The amount of header space to reserve at allocation time is precomputed by the user-space daemon, depending on the local IPCP dependency graph ● All PDU queues are limited in size to keep memory usage under control ● Deferred work (workqueues) used only when necessary, to keep latency low β—‹ Example: driver transmission routine directly executes in the context of an application write() system call, when possible
  • 16. Architecture overview rlite-ctl uipcps daemon application librlite-cdaplibrlite-conf librlite /dev/rlite /dev/rlite-io rlite shim-eth shim-loopback shim-tcp4 normal user-space kernel-space shim-hv shim-udp4
  • 17. user-space libraries ● librlite (written in C) β—‹ main library, abstracts interactions with the rlite control device (/dev/rlite) β—‹ provides common utilities and helpers (application names, flow specification, control messages, ...) β—‹ provides an API for RINA applications ● Other libraries β—‹ librlite-conf (C): extends librlite with kernel-space IPCP management functionalities β—‹ librlite-cdap (C++): CDAP implementation based on Google Protocol Buffer
  • 18. librlite - Overview ● librlite provides API calls to interact with control device instances β—‹ Validation, serialization and deserialization of control messages in both directions (user β†’ kernel, kernel β†’ user) ● It defines a POSIX-like APIs for applications: β—‹ Reminiscent of the socket API, to ease porting of existing socket applications... β—‹ … yet with the full power of RINA API (QoS support and complete naming scheme) β—‹ Easy to learn for grown-up network developers! β—‹ Documentation available at https://github.com/vmaffione/rlite/blob/master/include/rina/api.h β—‹ Other resources: https://github.com/IRATI/stack/wiki/Application-API
  • 19. librlite - Application API ● Main API calls: β—‹ int rina_open() β†’ fd β–  Opens a control device instance, returning a file descriptor. β—‹ int rina_flow_alloc(dif_name, local_name, remote_name, flowspec, flags) β†’ fd β–  Issues a flow allocation request and possibly wait for the associated response. Returns a file descriptor to be used for data transfer. β—‹ int rina_register(fd, dif_name, appl_name, flags) β–  Register an application into a given DIF. β—‹ int rina_register(fd, dif_name, appl_name, flags) β–  Unregister an application from a given DIF. β—‹ int rina_flow_accept(fd, flags) β†’ remote_appl, flowspec β–  Wait and possibly accept an incoming flow request, where the destination application is one of the ones registered to the control device referred by fd. Returns a file descriptor to be used for data transfer.
  • 20. librlite-conf ● It is the backend for the rlite-ctl stack administration tool ● Exports the management and inspection functionalities: β—‹ IPCP creation β—‹ IPCP deletion β—‹ IPCP configuration β—‹ Fetch of current flows (with related statistics) β—‹ Dump state of a specific flow β—‹ Synchronization with uipcps daemon, to wait for the user-space part of an IPCP to show up β—‹ ...
  • 21. librlite-cdap ● CDAP implementation using Google Protocol Buffer as concrete syntax ● Provides CDAP message constructors, serializers and deserializers ● Provides CDAP connections object to send and receive CDAP messages ● Each CDAP connection wraps a file descriptor β—‹ In this way CDAP can be used over arbitrary file descriptors β—‹ Primarily meant to be used with /dev/rlite-io file descriptors β—‹ No dependencies on other parts of rlite, can be reused as a stand-alone component
  • 22. Architecture overview rlite-ctl uipcps daemon application librlite-cdaplibrlite-conf librlite /dev/rlite /dev/rlite-io rlite shim-eth shim-loopback shim-tcp4 normal user-space kernel-space shim-hv shim-udp4
  • 23. Uipcps daemon - Overview ● A multi-threaded single-process daemon that implements management part of some IPCPs ● When an IPCP is created by the kernel, the daemon gets notified, and creates the corresponding user-space IPCP (uipcp) ● For regular IPCPs, it implements: β—‹ Flow allocation RIB objects β—‹ Directory Forwarding Table RIB objects β—‹ Enrollment RIB objects and enrollment state machines β—‹ Routing RIB objects β—‹ Address allocation RIB objects ● For shim-tudp4 IPCPs it implements UDP sockets setup and dynamic UDP port allocation ● For shim-tcp4 IPCPs it implements TCP connection setup and teardown for both client and server side (connect(), accept(), etc.)
  • 24. Uipcps daemon - Internals ● A custom event-loop thread for each IPCP ● An additional thread that implements a UNIX socket server to serve requests coming from the rlite-ctl tool (or other future agents) ● Abstract factory pattern to manage different types of uipcps ● Reference counters used to manage uipcps lifetime ● Subsystems: β—‹ UNIX socket server, written in C β—‹ uipcps container for generic uipcp management (creation, deletion, …), written in C β—‹ shim-udp4 and shim-tcp4 user-space implementation, written in C β—‹ normal IPCP user-space implementation, written in C++ manly because of CDAP ● C++ code confined inside the uipcp-normal statically linked library.
  • 25. Uipcps daemon - Subsystems rlite-ctl uipcp daemon librlite-cdap librlite application unix server uipcps container normal shim udp4
  • 26. Uipcp daemon - Event loop ● A custom event-loop on top of rlite control devices ● The event-loop thread to select() over many file descriptors β—‹ rlite control devices: when events happen on the control device, event-specific callbacks get executed β—‹ Other file descriptors: when an event is ready on one of those, an user-provided callback gets executed ● Supports timers, that can be used to execute a callback after a certain amount of time
  • 27. Uipcp daemon - Advanced features ● The uipcp-containers module keeps track of the IPCPs in the local system and the flows allocated among them β—‹ This information is maintained in a graph of local IPCPs β—‹ A node for each IPCP, an edge for each inter-IPCP flow β—‹ Graph used for automatic computation of: β–  per-IPCP Maximum SDU size (using the constraints provided by shim DIFs) β–  per-IPCP PCI header space to be reserved at kernel buffer allocation β—‹ Result of computation is pushed to the kernel for optimized operation ● Optional automatic re-enrollment triggers to create N-1 flows where they are missing
  • 28. rlite-ctl ● An ip-route2-like command-line tool to administer and monitor IPCP processes ● Functionalities: β—‹ IPCP creation and deletion β—‹ IPCP configuration β—‹ Registration of an IPCP to a DIF β—‹ Enrollment between a local IPCP and a remote IPCP β—‹ Show list of IPCPs β—‹ Show RIB of a DIF β—‹ Show list of flows β—‹ Dump state of a specific flow
  • 29. Common functionalities ● Common code is compiled both in user-space and kernel-space, to ease maintenance: β—‹ Serialization and deserialization routines of control messages across user/kernel interface β–  Table-based serialization/deserialization, adding a new message is straightforward β—‹ Helper functions for RINA names - (APN, API, AEN, AEI) tuples.
  • 30. Available RINA application ● Example applications: β—‹ rinaperf: multi-threaded client/server capable of parallel flow allocation, implementing basic connectivity and performance testing: ping, request-response, unidirectional bandwidth β—‹ rina-echo-async: single-threaded event-loop based client/server tool, capable of concurrent flow allocation and concurrent flow management ● Real application β—‹ nginx: RINA port of the popular Nginx server β—‹ dropbear: RINA port of the Dropbear ssh client/server β—‹ rina-gw: Event-loop application acting as an application gateway between a RINA network and an IP network β–  It forwards TCP connections over RINA flows and the other way around
  • 31. Demo ● RINA/TCP gateway, to make TCP/IP world interact with RINA world ● Minimally patched Nginx Web Server runs over RINA TCP/IP NETWORK Proxy host Client host 1 Web browser rina-gw Server host 1 patched nginx RINA NETWORK Client host 2 Web browser RINA flow TCP connection
  • 32. Demo ● RINA/TCP gateway, to make TCP/IP world interact with RINA world ● Minimally patched Nginx Web Server runs over RINA VM A patched nginx VM B rina-gw Browser n.1.DIF (normal) Shim-eth (e.1.DIF) TCP