SlideShare a Scribd company logo
Towards High Performance
Packet Processing for 5G
Bal´azs Pinczel, D´aniel G´ehberger, Zolt´an Tur´anyi and Bence Formanek
Ericsson Research, Hungary
Abstract—Network Functions Virtualization aims to implement
networking functions in the cloud – even the ones working on the
user plane. This is envisioned by decomposing today’s monolithic
devices into smaller functions and re-composing them real-time.
This enables flexibility and independent innovation, but may
come at a performance price. In this paper we investigate the
performance aspects of composing user plane functions. We have
developed a prototype for mobile service chaining to experiment
with the function call based composition method provided by the
Click modular router. The prototype allows flexible management
of feature sets, including the support for relocating user chains
using context transfer. We provide a thorough performance
evaluation, draw conclusions and propose a versatile execution
environment, which combines various composition and processing
methods in order to achieve high performance packet processing
for 5G networks.
I. INTRODUCTION
Mobile core networks provide important contribution to the
mobile communication service. They are responsible for man-
aging connectivity sessions, handovers, idle mode and paging,
security, policy and charging. These functions have been
traditionally standardized to ensure interoperability between
the network and the terminals, but also among networking
devices. The standards govern not only functions, protocols
and procedures, but also a grouping of the functions into
nodes connected by standardized interfaces. The traditional
approach to core networks is to implement these nodes in
separate devices. This in turn, resulted in low feature velocity,
since any new feature that impacts more than one node, has
to be first standardized, then implemented by vendors (often
as a new release of a monolithic device), and finally carefully
rolled out – a process that can easily take years. This is very
far from the DevOps model and the methods of innovation
established in other technology domains. Any evolution of
mobile core networks must significantly improve this prob-
lem. Modern IT software technologies and the cloud enable
new ways to implement telecommunications equipment and
represent good sources of inspiration. We adopt the argument
for decomposing node functionality into smaller units. This
argument is fairly common and forms one of the principles of
the Network Function Virtualization (NFV) initiative [1]. The
subject of this paper is real-time re-composition of network
functions, focusing on the user plane. Even though some
network functions are possible to implement in purpose-built
silicon, in our prototype we use general-purpose CPUs, since
we believe this is where rapid innovation can happen. Some
of our results are also applicable to programmable NPUs.
Our concept consists of implementing core network user
plane functionality in small User Plane Functions (UPFs) that
can then be chained by a platform both within a physical server
and between servers. The control plane of the platform is free
to mix and match any set of UPF to a packet flow in any
order running at any location. The platform must take mobility
into account, which may result in the relocation of certain
functions – a generic context transfer facility similar to the
context transfer during X2 LTE handovers [2]. This concept,
called Mobile Service Chaining [3] has the following abilities.
• Customization: specify which flow gets what processing
• Simplification: omit network functions not needed
• Deployment flexibility: control where functions execute
• Extensibility: easy to deploy and apply new functions
Deployment flexibility not only aids in efficient resource
management, but also allows use cases, such as local breakout,
which required extensive standardization before [4]. We also
note that due to the extensibility of the concept radio and
service functions can also be included and managed under
a single platform. That is, some of the virtualized radio
functions, like PDCP or transport encryption employed in LTE
can be handled just as any other core network function. Also,
functions that today are deployed above the (S)Gi interface
can also be made part of a single (mobile) service chain tying
these service functions to the same control structure making
all core network information (such as user identity, policies,
charging, location, etc.) available to them in a uniform manner.
User Plane Functions may come in various complexity,
state, resource and security requirements. Composition must
be performance efficient, so that the flexibility enabled by the
system does not come at a prohibitive cost. Small, simple
UPFs, such as token bucket policing, packet marking or
encapsulation may be invoked all in the same thread, while
more complex UPFs require more isolation or the use of a
different operating system. Our ultimate goal is to address this
diversity in a mobile network context.
We have developed a limited prototype of mobile service
chaining to experiment with a certain kind of composition
method. In particular, we have implemented a chain compos-
itor composing UPFs inside a Click modular router [5]. We
have also implemented forwarding between nodes (carrying
metadata between sites) and support (lossless and reordering-
free) handovers via context transfer. In Section III we describe
the details of our prototyping environment establishing a
baseline for performance. Then, in Section IV we describe
the prototype and report our findings about the kind of perfor-
mance price one pays for flexible composability. In Section V
we demonstrate handover performance, which is an important
aspect in a mobile environment often overlooked in cloud-
related discussions. Finally, in Section VI we conclude the
paper by discussing how the composition method evaluated by
our prototype fits among a wider set of composition methods.
In addition, we outline our vision for a distributed packet
processing execution environment for 5G.
II. RELATED WORK
A. Mobile Service Chaining preliminaries
Since our prototype follows the considerations of the Mobile
Service Chaining 5G Core Network Architecture [3], we
provide a brief overview here. The architecture is essentially an
evolution of service chaining, with added support for mobility
and a formalized set of metadata. It caters for the specifics of
the mobile core environment, such as bearers, requirements on
grouped flow processing (e.g., to provide aggregated maximum
bit-rate policing) and the associated scalability requirements.
The Control Plane (CP) of this architecture includes the
functions of today’s Mobility Management Entity, Policy and
Charging Rules Function and a chain controller. We focus on
the realization of the user plane, however, as we also evaluate
lossless handovers and chain forwarding, a simple Controller
is necessary for the coordination. The architecture defines two
types of functional elements for the actual packet processing
in the User Plane (UP): Forwarding Elements (FE) and User
Plane Functions (UPF). A FE forwards packets between its
ports based on the rules, which are configured by the CP. A
port is either connected to a UPF or to a network interface.
The actual packet processing is done in the UPFs based on the
locally stored per-user contexts. It is not expected that UPFs
know any detail about the higher level topology.
Packets are sent through the system according to which
service chain they need to traverse. Such information is added
to the packets as tags by classifiers, which are also UPFs
managed by the CP. Simple classification can be done based
on header fields, packet direction (uplink or downlink) but it
is also possible to re-use tags assigned by prior UPFs (like
Deep Packet Inspection). The rules in the FEs may apply on
the assigned tags or directly on the header fields of the packet.
It is important that the assigned tags must be carried with the
packets through the network. For handover support UP nodes
have to be able to export and import the rules and UPF contexts
related to the user handing over.
B. Other related work
Service chaining is usually referred to as a combination
of OpenFlow and Virtualized Network Functions (VNF) re-
alized in virtual machines (VMs) or Linux containers [6].
However, it is concluded in studies that the packet transfer
from the OpenFlow switch to the VMs represents a significant
performance overhead [7]. This is especially troublesome
with simple VNFs where the actual processing time can be
comparable with this overhead. While the overhead is very
small for Linux containers [8], for a granular service chaining
architecture – with a lot of small VNFs – the network stack
of the Linux kernel itself can cause bottlenecks. The overhead
of virtualization has been also addressed with configurations
where the VNFs are simple processes on the host computer [9],
[10]. Nevertheless, the inter-VNF communication is still not
lightweight enough, since it is done via Ethernet softswitches,
and requires packet copies along the way.
There are relatively few performance evaluations for service
chaining. Cerrato et al. [9], [10] present results with identical
network functions, which calculate a hash on the first bytes
of the packets. This simplistic approach is also followed by
papers addressing the performance capabilities of OpenFlow
switches. Being the most popular, the literature mostly focuses
on the Open vSwitch. For the measurements, the switch is
usually operated with simple forwarding rules where higher
performance is easier to achieve [7], [11]. After the basic
one-rule cases, some papers also show that the number of
rules affects the performance significantly. There has been
impressive improvements during the last years [12], [13].
The Click modular router [5] provides a flexible packet
processing environment at the cost of some performance
overhead. The overhead is also measurable with high I/O rate
extensions [14]. Click can be used to construct middleboxes
and – besides general feasibility – Dobrescu et al. [15] show
that it is possible to predict the performance with the CPU
cache miss rates. Martins et al. [16] also present results for
several middleboxes, where the complexity varies from simple
forwarding to an intrusion detection system.
Regarding cloud based mobility, it is possible to perform
live migration of virtual machines and Linux containers [17].
However, this is not a solution for handovers in 5G, because
only the context of a single user needs to be migrated, not
the whole user plane node which possibly handles thousands
of users. Wubin et al. [18] describe a solution which provides
high availability for containers by exporting the internal states,
which is a significant step towards our needs, but it is still
missing parts to support the handover of a single user.
We can conclude that in most of the cases packet processing
and service chaining is evaluated involving simple rules or
functions. The papers that evaluate more complex middleboxes
do not consider the composition of these neither chained with
an OpenFlow switch nor inside the execution environment
itself. Furthermore, to the best of our knowledge there are no
public results available about VNFs, which support per-user
context migration.
III. PROTOTYPING ENVIRONMENT
We have selected the Click modular router environment to
investigate the performance of packet processing composition.
In this section we briefly overview Click and select one of its
environments to build our prototype on. We seek to provide the
prototype directly as a virtual machine or a Linux container
to ease the cloud based deployment since this is is expected
for 5G systems.
NIC
Host1
Traffic generator
Click router
Traffic collector
NIC
Host2
Fig. 1. Measurement setup.
The Click modular router enables dynamic composition of
software routers [5]. A router is assembled in run-time –
based on a configuration file – from simple packet processing
elements. Elements can have multiple input and output ports,
and as the internal logic can be written in pure C++, the packet
processing is not limited to the restricted set of actions as in
the case of OpenFlow switches [19].
Packets traverse the chain of elements as sequences of
function calls initiated by the active elements of the actual
configuration (ingress interfaces or elements with buffered
packets, ready to send). Click has multithread support, which
enables the scheduling of active elements to different cores
in parallel – in the prototype, though, we focus on the
performance of one CPU core, using a single Click thread.
The Click framework supports three different environments,
namely, it can run in Linux user space, kernel space and
over the Xen hypervisor as a library operating system called
ClickOS [16]. More recent versions of Click implement sup-
port for netmap – a fast packet I/O solution available as a
kernel module [20] – in user space and for ClickOS. As a
result, a user space deployment can outperform the kernel
based counterparts [14]. It is also possible to have netmap
access in Docker containers with mounting the related device
file and granting privileged access for the container. The
netmap framework also contains a software switch called
VALE, which provides high packet rates between netmap
accelerated interfaces [21].
Before discussing the performance of the software envi-
ronments, we provide a brief overview about the hardware
components used. For all of our measurements we used Linux
(Ubuntu 14.04 LTS) servers, each containing 3.5 GHz 6-core
Intel Xeon E5-1650 CPUs and having 32 GiB RAM. For
user plane traffic we used dedicated 10-Gigabit point-to-point
links between the machines with Intel X540 network interface
cards (NICs). The traffic on these links was directly handled
by either a Click instance or a traffic generator/collector as
it can be seen in Figure 1, without being processed by the
OS network stack. An exception was the ClickOS where
a VALE switch also had to be used. Control plane traffic
used a separate 1-Gigabit network. The Click version used
for the measurements is a recent (April, 2015) commit from
the repository [22], as is netmap [23]. Our ClickOS-based
evaluation was done on Xen 4.4.0. For traffic generation we
use the Pkt-gen tool from the netmap framework to achieve
high packet per second rates.
We tested the multiple different software environments of
Click to select the best for our prototype. In these cases the
Click router was used with a simple configuration, which only
rewrote the Ethernet addresses, thus realizing a simple low
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512
throughput(Kpps)
kernel forwarder user space
Click
kernel space
Click
ClickOS
with netmap
user space Click
with netmap
user space Click
in Docker
with netmap
6% 10% 18% 33%
8% 13% 22% 39%
17% 29%
53%
62%
46%
58%
92%
99%
67%
100%
100%
100%
64%
100%
100%
100%
Fig. 2. Throughput and link utilization in different environments.
0.00
0.05
0.10
0.15
0.20
64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512
delay(ms)
kernel forwarder user space
Click
kernel space
Click
ClickOS
with netmap
user space Click
with netmap
user space Click
in Docker
with netmap
Fig. 3. Delay of different environments.
level forwarder. Figure 2 shows the averaged values of the
throughput results extended with the standard deviation and
the link utilization percentage on the top of the bars. Figure 3
shows the delay for the different environments as boxplots
where the top and bottom 5 percentiles are defined as outliers.
We used various packet sizes from 64 bytes up to 512 bytes
in all the cases.
In order to provide an easily reproducible reference which
can be used to estimate the peak performance of other ma-
chines, we measured packet forwarding in the Linux kernel
as well. This method provides even lower throughput than
the simple user space Click configuration. This is because in
Click we only rewrite the Ethernet addresses without going
through the Linux protocol stack. As the figure shows, the
kernel space based Click router improves the throughput but
adds variability.
The last three cases are accelerated with the netmap frame-
work, which – as expected – improves the performance
significantly. While ClickOS shows a decent improvement,
we reached the best results with the user space Click router
accelerated with the netmap framework. Here the processing
rate goes up until 10 Mpps for the smallest packets, and above
128 bytes the 10 Gigabit link gets saturated. As we noted
before, it is possible to place this software combination into a
Docker container, which is shown as the last environment on
UPF 1 UPF 2 UPF n
Tunnel
encapsulator
Tunnel
decapsulator
Handover
buffer
Interface 1 Interface 2
FE
…
Fig. 4. Mobile Service Chaining 5G Core Network Architecture.
the figure. The container slightly limits the peak performance
for small packets and adds some variation, but the link still
gets saturated quickly as the packet size increases. Based
on the above results, we selected the Click router running
in a Docker container because despite the slightly lower
performance, this setup seamlessly fits into the envisioned
cloud based deployment of 5G systems.
IV. THE 5G CORE NETWORK PROTOTYPE
In our prototype we mapped the proposed logical 5G Core
Network Architecture to nodes on physical computers each
running a Click router instance. Besides a more detailed de-
scription, we provide a performance evaluation in this section.
A. Implementation details
The FEs and the UPFs are realized as Click elements, and
the UPFs are connected to the ports of the single FE as shown
in Figure 4. To carry the packet tags between Click instances,
we implemented a simple tunneling protocol, which uses UDP
encapsulation and carries the tag/value pairs besides payload.
When an encapsulated packet arrives to a Click node, the extra
headers are removed and the tags are placed to the annotation
area defined by the Click for each packet, which provides
efficient tag handling. The implementation is capable of per-
user handovers, utilizing a special buffering element in each
node. This element does nothing for users not in a handover
and will be further described in Section V.
Both the input and the output ports of the UPFs are
connected to the FE, which applies its rules for all the
incoming packets. The actual rules can be defined with a rule
language from the Control Plane where currently it is possible
to match against UPF names and packet tags. Obviously, the
performance of the rule matching determines the performance
of the system, especially for a large rule set. In order to address
this, we implemented a hash table based caching component
to speed up the lookup.
For the evaluation, we implemented the following fully
functional, per-user handover capable UPFs.
a) Counter: This UPF maintains a separate packet
counter for each user, and writes the current value into the
payload at an offset, which is chosen so as to avoid overwriting
any previously placed counter values. While this is a relatively
simple function, it still requires payload parsing, modification
and the recalculation of the UDP checksum.
b) Marker: The marker element uses token buckets to
check whether the traffic conforms to a per-user defined
bandwidth limit. Non-conformant packets are marked with a
tag to be dropped later.
c) Header compressor (IPHC): This element imple-
ments IP/UDP compression and decompression as specified in
RFC2507 [24]. It is able to handle multiple users with multiple
flows per user, assigning context identifiers automatically.
B. Evaluation of the prototype
We evaluate the performance of the prototype running on a
single node, assuming that it is an intermediate element of
a longer chain, and that the incoming packets are already
classified and tagged. To enable this, we modified the Pkt-
gen tool to be able to generate tunneled traffic, tagged with,
e.g., user-ID tag. As these packets are larger than 64 bytes,
we used 128 byte packet size for the rest of the paper.
1) UPF Overhead: In each step of the following exper-
iment, the configuration is extended with additional com-
ponents. Figure 5a shows the average packet rate with the
standard deviation. We also displayed the nominal time re-
quired to process each packet (simply taking the inverse of the
average packet rate). The first setup uses no encapsulation, just
one FE with one rule to forward. Compared to the baseline
measurements of the previous section, this configuration also
removes the Ethernet header and validates the IP header before
entering the FE. After the rule matching, the IP addresses are
rewritten and a new Ethernet header is placed to the packet.
The second setup includes tunneling and handover elements
as shown in Figure 4. The encapsulation step also involves
a configurable next hop selection based on tags. In the rest
of the cases the actual UPFs are added to the chain one
by one, creating more and more complex configurations. As
these functions handle contexts, parse, validate and copy fields
for each incoming packet, it is expected that these CPU and
memory intensive tasks affect the performance. Still, as it can
be seen on the figure, the throughput is stable and remains
in the millions of packets per second range. The per-packet
processing time shows that each component adds around 70–
110 ns overhead, but due to the inverse proportionality, this
effect appears to influence higher packet rates more.
2) Impact of additional users: Current custom-made core
network nodes handle 10–100 k users. Per component load is
expected to be lower in cloud-based deployments of future 5G
systems, but still in the thousands. The following measurement
addresses this by taking the most advanced configuration with
3 UPFs, and increasing the number of users from 1 to 10 k.
The average packet rates with the standard deviation and
the recorded L3 cache load misses are shown in Figure 5b.
Even though the hash table based lookups we used are near
constant time, the increase in the number of users causes some
performance degradation partly due to the L3 cache misses.
3) Handling multiple chains: Flexibility is one of the
strengths of the proposed 5G architecture. The last throughput
related evaluation addresses this, by taking the previous setup
one more step forward. For this measurement we used 1000
0
1000
2000
3000
4000
5000
6000
only FE full node,
no active
UPFs
counter counter,
marker
counter,
marker,
IPHC
throughput(Kpps)
186 ns
281 ns
349 ns
447 ns
559 ns
(a) Throughput of different configurations
0
500
1000
1500
2000
1 10 100 1000 10000
throughput(Kpps)
number of users
114
192
424
821
911554
(b) Throughput with multiple users
0
1000
2000
3000
4000
0 20 40 60 80 100
throughput(Kpps)
% of users with the empty chain
(c) Throughput with a mix of chains
Fig. 5. Prototype performance with various workloads.
UE
BS2
GW
BS1
(a) Packet paths
UE
BS2
GW
BS1
(b) Intermediate step
Fig. 6. Handover from BS1 to BS2.
users and defined 2 chains, having one with three UPFs and
one without any. In terms of FE rules, this requires only one
extra rule, which defines the shortcut for packets arriving on
the less complex chain. We generated the traffic with a set
ratio of users taking the more and the less complex chains.
The leftmost point of Figure 5c is the case when all the traffic
goes through the more complex chain, and with each further
point 5% of the users are moved to the empty one, which
scales up the performance.
This demonstrates that while the prototype – as a general-
purpose packet processing environment – supports more com-
plex feature sets as well, the aggregated performance of
serving multiple chains of various complexity remains pre-
dictable. The results also show that this kind of flexibility
does not sacrifice the scalability: the same compute resources
can be saturated either with more complex services at lower
throughput, or with less complex ones at higher throughput.
V. HANDOVER
As a core requirement for mobile service chaining solutions,
the proposed architecture supports transfer of a user’s UPF
contexts from a source node to a target node. To avoid packet
reordering, the prototype uses buffering in nodes where traffic
arrives on the new (post-handover) path until all the packets
arriving on the old path are processed.
To describe the process of handing over, we consider an
example scenario depicted in Figure 6a, where a gateway
node (GW) has two-way communication with an emulated UE
through either one of the two base stations (BS1,2), with the
possibility of switching between the paths when instructed to
do so by the Control Plane. Figure 7 shows the procedure of a
Control Plane GW BS1 BS2
2: HANDOVER
3: Start buffering uplink
packets from target BS
4: HANDOVER ACK
1: Set up uplink reordering
6: HANDOVER
7: Start buffering
9: CONTEXT_REQ
10: Start forwarding
11: CONTEXT_RESP
(UPF contexts)
12: Delete context 13: Install context
8: Context transfer
14: HANDOVER ACK
5: Handover preparation
16: DL PATH SWITCH
15: Handover execution
Fig. 7. Interaction of core network entities during a handover.
handover from BS1 to BS2. There is a buffer in the GW for the
uplink packets arriving from BS2, as well as one in BS2 for
packets already arriving on the new path. Also in BS2, there is
a separate buffer for the packets arriving on the temporary path
shown in Figure 6b. This buffer stores incoming packets until
the migration of contexts completes. When all the preparatory
steps are completed, the path of downlink packets is switched
in the GW, and after some time – since no new packets take
the old path – the buffered packets can be consumed, until
eventually the buffers get empty, at which point the node
switches to processing incoming packets directly.
In the prototype the bulk of the above functionality is real-
ized in a special element chained before the FE, which man-
ages the lifecycle of per-user handover buffers, and handles
packet forwarding (see Figure 4). The rest of the functionality
(e.g., control-related communication or context transfer) is
spread among the FE, the UPFs, the daemon of the user plane
nodes and the Control Plane.
To test handovers the UE emulated the behavior of a radio
device by replying to the same base station it received the
packet from – the reply being the same 128 byte packet with
only one modified tag. Traffic was generated in the GW at the
rate of 10 kpps, and was captured both in the UE, and upon
-2
0
2
4
6
8
9980 10000 10020 10040 10060
timerelativetohandover(ms)
packet sequence number
DL via BS1
DL via BS2
(a) Captured at the GW
-2
0
2
4
6
8
9980 10000 10020 10040 10060
timerelativetohandover(ms)
packet sequence number
DL via BS1
DL via BS2
(b) Captured at the UE
Fig. 8. Packet arrival times during a handover.
returning to the GW. To examine the effect of a handover
procedure, we look at the disturbance in packet arrival times.
Figure 8 shows these for a single handover selected from our
measurements (all other measurements show highly similar
characteristics).
In normal conditions, the packet arrival times form a di-
agonal line, the slope of which is determined by the rate of
the packets. When a handover is in progress, packets cease
to arrive for a short period (approximately 2 ms), caused by
the buffering during, which the context is migrated. When
packet processing is resumed, the nodes receive larger batches
of packets at a time, because the buffers can be processed
faster than the packet arrival rate. This eventually leads to the
buffers being empty, from which point the arrival times follow
their usual linear pattern. The pattern is already very close to
linear 3.5 ms after the beginning of the handover, and is fully
restored after 6 ms. During this period the curve approaches
its normal slope in gradually shortening steps, which is likely
a scheduling artifact.
The figures also show a single packet that arrives to the UE
through BS1, during the exact time of the context transfer.
The UE sends the reply still to BS1, where the forwarding
causes it to be sent through BS2, before arriving to the GW
suffering the delay of the context transfer. This is normal –
even if rare – behavior, and the continuity in packet sequence
numbers shows that even in this case the handover is lossless
and reordering-free.
VI. DISCUSSION
In the previous sections we have investigated the perfor-
mance aspects of a certain composition model. This model is
characterized by the following attributes
• individual processing functions (UPFs) are invoked
through function calls;
• a (cached) rule database is consulted before selecting the
next UPF to execute;
• UPFs are free to process, duplicate, buffer, drop packets
or even generate new packets; and
• packets can carry metadata between the UPFs.
The main advantage of this composition model is its flexi-
bility. It allows single- or multithreaded operations and has its
internal scheduler to initiate elements that have work to do. It
has some limitations, however. Since all Click elements run in
the same process, very little can be done to realize isolation
– either for resource control or security. Another issue is that
in order to manage relatively high performance and in order
to run in a variety of situations, Click represents a confined
programming environment. There are a lot of rules on how
to handle packets, allocate memory or do I/O. As a result
third party libraries in general, cannot be used, not even the
STL. This makes the development of complex UPFs, such as a
full DPI suite or transparent HTTP proxies quite cumbersome.
These would best run in a separate container or VM, since we
think similar limitations would appear in any system using
function calls as composition method.
At the other end of the scale, even the overhead represented
by function calls and rule lookups may be significant for very
small UPFs. For example, flow counters, bandwidth policers,
simple encapsulation modules (and in general most actions of
an OpenFlow switch) complete in very few cycles. For these
an even more lightweight composition model is better. It is
simply more efficient to copy their code one after another in
as many combinations as needed [25].
Since a versatile programmable packet processing execution
environment may contain a mix of small and large UPFs it
is best, if several composition methods are available to the
programmers. The figure below summarizes our proposal for
composition methods needed. The just-in-time (JIT) linker
method dynamically copies small UPFs one after another. This
method has been shown to achieve the fastest software switch
for simple actions [25]. The function call method has been
the subject of our investigations in this paper and is suitable
for moderately sized, trusted code. The separate container/VM
model passes packets between different containers or VMs on
the same server, and enables the use of any third party library
or even OS (in the case of VMs) at the expense of passing the
packets between contexts [7]. Finally, the last method, chain
forwarding, enables the functions to reside on different hosts
and uses encapsulation to carry chain ID and other metadata
on the wire [26].
The first two methods may be used to support a run-to-
completion mode of operation, whereas the second and third
can split the processing of a single packet among multiple
JIT linking Function call Container / VM
Chain
forwarding
Composition
method
Separate threads or
processes
Run-to-completion
Separate
machines
Processing
method
Click
Fig. 9. Composition methods and the corresponding processing methods.
threads or processes (latter of which can even reside in
different containers/VMs). Presumably a UPF needs to be
developed with its composition method in mind. E.g., for JIT
linked components small fragments of code are required with
linker metadata. Function call components can be provided as
shared libraries, while the rest as apps or VM images.
In general, it is best if the controller of this execution
environment has no knowledge of the type of each UPF. This
leaves the implementation and optimization of the user plane
to the developer of the UPFs and the execution environment.
It is especially good if the distinction is not hard coded in
a standard. For example, an OpenFlow softswitch combined
with VMs could implement a wide variety of packet processing
applications, but it creates an artificial distinctions between
things that can be done in the OF switch and things that
must be done in a separate VM. Even if a particular function
is small enough to fit into the programming environment of
the softswitch, it must be placed in a VM, if the OpenFlow
standard does not include it.
And finally, we note that the presented user plane model,
that is, individual UPFs chained together inside and between
machines is fairly future proof. Controllers have an abstract
view of the user plane, can mix and match standard or de-
facto components. The ability to add components regardless
of their size enables continuous feature innovation, while the
abstraction itself enables continuous performance innovation
(hidden from controllers). Due to the range of composition
models employed high performance can be provided even for
small functions without the controller to understand the details
of the user plane implementation. This level of abstraction
(composable primitives) can fully utilize hardware innovation
without requiring the controller to understand the intricacies
of the underlying user plane node.
REFERENCES
[1] ETSI NFV Industry Specification Group. Network Functions Virtual-
isation – White Paper. Visited Apr 2015. [Online]. Available: https:
//portal.etsi.org/Portals/0/TBpages/NFV/Docs/NFV White Paper3.pdf
[2] “Evolved Universal Terrestrial Radio Access Network (E-UTRAN); X2
general aspects and principles,” (3GPP TS 36.420).
[3] D. Roeland and Z. Fu, “Mobile Service Chaining 5G Core Network
Architecture,” Submitted to IEEE Commun. Mag., 2015.
[4] “Local IP Access and Selected IP Traffic Offload (LIPA-SIPTO),” (3GPP
TR 23.829).
[5] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek, “The
Click Modular Router,” ACM Trans. Comput. Syst., vol. 18, no. 3, pp.
263–297, Aug. 2000.
[6] J. Blendin, J. Ruckert, N. Leymann, G. Schyguda, and D. Hausheer,
“Position Paper: Software-Defined Network Service Chaining,” in 2014
Third European Workshop on Software Defined Networks (EWSDN),
Sept 2014, pp. 109–114.
[7] P. Emmerich, D. Raumer, F. Wohlfart, and G. Carle, “Performance
characteristics of virtual switching,” in 2014 IEEE 3rd International
Conference on Cloud Networking (CloudNet), Oct 2014, pp. 120–125.
[8] R. Morabito, J. Kj¨allman, and M. Komu, “Hypervisors vs. Lightweight
Virtualization: A Performance Comparison,” in 2015 IEEE International
Conference on Cloud Engineering (IC2E), March 2015, pp. 386–393.
[9] I. Cerrato, G. Marchetto, F. Risso, R. Sisto, and M. Virgilio, “An efficient
data exchange algorithm for chained network functions,” in 2014 IEEE
15th International Conference on High Performance Switching and
Routing (HPSR), July 2014, pp. 98–105.
[10] I. Cerrato, M. Annarumma, and F. Risso, “Supporting Fine-Grained
Network Functions through Intel DPDK,” in 2014 Third European
Workshop on Software Defined Networks (EWSDN), Sept 2014, pp. 1–6.
[11] A. Mian, A. Mamoon, R. Khan, and A. Anjum, “Effects of Virtualization
on Network and Processor Performance Using Open vSwitch and Xen
Server,” in 2014 IEEE/ACM 7th International Conference on Utility and
Cloud Computing (UCC), Dec 2014, pp. 762–767.
[12] G. Pongr´acz, L. Moln´ar, and Z. Kis, “Removing Roadblocks from
SDN: OpenFlow Software Switch Performance on Intel DPDK,” in 2013
Second European Workshop on Software Defined Networks (EWSDN),
Oct 2013, pp. 62–67.
[13] P. Emmerich, D. Raumer, F. Wohlfart, and G. Carle, “Assessing Soft-
and Hardware Bottlenecks in PC-based Packet Forwarding Systems,” in
ICN 2015: The Fourteenth International Conference on Networks, Apr
2015, pp. 78–83.
[14] L. Rizzo, M. Carbone, and G. Catalli, “Transparent acceleration of soft-
ware packet forwarding using netmap,” in INFOCOM, 2012 Proceedings
IEEE, Mar. 2012, pp. 2471–2479.
[15] M. Dobrescu, K. Argyraki, and S. Ratnasamy, “Toward Predictable
Performance in Software Packet-Processing Platforms,” in Presented as
part of the 9th USENIX Symposium on Networked Systems Design and
Implementation (NSDI 12). San Jose, CA: USENIX, 2012, pp. 141–
154.
[16] J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco,
and F. Huici, “ClickOS and the Art of Network Function Virtualization,”
in 11th USENIX Symposium on Networked Systems Design and Imple-
mentation (NSDI 14). Seattle, WA: USENIX Association, Apr. 2014,
pp. 459–473.
[17] W. Li and A. Kanso, “Comparing Containers versus Virtual Machines
for Achieving High Availability,” in 2015 IEEE International Conference
on Cloud Engineering (IC2E), March 2015, pp. 353–358.
[18] W. Li, A. Kanso, and A. Gherbi, “Leveraging Linux Containers to
Achieve High Availability for Cloud Services,” in 2015 IEEE Inter-
national Conference on Cloud Engineering (IC2E), March 2015, pp.
76–83.
[19] O. N. Foundation, “OpenFlow Switch Specification
Version 1.5.0,” Dec 2014. [Online]. Available: https:
//www.opennetworking.org/images/stories/downloads/sdn-resources/
onf-specifications/openflow/openflow-switch-v1.5.0.noipr.pdf
[20] L. Rizzo, “netmap: a novel framework for fast packet I/O,” in Proceed-
ings of the 2012 USENIX Annual Technical Conference (USENIX ATC
12). Bellevue, WA: USENIX Association, Aug. 2012, pp. 101–112.
[21] L. Rizzo and G. Lettieri, “VALE, a Switched Ethernet for Virtual
Machines,” in Proceedings of the 8th International Conference on
Emerging Networking Experiments and Technologies, ser. CoNEXT ’12.
New York, NY, USA: ACM, 2012, pp. 61–72.
[22] Click modular router source code repository. Visited Apr 2015.
[Online]. Available: https://github.com/kohler/click/
[23] Netmap source code repository. Visited Apr 2015. [Online]. Available:
https://code.google.com/p/netmap/
[24] M. Degermark, B. Nordgren, and S. Pink. IP Header Compression.
Visited Apr 2015. [Online]. Available: https://tools.ietf.org/html/rfc2507
[25] “Universal Node Initial Benchmarking Documentation,” D5.4 deliver-
able of EU FP7 project UNIFY (will be publicly available at https:
//www.fp7-unify.eu/index.php/results.html#Deliverables).
[26] P. Quinn and T. Nadeau. Problem Statement for Service Function
Chaining. Visited Apr 2015. [Online]. Available: https://tools.ietf.org/
html/rfc7498

More Related Content

PDF
Conference Paper: Universal Node: Towards a high-performance NFV environment
PDF
V25112115
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Hardback solution to accelerate multimedia computation through mgp in cmp
PDF
shashank_hpca1995_00386533
PDF
Address Interleaving in NoCs
PDF
Towards an Open Data Center with an Interoperable Network (ODIN) : Volume 2: ...
PDF
Conference Paper: Cross-platform estimation of Network Function Performance
Conference Paper: Universal Node: Towards a high-performance NFV environment
V25112115
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Hardback solution to accelerate multimedia computation through mgp in cmp
shashank_hpca1995_00386533
Address Interleaving in NoCs
Towards an Open Data Center with an Interoperable Network (ODIN) : Volume 2: ...
Conference Paper: Cross-platform estimation of Network Function Performance

What's hot (20)

PDF
Performance evaluation of ecc in single and multi( eliptic curve)
PDF
Lecture 3 parallel programming platforms
PDF
2004 qof is_mpls_ospf
PDF
PDF
Performance Evaluation of ad-hoc Network Routing Protocols using ns2 Simulation
PDF
PPT
1 introduction-to-computer-networking
PPT
1 introduction-to-computer-networking
DOCX
Data communication q and a
PDF
An octa core processor with shared memory and message-passing
PDF
Multistage interconnection networks a transition to optical
PPT
1 introduction-to-computer-networking
PDF
Computer Network Performance Evaluation Based on Different Data Packet Size U...
DOCX
Computer network solution
PDF
Lecture 7 cuda execution model
PDF
Ns2 x graphs
DOCX
TransparentInterconnectionsofLotofLinks
PPT
system interconnect architectures in ACA
PDF
PDF
MuMHR: Multi-path, Multi-hop Hierarchical Routing
 
Performance evaluation of ecc in single and multi( eliptic curve)
Lecture 3 parallel programming platforms
2004 qof is_mpls_ospf
Performance Evaluation of ad-hoc Network Routing Protocols using ns2 Simulation
1 introduction-to-computer-networking
1 introduction-to-computer-networking
Data communication q and a
An octa core processor with shared memory and message-passing
Multistage interconnection networks a transition to optical
1 introduction-to-computer-networking
Computer Network Performance Evaluation Based on Different Data Packet Size U...
Computer network solution
Lecture 7 cuda execution model
Ns2 x graphs
TransparentInterconnectionsofLotofLinks
system interconnect architectures in ACA
MuMHR: Multi-path, Multi-hop Hierarchical Routing
 
Ad

Similar to Conference Paper: Towards High Performance Packet Processing for 5G (20)

PDF
Towards achieving-high-performance-in-5g-mobile-packet-cores-user-plane-function
PDF
FLEXIBLE VIRTUAL ROUTING FUNCTION DEPLOYMENT IN NFV-BASED NETWORK WITH MINIMU...
PDF
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
PDF
Active Network Service Composition
PPT
Netkit
PDF
SDN: A New Approach to Networking Technology
PDF
Scheduling wireless virtual networks functions
PDF
Sdn 02
PDF
ONP 2.1 platforms maximize VNF interoperability
PDF
Network Function Virtualisation
PDF
Conference Paper: Network Function Chaining in DCs: the unified recurring con...
PDF
AN ADAPTIVE DIFFSERV APPROACH TO SUPPORT QOS IN NETWORK MOBILITY NEMO ENVIRON...
PDF
5 g architecture
PDF
SD_WAN_NFV_White_Paper
PDF
CONTAINERIZED SERVICES ORCHESTRATION FOR EDGE COMPUTING IN SOFTWARE-DEFINED W...
PDF
IMPROVEMENTS FOR DMM IN SDN AND VIRTUALIZATION-BASED MOBILE NETWORK ARCHITECTURE
PDF
IMPROVEMENTS FOR DMM IN SDN AND VIRTUALIZATION-BASED MOBILE NETWORK ARCHITECTURE
PDF
Improvements for DMM in SDN and Virtualization-Based Mobile Network Architecture
PDF
Enabling Active Flow Manipulation in Silicon-based Network Forwarding Engine
Towards achieving-high-performance-in-5g-mobile-packet-cores-user-plane-function
FLEXIBLE VIRTUAL ROUTING FUNCTION DEPLOYMENT IN NFV-BASED NETWORK WITH MINIMU...
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
Active Network Service Composition
Netkit
SDN: A New Approach to Networking Technology
Scheduling wireless virtual networks functions
Sdn 02
ONP 2.1 platforms maximize VNF interoperability
Network Function Virtualisation
Conference Paper: Network Function Chaining in DCs: the unified recurring con...
AN ADAPTIVE DIFFSERV APPROACH TO SUPPORT QOS IN NETWORK MOBILITY NEMO ENVIRON...
5 g architecture
SD_WAN_NFV_White_Paper
CONTAINERIZED SERVICES ORCHESTRATION FOR EDGE COMPUTING IN SOFTWARE-DEFINED W...
IMPROVEMENTS FOR DMM IN SDN AND VIRTUALIZATION-BASED MOBILE NETWORK ARCHITECTURE
IMPROVEMENTS FOR DMM IN SDN AND VIRTUALIZATION-BASED MOBILE NETWORK ARCHITECTURE
Improvements for DMM in SDN and Virtualization-Based Mobile Network Architecture
Enabling Active Flow Manipulation in Silicon-based Network Forwarding Engine
Ad

More from Ericsson (20)

PDF
Ericsson Technology Review: Versatile Video Coding explained – the future of ...
PDF
Ericsson Technology Review: issue 2, 2020
PDF
Ericsson Technology Review: Integrated access and backhaul – a new type of wi...
PDF
Ericsson Technology Review: Critical IoT connectivity: Ideal for time-critica...
PDF
Ericsson Technology Review: 5G evolution: 3GPP releases 16 & 17 overview (upd...
PDF
Ericsson Technology Review: The future of cloud computing: Highly distributed...
PDF
Ericsson Technology Review: Optimizing UICC modules for IoT applications
PDF
Ericsson Technology Review: issue 1, 2020
PDF
Ericsson Technology Review: 5G BSS: Evolving BSS to fit the 5G economy
PDF
Ericsson Technology Review: 5G migration strategy from EPS to 5G system
PDF
Ericsson Technology Review: Creating the next-generation edge-cloud ecosystem
PDF
Ericsson Technology Review: Issue 2/2019
PDF
Ericsson Technology Review: Spotlight on the Internet of Things
PDF
Ericsson Technology Review - Technology Trends 2019
PDF
Ericsson Technology Review: Driving transformation in the automotive and road...
PDF
SD-WAN Orchestration
PDF
Ericsson Technology Review: 5G-TSN integration meets networking requirements ...
PDF
Ericsson Technology Review: Meeting 5G latency requirements with inactive state
PDF
Ericsson Technology Review: Cloud-native application design in the telecom do...
PDF
Ericsson Technology Review: Service exposure: a critical capability in a 5G w...
Ericsson Technology Review: Versatile Video Coding explained – the future of ...
Ericsson Technology Review: issue 2, 2020
Ericsson Technology Review: Integrated access and backhaul – a new type of wi...
Ericsson Technology Review: Critical IoT connectivity: Ideal for time-critica...
Ericsson Technology Review: 5G evolution: 3GPP releases 16 & 17 overview (upd...
Ericsson Technology Review: The future of cloud computing: Highly distributed...
Ericsson Technology Review: Optimizing UICC modules for IoT applications
Ericsson Technology Review: issue 1, 2020
Ericsson Technology Review: 5G BSS: Evolving BSS to fit the 5G economy
Ericsson Technology Review: 5G migration strategy from EPS to 5G system
Ericsson Technology Review: Creating the next-generation edge-cloud ecosystem
Ericsson Technology Review: Issue 2/2019
Ericsson Technology Review: Spotlight on the Internet of Things
Ericsson Technology Review - Technology Trends 2019
Ericsson Technology Review: Driving transformation in the automotive and road...
SD-WAN Orchestration
Ericsson Technology Review: 5G-TSN integration meets networking requirements ...
Ericsson Technology Review: Meeting 5G latency requirements with inactive state
Ericsson Technology Review: Cloud-native application design in the telecom do...
Ericsson Technology Review: Service exposure: a critical capability in a 5G w...

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
MYSQL Presentation for SQL database connectivity
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Diabetes mellitus diagnosis method based random forest with bat algorithm
MYSQL Presentation for SQL database connectivity
NewMind AI Weekly Chronicles - August'25 Week I
CIFDAQ's Market Insight: SEC Turns Pro Crypto
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
A Presentation on Artificial Intelligence
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
20250228 LYD VKU AI Blended-Learning.pptx

Conference Paper: Towards High Performance Packet Processing for 5G

  • 1. Towards High Performance Packet Processing for 5G Bal´azs Pinczel, D´aniel G´ehberger, Zolt´an Tur´anyi and Bence Formanek Ericsson Research, Hungary Abstract—Network Functions Virtualization aims to implement networking functions in the cloud – even the ones working on the user plane. This is envisioned by decomposing today’s monolithic devices into smaller functions and re-composing them real-time. This enables flexibility and independent innovation, but may come at a performance price. In this paper we investigate the performance aspects of composing user plane functions. We have developed a prototype for mobile service chaining to experiment with the function call based composition method provided by the Click modular router. The prototype allows flexible management of feature sets, including the support for relocating user chains using context transfer. We provide a thorough performance evaluation, draw conclusions and propose a versatile execution environment, which combines various composition and processing methods in order to achieve high performance packet processing for 5G networks. I. INTRODUCTION Mobile core networks provide important contribution to the mobile communication service. They are responsible for man- aging connectivity sessions, handovers, idle mode and paging, security, policy and charging. These functions have been traditionally standardized to ensure interoperability between the network and the terminals, but also among networking devices. The standards govern not only functions, protocols and procedures, but also a grouping of the functions into nodes connected by standardized interfaces. The traditional approach to core networks is to implement these nodes in separate devices. This in turn, resulted in low feature velocity, since any new feature that impacts more than one node, has to be first standardized, then implemented by vendors (often as a new release of a monolithic device), and finally carefully rolled out – a process that can easily take years. This is very far from the DevOps model and the methods of innovation established in other technology domains. Any evolution of mobile core networks must significantly improve this prob- lem. Modern IT software technologies and the cloud enable new ways to implement telecommunications equipment and represent good sources of inspiration. We adopt the argument for decomposing node functionality into smaller units. This argument is fairly common and forms one of the principles of the Network Function Virtualization (NFV) initiative [1]. The subject of this paper is real-time re-composition of network functions, focusing on the user plane. Even though some network functions are possible to implement in purpose-built silicon, in our prototype we use general-purpose CPUs, since we believe this is where rapid innovation can happen. Some of our results are also applicable to programmable NPUs. Our concept consists of implementing core network user plane functionality in small User Plane Functions (UPFs) that can then be chained by a platform both within a physical server and between servers. The control plane of the platform is free to mix and match any set of UPF to a packet flow in any order running at any location. The platform must take mobility into account, which may result in the relocation of certain functions – a generic context transfer facility similar to the context transfer during X2 LTE handovers [2]. This concept, called Mobile Service Chaining [3] has the following abilities. • Customization: specify which flow gets what processing • Simplification: omit network functions not needed • Deployment flexibility: control where functions execute • Extensibility: easy to deploy and apply new functions Deployment flexibility not only aids in efficient resource management, but also allows use cases, such as local breakout, which required extensive standardization before [4]. We also note that due to the extensibility of the concept radio and service functions can also be included and managed under a single platform. That is, some of the virtualized radio functions, like PDCP or transport encryption employed in LTE can be handled just as any other core network function. Also, functions that today are deployed above the (S)Gi interface can also be made part of a single (mobile) service chain tying these service functions to the same control structure making all core network information (such as user identity, policies, charging, location, etc.) available to them in a uniform manner. User Plane Functions may come in various complexity, state, resource and security requirements. Composition must be performance efficient, so that the flexibility enabled by the system does not come at a prohibitive cost. Small, simple UPFs, such as token bucket policing, packet marking or encapsulation may be invoked all in the same thread, while more complex UPFs require more isolation or the use of a different operating system. Our ultimate goal is to address this diversity in a mobile network context. We have developed a limited prototype of mobile service chaining to experiment with a certain kind of composition method. In particular, we have implemented a chain compos- itor composing UPFs inside a Click modular router [5]. We have also implemented forwarding between nodes (carrying metadata between sites) and support (lossless and reordering- free) handovers via context transfer. In Section III we describe the details of our prototyping environment establishing a baseline for performance. Then, in Section IV we describe
  • 2. the prototype and report our findings about the kind of perfor- mance price one pays for flexible composability. In Section V we demonstrate handover performance, which is an important aspect in a mobile environment often overlooked in cloud- related discussions. Finally, in Section VI we conclude the paper by discussing how the composition method evaluated by our prototype fits among a wider set of composition methods. In addition, we outline our vision for a distributed packet processing execution environment for 5G. II. RELATED WORK A. Mobile Service Chaining preliminaries Since our prototype follows the considerations of the Mobile Service Chaining 5G Core Network Architecture [3], we provide a brief overview here. The architecture is essentially an evolution of service chaining, with added support for mobility and a formalized set of metadata. It caters for the specifics of the mobile core environment, such as bearers, requirements on grouped flow processing (e.g., to provide aggregated maximum bit-rate policing) and the associated scalability requirements. The Control Plane (CP) of this architecture includes the functions of today’s Mobility Management Entity, Policy and Charging Rules Function and a chain controller. We focus on the realization of the user plane, however, as we also evaluate lossless handovers and chain forwarding, a simple Controller is necessary for the coordination. The architecture defines two types of functional elements for the actual packet processing in the User Plane (UP): Forwarding Elements (FE) and User Plane Functions (UPF). A FE forwards packets between its ports based on the rules, which are configured by the CP. A port is either connected to a UPF or to a network interface. The actual packet processing is done in the UPFs based on the locally stored per-user contexts. It is not expected that UPFs know any detail about the higher level topology. Packets are sent through the system according to which service chain they need to traverse. Such information is added to the packets as tags by classifiers, which are also UPFs managed by the CP. Simple classification can be done based on header fields, packet direction (uplink or downlink) but it is also possible to re-use tags assigned by prior UPFs (like Deep Packet Inspection). The rules in the FEs may apply on the assigned tags or directly on the header fields of the packet. It is important that the assigned tags must be carried with the packets through the network. For handover support UP nodes have to be able to export and import the rules and UPF contexts related to the user handing over. B. Other related work Service chaining is usually referred to as a combination of OpenFlow and Virtualized Network Functions (VNF) re- alized in virtual machines (VMs) or Linux containers [6]. However, it is concluded in studies that the packet transfer from the OpenFlow switch to the VMs represents a significant performance overhead [7]. This is especially troublesome with simple VNFs where the actual processing time can be comparable with this overhead. While the overhead is very small for Linux containers [8], for a granular service chaining architecture – with a lot of small VNFs – the network stack of the Linux kernel itself can cause bottlenecks. The overhead of virtualization has been also addressed with configurations where the VNFs are simple processes on the host computer [9], [10]. Nevertheless, the inter-VNF communication is still not lightweight enough, since it is done via Ethernet softswitches, and requires packet copies along the way. There are relatively few performance evaluations for service chaining. Cerrato et al. [9], [10] present results with identical network functions, which calculate a hash on the first bytes of the packets. This simplistic approach is also followed by papers addressing the performance capabilities of OpenFlow switches. Being the most popular, the literature mostly focuses on the Open vSwitch. For the measurements, the switch is usually operated with simple forwarding rules where higher performance is easier to achieve [7], [11]. After the basic one-rule cases, some papers also show that the number of rules affects the performance significantly. There has been impressive improvements during the last years [12], [13]. The Click modular router [5] provides a flexible packet processing environment at the cost of some performance overhead. The overhead is also measurable with high I/O rate extensions [14]. Click can be used to construct middleboxes and – besides general feasibility – Dobrescu et al. [15] show that it is possible to predict the performance with the CPU cache miss rates. Martins et al. [16] also present results for several middleboxes, where the complexity varies from simple forwarding to an intrusion detection system. Regarding cloud based mobility, it is possible to perform live migration of virtual machines and Linux containers [17]. However, this is not a solution for handovers in 5G, because only the context of a single user needs to be migrated, not the whole user plane node which possibly handles thousands of users. Wubin et al. [18] describe a solution which provides high availability for containers by exporting the internal states, which is a significant step towards our needs, but it is still missing parts to support the handover of a single user. We can conclude that in most of the cases packet processing and service chaining is evaluated involving simple rules or functions. The papers that evaluate more complex middleboxes do not consider the composition of these neither chained with an OpenFlow switch nor inside the execution environment itself. Furthermore, to the best of our knowledge there are no public results available about VNFs, which support per-user context migration. III. PROTOTYPING ENVIRONMENT We have selected the Click modular router environment to investigate the performance of packet processing composition. In this section we briefly overview Click and select one of its environments to build our prototype on. We seek to provide the prototype directly as a virtual machine or a Linux container to ease the cloud based deployment since this is is expected for 5G systems.
  • 3. NIC Host1 Traffic generator Click router Traffic collector NIC Host2 Fig. 1. Measurement setup. The Click modular router enables dynamic composition of software routers [5]. A router is assembled in run-time – based on a configuration file – from simple packet processing elements. Elements can have multiple input and output ports, and as the internal logic can be written in pure C++, the packet processing is not limited to the restricted set of actions as in the case of OpenFlow switches [19]. Packets traverse the chain of elements as sequences of function calls initiated by the active elements of the actual configuration (ingress interfaces or elements with buffered packets, ready to send). Click has multithread support, which enables the scheduling of active elements to different cores in parallel – in the prototype, though, we focus on the performance of one CPU core, using a single Click thread. The Click framework supports three different environments, namely, it can run in Linux user space, kernel space and over the Xen hypervisor as a library operating system called ClickOS [16]. More recent versions of Click implement sup- port for netmap – a fast packet I/O solution available as a kernel module [20] – in user space and for ClickOS. As a result, a user space deployment can outperform the kernel based counterparts [14]. It is also possible to have netmap access in Docker containers with mounting the related device file and granting privileged access for the container. The netmap framework also contains a software switch called VALE, which provides high packet rates between netmap accelerated interfaces [21]. Before discussing the performance of the software envi- ronments, we provide a brief overview about the hardware components used. For all of our measurements we used Linux (Ubuntu 14.04 LTS) servers, each containing 3.5 GHz 6-core Intel Xeon E5-1650 CPUs and having 32 GiB RAM. For user plane traffic we used dedicated 10-Gigabit point-to-point links between the machines with Intel X540 network interface cards (NICs). The traffic on these links was directly handled by either a Click instance or a traffic generator/collector as it can be seen in Figure 1, without being processed by the OS network stack. An exception was the ClickOS where a VALE switch also had to be used. Control plane traffic used a separate 1-Gigabit network. The Click version used for the measurements is a recent (April, 2015) commit from the repository [22], as is netmap [23]. Our ClickOS-based evaluation was done on Xen 4.4.0. For traffic generation we use the Pkt-gen tool from the netmap framework to achieve high packet per second rates. We tested the multiple different software environments of Click to select the best for our prototype. In these cases the Click router was used with a simple configuration, which only rewrote the Ethernet addresses, thus realizing a simple low 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 throughput(Kpps) kernel forwarder user space Click kernel space Click ClickOS with netmap user space Click with netmap user space Click in Docker with netmap 6% 10% 18% 33% 8% 13% 22% 39% 17% 29% 53% 62% 46% 58% 92% 99% 67% 100% 100% 100% 64% 100% 100% 100% Fig. 2. Throughput and link utilization in different environments. 0.00 0.05 0.10 0.15 0.20 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 64 128 256 512 delay(ms) kernel forwarder user space Click kernel space Click ClickOS with netmap user space Click with netmap user space Click in Docker with netmap Fig. 3. Delay of different environments. level forwarder. Figure 2 shows the averaged values of the throughput results extended with the standard deviation and the link utilization percentage on the top of the bars. Figure 3 shows the delay for the different environments as boxplots where the top and bottom 5 percentiles are defined as outliers. We used various packet sizes from 64 bytes up to 512 bytes in all the cases. In order to provide an easily reproducible reference which can be used to estimate the peak performance of other ma- chines, we measured packet forwarding in the Linux kernel as well. This method provides even lower throughput than the simple user space Click configuration. This is because in Click we only rewrite the Ethernet addresses without going through the Linux protocol stack. As the figure shows, the kernel space based Click router improves the throughput but adds variability. The last three cases are accelerated with the netmap frame- work, which – as expected – improves the performance significantly. While ClickOS shows a decent improvement, we reached the best results with the user space Click router accelerated with the netmap framework. Here the processing rate goes up until 10 Mpps for the smallest packets, and above 128 bytes the 10 Gigabit link gets saturated. As we noted before, it is possible to place this software combination into a Docker container, which is shown as the last environment on
  • 4. UPF 1 UPF 2 UPF n Tunnel encapsulator Tunnel decapsulator Handover buffer Interface 1 Interface 2 FE … Fig. 4. Mobile Service Chaining 5G Core Network Architecture. the figure. The container slightly limits the peak performance for small packets and adds some variation, but the link still gets saturated quickly as the packet size increases. Based on the above results, we selected the Click router running in a Docker container because despite the slightly lower performance, this setup seamlessly fits into the envisioned cloud based deployment of 5G systems. IV. THE 5G CORE NETWORK PROTOTYPE In our prototype we mapped the proposed logical 5G Core Network Architecture to nodes on physical computers each running a Click router instance. Besides a more detailed de- scription, we provide a performance evaluation in this section. A. Implementation details The FEs and the UPFs are realized as Click elements, and the UPFs are connected to the ports of the single FE as shown in Figure 4. To carry the packet tags between Click instances, we implemented a simple tunneling protocol, which uses UDP encapsulation and carries the tag/value pairs besides payload. When an encapsulated packet arrives to a Click node, the extra headers are removed and the tags are placed to the annotation area defined by the Click for each packet, which provides efficient tag handling. The implementation is capable of per- user handovers, utilizing a special buffering element in each node. This element does nothing for users not in a handover and will be further described in Section V. Both the input and the output ports of the UPFs are connected to the FE, which applies its rules for all the incoming packets. The actual rules can be defined with a rule language from the Control Plane where currently it is possible to match against UPF names and packet tags. Obviously, the performance of the rule matching determines the performance of the system, especially for a large rule set. In order to address this, we implemented a hash table based caching component to speed up the lookup. For the evaluation, we implemented the following fully functional, per-user handover capable UPFs. a) Counter: This UPF maintains a separate packet counter for each user, and writes the current value into the payload at an offset, which is chosen so as to avoid overwriting any previously placed counter values. While this is a relatively simple function, it still requires payload parsing, modification and the recalculation of the UDP checksum. b) Marker: The marker element uses token buckets to check whether the traffic conforms to a per-user defined bandwidth limit. Non-conformant packets are marked with a tag to be dropped later. c) Header compressor (IPHC): This element imple- ments IP/UDP compression and decompression as specified in RFC2507 [24]. It is able to handle multiple users with multiple flows per user, assigning context identifiers automatically. B. Evaluation of the prototype We evaluate the performance of the prototype running on a single node, assuming that it is an intermediate element of a longer chain, and that the incoming packets are already classified and tagged. To enable this, we modified the Pkt- gen tool to be able to generate tunneled traffic, tagged with, e.g., user-ID tag. As these packets are larger than 64 bytes, we used 128 byte packet size for the rest of the paper. 1) UPF Overhead: In each step of the following exper- iment, the configuration is extended with additional com- ponents. Figure 5a shows the average packet rate with the standard deviation. We also displayed the nominal time re- quired to process each packet (simply taking the inverse of the average packet rate). The first setup uses no encapsulation, just one FE with one rule to forward. Compared to the baseline measurements of the previous section, this configuration also removes the Ethernet header and validates the IP header before entering the FE. After the rule matching, the IP addresses are rewritten and a new Ethernet header is placed to the packet. The second setup includes tunneling and handover elements as shown in Figure 4. The encapsulation step also involves a configurable next hop selection based on tags. In the rest of the cases the actual UPFs are added to the chain one by one, creating more and more complex configurations. As these functions handle contexts, parse, validate and copy fields for each incoming packet, it is expected that these CPU and memory intensive tasks affect the performance. Still, as it can be seen on the figure, the throughput is stable and remains in the millions of packets per second range. The per-packet processing time shows that each component adds around 70– 110 ns overhead, but due to the inverse proportionality, this effect appears to influence higher packet rates more. 2) Impact of additional users: Current custom-made core network nodes handle 10–100 k users. Per component load is expected to be lower in cloud-based deployments of future 5G systems, but still in the thousands. The following measurement addresses this by taking the most advanced configuration with 3 UPFs, and increasing the number of users from 1 to 10 k. The average packet rates with the standard deviation and the recorded L3 cache load misses are shown in Figure 5b. Even though the hash table based lookups we used are near constant time, the increase in the number of users causes some performance degradation partly due to the L3 cache misses. 3) Handling multiple chains: Flexibility is one of the strengths of the proposed 5G architecture. The last throughput related evaluation addresses this, by taking the previous setup one more step forward. For this measurement we used 1000
  • 5. 0 1000 2000 3000 4000 5000 6000 only FE full node, no active UPFs counter counter, marker counter, marker, IPHC throughput(Kpps) 186 ns 281 ns 349 ns 447 ns 559 ns (a) Throughput of different configurations 0 500 1000 1500 2000 1 10 100 1000 10000 throughput(Kpps) number of users 114 192 424 821 911554 (b) Throughput with multiple users 0 1000 2000 3000 4000 0 20 40 60 80 100 throughput(Kpps) % of users with the empty chain (c) Throughput with a mix of chains Fig. 5. Prototype performance with various workloads. UE BS2 GW BS1 (a) Packet paths UE BS2 GW BS1 (b) Intermediate step Fig. 6. Handover from BS1 to BS2. users and defined 2 chains, having one with three UPFs and one without any. In terms of FE rules, this requires only one extra rule, which defines the shortcut for packets arriving on the less complex chain. We generated the traffic with a set ratio of users taking the more and the less complex chains. The leftmost point of Figure 5c is the case when all the traffic goes through the more complex chain, and with each further point 5% of the users are moved to the empty one, which scales up the performance. This demonstrates that while the prototype – as a general- purpose packet processing environment – supports more com- plex feature sets as well, the aggregated performance of serving multiple chains of various complexity remains pre- dictable. The results also show that this kind of flexibility does not sacrifice the scalability: the same compute resources can be saturated either with more complex services at lower throughput, or with less complex ones at higher throughput. V. HANDOVER As a core requirement for mobile service chaining solutions, the proposed architecture supports transfer of a user’s UPF contexts from a source node to a target node. To avoid packet reordering, the prototype uses buffering in nodes where traffic arrives on the new (post-handover) path until all the packets arriving on the old path are processed. To describe the process of handing over, we consider an example scenario depicted in Figure 6a, where a gateway node (GW) has two-way communication with an emulated UE through either one of the two base stations (BS1,2), with the possibility of switching between the paths when instructed to do so by the Control Plane. Figure 7 shows the procedure of a Control Plane GW BS1 BS2 2: HANDOVER 3: Start buffering uplink packets from target BS 4: HANDOVER ACK 1: Set up uplink reordering 6: HANDOVER 7: Start buffering 9: CONTEXT_REQ 10: Start forwarding 11: CONTEXT_RESP (UPF contexts) 12: Delete context 13: Install context 8: Context transfer 14: HANDOVER ACK 5: Handover preparation 16: DL PATH SWITCH 15: Handover execution Fig. 7. Interaction of core network entities during a handover. handover from BS1 to BS2. There is a buffer in the GW for the uplink packets arriving from BS2, as well as one in BS2 for packets already arriving on the new path. Also in BS2, there is a separate buffer for the packets arriving on the temporary path shown in Figure 6b. This buffer stores incoming packets until the migration of contexts completes. When all the preparatory steps are completed, the path of downlink packets is switched in the GW, and after some time – since no new packets take the old path – the buffered packets can be consumed, until eventually the buffers get empty, at which point the node switches to processing incoming packets directly. In the prototype the bulk of the above functionality is real- ized in a special element chained before the FE, which man- ages the lifecycle of per-user handover buffers, and handles packet forwarding (see Figure 4). The rest of the functionality (e.g., control-related communication or context transfer) is spread among the FE, the UPFs, the daemon of the user plane nodes and the Control Plane. To test handovers the UE emulated the behavior of a radio device by replying to the same base station it received the packet from – the reply being the same 128 byte packet with only one modified tag. Traffic was generated in the GW at the rate of 10 kpps, and was captured both in the UE, and upon
  • 6. -2 0 2 4 6 8 9980 10000 10020 10040 10060 timerelativetohandover(ms) packet sequence number DL via BS1 DL via BS2 (a) Captured at the GW -2 0 2 4 6 8 9980 10000 10020 10040 10060 timerelativetohandover(ms) packet sequence number DL via BS1 DL via BS2 (b) Captured at the UE Fig. 8. Packet arrival times during a handover. returning to the GW. To examine the effect of a handover procedure, we look at the disturbance in packet arrival times. Figure 8 shows these for a single handover selected from our measurements (all other measurements show highly similar characteristics). In normal conditions, the packet arrival times form a di- agonal line, the slope of which is determined by the rate of the packets. When a handover is in progress, packets cease to arrive for a short period (approximately 2 ms), caused by the buffering during, which the context is migrated. When packet processing is resumed, the nodes receive larger batches of packets at a time, because the buffers can be processed faster than the packet arrival rate. This eventually leads to the buffers being empty, from which point the arrival times follow their usual linear pattern. The pattern is already very close to linear 3.5 ms after the beginning of the handover, and is fully restored after 6 ms. During this period the curve approaches its normal slope in gradually shortening steps, which is likely a scheduling artifact. The figures also show a single packet that arrives to the UE through BS1, during the exact time of the context transfer. The UE sends the reply still to BS1, where the forwarding causes it to be sent through BS2, before arriving to the GW suffering the delay of the context transfer. This is normal – even if rare – behavior, and the continuity in packet sequence numbers shows that even in this case the handover is lossless and reordering-free. VI. DISCUSSION In the previous sections we have investigated the perfor- mance aspects of a certain composition model. This model is characterized by the following attributes • individual processing functions (UPFs) are invoked through function calls; • a (cached) rule database is consulted before selecting the next UPF to execute; • UPFs are free to process, duplicate, buffer, drop packets or even generate new packets; and • packets can carry metadata between the UPFs. The main advantage of this composition model is its flexi- bility. It allows single- or multithreaded operations and has its internal scheduler to initiate elements that have work to do. It has some limitations, however. Since all Click elements run in the same process, very little can be done to realize isolation – either for resource control or security. Another issue is that in order to manage relatively high performance and in order to run in a variety of situations, Click represents a confined programming environment. There are a lot of rules on how to handle packets, allocate memory or do I/O. As a result third party libraries in general, cannot be used, not even the STL. This makes the development of complex UPFs, such as a full DPI suite or transparent HTTP proxies quite cumbersome. These would best run in a separate container or VM, since we think similar limitations would appear in any system using function calls as composition method. At the other end of the scale, even the overhead represented by function calls and rule lookups may be significant for very small UPFs. For example, flow counters, bandwidth policers, simple encapsulation modules (and in general most actions of an OpenFlow switch) complete in very few cycles. For these an even more lightweight composition model is better. It is simply more efficient to copy their code one after another in as many combinations as needed [25]. Since a versatile programmable packet processing execution environment may contain a mix of small and large UPFs it is best, if several composition methods are available to the programmers. The figure below summarizes our proposal for composition methods needed. The just-in-time (JIT) linker method dynamically copies small UPFs one after another. This method has been shown to achieve the fastest software switch for simple actions [25]. The function call method has been the subject of our investigations in this paper and is suitable for moderately sized, trusted code. The separate container/VM model passes packets between different containers or VMs on the same server, and enables the use of any third party library or even OS (in the case of VMs) at the expense of passing the packets between contexts [7]. Finally, the last method, chain forwarding, enables the functions to reside on different hosts and uses encapsulation to carry chain ID and other metadata on the wire [26]. The first two methods may be used to support a run-to- completion mode of operation, whereas the second and third can split the processing of a single packet among multiple
  • 7. JIT linking Function call Container / VM Chain forwarding Composition method Separate threads or processes Run-to-completion Separate machines Processing method Click Fig. 9. Composition methods and the corresponding processing methods. threads or processes (latter of which can even reside in different containers/VMs). Presumably a UPF needs to be developed with its composition method in mind. E.g., for JIT linked components small fragments of code are required with linker metadata. Function call components can be provided as shared libraries, while the rest as apps or VM images. In general, it is best if the controller of this execution environment has no knowledge of the type of each UPF. This leaves the implementation and optimization of the user plane to the developer of the UPFs and the execution environment. It is especially good if the distinction is not hard coded in a standard. For example, an OpenFlow softswitch combined with VMs could implement a wide variety of packet processing applications, but it creates an artificial distinctions between things that can be done in the OF switch and things that must be done in a separate VM. Even if a particular function is small enough to fit into the programming environment of the softswitch, it must be placed in a VM, if the OpenFlow standard does not include it. And finally, we note that the presented user plane model, that is, individual UPFs chained together inside and between machines is fairly future proof. Controllers have an abstract view of the user plane, can mix and match standard or de- facto components. The ability to add components regardless of their size enables continuous feature innovation, while the abstraction itself enables continuous performance innovation (hidden from controllers). Due to the range of composition models employed high performance can be provided even for small functions without the controller to understand the details of the user plane implementation. This level of abstraction (composable primitives) can fully utilize hardware innovation without requiring the controller to understand the intricacies of the underlying user plane node. REFERENCES [1] ETSI NFV Industry Specification Group. Network Functions Virtual- isation – White Paper. Visited Apr 2015. [Online]. Available: https: //portal.etsi.org/Portals/0/TBpages/NFV/Docs/NFV White Paper3.pdf [2] “Evolved Universal Terrestrial Radio Access Network (E-UTRAN); X2 general aspects and principles,” (3GPP TS 36.420). [3] D. Roeland and Z. Fu, “Mobile Service Chaining 5G Core Network Architecture,” Submitted to IEEE Commun. Mag., 2015. [4] “Local IP Access and Selected IP Traffic Offload (LIPA-SIPTO),” (3GPP TR 23.829). [5] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek, “The Click Modular Router,” ACM Trans. Comput. Syst., vol. 18, no. 3, pp. 263–297, Aug. 2000. [6] J. Blendin, J. Ruckert, N. Leymann, G. Schyguda, and D. Hausheer, “Position Paper: Software-Defined Network Service Chaining,” in 2014 Third European Workshop on Software Defined Networks (EWSDN), Sept 2014, pp. 109–114. [7] P. Emmerich, D. Raumer, F. Wohlfart, and G. Carle, “Performance characteristics of virtual switching,” in 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), Oct 2014, pp. 120–125. [8] R. Morabito, J. Kj¨allman, and M. Komu, “Hypervisors vs. Lightweight Virtualization: A Performance Comparison,” in 2015 IEEE International Conference on Cloud Engineering (IC2E), March 2015, pp. 386–393. [9] I. Cerrato, G. Marchetto, F. Risso, R. Sisto, and M. Virgilio, “An efficient data exchange algorithm for chained network functions,” in 2014 IEEE 15th International Conference on High Performance Switching and Routing (HPSR), July 2014, pp. 98–105. [10] I. Cerrato, M. Annarumma, and F. Risso, “Supporting Fine-Grained Network Functions through Intel DPDK,” in 2014 Third European Workshop on Software Defined Networks (EWSDN), Sept 2014, pp. 1–6. [11] A. Mian, A. Mamoon, R. Khan, and A. Anjum, “Effects of Virtualization on Network and Processor Performance Using Open vSwitch and Xen Server,” in 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing (UCC), Dec 2014, pp. 762–767. [12] G. Pongr´acz, L. Moln´ar, and Z. Kis, “Removing Roadblocks from SDN: OpenFlow Software Switch Performance on Intel DPDK,” in 2013 Second European Workshop on Software Defined Networks (EWSDN), Oct 2013, pp. 62–67. [13] P. Emmerich, D. Raumer, F. Wohlfart, and G. Carle, “Assessing Soft- and Hardware Bottlenecks in PC-based Packet Forwarding Systems,” in ICN 2015: The Fourteenth International Conference on Networks, Apr 2015, pp. 78–83. [14] L. Rizzo, M. Carbone, and G. Catalli, “Transparent acceleration of soft- ware packet forwarding using netmap,” in INFOCOM, 2012 Proceedings IEEE, Mar. 2012, pp. 2471–2479. [15] M. Dobrescu, K. Argyraki, and S. Ratnasamy, “Toward Predictable Performance in Software Packet-Processing Platforms,” in Presented as part of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). San Jose, CA: USENIX, 2012, pp. 141– 154. [16] J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco, and F. Huici, “ClickOS and the Art of Network Function Virtualization,” in 11th USENIX Symposium on Networked Systems Design and Imple- mentation (NSDI 14). Seattle, WA: USENIX Association, Apr. 2014, pp. 459–473. [17] W. Li and A. Kanso, “Comparing Containers versus Virtual Machines for Achieving High Availability,” in 2015 IEEE International Conference on Cloud Engineering (IC2E), March 2015, pp. 353–358. [18] W. Li, A. Kanso, and A. Gherbi, “Leveraging Linux Containers to Achieve High Availability for Cloud Services,” in 2015 IEEE Inter- national Conference on Cloud Engineering (IC2E), March 2015, pp. 76–83. [19] O. N. Foundation, “OpenFlow Switch Specification Version 1.5.0,” Dec 2014. [Online]. Available: https: //www.opennetworking.org/images/stories/downloads/sdn-resources/ onf-specifications/openflow/openflow-switch-v1.5.0.noipr.pdf [20] L. Rizzo, “netmap: a novel framework for fast packet I/O,” in Proceed- ings of the 2012 USENIX Annual Technical Conference (USENIX ATC 12). Bellevue, WA: USENIX Association, Aug. 2012, pp. 101–112. [21] L. Rizzo and G. Lettieri, “VALE, a Switched Ethernet for Virtual Machines,” in Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies, ser. CoNEXT ’12. New York, NY, USA: ACM, 2012, pp. 61–72. [22] Click modular router source code repository. Visited Apr 2015. [Online]. Available: https://github.com/kohler/click/ [23] Netmap source code repository. Visited Apr 2015. [Online]. Available: https://code.google.com/p/netmap/ [24] M. Degermark, B. Nordgren, and S. Pink. IP Header Compression. Visited Apr 2015. [Online]. Available: https://tools.ietf.org/html/rfc2507 [25] “Universal Node Initial Benchmarking Documentation,” D5.4 deliver- able of EU FP7 project UNIFY (will be publicly available at https: //www.fp7-unify.eu/index.php/results.html#Deliverables). [26] P. Quinn and T. Nadeau. Problem Statement for Service Function Chaining. Visited Apr 2015. [Online]. Available: https://tools.ietf.org/ html/rfc7498