SlideShare a Scribd company logo
Data Plane Development Kit A Guide To The User
Spacebased Fast Network Applications Heqing Zhu
download
https://ebookbell.com/product/data-plane-development-kit-a-guide-
to-the-user-spacebased-fast-network-applications-heqing-
zhu-51692386
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
China Automotive Low Carbon Action Plan 2022 Low Carbon Development
Strategy And Transformation Path For Carbon Neutral Automotive
Automotive Data Of China Co
https://ebookbell.com/product/china-automotive-low-carbon-action-
plan-2022-low-carbon-development-strategy-and-transformation-path-for-
carbon-neutral-automotive-automotive-data-of-china-co-49199832
Texts From Mittens A Cat Who Has An Unlimited Data Planand Isnt Afraid
To Use It Angie Bailey
https://ebookbell.com/product/texts-from-mittens-a-cat-who-has-an-
unlimited-data-planand-isnt-afraid-to-use-it-angie-bailey-61016852
Data Analysis Plans A Blueprint For Success Using Sas 1ed Kathleen
Jablonski Mark Guagliardo
https://ebookbell.com/product/data-analysis-plans-a-blueprint-for-
success-using-sas-1ed-kathleen-jablonski-mark-guagliardo-10414832
Fundamentals Of Data Engineering Plan And Build Robust Data Systems 1
Converted Joe Reis
https://ebookbell.com/product/fundamentals-of-data-engineering-plan-
and-build-robust-data-systems-1-converted-joe-reis-55035078
Oracle Big Data Handbook Plan And Implement An Enterprise Big Data
Infrastructure 1st Edition Tom Plunkett Et Al
https://ebookbell.com/product/oracle-big-data-handbook-plan-and-
implement-an-enterprise-big-data-infrastructure-1st-edition-tom-
plunkett-et-al-5845688
Fundamentals Of Data Engineering Plan And Build Robust Data Systems
Joe Reis Matt Housley
https://ebookbell.com/product/fundamentals-of-data-engineering-plan-
and-build-robust-data-systems-joe-reis-matt-housley-56609802
Data Center Handbook Plan Design Build And Operations Of A Smart Data
Center 2nd Edition Hwaiyu Geng
https://ebookbell.com/product/data-center-handbook-plan-design-build-
and-operations-of-a-smart-data-center-2nd-edition-hwaiyu-geng-28537542
Genetic Data Analysis For Plant And Animal Breeding Holland James Isik
https://ebookbell.com/product/genetic-data-analysis-for-plant-and-
animal-breeding-holland-james-isik-6750180
Community And Quality Of Life Data Needs For Informed Decision Making
Committee On Identifying Data Needs For Placebased Decision Making
https://ebookbell.com/product/community-and-quality-of-life-data-
needs-for-informed-decision-making-committee-on-identifying-data-
needs-for-placebased-decision-making-1632810
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
Data Plane Development
KIT (DPDK)
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
Data Plane Development
KIT (DPDK)
A Software Optimization Guide to the
User Space-based Network Applications
Edited by
Heqing Zhu
First edition published 2021
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2021 Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of
their use. The authors and publishers have attempted to trace the copyright holders of all material
reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and
let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.
com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis
sions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
ISBN: 9780367520175 (hbk)
ISBN: 9780367373955 (pbk)
ISBN: 9780429353512 (ebk)
Typeset in Times
by codeMantra
DPDK uses the FreeBSD license for the most software code, which are running in the user mode.
A small amount of code will reside in the kernel mode such as VFIO, KNI, which are released under
GPL. BSD license model gives the developers and consumers more flexibility for the commercial
use. If a developer wants to contribute any new software code, the license model need be followed.
If we go to the DPDK website (www.dpdk.org), we can download the source code for immediate use.
DPDK open source development is still very active today, rapidly evolving with new features and
large source code contribution, the open source package is released every 3months, this release
cadence is decided since 2016.
v
Contents
Preface vii
......................................................................................................................
Editor xv
.......................................................................................................................
Contributors xvii
...........................................................................................................
SECTION 1 DPDK Foundation
Chapter 1 Introduction 3
Heqing Zhu and Cunming Liang
..........................................................................................
Chapter 2 Cache and Memory 31
Chen Jing and Heqing Zhu
............................................................................
Chapter 3 Core-Based Parallelism 51
Qun Wan, Heqing Zhu, and Zhihong Wang
......................................................................
Chapter 4 Synchronization 65
Frank Liu and Heqing Zhu
..................................................................................
Chapter 5 Forwarding 85
Yipeng Wang, Jasvinder Singh, Zhe Tao, Liang Ma, and
Heqing Zhu
.........................................................................................
Chapter 6 PCIe/NIC 115
Cunming Liang, Hunt David, and Heqing Zhu
..........................................................................................
Chapter 7 PMD 133
Helin Zhang and Heqing Zhu
.................................................................................................
Chapter 8 NIC-Based Parallellism 157
Jingjing Wu, Xiaolong Ye, and Heqing Zhu
....................................................................
Chapter 9 NIC Offload 183
Wenzhuo Lu and Heqing Zhu
......................................................................................
vi Contents
Chapter 10 Packet Security 199
Fan Zhang (Roy) and Heqing Zhu
.................................................................................
SECTION 2 I/O Virtualization
Chapter 11 Hardware Virtualization 219
Qian Xu and Rashmin Patel
..................................................................
Chapter 12 Virtio 229
Tiwei Bie, Changchun Ouyang, and Heqing Zhu
................................................................................................
Chapter 13 Vhost-User 251
Tiwei Bie and Heqing Zhu
........................................................................................
SECTION 3 DPDK Applications
Chapter 14 DPDK for NFV 265
Xuekun Hu, Waterman Cao, and Heqing Zhu
................................................................................
Chapter 15 Virtual Switch 277
Ciara Loftus, Xuekun Hu, and Heqing Zhu
..................................................................................
Chapter 16 Storage Acceleration 291
Ziye Yang and Heqing Zhu
........................................................................
Index 305
......................................................................................................................
vii
Preface
DPDK (Data Plane Development Kit) started as a small software project at Intel®
about a decade ago. By 2019, it has evolved into a leading open-source project under
the governance of Linux Foundation. DPDK is known as a kernel bypass networking
technology, and it has gained the huge adoption by cloud, telecom, and enterprise
networking and security systems. Many technical articles and online documentation
shared the DPDK technology and application; many of the contents are available on
the Internet. They are very informative and useful, but isolated in many different
places. As of 2019, the general-purpose processors are widely used for cloud and
telecom network infrastructure systems. DPDK played an important role to deliver
the high-speed I/O, and it is widely used in software-defined data centers (Open
vSwitch, VMware NSX, Red Hat virtualization, load balancers, etc.) and telecom
networking appliances such as Virtual EPC, Virtual Router, and NGFW (next-gen-
eration firewall).
In April 2015, the first DPDK China summit was held in Beijing. Network devel-
opers and system designers from China Mobile, China Telecom, Alibaba, IBM,
Intel®, Huawei, and ZTE presented a a variety of topics around DPDK technologies
and their use cases. The passionate audience inspired us to write this book about
DPDK, and the first edition of Chinese book was published in 2016. In the past
3years, there were continuous improvements in DPDK community. We incorporate
some changes into this English book.
The DPDK community spent lots of effort on the documentation improvement;
the significant progress has been made. This book is intended to make DPDK easy
to use, and to cover the design principles and software libraries and software opti-
mization approach. We have to admit we have the limited knowledge and language
skills, which may leave this book imperfect. This book is a joint work from many
talented engineers from Intel®, Alibaba, and Huawei; they are DPDK developers
and users. For network engineers and college students, if you are working to build
a networking/security system or service, which has not used DPDK yet, this book
may help you.
The “network function virtualization (NFV)” transformation has inspired DPDK
as a key data plane technology, a crucial ingredient to build the 5G and edge systems
(uCPE/SD-WAN, MEC). DPDK is used as a foundational I/O software to deliver the
wireless and fixed network services.
SILICON
When I joined Intel® in 2005, a big challenge is how to ensure software to gain the
benefits using multicore CPU. Most legacy software was not designed with paral-
lelization in mind, not able to gain performance benefits from the multicore proces-
sor. Prior to 2005, CPU design focused on increasing the running frequency; the
software gained performance improvement without any changes; the increased CPU
frequency helps the software to achieve a higher performance. As industry enters
viii Preface
the multicore journey (almost 15years up to now), the CPU innovation focuses on
delivering more cores at each new product release cycle. Table 1 summarizes the
core count increase history on Intel® Xeon processors.
The server platform evolution is largely driven by Intel® Xeon processor; the most
commonly used server platform is based on the dual-socket design; the number of
logical computing cores increased about 56 times within 13years. It is a similar
aggressive technology development from all other CPU suppliers. In addition to the
rapid CPU development, the other silicon components in the server system, such as
memory and NIC (network interface card), have gone through a similar and signifi-
cant capability boost.
Ethernet technology has evolved from the early interface speed at 10/100 Mbps to
gigabit Ethernet (1 Gbps). Today, 10 Gbps/25 Gbps NIC is adopted by the new data
center servers. In the networking production systems, 100 Gbps ultra-high-speed
network interface is also being used, but CPU frequency has remained at the same
level as 10years ago. Table 2 summarizes Ethernet development history from https://
ethernetalliance.org/.
Supporting the high-speed NIC, substantial software innovations should be made;
it is important to have the software innovation with the high parallelism. DPDK is
born in the time of change; one of the designed goals is to take advantage of the
multicore architecture to enable high-speed NIC.
DPDK is an open-source project with broad industry support; many organizations
recognize the power of open source and start to participate and contribute; there is
a very clear progress in cloud and telecom/security solution. DPDK is born at Intel®
first, but it gained the broad contribution due to the open source model.
From a cost perspective, the server platform cost is reduced significantly, and a
dual-socket server today is just about the similar cost to a high-end laptop 10years
ago. The computing capability exceeds what a supercomputer can do at that time.
Such rapid technology development makes the server platform as the preferable
TABLE 1
The Intel® Xeon Processor in Years
CPU Code Name
CPU Process
(nm)
Max # Cores
per CPU
Release
Time
Hyper-
Threading
Total # of
Cores on
2-CPU Server
WoodCrest 65 2 2006 No 4
Nehalem-EP 45 4 2009 Yes 16
Westmere-EP 32 6 2010 Yes 24
Sandy Bridge-EP 32 8 2012 Yes 32
Ivy Bridge-EP 22 12 2013 Yes 48
Haswell-EP 22 18 2014 Yes 72
Skylake-EP 14 28 2017 Yes 112
Cascade Lake-EP 14 56 2019 Yes 224
ix
Preface
option to implement software-defined infrastructure (SDI), a common platform to
deliver the computing, network, and storage tasks. DPDK and Intel® server platform
are great recipes to build out the software-defined network or security infrastructure
and service. The early DPDK started from a performance test case of Intel® NIC.
Today, DPDK software is widely recognized and deployed in the cloud and telecom
network infrastructure. DPDK is an important software library to realize the NFV;
it helps the concept; it helps in the production deployment.
NETWORK PROCESSOR
In the context of the hardware platform for the networking workload, it is necessary
to discuss the network processor first. The telecom vendors have used network proces-
sors or similar chip technology as the primary silicon choice for data plane process-
ing. Intel® was a market leader in this domain. Intel® had a product line known as
Intel® Internet Exchange Architecture (IXA); the silicon product is known as IXP4xx,
IXP12xx, IXP24xx/23xx, and IXP28xx. The technology was successful, and the
product was a market leader. Within IXP silicon, a large number of microengines
are inside of the processors; they are programmable for data plane processing. IXP
has the XScale processor for the control plane, and it is a StrongARM-based silicon
component.
In 2006, AMD has gained the processor leadership in x86 processor domain, and
Intel® has to optimize the research and development organization, and investment
portfolio. The network processor business was evaluated, and the internal finding
indicated that IXP’s overall business potential was not large enough for the long-term
investment. Intel® gained the No.1 market share in 2006, but the market size was not
big enough to support the long-term growth. Without the specialized network proces-
sor silicon, it requires other solution to address high-performance packet processing
TABLE 2
Ethernet Port Speed Evolution
Ethernet Port Speed
10 Mb/s
100 Mb/s
1 GbE
Year
~1980
~1995
~1998
10 GbE ~2002
25 GbE ~2016
40 GbE ~2010
50 GbE ~2018
100 GbE ~2011
200 GbE ~2017
400GbE ~2017
x Preface
workloads. Intel®’s architects predicted that multicore x86 processor will be devel-
oped in a faster pace, so it makes sense to replace IXP roadmap with CPU-based
technology. As a result, Intel® stopped the IXP product line development and grad-
ually shifted towards CPU-based solution, which requires a software approach for
networking workloads. This is an important strategic shift. The business plan is to
converge all the networking-related solutions into x86-based multicore processor. The
data plane workload requirement is different than the general computing character-
istic; it needs to be fulfilled with the dedicated software solution; DPDK is a solution
to respond to this strategic shift. The IXP is still existing at Intel®, which contributes
to the accelerator technology, known as Intel® QuickAssist Technology (QAT), which
is commonly available in QAT PCIe card, server chipset, or SoC (system on chip).
In 2006, the networking system needs to support the 10Gbps I/O. At that time,
Linux system and kernel drivers were not able to achieve this. DPDK comes up as
the initial software-based solution, as it meets up a new silicon trend at the dawn of
multiple-core processor architecture. Since its birth, DPDK is very busy keeping up
to the growing high-speed I/O such as 25Gbe, 40Gbe, and 100Gbe Ethernet.
In a short summary, DPDK was born in a disruptive time. The software design
focuses on performance and scalability, and it is achieved by using multicores.
Together with Intel’s tick-tock silicon model, it sets the rapid product cadence com-
pared to most of the network silicon technologies. Under the tick-tock model, Intel®
released the new processor in such a cadence.
• CPU architecture needs to be refreshed every 2years.
• CPU manufacturing process needs to be refreshed every 2years.
At the early time, this is a very aggressive product beat rate in the silicon technology
sector. Later, the smartphone silicon has the more aggressive schedule.
DPDK HISTORY
A network processor supports the packet movement in and out of the system at the
line rate. For example, the system receives the 64-byte packet at a line rate of 10 Gbps,
which is about 14.88 Mpps (million packets per second). This cannot be achieved in
the early Linux kernel using x86 server platform. Intel’s team started with a NIC
performance test code; a breakthrough is made with NIC poll mode driver in Linux
user mode. Traditionally, the NIC driver runs in Linux kernel mode and wakes up the
system for the packet processing via interrupts, for every incoming packet.
In early days, CPU is faster than the I/O processing unit; the interrupt-based pro-
cessing is very effective; CPU resource is expensive, so it is shared by different
I/O and computing tasks. However, the processor and high-speed network interface
speed mandates the packet processing to support 10 Gbps and above, which exceeds
what the traditional Linux networking software stack can deliver. The CPU fre-
quency still remains at 3GHz or lower; only the game computer platform can do the
frequency overclock up to 5GHz. The networking and communication systems need
to consider the energy efficiency as it is always on; it runs 24 × 7hours. Therefore,
the network infrastructure need to take the power consumption into the total cost
xi
Preface
of ownership (TCO) analysis. Today, most network systems run below 2.5GHz
for energy saving. From the silicon perspective, I/O speed is 10×~100× faster than
before; CPU core count on a given platform is also up to 100×. One obvious way is to
assign the dedicated cores to poll on the high-speed network ports/queues, so that the
software can take advantage of the multicore architecture; this concept is a design
foundation on DPDK.
At the very beginning, Intel® only shared the early prototype source code with
the limited customers. Intel® shipped the early software package with FreeBSD
license. 6wind played an important role to help in the software development and
enhancement; there is a business contract. From 2011, 6wind, Wind River, Tieto, and
Radisys announced business services support for Intel® DPDK. Later, Intel® shared
DPDK code package on its website; it’s free for more developers to download and
use. 6wind set up the open-source website www.dpdk.org in April 2013; this became
a host website; and eventually, DPDK becomes one of Linux Foundation Projects.
OPEN SOURCE
Today, any developer can submit source code patches via www.dpdk.org. At the
beginning, DPDK focused on Intel® server platform enabling such as the optimal
use of Intel® processor, chipset, accelerator, and NIC. This project grew significantly
with the broad participation with many other silicon companies; this transformed
DPDK to be ready with multiple architecture and multiple I/O (such as NIC, FPGA
(field-programmable gate array)) support. Intel® is a foundational member to invest
and grow the software community, together with other member companies and
individual developers such as 6Wind, Redhat, Mellanox, ARM, Microsoft, Cisco,
VMware, and Marvell (Cavium).
DPDK version 2.1 was released in August 2015. All major NIC manufacturers
joined the DPDK community to release the NIC PMD support, including Broadcom
NIC (acquired by Emulex), Mellanox, Chelsio, and Cisco. Beyond the NIC driver,
DPDK is expanded for the packet-related acceleration technology; Intel® submitted
the software modules to enable Intel® QAT for crypto acceleration, which is used for
packet encryption/decryption and data compression.
The DPDK community has made great progress on the multiple architecture sup-
port. Dr. Zhu Chao at IBM Research China started the migrating DPDK to sup-
port Power Architecture. Freescale China's developers joined the code contribution.
Engineers from Tilera and EZchip have spent efforts to make DPDK running on the
tile architecture. DPDK also supported the ARM architecture later.
DPDK became a Linux Foundation Project in 2017: “The first release of DPDK
open-source code came out 8years ago; since that time, we’ve built a vibrant com-
munity around the DPDK project”. Jim St. Leger, DPDK board chair, Intel®, wrote
this statement “We’ve created a series of global DPDK Summit events where the
community developers and code consumers gather. The growth in the number of
code contributions, participating companies, and developers working on the project
continues to reflect the robust, healthy community that the DPDK project is today”.
RedHat integrated DPDK into Fedora Linux first, then added to RedHat Enterprise
Linux; many other Linux OS distribution packages followed this. VMware
xii Preface
this. VMware engineers joined the DPDK community and took charge of the
maintainer role of VMXNET3-PMD, i.e., the de-facto high-performance software
virtual interface to the guest software on VMware NSX. Canonical added DPDK
support since Ubuntu 15. For public cloud computing, netvsc PMD can be used for
Microsoft Hyper-V, and ENA PMD is available to support AWS Elastic Network
Adapter.
EXPANSION
DPDK is designed to run in Linux user space; it is intended to stay closer to the
application, which usually runs in Linux user space. The DPDK test case indicated
a single Intel® core can forward packets with an approximate speed of 57 Mpps; this
is achieved in an extremely simplified test case. Open vSwitch is a classical open-
source component which is used by cloud infrastructure server. OVS integrated
DPDK to accelerate virtual switching performance; this is widely adopted by the
large-scale cloud computing and network virtualization systems, DPDK added the
virtio–user interface to connect container with OVS-DPDK, and this is an overlay
networking acceleration in a vendor neutral way.
For telecom equipment manufacturers, the server platform and open-source
software, such as Linux, DPDK/VPP/Hyperscan, are important and new recipes to
design and deliver the networking and security systems; they are also important reci-
pes for cloud service model. Furthermore, Linux networking stack is also innovating
fast with XDP and AF_XDP. This adds the interesting dynamics now as it offers the
bypass path for Linux kernel stack, and it returns NIC management to the existing
Linux utilities. It provides the new way to use Linux and DPDK together.
As one of the best open-source networking projects in the past decade, DPDK
became a widely adopted software library for accelerating packet processing per-
formance (10x more than Linux kernel networking) on the general-purpose server
platforms. It is heavily used in many use cases, which are as follows:
• Build the network and security appliances and systems.
• Optimize the virtual switching for cloud infrastructure.
• Optimize storage systems with high I/O demand, like NVM device.
• NFV, build the software-centric networking infrastructure with servers.
• Cloud networking and security function as a service.
BOOK CONTRIBUTION
This book is a joint contribution from many individuals who worked and are work-
ing for Intel®. The early Chinese book editions are mainly contributed by Cunming
Liang, Xuekun Hu, Waterman Cao (Huawei), and Heqing Zhu as the main editors.
Each chapter in this book has the following list of contributors.
Section 1: DPDK foundation
Chapter 1: Heqing Zhu, Cunming Liang
Chapter 2: Chen Jing (Alibaba), Heqing Zhu
xiii
Preface
Chapter 4: Frank Liu (Netease), Heqing Zhu
Chapter 5: Yipeng Wang, Zhe Tao (Huawei), Liang Ma, Heqing Zhu
Chapter 6: Cunming Liang, Hunt David, Heqing Zhu
Chapter 7: Helin Zhang, Heqing Zhu
Chapter 8: Jingjing Wu, Xiaolong Ye, Heqing Zhu
Chapter 9: Wenzhuo Lu, Heqing Zhu
Chapter 10: Fan Zhang (Roy), Heqing Zhu
Section 2: I/O Virtualization
Chapter 11: Qian Xu, Rashmin Patel
Chapter 12: Tiwei Bie, Changchun Ouyang (Huawei), Heqing Zhu
Chapter 13: Tiwei Bie, Heqing Zhu
Section 3: DPDK Application
Chapter 14: Xuekun Hu, Waternan Cao (Huawei), Heqing Zhu
Chapter 15: Loftus Ciara, Xuekun Hu, Heqing Zhu
Chapter 16: Ziye Yang, Heqing Zhu
For the DPDK Chinese edition, the draft received the review and feedback from the
below volunteers:
• Professor Bei Hua (USTC) and Kai Zhang (USTC, in Fudan University
now);
• Professor Yu Chen, Dan Li (Tsinghua University);
• Dr. Liang Ou (China Telecom).
Lots of folks’s work lead to this book content:
Intel®: Yong Liu, Tao Yang, De Yu, Qihua Dai, Cunyin Chang and Changpeng
Liu, St Leger Jim, Yigit, Ferruh, Cristian Dumitrescu, Gilmore Walter, O'Driscoll
Tim, Kinsella Ray, Konstantin Ananyev, Doherty Declan; Bruce Richardson, Keith
Wiles, DiGiglio, John; Liang-min Wang, Jayakumar, Muthurajan Alibaba: Xun Li,
Huawei Xie. This book leveraged the content from Redhat and VMware open-source
developers and product managers. VMware: William Tu, Justin Petit; Redhat: Kim
Buck, Anita Tragler, and Franck Baudin.
Special thanks to John McManara, Lin Zhou, Jokul Li, Ahern, Brian, Michael
Hennessy Xiaomei Zhou, and Labatte Timmy for providing leadership support. Dan
Luo/Lin Li helped in the translation and review from Chinese edition to the English
version. Roy Zhang is very instrumental to guide the technology review of English
edition.
DPDK is contributed by worldwide talents, who created the technology and made
it thrive. At Intel®, it was mainly led by Intel® fellow: Venky (passed away at 2018)
in Oregon and St leger Jim in Arizona.
Heqing Zhu
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
xv
Editor
Heqing Zhu was born in China. He has worked with Intel® for 15years. His roles
include software developer, engineering leadership, product management, solution
architect in telecom and cloud networking, and open-source software development.
Prior to Intel®, he worked for Alcatel Shanghai Bell and Huawei. Mr. Zhu currently
lives in Chandler, Arizona, in the United States. He graduated from the University
of Electronic Science and Technology of China (UESTC) with a master’s degree in
Information and Communication System.
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
xvii
Contributors
Tiwei Bie
Ant Financial
Shanghai
China
Waterman Cao
Huawei
Shanghai
China
Jing Chen
Alibaba
Shanghai
China
Xuekun Hu
Intel®
Shanghai
China
David Hunt
Intel®
Shannon
Ireland
Cunming Liang
Intel®
Shanghai
China
Frank Liu
NetEase
Hangzhou
China
Jijiang(Frank) Liu
NetEase
Hangzhou
China
Ciara Loftus
Intel®
Shannon
Ireland
Wenzhuo Lu
Intel®
Shanghai
China
Liang Ma
Intel®
Shannon
Ireland
Changchun Ouyang
Huawei
Shanghai
China
Rashmin Patel
Intel®
Arizona
USA
Jasvinder Singh
Intel®
Shannon
Ireland
Qun Wan
Intel®
Shanghai
China
Yipeng Wang
Intel®
Oregon
USA
xviii Contributors
Zhihong Wang
Intel®
Shanghai
China
Jingjing Wu
Intel®
Shanghai
China
Qian Xu
Intel®
Shanghai
China
Ziye Yang
Intel®
Shanghai
China
Xiaolong Ye
Intel®
Shanghai
China
Fan(Roy) Zhang
Intel®
Shannon
Ireland
Helin Zhang
Intel®
Shanghai
China
Tao Zhe
Huawei
Shanghai
China
Section 1
DPDK Foundation
There are ten chapters in this section, focusing on the basic concepts and software
libraries, including CPU scheduler, multicore usage, cache/memory management,
data synchronization, PMD (poll mode driver) and NIC-related features, and soft-
ware API (application programming interface). PMD is very important concept and
the new user space software driver for NIC. Understanding the DPDK (Data Plane
Development Kit) basics will build a solid foundation before starting on the actual
projects.
In the first five chapters, we will introduce the server platform basics such as cache
use, parallel computing, data synchronization, data movement, and packet forward-
ing models and algorithms. Chapter 1 will introduce the networking technology evo-
lution, and the network function appliance trend, going from hardware purpose-built
box to software-defined infrastructure, which is present in the cloud. Essentially, it
is the silicon advancement that drives the birth of DPDK, and a few basic examples
are given here. Chapter 2 will introduce memory and cache in the performance opti-
mization context, how to use cache and memory wisely, the concept of the HugePage
and NUMA (non-uniform memory access), and the cache alignment data structure.
Chapters 3 and 4 will focus on multicore and multi-thread, the effective model for
the data sharing for high parallelism, and the lock-free mechanism. Chapter 5 will
move to the packet forwarding models and algorithms, and the decision is required
to choose the run-to-completion and/or pipeline model, or both.
The next five chapters will focus on I/O optimization, and we will talk about
PCIe, NIC, and PMD design and optimization details, which enables DPDK PMD
to deliver the high-speed forwarding rate to meet demands at 10 Gbe, 25 Gbe, and
40 Gbe up to 100 Gbe in a server platform. Chapter 6 will have a deep dive on
PCIe transaction details for the packet movement. Chapter 7 will focus on NIC
2 Data Plane Development KIT
performance tuning, and the platform and NIC configuration. Chapters 8 and 9 will
go further into NIC common features and software usage, the multi-queue, flow
classification, core assignment, and load balancing methods to enable a highly scal-
able I/O throughput, and will introduce the NIC offload feature on L2/L3/L4 packet
processing. Chapter 10 is about packet security and crypto processing; securing the
data in transit is an essential part of Internet security. How can DPDK add value to
“encrypt everywhere”?
3
1 Introduction
Heqing Zhu and Cunming Liang
Intel®
1.1 WHAT’S PACKET PROCESSING?
Depending on whether the system is a network endpoint or middlebox, packet pro-
cessing (networking) may have different scope. In general, it consists of packet recep-
tion and transmission, packet header parsing, packet modification, and forwarding. It
occurs at multiple protocol layers.
CONTENTS
1.1 What’s Packet Processing? 3
...............................................................................
1.2 The Hardware Landscape 4
.................................................................................
1.2.1 Hardware Accelerator 5
...........................................................................
1.2.2 Network Processor Unit 6
........................................................................
1.2.3 Multicore Processor 7
..............................................................................
1.3 The Software Landscape 9
..................................................................................
1.3.1 Before DPDK 11
......................................................................................
1.3.2 DPDK Way 13
.........................................................................................
1.3.3 DPDK Scope 14
.......................................................................................
1.4 Performance Limit 16
..........................................................................................
1.4.1 The Performance Metric 16
.....................................................................
1.4.2 The Processing Budget 17
.......................................................................
1.5 DPDK Use Case 18
..............................................................................................
1.5.1 Accelerated Network 19
..........................................................................
1.5.2 Accelerated Computing 20
......................................................................
1.5.3 Accelerated Storage 20
............................................................................
1.6 Optimization Principles 20
..................................................................................
1.7 DPDK Samples 21
...............................................................................................
1.7.1 HelloWorld 21
..........................................................................................
1.7.1.1 Initialize the Runtime Environment 22
....................................
1.7.1.2 Multicore Initialization 23
........................................................
1.7.2 Skeleton 24
...............................................................................................
1.7.2.1 Ethernet Port Initialization 25
..................................................
1.7.3 L3fwd 28
..................................................................................................
1.8 Conclusion 30
......................................................................................................
Further Reading 30
.......................................................................................................
4 Data Plane Development KIT
• In the endpoint system, the packet is sent to the local application for further
processing. Packet encryption and decryption, or tunnel overlay, may be
part of the packet processing, session establishment, and termination.
• In the middlebox system, the packet is forwarded to the next hop in the net-
work. Usually, this system handles a large number of packets in/out of the
system, packet lookup, access control, and quality of service (QoS).
The packet may go through the hardware components such as I/O (NIC) interface,
bus interconnect (PCIe), memory, and processor; sometimes, it may go through the
hardware accelerator in the system. Most of the packet movements and modifications
can be categorized as follows:
• Data movement is, like packet I/O, from PCIe-based NIC device to cache/
memory, so that CPU can process further.
• Table lookup/update, this is memory access (read/write), this is used for packet-
based access control, or routing decision (which interface to be sent out).
• Packet modification involves network protocols that are defined at many
different layers, just like peeling the onion layers; each protocol layer has
its data format, and usually, it is defined by the International Standard like
Internet Engineering Task Force/ Request for Comments (IETF/RFC).
Packet processing often involves the packet change, header removal, or
addition.
1.2 THE HARDWARE LANDSCAPE
Traditionally, the network system is highly complicated and consists of the control,
data, signal, and application plane; each plane can be realized with the different sub-
systems; and these systems are known as the embedded systems with low power con-
sumption, low memory footprint, but real-time characteristics. Such systems require
the hardware and software talents to work together.
In early 2000, CPU only had a single core with high frequency; the first dual-core
processor for general computing emerged in 2004. Prior to that, the multicore, multi-
thread architecture is available in the networking silicon, but not in the general-
purpose processors. In the early years, x86 was not the preferred choice for packet
processing. As of today, the below silicon can be used for packet processing system.
From the programmer skills, they can be split into the different category.
• Hardware accelerator (FPGA (field-programmable gate array), ASIC
(application-specific integrated circuit));
• Network processor unit (NPU);
• Multicore general-purpose processor (x86).
These systems are used for different scenarios; each hardware has certain advantages
and disadvantages. For large-scale and fixed function systems, the hardware accel-
erator is preferred due to its high performance and low cost. The network processor
5
Introduction
provides the programmable packet processing, thereby striking a balance between
flexibility and high performance, but the programming language is vendor specific.
In the recent years, P4 has emerged as a new programming language for packet pro-
cessing, and it gained the support from Barefoot Switch and/or FPGA silicon, but
not common for NPU.
The multicore general-purpose processor has the traditional advantages such
as supporting all generic workloads and the server platform that is commonly
equipped with high-speed Ethernet adapters. The server has quickly evolved as the
preferred platform for packet processing. It can support the complex packet process-
ing together with the application and service; the application and service software
can be written with many different programming languages (C, Java, Go, Python).
Over the years, there are lots of high-quality open-source projects that emerged
for packet processing, such as DPDK, FD.io, OPNFV, and Tungsten.io. The cloud
infrastructure has gone down a path known as NetDevOps approach, taking the
open source to deliver software-defined networking and security infrastructure and
service.
From the perspective of the silicon advancement, new accelerator and high-speed
I/O units have been integrated with multicore processors. This leads to the genera-
tion of system on chip (SoC). SoC is cost-effective. Silicon design has longer life
cycles.
1.2.1 HARDWARE ACCELERATOR
ASIC and FPGA have been widely used in packet processing. Hardware developers
are required to implement the chip and use the chip. An ASIC is an integrated circuit
designed for special purpose. This integrated circuit is designed and manufactured
based on the specific requirements of target systems. ASIC is designed for specific
users’ needs; it needs the large-volume production to afford the high R&D cost; it
is smaller in size; and it has lower power consumption, high reliability and perfor-
mance, and reduced cost, in comparison with the general-purpose processor. ASIC’s
shortcomings are also obvious: not flexible, not scalable, high development costs, and
long development cycles. ASIC leads to the development of the popular accelerators
such as crypto and signal processing. Combining ASIC with the general-purpose
processors will lead into SoC that provides heterogeneous processing capability. In
general, the dedicated board design is needed to use ASIC.
FPGA is a semi-custom circuit in the ASIC domain. Unlike ASIC, FPGA is pro-
grammable and flexible to use. FPGA is inherently parallelized. Its development
method greatly differs from the software. FPGA developers require an in-depth
understanding of hardware description language (Verilog or VHDL). The general
software executes in the sequential order on the general-purpose processor; the soft-
ware parallelism is up to the software design. FPGA silicon can include some fixed
functions and some programmable functions. FPGA has made great progress in the
data center in the recent years, and FPGA can be used as a cloud service. FPGA can
be used for smart NIC. FPGA is often selected to build the super-high-speed I/O
interface, advanced packet parsing and flow filtering, and QoS acceleration. FPGA
6 Data Plane Development KIT
can be offered as add-in card; through PCIe interface, it is easy to be plugged into
the server system; it is popular for cloud data center scenario. FPGA can also be used
for a specific purpose, like signal processing. FPGA is also often used in a special
board design.
Take the 5G wireless base station as an example; the telecom vendor develops the
system in stages, and it may use FPGA to build the early-stage product. Once the
product quality is in good shape, the high-volume needs will drive the new stage,
which focuses on using ASIC (SoC) to replace FPGA, which will drive the cost down
for the large-scale use.
1.2.2 NETWORK PROCESSOR UNIT
NPU is a programmable chip specifically designed for packet processing. It is usu-
ally designed with a multicore-based parallel execution logic, dedicated modules for
packet I/O, protocol analysis, routing table lookup, voice/data encoding/decoding,
access control, QoS, etc. NPU is programmable, but not easy; the developer needs
to take a deep dive into the chip’s datasheet and learn the vendor-specific instruction
sets, known as microcode (firmware); the user needs to develop the hardware-based
processing pipeline for the target network application. The network applications are
realized with the loadable microcode (firmware) running on NPU. NPU generally
has the built-in high-speed bus and I/O interface technology. In general, NPU has
the built-in low latency memory modules; it allows the forwarding table using the
on-chip memory, which makes it faster than the external DRAM access. NPU can
be integrated as part of SoC; in recent years, NPU-based silicon vendors have been
consolidated by CPU or NIC vendors
The below diagram (Figure 1.1) is a conceptual diagram. The “packet processing
engines” is programmable hardware logic, which allows the rapid implementation of
workload-specific packet processing; the written microcode can run on many paral-
lel engines. “Physical I/O” interface is a fixed function to comply with the standard-
ized interface specification. “Traffic manager” and “classification and queueing”
are relatively fixed functions, which are common features in most of the network
systems. They are built in as the specialized hardware units for QoS and packet
ordering. “Internal memory” and “TCAM (ternary content-addressable memory)”
provide the low latency access memory for the packet header parsing and forward-
ing decision. The memory controller connects the external memory chips for larger
memory capacity.
NPU has many advantages, such as high performance and programmability,
but its cost and workload-specific characteristics imply the market limit. It is often
used in communication systems. Different NPU products have been available from
various silicon vendors. As said before, NPU microcode is often vendor-specific;
thus, it is not easy to use, and it is not easy to hire the talented developers who
understand NPU well. Due to the steep learning curve, the time to market will be
affected by the availability of talents. Because NPU is used to its limited market, it
does not create enough job opportunities. The experienced talents may shift focus
to leave the network processor domain to seek the career growth opportunities
elsewhere.
7
Introduction
FIGURE 1.1 NPU conceptual block.
There are many attempts of using NPU with the common programming lan-
guages (like C); technically speaking, it is possible and it is available, but it is not
the best way of using NPU. But the other reality is the performance gap between
using C and using microcode language. Translation from C programming language
to microcode is doable, but it is not the optimal way of getting the most performance
out of NPU. So this path is feasible, but does not have the real value. If all NPUs from
different vendors can support P4 language, it will be much easier to use. Though, the
performance dilemma may remain as a similar concern. P4 ecosystem is still in its
development phase, and the broad NPU/P4 support is yet to be seen.
1.2.3 MULTICORE PROCESSOR
In the past 15 years, CPU delivered huge boost on the processing cycles, thanks to
the new era of the multicore architecture. With more cores available in the general-
purpose processor, it is natural to assign some cores for the packet processing pur-
pose; CPU finds its way to converge more workload on its own. From the historic
networking system perspective, (low-power) CPU has always been preferred for
control plane (protocol handling), and some network services (like wireless data
service) are computer intensive. A few cores can be assigned to handle the data plane
processing, whereas the remaining cores can be used for control and service plane
processing, which can make a more efficient and cost-effective system.
Looking back, take a telecom appliance in 2005, which is a complicated system
(chassis) consisting of ten separate boards; each board is a subsystem having a specific
function, and each board is built with a dedicated silicon and software. So different
boards within a single system have different platform designs and silicon components.
Later, the idea is to converge all the subsystems into a powerful server system, and the
subsystems can run as software threads (within virtual machines or containers). This
new approach is expected to transform the network industry with less silicon compo-
nents, less heterogeneous system and platform architecture, and hence less cost. This is
the initial motivation for implementing network functions with software.
On the other hand, CPU release cycle is short, and new processors come to the
market on a yearly basis. But the network processors and accelerators are difficult to
8 Data Plane Development KIT
catch up with this fast release cadence, which often happens every 3 or 4 years. The
market requires the large-scale shipping of the general-purpose processor, and the
market does not have the strong business demand to refresh the networking silicon.
Over the years, the business drives the significant progress on the general-purpose
processor with a very competitive cost.
Figure 1.2 describes a dual-socket server platform. Two Xeon processors are
interconnected with UPI (Ultra Path Interconnect) system bus; memory channels
are directly connected to both processor units; all the external devices are connected
via the PCIe interface; each socket is connected with 2 × 25 Gbe Ethernet adapters
(NIC) with 100 Gbps I/O for data in and out. Lewisburg PCH is a chipset, which
is served as the platform controller, supporting the additional management engine,
such as high-speed I/O (USB, SATA, PCIe), 4 × 10 Gbe Ethernet interface, and
Intel® QAT (built-in security accelerator for crypto and compression functions). The
processor, memory, and Ethernet devices are the main hardware modules to handle
the packet processing tasks. PCI-e interface provides the I/O extensibility and sup-
ports to add the differentiated accelerators (or more flexible I/O) into server plat-
forms, e.g., FPGA cards
The general-purpose processor can be integrated with additional silicon IP; then,
it will evolve into SoC. The SoC system often consists of a processor, the integrated
memory controller, the network I/O modules, and even hardware accelerators such
as security engine and FPGA. Here are a few known SoC examples:
• Intel®: Xeon-D SoC, Atom SoC;
• Tilera: TILE-Gx;
• Cavium network: OCTEON & OCTEON II;
• Freescale: QorIQ;
• NetLogic: XLP.
FIGURE 1.2 A dual-socket server platform.
9
Introduction
The block diagram as shown in Figure 1.3 is one example of Intel® Atom SoC, which
is a power-efficient silicon; it’s the tiny chip consists of 2x atom core, internal chip-
set, Intel® QAT, and 4 × 10 Gbe Ethernet interface. Given that all the hardware units
are integrated into a single chip, SoC is viewed as power efficient and cost-effective.
The power consumption of this chip is less than 10 W. DPDK can run on this chip
to move the packet data from Ethernet to CPU with the line rate of 10 Gbps. SoC is
also popular in the ARM-based processor design. For example, AWS released the
Graviton SoC for cloud computing use case. It is important to point out that CPU
has made huge progress with cloud virtualization and container technologies, which
provides more granularity and flexibility to place software workload with computa-
tion resources. CPU-based packet processing reduces the needs for hardware talents;
it deals mostly with the software.
1.3 THE SOFTWARE LANDSCAPE
DPDK is created for high-speed networking (packet processing). Before DPDK was
born, most of the personal computers and server systems were installed with Windows
or Linux OS. All these systems had the networking capabilities, and support the
network protocol stack and talk to each other via Ethernet and socket interface; the
low-speed network processing capability is good enough for a computation-centric
system. There is a big difference in supporting low-speed network processing (10/100
Mbps) and supporting high-speed network processing (1/10/25/40/50/100 Gbps).
Before the multicore architecture was common in a CPU, it was not a popular idea to
use an Intel® processor for network processing system.
Let’s take a look at the popular wireline and wireless network protocol stacks,
and a little more specific on what’re the network protocol layers. Figure 1.4 shows
an example of classic wireline network protocol layers (including both OSI model
and TCP/IP model). The left side shows the OSI 7-layer model, whereas the right
FIGURE 1.3 Intel® Atom SoC diagram.
10 Data Plane Development KIT
FIGURE 1.4 The OSI model and TCP/IP model (wireline).
side shows the TCP/IP model. The TCP/IP model is often implemented by the Linux
kernel systems. By default, the incoming packet goes to the Linux kernel at the “link
layer”, and this is often done by NIC and its driver, which is running in the kernel
space. The whole network stack can be handled by the Linux kernel. In the network
system, the packet has to be copied from the kernel space to the user space because
the application usually resides in the user space, and this application will eventually
consume the arrived packet. Before zero copy is introduced to Linux stack, packet
copy is an expensive processing, but it is essential, as packet is received in Linux
kernel but consumed by the user space application. For the middlebox networking
systems (like routers), routing functions are largely implemented in the user space as
well, not using the Linux kernel space stack. Generally, software development in the
kernel space is much harder in terms of debugging and testing. User space software
is easy to develop/debug. There are certain systems to handle everything in the ker-
nel, but they are rare.
Figure 1.5 describes wireless 4G (LTE) user plane network protocol layers. eNo-
deB is a wireless base station with air interface. Serving gateway (GW) and PDN
GW are the wireless core networking systems. For the 5G system, the user plane
stack is very similar to 4G, eNodeB becomes gNodeB, and serving GW/PDN GW
becomes UPF. As we can see, the protocol layers are different between the base sta-
tion and the core system. In the wireless base station (eNodeB), L1/L2 protocols are
different because the wireless interface is essentially air interface and uses wireless
signal processing and codec technology. We will not cover any details here. It is a
completely different technology domain. From eNodeB to serving GW, the packet
processing system is similar to a typical wireline network system, where L1/L2 is
based on Ethernet interface and GTP/UDP/IP protocols are running on the top of
Ethernet. 5G network stack is also similar. 5G prefers to use cloud architecture in
order to implement the service-oriented elastic model, so that the edge computing
11
Introduction
FIGURE 1.5 Wireless user plane network protocol (4G/LTE).
and network slicing are part of the new service. Packet processing and computing
service are converged at the 5G network infrastructure node. This is largely due to
the multicore CPU that has the huge computation power. In the early years, not a
long time ago, the computing system focused on singular workload, or singular ser-
vice provision, where the packet processing requirement is low and the Linux kernel
stack is sufficient to handle less than 1 Gbps Ethernet interface. Later, the network-
ing systems need support multiple high-speed networking interfaces (far more than 1
Gbps), and they require different software options. Linux kernel networking cannot
meet the higher networking demand.
1.3.1 BEFORE DPDK
In the early 2000s, Intel® processor was not widely used for high-speed network
processing. NPU was a silicon choice at Intel®; now the change must happen; a path-
finding effort at Intel® is kicked off.
How does a traditional NIC device process the packet in a server system using
Linux? The steps are summarized below:
• A packet arrives at a NIC (PCIe device).
• The NIC completes DMA (direct memory access) and copy packet into host
memory region known as a packet buffer.
• The NIC sends an interrupt to wake up the processor.
• The processor reads and writes the packet descriptor and packet buffer.
• The packet is sent to the Linux kernel protocol stack for more protocol pro-
cessing like IP-related access control decision.
• If the application resides in the user space, the packet data (also known as
payload) will be copied from the kernel space to the user space.
• If the application resides in the kernel space, the data will be processed in
the kernel mode (less percentage).
12 Data Plane Development KIT
In the early system, each incoming packet may trigger an interrupt. The interrupt
overhead includes the context switching, and it is affordable if there are not many
packets coming to the system in a short period. In the past decade, CPU frequency
remained almost the same, but Ethernet speed jumped from 10/100 Mbps to 1 Gbps,
10 Gbps to 25 Gbps, and 40 Gbps to 100 Gbps. Software is facing the challenge to
handle the large packet burst scenario on a growing network interface, huge number
of packets will arrive, and the system cannot afford high amount of interrupt pro-
cessing. Simply speaking, the overhead is too high. NAPI (new API) mechanism
was introduced into the Linux kernel. It allowed the system, after wakeup by the
interrupt, to initiate the software routine that processes multiple packets in a poll-
ing manner, until all packets are handled, then goes back to the interrupt mode.
The NAPI method can significantly improve the packet processing efficiency in a
high burst scenario. Later, Netmap (2011), a well-known high-performance network
I/O framework, uses a shared pool of packets to reduce the packet replication from
the kernel space to the user space. This solves the other problem—the high cost of
packet copy [1,2].
Netmap and NAPI have significantly improved the packet processing capability
on legacy Linux system. Is there any further improvement room? As a time-sharing
operating system, Linux will schedule many tasks with time-slicing mechanism.
Compared with equal time assigned to all tasks, Linux scheduler has the job to
assign the different time slices for different tasks. The number of CPU cores was
relatively small in earlier years, and in order to allow every task to be processed
in a timely fashion, time sharing was a good strategy to support multiple tasks to
sharing the expensive processor cycles, although this method was done at the cost
of efficiency. Later, CPU has more cores available, so it is time to look at the new
ways to optimize the system performance. If the goal is to pursue high performance,
time sharing is not the best option. One new idea is to assign the dedicated cores to
the dedicated tasks. Netmap reduces the memory copy from the kernel space to the
user space, but there are still Linux schedulers, which do not eliminate the overhead
of the task switch. The additional overhead from the task switch and the subsequent
cache replacement caused by the task switch (each task has its own data occupied in
cache) will also have an adverse impact on the system performance.
By nature, network workload is latency sensitive, and packet may traverse many
hops through Internet; as a result, real time is a critical system requirement for the
networking system. It is a long processing path from the NIC interrupt to the soft-
ware interrupt routine (served by CPU), and then the packet payload is handled by
the final application; this path takes lots of cycles.
Prior to 2010, x86 CPU was not a popular silicon choice to design the high-speed
packet processing system. In order to complete the solution transition from NPU-
based silicon to x86/software, there are a few fundamental challenges. Intel® engi-
neers need answer for the following:
• A software path to enable the packet processing on x86 CPU;
• Find a better software method to do things differently;
• A performance scale way using multicore architecture;
• How to tune “Linux system” as packet processing environment.
13
Introduction
1.3.2 DPDK WAY
DPDK is the answer to the above challenges; particularly, PMD (polling mode
driver) has been proved as a high-speed packet processing software library on Linux.
DPDK, essentially, is based on a set of software optimization principles, a set of soft-
ware libraries to implement the high-performance packet movement on the multicore
processor. The initial goal is to focus on the high-speed packet I/O on the server
platform. This software demonstrated that the server is good for the networking
system, and it can handle the high-speed data plane. The journey was not easy, and it
was achieved through heavy engineering investment. It is built on the many software
optimization practices. Let’s navigate a few ideas and techniques quickly.
Polling mode: Assign the dedicated core for NIC packet reception and transmis-
sion. This approach does not share core for other software tasks, and the core can
run in the endless loop to check if any packet just arrives or needs to be sent out, thus
reducing the need for interrupt service and its overhead. We will discuss the trade-
offs between polling and interrupt mechanisms later. In fact, DPDK supports both
mechanisms and even hybrid-use model.
User space driver: In fact, in most scenarios, the packet needs to be sent to the
user space eventually. Linux NIC driver is mostly kernel based. The user space driver
can avoid unnecessary packet memory copy from the kernel space to the user space,
and it also saves the cost of system calls. An indirect benefit is that the user space
driver is not limited to the packet buffer structure mandated in the Linux kernel
mode. Linux kernel stack mandates the stable interface, and the DPDK-based mbuf
(memory buffer) header format can be flexibly defined (because it is new) so that it
can be designed in DMA-optimized way for NIC. This flexibility adds the perfor-
mance benefit. The user space driver is flexible, it is easy to modify, and it meets the
rapid development needed for different scenarios.
Core affinity: By setting a thread’s affinity to a particular CPU core, specific
tasks can be bound with cores (thread). Without the core affinity assignment,
there might be the task switching among different cores, and the drawback to this
assignment is that thread switching between cores can easily lead to performance
losses due to cache misses and cache write-back. One further step is to ask a core
to be excluded from Linux scheduling system, so that the core is only used for the
specific task.
Optimized memory: Network processing is an I/O-bound workload scenario.
Both CPU and NIC need access to the data in memory (actually cache and/or
DRAM) frequently. The optimal memory access includes the use of HugePage
and contiguous memory regions. For example, HugePage memory can reduce
the TLB misses, multichannel-interleaved memory access can improve the total
bandwidth efficiency, and the asymmetric memory access can reduce the access
latency. The key idea is to get the data into cache as quickly as possible, so that
CPU doesn’t stall.
Software tuning: Tuning itself cannot be claimed as the best practice. In fact, it
refers to a few known tuning practices, such as cache line alignment of data struc-
ture, avoiding false sharing between multiple cores, pre-fetching data in a timely
manner, and bulk operations of multiple data (multi-buffer). These optimization
14 Data Plane Development KIT
methods are used in every corner of DPDK. The code example can be found in
the “l3fwd” case study. It is important to know that these techniques are com-
monly applicable; beyond DPDK, any software can be optimized with the similar
approach.
Using the latest instruction set and platform technologies: The latest instruc-
tion sets of Intel® processor and other new features has been one of the innovation
sources of DPDK optimization. For example, Intel® DDIO (Direct Data I/O) tech-
nology is a hardware platform innovation in DMA and the cache subsystem. DDIO
plays a significant role to boost I/O performance as the packet data can be directly
placed into cache, thus reducing the CPU access latency on DRAM. Without DDIO,
packet is always placed into memory first, and then CPU needs to fetch packet data
from DRAM into cache, which means the extra cycles that CPU needs to wait. The
other example is how to make the best use of SIMD (single-instruction multiple data)
and multiple buffer (multi-buffer) programming techniques. Some instructions, like
CMPXCHG, are the cornerstone for lockless data structure design. Crc32 instruction
is also a good source for efficient hash computation. These contents will be covered
in later chapters.
NIC driver tuning: When the packet enters the system memory through PCIe
interface, I/O performance is affected by the transaction efficiency among the PCIe-
based device, bus transaction, and the system memory. For example, the packet data
coalescence can make a difference through transferring multiple packets together,
thus allowing a more efficient use of PCIe bus transactions. Modern NICs also
support load balancing mechanisms such as receive side scaling (RSS) and Flow
Director (FDir) features, which enable NIC multiple queue to work with CPU mul-
tiple core model. New NIC offload can also perform the packet header checksum,
TCP segmentation offload (TSO), and tunnel header processing. DPDK is designed
to take full advantage of the NIC features for performance reasons. These contents
will be described in Chapters 6–9.
Network virtualization and cloud-native acceleration: Initial DPDK optimi-
zation focuses on moving packets from I/O to CPU. Later, DPDK provides the
optimal way to move packets from host to tenants (VM, container tenants). This
is a crucial ingredient for cloud infrastructure and network function virtualiza-
tion (NFV). DPDK supports both SR-IOV and vSwitch optimization with PMD
concept.
Security acceleration: DPDK can run from the bare metal to the virtualized guest
and container-based environment; the initial Application Programming Interface
(API) abstraction is NIC centric; later, it is extended from Ethernet to crypto, com-
pression, and storage I/O acceleration. Crypto and compression APIs are impor-
tant software abstraction; they can hide the underlying silicon’s implementation
difference.
1.3.3 DPDK SCOPE
Here are the basic modules within DPDK. It mimics the most network functions in
software and serves as a foundational layer to develop a packet processing system
(Figure 1.6).
15
Introduction
FIGURE 1.6 DPDK framework and modules.
Core libraries (core libs) provide the Linux system abstraction layer and pro-
vides software APIs to make use of the HugePage memory, cache pool, timers, lock-
free rings, and other underlying components.
PMD libraries provide all user-space drivers in order to obtain a high network
throughput by PMD. Basically, all industry-leading Ethernet companies are offer-
ing the PMD at DPDK. In addition, a variety of virtual NICs for Microsoft (netvsc)
and VMware (vmxnet3), and KVM-based virtualized interfaces (virtio) are also
implemented.
Classify libraries support exact match, longest prefix match (LPM), wildcard
matching (ACL [access control list]), and cuckoo hash algorithm. They focus on flow
lookup operations for common packet processing.
Accelerator APIs supported the packet security, data compression, and event
modeler for core–core communications. The FPGA accelerated function or SoC
units can be hidden under the abstract software layers here.
QoS libraries provide network QoS components such as Meter and Sched.
In addition to these components, DPDK also provides the platform features like
POWER, which allows the CPU clock frequency to change at runtime for energy
saving, and KNI (kernel network interface), which builds a fast channel to the Linux
kernel stack. The Packet Framework and DISTRIB provide the underlying compo-
nents for building a more complex multicore pipeline processing model. This is an
incomplete picture as the DPDK project evolves quickly with new release in every
quarter. In DPDK, most components are BSD (Berkeley Software Distribution)-
based license, making it friendly for the further code modification and commercial
adoption.
16 Data Plane Development KIT
1.4 PERFORMANCE LIMIT
The performance limit can be estimated by the theoretical analysis. The theoretical
limit can come from multiple dimensions. Take an example of packet processing;
the first limit is the packet forwarding rate, which is determined by the physical
interface speed, also known as the line speed on a given interface. When the packet
enters the memory from NIC, it will go through the I/O bus (e.g., PCIe bus). There
is a limit at the PCIe bus transaction level. Of course, there is a ceiling for CPU to
load/store packet data to cache lines. For example, Intel® Haswell processor can only
load 64 bytes and store 32 bytes in a cycle. The memory controller is limited by a
memory read/write bandwidth. All these hardware platform boundaries in different
dimensions contribute to the workload’s performance limit. Through optimization,
the software developer can look at these dimensions to write a high-performing soft-
ware; the goal is to get closer to the performance limit.
In theory, I/O interface, PCIe bus, memory bandwidth, and cache utilization can
set quantitative limits; it is easy to say but not easy to do, as it requires in-depth
system-level understanding. Good developers know how to measure the performance
tuning progress and find potential room for improvement. It takes a lot of effort to
tune the workload and then get closer the theoretical limits. If the software is already
extremely optimized, there will be no good return to continue pushing the boundary.
DPDK has a design goal to provide the software libraries so that we can push
the performance limit of packet processing. Is it designed in a way that has already
reached the system limit? As we try to gain a better understanding of DPDK, let’s
get back to the simple facts. What are the common performance metrics to measure
the packet processing?
1.4.1 THE PERFORMANCE METRIC
The performance indicators for packet processing such as throughput, latency, packet
loss, and jitter are the most common metrics. For packet forwarding, the throughput
can be measured as packet rate—pps (packets per second); another way to measure
is bit rate—bps (bits per second). Bit rate is often associated with the physical inter-
face, like NIC port speed. Different packet size often indicates different require-
ments for packet storing and forwarding capabilities. Let’s establish a basic concept
of effective bandwidth and packet forwarding rates.
The line rate (transmitting in the wire speed) is the maximum packet (or frame)
forward rate, which is limited by the speed of physical interface, theoretically. Take
Ethernet as an example. The interface is defined as 1 Gbps, 10 Gbps, 25 Gbps, 40
Gbps, and 100 Gbps; each represents the maximum transmission rate, measured in
bps. Indeed, not every bit is used to transmit the effective data. There is an inter-
packet gap (IPG) between Ethernet frames. The default IPG is 12 bytes. Each frame
also has a 7-byte preamble and a 1-byte Ethernet start frame delimiter (SFD). The
Ethernet frame format is shown in Figure 1.7. The effective data in Ethernet frame
mainly includes the destination MAC (media access control) address, source MAC
address, Ethernet type, and payload. The packet tail is known as FCS (frame check-
sum) code, which is designed to validate the Ethernet frame integrity (Figure 1.7).
17
Introduction
The Ethernet frame forwarding rate and the bit rate can be translated by the equa-
tion shown in Figure 1.8.
As shown in Table 1.1, if the data is transmitted in full interface speed, the small
packet implies the high packet arrival rate. In a nutshell, small packets cause the
larger processing burden as more packets will arrive in a given time, and it can be
10× more overhead for small packet (64 bytes) than for normal packet
1.4.2 THE PROCESSING BUDGET
What is the biggest challenge for processing packets on a general-purpose proces-
sor like CPU? Take 40 Gbps Ethernet as an example; the curve shown in Figure 1.9
indicates the maximum forwarding rate for the different packet sizes. Internet traffic
is a mix of many packets of different sizes
For the packet size of 64 bytes and 1024 bytes, respectively, CPU instruction
cycles (as the processing budget) are different to meet 40 Gbps forwarding require-
ment, i.e., 33 cycles vs 417 cycles. It is a very different system load when dealing with
small or large size packets. If the packet size is smaller, the packet interval is shorter,
i.e., 16.8 ns vs 208.8 ns. Assuming that the CPU frequency is running at 2 GHz,
64-byte and 1024-byte packets can, respectively, consume 33 and 417 clock cycles
to reach the line rate. In the store-forward model, the packet receiving, transmitting,
and forward table lookup need to access the memory. CPU can only wait if there is
memory access; this access takes the cycles, known as memory latency. The actual
FIGURE 1.7 Ethernet frame format.
Bit Rate / 8 From bit to byte
Frame Forwarding Rate =
(# of pps), mpps - million pps IPG + Preamble + SFD + Packet size
FIGURE 1.8 Packet forwarding rate: From bps to pps.
TABLE 1.1
Packet Forwarding Rate with Different Packet Sizes
Interface Speed 10 Gbps 25 Gbps 40 Gbps
# of
Mpps
arrival
(ns)
# of
Mpps
arrival
(ns)
# of
Mpps
arrival
(ns)
Packet size (Byte)
64 14.88 67.20 37.20 26.88 59.52 16.80
128 8.45 118.40 21.11 47.36 33.78 29.60
256 4.53 220.80 11.32 88.32 18.12 55.20
512 2.35 425.60 5.87 170.24 9.40 106.40
1024 1.20 835.20 2.99 334.08 4.79 208.80
18 Data Plane Development KIT
FIGURE 1.9 Cost of packet instructions at 40 Gbps line rate.
latency depends on where the data is located. If data resides in the last level cache
(LLC), the access takes about 40 clock cycles. If there is an LLC miss, it will need
an external memory read, with an estimated latency of 70 ns, which is translated to
140 cycles on a 2-GHz system. For a small packet size (64 bytes) to reach 40 Gbps,
the total budget is just about 33 cycles. This poses a challenge to the general-purpose
CPU system for high-performance network workload use. Does this rule out CPU for
high-performance network workload? The answer is no.
DPDK has a solution to tackle the challenge; a set of optimization methods are
implemented to make the best use of the hardware platform features (DDIO, HugePage
memory, cache alignment, thread binding, NUMA (non-uniform memory access)
awareness, memory interleave, lock-free data structure, data pre-fetching, use of
SIMD instructions, etc.). By combining all these optimization methods, the memory
latency challenge can be minimized. A single core can do L3 packet forwarding rate
at about 40 Mpps; it is measured in a simplified test case, which just moves packets
from Ethernet to CPU. Giving that there is an increasing number of CPU cores that are
available, I/O performance can scale up with more Ethernet adapters that are inserted
with more PCIe slots. The L3 forwarding throughput has been measured in 4-socket
system, and it can reach about 1TPPS on Intel® Xeon Scalable Processor. It is a signifi-
cant milestone announced by an open-source project—FD.io/VPP in 2017 [3].
DPDK is not yet able to handle the extreme high I/O use case. For example, the
optical transport interface needs support from 400 Gbps interface. This is too much
for a software-only approach on a server platform; the PCIe bus and NIC are not
available yet. DPDK reduces the network system development barrier, and now soft-
ware developers can implement the network functions.
1.5 DPDK USE CASE
DPDK is the new open-source technology, and it is rising together with the server
platform, which is going through the rapid development, and has more CPU cores
and higher-speed Ethernet interface. The server platform cost is very appealing
when compared to the mainstream network/storage systems (which are expensive
19
Introduction
due to their purpose-built silicon-based system design). From business side, this
open-source project reduces the software investment. Overall, the trend is to build
the software-defined infrastructure on server platforms, and DPDK has been proved
to deliver the accelerated network, and computing and storage functions.
1.5.1 ACCELERATED NETWORK
DPDK, as open source, can be of immediate use without license cost. What’s more
exciting, it delivers a performance boost with each new generation of server plat-
form. Processor refresh means the improved IPC (instruction per cycle) at per core
level, and it may be enhanced with a more effective system bus that is connected
with more cores. DPDK on a single core can deliver 10/25 Gbps easily. Together with
multi-queues in NIC, DPDK drives the 40/100 Gbps interface using two or more
cores. Xeon processor has many cores, and the packet throughput bottleneck is often
limited by PCI-e (I/O) interface and lanes. Indeed, I/O is the limit, not CPU.
Prior to DPDK (or similar technology), the network infrastructure system is often
designed with a highly complicated system; hardware and software co-design is
common; it requires the engineers to know both hardware and software skills. So
the network system was mainly done by a large-sized engineering organization with
high cost. It involves the system chassis; different but purpose-built boards are con-
nected as the subsystems, which may meet the needs of signal processing, data plane,
control, and application services. Each board is built with the specific embedded
software and its function; hence, it requires the dedicated engineering investment.
Decoupling the hardware and software development, and building software on
the top of common server system, is a big and disruptive change; the server platform
and high-speed NIC are easy to get; it is flexible to add more NICs to meet more I/O
demand; the cost is very transparent on the server platform. It is easy to find software
talents who can write code, and load and debug the program on the standard server
platform. It is difficult to find the designers who understand the NPU or Network ASIC/
FPGA to build the highly complex system. From the cloud computing industry, the
software developers adopt the open source on the server platform, so they built the load
balancer and the anti-DDoS system using DPDK, and they run the software in a stan-
dard server for its in-house production deployment. This is the early example to replace
the commercial load balancer system. Later, the network gateway can be moved to run
within a virtual machine, and it can be offered as an elastic network service.
Similar idea known as network function virtualization (NFV) emerged in 2012.
Server virtualization leads to the cloud computing; it allows multiple tenants to run
workload on the logically isolated environment but consolidated on a physically
shared server. By replicating this to the telecom network infrastructure, NFV is
intended to run the network functions as a tenant workload and consolidate multiple
workloads on a physically shared server. This will allow telecom service providers
to get more flexibility in choosing the vendor and solution suppliers, e.g., decoupling
the hardware and software suppliers. This idea is an important step to build out 5G
infrastructure and network service. Software defined networking (SDN)/NFV is a
big wave to transform network infrastructure; to build the new system, DPDK is
valuable to be part of software architecture.
20 Data Plane Development KIT
1.5.2 ACCELERATED COMPUTING
For network nodes, it is very easy to understand the value of DPDK. In fact, DPDK
is very helpful for the cloud computing node, like Open vSwitch acceleration. Open
vSwitch is responsible for creating the overlay network for multiple tenant’s isolation
needs; any traffic to tenant workload will go through Open vSwitch. DPDK is used
to accelerate Open vSwitch, which performs better than the Open vSwitch (OVS)
kernel data path and saves the CPU cycles for cloud computing server.
Linux kernel protocol stack provides the rich and powerful network service.
However, it is also slow. The user space stack, on the top of DPDK, is highly desired
for more performance boost. The attempt has been known, like applying BSD protocol
stack in the user space, built on DPDK. Tencent, one of the biggest cloud comput-
ing companies, released this project, F-Stack, in open source, and it is available at
http://www.f-stack.org/. Furthermore, the web application software like Nginx, and the
memory database software like Redis, can be easily integrated with F-Stack. Nginx
and Redis are very popular computing workloads running in the cloud data center.
1.5.3 ACCELERATED STORAGE
In 2016, Intel® open-sourced the storage performance development kit (www.SPDK.io).
Its storage device is similar to NIC; it has I/O device, but for data storage, user space
PMD is a more effective than Linux kernel driver for the high-performance scenario; the
PMD can drive the NVMe device. It allows the application to have a faster access to the
SSD. SDPK provides the other components such as NVMe-oF, iSCSI, and vhost server
support. We will describe the storage and network acceleration in the later chapters.
1.6 OPTIMIZATION PRINCIPLES
While DPDK has adopted lots of optimization methods, many approaches are appli-
cable for any software. The core principles are believed to be reusable in other area,
which are as follows:
1. Target Software Optimization for a Specific Workload
The specialized hardware is one way to achieve high performance, but
DPDK uses the general-purpose processor and reaches the desired high-
performance goal with software optimization techniques. In the early
phase, the research efforts covered all possible platform components such
as CPU, chipset, PCIe, and NIC. The optimization is always on the network
workload characteristics.
2. Pursuing Scalable Performance
The multicore era is a significant silicon progress. It enables high paral-
lelism to achieve scalable performance. To avoid data contention, the sys-
tem design needs to avoid the data race as much as possible. This is to say,
design the data structure as a local variable, and use the lockless design
to gain high throughout. Focus on an architectural approach to exploit the
multicore for performance scaling.
21
Introduction
3. Seeking Cache-Centric Design and Optimization
In the context of system and algorithm optimization, code implementa-
tion optimization is much less known, often ignored. Code implementation
optimization requires developers to have a good understanding of computer
and processor architecture. DPDK depends on the optimization techniques
such as cache usage and memory access latency impacts. As a programmer,
if the code is written in a way to take cache utilization into account, the
software optimization is probably half-finished. Most new software devel-
opers may not think how cache will behave at all.
4. Theoretical Analysis with Practice
What is the performance limit? Is there any room for performance tun-
ing? Is it worthy of in-depth study? Sometimes, it is easy to say, but difficult
to do. By doing analysis, inferring, prototyping, and testing over and over,
the optimization is often such an experimental journey. It is an incremental
progress, and if the time permits, it is always good to have the performance
model and analysis, as it will help to set achievable design goals.
Cloud computing is essentially taking more workload in a physical system. The success
is due to CPU that has many core. Edge computing platform is the new trend, i.e., move
the computing closer to the data, so it will provide the low latency computing experi-
ence, i.e., drive the new use case. DPDK can move data faster and store data quicker.
DPDK, as the user space networking, is known for bypassing the heavy Linux kernel
networking stack. Many of the DPDK optimization principles are also applicable to the
Linux kernel network; there is an excellent progress in the kernel stack, such as XDP
and AF-XDP. As Linux kernel comes up with its own bypass mechanism, more options
are available for network developers to choose.
1.7 DPDK SAMPLES
DPDK concepts are discussed, and three examples are given here to get started with
code for a quick look and feel.
1. HelloWorld is a simple example. It will set up a basic running environment
for packet processing. DPDK establishes a software environment abstrac-
tion layer (EAL), which is based on Linux (alike) operating system, and
causes this environment to be optimized for packet processing.
2. Skeleton is a most streamlined single-core packet sending and receiving
example. It may be one of the fastest packets in/out testing code in the world.
3. L3fwd, Layer 3 forwarding, is one of main DPDK applications to showcase
the use case, and it is heavily used for performance benchmark tests.
1.7.1 HELLOWORLD
HelloWorld is a simple sample for both codes and functions. It creates a basic run-
ning environment for multicore (multi-thread) packet processing. Each thread will
print a message “hello from core #”. Core # is managed by the operating system.
22 Data Plane Development KIT
Unless otherwise indicated, the DPDK thread in this book is associated with a hard-
ware thread. The hardware thread can be a logical core (lcore) or a physical core.
One physical core can become two logical cores if hyper-threading is turned on.
Hyper-threading is an Intel® processor feature that can be turned on or off via BIOS/
UEFI.
In the code example, rte refers to the runtime environment and eal
means environment abstraction layer (EAL). The most DPDK APIs
are prefixed with rte. Similar to most of the parallel systems, DPDK has adopted
the master thread and multiple slave threads models, which is often running in an
endless loop.
Int
main(int argc, char **argv)
{
int ret;
unsigned lcore_id;
ret = rte_eal_init(argc, argv);
if (ret < 0)
rte_panic("Cannot init EALn");
/* call lcore_hello() on every slave lcore */
RTE_LCORE_FOREACH_SLAVE(lcore_id) {
rte_eal_remote_launch(lcore_hello, NULL,
lcore_id);
}
/* call it on master lcore too */
lcore_hello(NULL);
rte_eal_mp_wait_lcore();
return 0;
}
1.7.1.1 Initialize the Runtime Environment
The main entry of the master thread is <main> function. It will invoke the below
entry function to start the initialization.
int rte_eal_init(int argc, char **argv).
The entry function supports the command line input, which is a long string
of command combinations. One of the most common parameters is “-c <core
mask>”; the core mask assigns which CPU threads (cores) need to be assigned to
run DPDK master and slave threads, with each bitmask representing a specific core.
Using “cat /proc/cpuinfo” can inspect the CPU cores on the given platform. A
select core needs to be careful on the dual-socket platform, as local core and remote
core can bring different workload performance.
23
Introduction
As said, rte _ eal _ init includes a list of complicated tasks, e.g., parsing the
input command parameters, analyzing and configuring the DPDK, and setting up the
runtime environment. It can be categorized as follows:
• Configuration initialization;
• Memory initialization;
• Memory pool initialization;
• Queue initialization;
• Alarm initialization;
• Interrupt initialization;
• PCI initialization;
• Timer initialization;
• Memory detection and NUMA awareness;
• Plug-in initialization;
• Master thread initialization;
• Polling device initialization;
• Establishing master-slave thread channels;
• Setting the slave thread to the wait mode;
• Probing and initializing PCIe devices.
For further details, it is recommended to read the online document or even source
code. One place to start is liblibrte _ ealcommoneal _ common _
options.c. For DPDK users, the initialization has been grouped with EAL inter-
face. The deep dive is only needed for in-depth DPDK customization.
1.7.1.2 Multicore Initialization
DPDK always tries to run with multiple cores for high parallelism. The software
program is designed to occupy the logical core (lcore) exclusively. The main function
is responsible for creating a multicore operation environment. As its name suggests,
RTE _ LCORE _ FOREACH _ SLAVE (lcore _ id) iterates all usable lcores
designated by EAL and then enables a designated thread on each lcore through
rte _ eal _ remote _ launch.
int rte_eal_remote_launch(int (*f)(void *), void *arg,
unsigned slave_id);
“f” is the entry function that slave thread will execute.
“arg” is the input parameter, which is passed to the slave
thread.
“slave_id” is the designated logical core to run as slave
thread.
For example, int rte _ eal _ remote _ launch(lcore _ hello, NULL,
lcore _ id). The parameter lcore _ id designates a specific core to run as a
slave thread, and executes from the entry function lcore _ hello. In this simple
example, lcore _ hello just reads its own logical core number (lcore_id) and
prints out “hello from core #”.
24 Data Plane Development KIT
static int
lcore_hello(__attribute__((unused)) void *arg)
{
unsigned lcore_id;
lcore_id = rte_lcore_id();
printf("hello from core %un", lcore_id);
return 0;
}
In this simple example, the slave thread finishes its assigned work and quits immedi-
ately. As a result, the core is released. In most other DPDK-based samples, the slave
thread will run as an infinite loop, taking care of the packet processing.
1.7.2 SKELETON
This sample only uses a single core. It is probably the only DPDK sample to run with
a single core, and it is designed to implement the simplest and fastest packet in and
out of the platform. The received packets are transmitted out directly and without
any meaningful packet processing. The code is short, simple, and clean. This is a test
case to measure a single-core packet in/out performance on a given platform.
The pseudocode will call rte _ eal _ init to initialize the runtime environ-
ment, check the number of Ethernet network interfaces, and assign the memory
pool via rte _ pktmbuf _ pool _ create. Input parameter is designated with
rte _ socket _ id (), this is to specify which memory needs to be used, and
with no doubt, it always prefers the local memory in the local socket. We will explain
the basic concept later. Then, the sample calls port _ init(portid, mbuf _
pool) to initialize Ethernet port with memory configuration, and finally, it calls
lcore _ main() to start the packet processing.
int main(int argc, char *argv[])
{
struct rte_mempool *mbuf_pool;
unsigned nb_ports;
uint8_t portid;
/* Initialize the Environment Abstraction Layer (EAL). */
int ret = rte_eal_init(argc, argv);
/* Check there is an even number of ports to send/
receive on. */
nb_ports = rte_eth_dev_count();
if (nb_ports < 2 || (nb_ports & 1))
rte_exit(EXIT_FAILURE, "Error: number of ports must be
evenn");
/* Creates a new mempool in memory to hold the mbufs. */
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_
MBUFS * nb_ports,MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_
BUF_SIZE, rte_socket_id());
25
Introduction
/* Initialize all ports. */
for (portid = 0; portid < nb_ports; portid++)
if (port_init(portid, mbuf_pool) != 0)
rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8
"n",portid);
/* Call lcore_main on the master core only. */
lcore_main();
return 0;
}
1.7.2.1 Ethernet Port Initialization
port_init(uint8_t port, struct rte_mempool *mbuf_pool)
This function will be responsible for Ethernet port configuration like the queue con-
figuration, and in general, Ethernet port is configurable with multi-queue support.
Each receive or transmit queue is assigned with memory buffers for packet in and
out. Ethernet device will place the received packets into the assigned memory buffer
(DMA), and buffer is part of the memory pool, which is assigned at the initialization
phase, socket aware.
It is important to configure the number of queues for the designated Ethernet port.
Usually, each port contains many queues. For simplicity, this example only specifies
a single queue. For packet receiving and transmitting, port, queue, and memory buf-
fer configuration are the separate concept. If no specific configuration is specified,
the default configuration will be applied.
Ethernet device configuration: Set the number of receiving and transmitting
queues on a specified port, and configure the ports with input options.
int rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q,
uint16_t nb_tx_q, const struct rte_eth_conf *dev_conf)
Ethernet port/queue setup: Configure the specific queue of a specified port with
memory buffer, the number of descriptors, etc.
int rte_eth_rx_queue_setup(uint8_t port_id, uint16_t
rx_queue_id,
uint16_t nb_rx_desc, unsigned int
socket_id,
const struct rte_eth_rxconf *rx_conf,
struct rte_mempool *mp)
int rte_eth_tx_queue_setup(uint8_t port_id, uint16_t
tx_queue_id,
uint16_t nb_tx_desc, unsigned int
socket_id,
const struct rte_eth_txconf *tx_conf)
After the Ethernet port initialization is completed, the device can be started with
int rte_eth_dev_start(uint8_t port_id).
26 Data Plane Development KIT
Upon finishing, Ethernet port can have the physical MAC address, and the port will
be turned on with promiscuous mode. In this mode, the incoming Ethernet packets
can be received into the memory, allowing the core to do further processing.
static inline int
port_init(uint8_t port, struct rte_mempool *mbuf_pool)
{
struct rte_eth_conf port_conf = port_conf_default;
const uint16_t rx_rings = 1, tx_rings = 1;
/* Configure the Ethernet device. */
retval = rte_eth_dev_configure(port, rx_rings, tx_
rings, &port_conf);
/* Allocate and set up 1 RX queue per Ethernet port. */
for (q = 0; q < rx_rings; q++) {
retval = rte_eth_rx_queue_setup(port, q,
RX_RING_SIZE,
rte_eth_dev_socket_id(port), NULL,
mbuf_pool);
}
/* Allocate and set up 1 TX queue per Ethernet port. */
for (q = 0; q < tx_rings; q++) {
retval = rte_eth_tx_queue_setup(port, q,
TX_RING_SIZE,
rte_eth_dev_socket_id(port), NULL);
}
/* Start the Ethernet port. */
retval = rte_eth_dev_start(port);
/* Display the port MAC address. */
struct ether_addr addr;
rte_eth_macaddr_get(port, &addr);
/* Enable RX in promiscuous mode for the Ethernet
device. */
rte_eth_promiscuous_enable(port);
return 0;
}
The packet reception and transmission is done in an endless loop, which is imple-
mented in the function of lcore _ main. It is designed with performance in mind
and will validate the assigned CPU cores (lcore) and Ethernet devices that are physi-
cally on the same socket. It is highly recommended to use local CPU and local NIC
in the local socket. It is known that remote socket will bring a negative performance
impact (more details will be discussed in a later section). The packet processing is done
with the packet burst functions. In both receive (rx) and transmit (tx) sides, four param-
eters are given, namely, ports, queues, packet buffer, and the number of burst packets.
27
Introduction
Packet RX/TX Burst Function:
static inline uint16_t rte_eth_rx_burst(uint8_t port_id,
uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t
nb_pkts)
static inline uint16_t rte_eth_tx_burst(uint8_t port_id,
uint16_t queue_id, struct rte_mbuf **tx_pkts, uint16_t
nb_pkts)
Now we have the basic look and feel of DPDK packet receiving and transmitting
code. The software has no dependency on vendor-specific NIC. From the very begin-
ning, DPDK takes the software design into account, and the device abstraction layer
is well designed to run across the platforms and NICs from multiple vendors.
static __attribute__((noreturn)) void lcore_main(void)
{
const uint8_t nb_ports = rte_eth_dev_count();
uint8_t port;
for (port = 0; port < nb_ports; port++)
if (rte_eth_dev_socket_id(port) > 0 &&
rte_eth_dev_socket_id(port) !=(int)
rte_socket_id())
printf("WARNING, port %u is on remote NUMA
node to "
"polling thread.ntPerformance
will "
"not be optimal.n", port);
/* Run until the application is quit or killed. */
for (;;) {
/*
* Receive packets on a port and forward them on
the paired
* port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3
-> 2, etc.
*/
for (port = 0; port < nb_ports; port++) {
/* Get burst of RX packets, from first port of
pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
/* Send burst of TX packets, to second port of
pair. */
28 Data Plane Development KIT
const uint16_t nb_tx = rte_eth_tx_burst(port ^
1, 0,
bufs, nb_rx);
/* Free any unsent packets. */
if (unlikely(nb_tx < nb_rx)) {
uint16_t buf;
for (buf = nb_tx; buf < nb_rx; buf++)
rte_pktmbuf_free(bufs[buf]);
}
}
}
}
1.7.3 L3FWD
This is a famous and popular DPDK example, as it is frequently used to measure
DPDK performance metrics. The typical test scenario, server, is installed with the
high-speed Ethernet adapters and connected with PCIe slots. Ethernet ports of the
server platform are connected with the external hardware packet generator (usually
from IXIA or Spirent), and the l3fwd sample can demonstrate 200 Gbps forwarding
rate on dual-socket server platforms easily. In this sample, the packet is received
from Ethernet. CPU will check the IP header for validation, and it will complete
the routing table lookup using the destination IP address. Once the destination port
is found, the packet is sent out with IP header modification like TTL update. Two
routing table lookup mechanisms are implemented: the exact match based on the
destination IP address and the LPM-based lookup. The l3fwd sample contains more
than 2,700 lines of code (including blank lines and comment lines), and the main
body is actually a combination of the HelloWorld and Skeleton.
To enable this instance, the command parameters are given in the following format:
./build/l3fwd [EAL options] -- -p PORTMASK [-P] --config(port,
queue, lcore)[, (port, queue, lcore)]
The command parameter is divided by “--” into two parts.
• The section after “--” is the command option of l3fwd sample.
• The section before “--” is used for DPDK’s EAL options, mainly for run-
time environment initialization and configuration.
[EAL options] is the configuration to set up the runtime environment, and it is passed
over to rte _ eal _ init for processing.
• PORTMASK identifies the Ethernet ports for DPDK use. By default, the
Ethernet device is managed by the Linux kernel driver. For example, the
device name is “eth1”. In today’s DPDK version, the user can bind the spe-
cific device to DPDK, which uses the igb _ uio kernel module to allow
29
Introduction
the device configuration in the user space, where DPDK can take the device
into control. A script is available to help the device bind operation, known
as dpdk-devbind.py. The below example binds “eth1” for DPDK use.
dpdk-devbind --bind=igb_uio eth1
Note: In the early DPDK version, the DPDK initialization will scan the known PCIe
devices for use, which can lead to the in-use network port being disconnected.
l3fwd sample configuration options a scalable performant approach on the basis
of the (port, queue, lcore) configuration. It connects the assigned core with
Ethernet port and queues. In order to achieve a high packet forwarding rate, multiple
CPU cores can work together, and each core drives the specific port and queue for
packet I/O (Table 1.2).
The master thread is similar to HelloWorld or Skeleton, and no further explana-
tion is required here.
Initialize the running environment: rte_eal_init(argc, argv);
Parse the input parameters: parse_args(argc, argv)
Initialize lcore and port configuration
Initialize Ethernet ports and queues, similar to Skeleton
sample
Start the Ethernet port
Invoke the slave threads to execute main_loop ()
The slave thread will do the actual packet I/O, and the entry function is known as
main _ loop(). It will run as follows:
Reads lcore information to complete configuration;
Reads information about send and receive queues;
Packet loop processing:
{
Sends packets in bulk to the transmit queue;
Receives packets in bulk from the receive queue;
Forwards packets in bulk;
}
Sending packets in bulk (or in bursts) to the designated queue and receiving packets
in bulk from the designated queue are common in DPDK. It is an effective way for
TABLE 1.2
L3fwd Common Options: Port, Queue, Core
Port Queue Thread Characterization
0 0 0 Queue 0 of processing port 0, thread 0
0 1 2 Queue 1 of processing port 0, thread 2
1 0 1 Queue 0 of processing port 1, thread 1
1 1 3 Queue 1 of processing port 1, thread 3
30 Data Plane Development KIT
optimal platform resource use. Batched packet forwarding is done based on either
the exact match (Hash) or the LPM selected as the compilation option. The example
includes the code implementation based on SSE, known as “multi-buffer” principle,
and is a known practice to get more performance on Intel® processors.
So far, lots of code have been shown here, and the intent is to give a quick feel
on what’s DPDK. In the later chapters, this book will not try much on the code deep
dive. It is difficult to keep up to date with the latest code development, as DPDK
community is still very active, so the used code might be obsolete as this book is in
public.
1.8 CONCLUSION
What is DPDK? It is a set of software libraries, implemented on the basis of the
software optimization principles and practices, hosted as an open-source project
under Linux foundation. It is known for moving packets into server platform, and
for moving packets into virtualization machines (or container tenants). The DPDK
has been included by many Linux distribution packages, such as Red Hat, CentOS,
and Ubuntu. DPDK is established as the leading user space networking open-source
project. DPDK is widely adopted for networking and cloud infrastructure, network
appliances, and virtual switch and storage acceleration systems worldwide.
FURTHER READING
1. https://wiki.linuxfoundation.org/networking/napi.
2. http://info.iet.unipi.it/~luigi/netmap/.
3. https://fd.io/2017/07/fdio-doubles-packet-throughput-performance-terabit-levels/.
31
2 Cache and Memory
Chen Jing
Alibaba
Heqing Zhu
Intel®
DPDK is initially optimized for Intel®-based network platform. This chapter focuses
on the cache, memory using Intel® processor based server platform as the hardware
example. This chapter does not incorporate the non-Intel® platform; however, the
concept should be very similar and applicable to other architecture and platforms.
2.1 DATA ACCESS AND LATENCY
Generally, the computer system consists of cache, memory, and storage hardware
modules for data process, movement, and storage. Cache and DRAM hold the run-
time data that the CPU needs to access, and the data will go away if the system is
power off. Hard drives, SSD, optical disk, and USB flash drive are persistent data
storage devices; once the data is written, it will exist in these devices even after the
system is powered off. Cache and memory controllers have been an integrated part
of modern processors. Typically, processors always access data from storage device
CONTENTS
2.1 Data Access and Latency 31
...............................................................................
2.2 Intel® Xeon Architecture 33
...............................................................................
2.3 Intel® Atom® SoC 36
...........................................................................................
2.4 Cache 37
..............................................................................................................
2.4.1 Cache Line 38
.........................................................................................
2.4.2 Cache Prefetching 38
...............................................................................
2.4.3 Software Prefetching 40
..........................................................................
......................................................................................
................................................................................
.......................................................................
.........................................................................................
.........................................................
............................................................
.............................................................................................
...............................................................................................................
............................................................................................................
.......................................................................................................
Further Reading 49
2.8 NUMA 48
2.7 DDIO 48
2.6 Memory Latency 47
2.5.2 Reserve HugePage at Runtime 46
2.5.1 Reserve HugePage at Boot time 46
2.5 TLB and HugePage 45
2.4.6 Noisy Tenant and RDT 45
2.4.5 Cache Coherency 43
2.4.4 False Sharing 43
Random documents with unrelated
content Scribd suggests to you:
T
CHAPTER II.—COURAGE TAKES
HEART.
his time, as before, there is a story to tell because of
something braved and dared for Miss Julia’s sake; something
that needed less nerve, perhaps, than the leap Courage took
that night on the drawbridge, but something that called not only for
a world of a different sort of courage, but for infinite patience as
well, and that claimed the whole summer for its doing. The reason
for it all lay in four little words—Miss Julia was dead. Beautiful,
strong, radiant Miss Julia! why, no one had thought of death for her,
save as years and years away in the serene twilight of a calm old
age; and yet it had come, suddenly, after a week’s brief illness, and
Courage was simply broken-hearted. She felt she had no right to her
name now, and never should have again. Miss Julia had been
teacher, mother, friend to her, one or the other almost since her
babyhood, and to care for Miss Julia in return, now that she herself
was grown up, to let every thing else “come second,” had been her
only thought. And now to find her hands suddenly empty, and all the
sunshine gone out of her life—was it strange that she felt despairing
and desolate and that nothing whatever was left?
“But we are left,” pleaded a chorus of little voices, and Courage
seemed to see four brighteyed little children; bright-eyed because
God had made them so, but with faces almost as sad as her own.
“Yes, we are left,” they continued pleading. “Miss Julia was going to
do so much for us this summer; could not you do it in her place for
her sake?”
Courage shook her head gravely as in answer to her own
thoughts.
“No, I cannot,” she said, firmly. “Everything that I leaned on is
gone; nothing is left to me—nothing.”
“But could you not try just for her sake?” chorused the little voices
over and over in her heart, day after day, in all the sad hours of
waking, and sometimes even in sleeping, until at last she bravely
brushed the tears away and made answer, “Yes, for her sake I will!”
She remembered the day of her six-year-old christening, when her
remarkable name had been given her and she had asked: “Is
courage something that people have, Papa? Have I got it?” and he
had told her, “Courage is something that people have, dear,
something fine, and I hope you will have it.”
Yes, she would try, even in this dark hour, to live up to her father’s
hope for her, and so her resolve was taken.
But the four bright-eyed little children knew nothing of any
resolve; they would not have understood what it meant if they had,
and as for their singing a pathetic little chorus in any one’s heart,
they were altogether unconscious of that as well. But one thing they
did know, and that was they should never see Miss Julia again in this
world, and they thought they also knew that a beautiful plan she
had made for them could never be carried out. The wisest thing,
therefore, for these four little people was to put, so far as possible,
all thought of the plan from their minds, and Mary, the eldest of the
four, said as much to the others.
“Oh, don’t let us think about it any more,” she urged, earnestly. “If
we only could have Miss Julia back what would we care for anything
else? Besides, when you think what has happened, it seems selfish,
and as though we did not have any hearts, to grieve over our own
little plans for a moment.”
“But it wasn’t just over our own little plan,” insisted her younger
brother Teddy, “it was Miss Julia’s plan for us, and I don’t think it
strange a bit that we should grieve over it.”
“Neither do I,” urged Allan, who came next to Teddy in age. “Of
course us boys, not going to the sewing-school, did not know Miss
Julia as well as you, but I just guess there wasn’t a boy who thought
more of her than I did. What’s more I loved her; not making a fuss
over her, to be sure, like you girls, still I did really love her,”
(emphasising the word by a shake of his head, and firm pursing of
his lips). “All the same, I think it’s natural we should feel awfully
disappointed.” Gertrude who was seven, and the youngest of the
four, nodded in approval of the stand Allan had taken, and continued
nodding, as he added, “We haven’t travelled so much, seems to me,
or had so much change in our lives as to settle back to the idea of a
hot summer here in town, instead of going to the country, without
feeling it a bit; that is, I don’t think we have.”
Mary sighed and said nothing, as though ready to admit, after all,
that perhaps it was natural that they should take their
disappointment somewhat to heart, but the tears that had sprung
suddenly into her eyes were from real longing for Miss Julia and not
from the disappointment.
This quiet talk in which the little Bennetts were indulging, was
being carried on from the backs of two horses—the two girls
mounted upon one and the two boys astride the other—but they
happened to be the quietest horses in the world; horses that never
budged in fact, tailless and headless, and that belonged to the
carpenter who lived on the first floor. The Bennetts lived on the top
floor; but whenever there was anything to be talked over, down they
trooped to the yard and climbed and helped each other to the backs
of these high seats, and when all were able to declare themselves
perfectly comfortable the conclave would commence. The little
Bennetts were great talkers. They simply loved to discuss things,
and this shows, when you stop to consider it, that they must be, on
the whole, an amiable little family, for some little people that we
hear of are quite too impatient and self-assertive to be willing to
discuss things at all. But whatever may have been the faults of the
little Bennetts they did have respect for each other’s opinions, and
were generally ready to admit that two heads were better than one,
and “Four heads,” to quote little Gertrude, “four times as better.”
This habit of discussion, for it really amounted to that, was partly no
doubt the outcome of a little strategy on the part of their mother.
Mary and Teddy and Allan and Gertrude were just a “pair of steps,”
as the saying goes, and sometimes the little living-room on the
fourth floor seemed all too small for the noisy company, and then
Mrs. Bennett would exclaim, and as though the most novel sort of an
idea had occurred to her:
“Children, why don’t you run down to the yard and have a good
talk?”
There was no resisting this appeal, such untold delights were
implied in Mrs. Bennett’s tone and manner, and the children seldom
failed to act upon the advice, and what was more, seldom failed to
light upon some interesting thing to talk about; and then, always as
a last resort, some one could tell a story. The some one was
generally Teddy, for he had the wildest imagination, and could upon
any and every occasion invent most thrilling romances, which were
quite as much of a surprise to himself as to his hearers. And so the
children had come to love their perch in the corner of the city yard,
with the uncertain shade of an old alanthus flickering over them in
summer, and the bright sun streaming full upon them in its leafless
winter days. And this was how it chanced that the Bennett children
found themselves in their old haunt that breezy May morning, and
were easing their heavy little hearts by frankly admitting to one
another how very great indeed was their disappointment.
Better so, I think. Wrinkles come earlier and plow deeper, and
thoughts are apt to grow bitter and morbid, when one broods and
broods, and will not take hearts near and dear into one’s confidence.
The day never dawns when truly brave hearts cry out for pity, but
sympathy is a sweet and blessed thing the world over, and God
meant not only that we should have it, but that, if need be, we
should reach our hands and grasp it.
There was one little Bennett, however, who did not share in the
general depression. Too short a time in the world to know aught of
its joys or sorrows, Baby Bennett lay comfortably in his mother’s lap,
having just dropped off to sleep after a good half hour of rocking,
Mrs. Bennett, who had herself grown drowsy with her low crooning
over the baby, glanced first at the bustling little clock on the mantel
shelf, and then, leaning her head against the back of the chair,
closed her eyes; but instead of falling asleep she fell to thinking, and
then her face grew very sad and tears made their way from beneath
her closed eyelids. So, you see, the mother-heart was heavy as well
as the-child-hearts in the Bennett family, and for the same reason. It
was not because they were not learning to face and accept the
thought that Miss Julia, whom they so dearly loved, could not return
to them; they were trying to be as brave as Miss Julia herself would
have had them. But this was the day, the very day that they were all
to have started, and they could not seem to forget it for a moment;
neither could somebody else, and soon there came a gentle knock at
Mrs. Bennett’s door.
“Come in,” she answered, forgetting the tears in her eyes; and,
laying the baby in its little clothes-basket of a bed, she turned to
greet the newcomer. Courage had mounted the four flights of stairs
very bravely, but the sight of the tears in Mrs. Bennett’s eyes
disarmed her, and, sinking into the nearest chair, she found she
would best not try to speak for a moment.
“Oh, I’m so sorry, Miss Courage, that you should have seen me,”
said Mrs. Bennett, with a world of regret in her voice; “it is so much
harder for you than for anybody, but this was the day, you know,
almost the very hour.”
“Yes, I know,” Courage faltered; “that was why I came.”
“It’s like you, Miss Courage; you’ve Miss Julia’s own
thoughtfulness, but I’m thinking it will be easier for us all when this
day’s over. I got rid of the trunk last week; it seemed to make us all
so disheartened to have it standing round.”
“You didn’t sell it, did you?”
“No, indeed I did not, for it may be the children will have a chance
yet some day, for a bit of an outing.”
“I have decided they are all to have it yet, Mrs. Bennett, this very
summer, and just as Miss Julia planned, too. That’s what I came to
tell you, if you will trust them to me.”
“Trust you! Oh, my dear! but it would be too much care for those
young shoulders; too much by far.”
“Mrs. Bennett,” said Courage, so earnestly as to carry conviction,
“I thought so at first, too, but the plan has grown to be just as dear
to me as it was to Miss Julia, and now, if you do not let me carry it
out, I do not see how I can ever live through this first summer.”
“Then indeed I will let you,” and then she added slowly, and with
an accent on every word, “and you are just Miss Julia’s own child!”
and Courage thought them the very sweetest words she had ever
heard, or ever could hear again.
“May I tell the children?” she asked, eagerly. “Where are they?”
Mrs. Bennett did not answer. I believe she could not, but she
opened the window and Courage knew that meant the children were
below in their favourite corner.
“Oh, let me call them, please,” resting one hand on Mrs. Bennett’s
arm and leaning far out over the sill.
“Children! come up stairs for a moment, I have something to tell
you. Come up quickly.” Courage hardly knew her own voice, it rang
out so cheerily.
“Oh, Miss Courage!” chorused four little voices, only this time the
sound was in her ears as well as in her heart, and as she watched
the children tumble helter-skelter from the horses in the yard way
down below her, a smile that was almost merry drove the shadows
from her face.
W
CHAPTER III.—A DELIGHTFUL
DISCOVERY.
hy, whatever’s going on here?” exclaimed Brevet.
“Oh, yes,” said Joe, turning slowly round, for he knew
what had attracted Brevet’s attention. “I done notice it on
de way up ter Ellismere fo’ you dis mornin’, an’ den I was so took up
with dat fascinatin’ song of yo’s as we drove back, dat I didn’t want
to interrupt you long ’nuff to call yo’ attention to it. Looks as dough
dere mus’ be some one come ter live in de pretty little house,
doesn’t it?”
“Why, yes, it does,” said Brevet, very much interested; “and you
don’t know who it is, Joe?”
“No, I hasn’t knowed nuffin’ ’bout it, till I seed de whole place
lookin’ so pert like dis mornin’,” and Joe brought old Jennie to a
standstill that they might more fully take in the situation.
“Don’t you think I ought to find out, Joe?”
“Why, yes, Honey, seems ter me it would be sort of frien’ly,” and
suiting the action to the word he took Brevet by the arms and
dropped him down over the cart-wheel.
The change that had come over this point in the road was indeed
remarkable. A little house that had remained untenanted for years,
in the midst of an overgrown enclosure, stood this bright June
morning with every door and window open to the air and sunshine.
The vines which had half hidden it from view had already been cut
away, and on every hand were signs that the place was being
brought into liveable shape with all possible expedition. No one was
in sight, so Brevet noiselessly pushed open the gate, and, making
his way to the little front porch, reached upward and lifted the brass
knocker of the open door. The unexpected sound instantly brought a
neatly-dressed, elderly-looking woman from some room in the rear.
“How’dy,” said Brevet, instantly put at his ease by the kindness of
the woman’s face.
“What did you say, dear?” she asked, with a puzzled frown.
“I said how’dy,” explained Brevet, wondering that the woman’s
face still wore the puzzled look. “We just stopped to ask who was
coming. We go by here very often, Joe and I,” pointing to the cart,
“and we were wondering what was up seeing this place open that’s
been closed so long.”
“It can’t be that Miss Julia’s self is a comin’ can it?” called Joe, for
the little house was not set so far back from the road but that he
could hear every word spoken between the woman and Brevet.
“Why, did you know Miss Julia?” she asked, stepping at once to
the gate, with Brevet following close behind her.
“No, Miss; dat is not personally, but I knowed dat Miss Julia owned
dis little plantation, an’ I often wonder dat she never done come to
live on it. I can ‘member when her Uncle Dave was livin’, an’ it was
den des de homiest little homestead in de country.”
“You have not heard then of Miss Julia’s death?”
“No,” exclaimed Joe, with as much feeling in his voice as though
Miss Julia had indeed been an old friend; “you don’ tell me! I’se
often heard what a reg’lar lady she was, and often wished I done
have a chance to lay eyes on her.”
“She was a very good friend to me,” said the woman, sorrowfully,
“and she had expected to come down here this summer and open
the house, and bring a little family of city children with her who had
never spent a day in the real country in their lives.” "You don’t say
so!” said Joe, shaking his head sadly. “It’s strange what times de
Lord chooses to call de good folks out of dis worl’.” And then he
added, after a moment of respectful silence, “But de place here, am
it sold to some new party?”
“No; Miss Julia left it in her will to a young lady who was just the
same as a daughter to her, and she has decided to come down in
Miss Julia’s place this summer.”
“And bring the little children?” asked Brevet, eagerly.
“And bring the little children,” answered the woman, her face
brightening. “I have come down to make everything ready for them,
and they are coming on Friday.”
“Oh, do you think I could know them?”
“Of course you can know them. You must come and see them so
soon as ever they come. But you must tell me your name so that I
can tell them about you.”
“My name is Howard Ellis, but that name isn’t any use now.
Everybody calls me Brevet since I and the Captain here have grown
to be such friends. It means kind of an officer in the army, and when
I grow up I’m going to West Point and learn how to be a real officer,
and not just kind of a one at all. But till then everybody’s going to
call me Brevet. And now what is your name please, and the
children’s, because I want to tell my grandnana all about you?”
“Well, my name is Mary Duff, dear, and the children are named
Bennett—Mary and Teddy and Allan and Gertrude Bennett.”
“Oh, are two of them boys?” and Brevet’s face was radiant. “I
haven’t had a boy to play with ever hardly, but I s’pose they’re older
boys than me,” he added, a little crestfallen; “almost all boys are.”
“Well, Teddy is not very much older, just a little, and Allan is just
about your age I should say. Never you fear, Brevet, you’ll have
beautiful times with them all, I know.”
“When shall I come then?” wishing to have matters very definitely
arranged. “Do you think they would like to have me here to help
them feel at home right off at the very first?”
“Well, I should not wonder but they would like that very much
indeed.”
“Then I will come on Friday.”
“You mean you will ask your grannana, Brevet,” said Joe,
significantly.
“Oh, yes; I mean I will ask if I may come.” This last very quickly
and eagerly, remembering his little lecture of the morning.
“Well, it’s des a comfort to see de ole place in shape once more,
an’ I trus’ you an’ de young lady an’ de chilluns will have des a
beautiful summer. P’r’aps some day,” and Joe’s eyes twinkled with
the thought, “dey’ll all come up and spen’ de day with me at
Arlington. Brevet here alway des loves to come. You know
Arlington’s where all de soldiers am buried. I used to be a slave on
de place ‘fo’ de wah, an’ dere ain’t much happened dere fur de las’
fifty years dat I hasn’t some knowledge of, and dey done tell me”
(indulging in a little complacent chuckle) “dat it’s mighty interestin’
ter spen’ de day with Joe at Arlington.”
“Well, indeed I should think it would be,” said Mary, very much
interested, “and I wish you would stop and see Miss Courage about
it the first time you drive by.”
“Thank you very much, Miss; and now. Brevet, your grannana will
be watchin’ fur us an’ we had bes’ be joggin’ on I’m thinkin’.”
“All right, Captain,” clambering into the cart, and then Joe and
Brevet courteously touched their caps, in true military fashion, and
old Jenny jogged on.
“Miss Courage did she say?” asked Brevet, the moment they were
out of hearing, just as Joe knew he would.
“Yes; it soun’ like dat, Honey, but some day we must make
inquiries. Dere mus’ be some ‘splanation of a name like dat.”
I
CHAPTER IV.—EVERYBODY HAPPY.
t is strange and beautiful,” thought Courage as she moved busily
about her room, putting one thing and another into a trunk that
stood open before the fireplace; “strange and beautiful how
difficulties take to themselves wings, when you once make up your
mind what is right to do and then go straight ahead and do it.”
“Miss Courage,” said a young coloured girl, who was leaning over
the bed trying to fold a black dress in a fashion that should leave no
creases to show for its packing, “I felt all along there was nothing
else for you to do.”
“Then, Sylvia, why did you not say so?” Courage asked, a little
sharply. “You knew how hard it was for me to come to any decision.
It was not because you were afraid to say so, was it?”
“Afraid?” and a merry look shone for a moment in Sylvia’s eyes.
“No, I don’t believe I ever could grow afraid of the little curly-headed
girl I used to work for when we were both children together. No,
indeed; it was only because I thought you ought to see it so
yourself. It seemed as though it was just as plain a duty as the hand
before your face, and I felt sure you would come to it, as you have,
if we only gave you time enough.” It was a comfort to Courage to
feel that Sylvia so thoroughly understood her. Indeed, they were far
more to each other than mistress and maid; they were true friends
these two, whose only home for a while had been Larry Starr’s brave
lighter, and for both of whom he had cared in the same kind,
fatherly way. Of course you do not understand about Larry or Larry’s
lighter, unless you have read “Courage,” but then on the other hand
there is no reason why you need to understand. Nor was Sylvia the
only one who approved of what Courage had done. The Elversons,
Miss Julia’s brother and his wife, and with whom Courage and Miss
Julia had lived, were as glad as glad could be to have Courage carry
out Miss Julia’s plan; and so in fact was everybody who saw how sad
and lonely Courage was, and what a blessing anything that would
occupy her thoughts must be to her. And so, in the light of all this,
you can see how sad it would have been if Courage had yielded to
her fears, and persistently turned away from a duty, in very truth as
plain as the hand before your face, as Sylvia had put it. But Courage
had not turned away, nor for one instant wavered from the moment
her resolve was taken.
And now at last the day for the start had dawned. The little
Bennetts had been awake at sunrise. Fancy having three months of
Christmas ahead of you—for it seemed just as fine as that to them.
It was a wonder they had slept at all. They had read about brooks
and hills and valleys, and woods where all manner of beautiful wild
things were growing; of herds of cow’s grazing in grassy pastures; of
loads of hay with children riding atop of them, and of the untold
delights of a hay-loft. And now they were going to know and enjoy
every one of these delights for themselves. Why, they could not
even feel sad about leaving their mother, and indeed she was as
radiant as they at the thought of their going.
“You see,” she explained to them, “I shall have the baby for
company, and such a beautiful time to rest; and your father and I
will take a sail now and then down the bay, or go to the park for the
day in the very warm weather; and then it is going to be such a
comfort to have your father home for two whole months, and that
couldn’t have happened either, you know, if you had not been going
away for the summer.” The children’s father, Captain Bennett, was
one of the pilots who earn their living by bringing the great ocean
steamers into the harbour, and often he would be aboard the pilot-
boat, at sea for weeks at a time, waiting his turn to take the helm of
one of the incoming steamers, and then, as like as not, he would
have to put straight to sea again, for there were many to keep, and
there was need for every hard-earned dollar. But the Captain’s
chance for a vacation had come with the children’s. He could afford
to take it, since four of his little family were to be provided for, for
the entire summer, and so every one was happy and every one
believed that somehow Miss Julia must know and be so glad for
them all.
But this was the day for the start, as I told you, and the children
had started. They were in the waiting-room at the foot of Cortlandt
Street, where Courage was to meet them.
“And here she is,” exclaimed Mary, with a great sigh of relief, being
the first to espy Courage coming through the gate of the ferry-
house, “and doesn’t she look lovely!” Mary was right; Courage did
look lovely as, with Sylvia close behind her, she walked the length of
the waiting-room to where the little group were standing. Other
people thought so too, as she passed, and watched her with keenest
interest. Her stylish black dress and black sailor hat were wonderfully
becoming, and the face that had been so pale and sad was flushed
with pleasure now, and with the rather uncomfortable consciousness
that she and her little party could scarcely fail to be the observed of
all observers. Mrs. Bennett was there, of course, to see them off,
and the baby and the Captain, and it must be confessed that the
eyes of both father and mother grew a little misty as they said
“Good-bye” to their little flock. The girl contingent was a trifle misty,
too, but the baby was the only one who really cried outright.
However, I half believe that was because he wanted a banana that
hung in a fruit stand near by, and not at all because the children
were going to leave him; some babies seem to have so very little
feeling. But now it was time to go aboard the boat, and the Captain
and Mrs. Bennett saw the last of the little party as they disappeared
within the ferry-boat cabin, and then in fifteen minutes more the
same little party was ranged along one side of a parlor car on the
“Washington Limited”; then the wheels slowly and noiselessly
commenced to turn and they were really off; all of the little party’s
hearts thrilling with the thought, and all sitting up as prim as you
please, in their drawing-room chairs, quite overawed with the
magnificence of their surroundings and the unparalleled importance
of the occasion.
Courage, very much amused, watched them for a few moments
and then suggested that they should settle themselves for the
journey. Bags were stowed away in the racks overhead, coats and
hats banished to coat hooks, and one thing and another properly
adjusted, until at last four little pair of hands having placed four little
footstools at exactly the desired angle, four pair of brand-new russet
shoes found a resting-place rather conspicuously atop of them, and
the four children leaned comfortably back in the large, upholstered
chairs as though now at last permanently established for the entire
length of the journey. But of course no amount of adjusting and
arranging really meant anything of that sort, or that they could be
able to sit still for more than five minutes at a time, and Courage
and Sylvia soon had to put their wits to work to think up ways of
keeping the restless little company in some sort of order. But
fortunately none of the fellow-passengers appeared disturbed
thereby. On the contrary, they seemed very much interested, and
finally a handsome old gentleman came down the aisle, and leaning
over the chair in which Courage was sitting, said courteously:
“My dear young lady, if you will pardon an old man’s curiosity, and
do not for any reason mind telling me, I should very much like to
know what you are doing, and where you are going with this little
family?”
“And I am very glad to tell you,” answered Courage cordially, for
since that summer spent with Larry there had always been such a
very warm corner in her heart for all old people; and Teddy, who
was sitting next to Courage, had the grace to offer the old
gentleman a chair. Then for some time he listened intently, his kind
old face glowing with pleasure as Courage told him all about the
children, and finally of the cosy little cottage awaiting their coming
down in Virginia.
“But in doing all this,” Courage concluded, “I am simply carrying
out the plans of my dearest friend, Miss Julia Everett.”
“Oh, you don’t mean it!” the old gentleman exclaimed, his voice
trembling. “I knew Miss Everett well. She always stopped with me
when she came to Washington.”
“Can it be that you are old Colonel Anderson?”
“Yes, I am Colonel Anderson, and I suppose I am old,” he added,
smiling; “and can it be you are young Miss Courage, of whom I have
heard so often?”
“Yes, I am Courage, but you will excuse me, won’t you, for
speaking as I did? I only had happened to hear Miss Julia——”
Courage hesitated.
“Oh, yes, dear child, I understand perfectly. You used to hear Miss
Julia speak of me as old Colonel Anderson, and so I am, and I am
not ashamed of it either, although I could not resist the temptation
to tease you a little, which was very rude of me. But now, can it be
that it is to Miss Julia’s estate near Arlington that you are going—to
the home that her Uncle Everett left her when she was just a little
slip of a girl, years before the war?”
“Yes, that is exactly where, but I have never seen it.”
“Well, you will love it when you do. It is the dearest little spot in
the world. I will drive out some day and take luncheon with you and
the children, if I should happen to have an invitation. I could tell you
some interesting things about the old place.”
“Oh, will you come?” exclaimed Mary and Gertrude in one breath,
for with a curiosity as pardonable, I think, as that of old Mr.
Anderson, all of the children had grouped themselves about
Courage, and had listened with keenest interest to every word
spoken. And so one more happy anticipation was added to the many
with which their happy hearts were overflowing.
At last the train steamed into Washington, although at times it had
seemed to the children as though it never would, and then a
carriage was soon secured, and, three on a seat, the little party
crowded into it, and they were off for their eight mile drive to
Arlington.
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu
A
CHAPTER V.—HOWDY
nd meantime what excitement in the little cottage down in
Virginia! Everything was in readiness and everybody was on
the tiptoe of expectation. Everybody meant Mary Duff, (it was
she, you know, who had cared for little Courage through all her
babyhood, and who had been sent down to get everything in order),
and besides Mary Duff, Mary Ann the cook, old Joe and Brevet.
It must be confessed, Brevet had had a little difficulty in winning
his grandmother’s consent to this visit, but he had been able to meet
every objection with such convincing arguments, that he had come
off victor in the encounter.
“You see, Grandnana,” he had confidentially explained, with his
pretty little half-southern, half-darkey accent, “I is a perfec’ stranger
to them now I know, but then everything is strange to them down
here, so don’t you s’pose it would be nice for me to be right there
waiting at the gate, where I can call out ‘How’dy’ just so soon as
ever they come in sight, and so for me not to be a stranger to them
more’n the first minute, and have them find there are folks here who
are very glad to know them right from the start? Besides, the lady—
Mary Duff was her name—told me she just knew those little
Bennetts would love to see me, and that she would surely expect me
down to-day for certain.”
And so “Grandnana” succumbed, not having the heart to nip such
noble hospitality in the bud, and at two o’clock precisely, the best
carriage wheeled up to the door and Mammy and Brevet were
quickly stowed away within it, to say nothing of a basketful of good
things covered with a huge napkin of fine old damask. But who is
Mammy? you ask, and indeed you should have been told pages ago,
for no one for many years had been half so important as Mammy in
the Ellis household. She is an old negro woman, almost as old as Joe
himself, and when on the first of January, 1863, President Lincoln
issued the proclamation that made all the slaves free, she was
among the first to turn her back upon the plantation where she was
raised, and make her way to Washington. It was there that Mrs. Ellis
had found her, when in search of a nurse for her two little boys, and
from that day to this she has been the faithful worshipper of the
whole Ellis family. Now in her old age her one and only duty has
been to care for Brevet, a care constantly lessening as that little
fellow daily proves his ability to look out more and more for himself.
Brevet was not to be allowed, however, on the occasion of this
first visit to their new neighbours, to make the trip alone.
“Grandnana” had been very firm about that, somewhat to his
chagrin, and so, if the truth be told, Mammy’s presence in the
comfortable, old-fashioned carriage was at first simply tolerated. But
that state of affairs did not last long. Try as he would, Brevet was
too happy at heart to cherish any grievance, imaginary or otherwise,
for many minutes together; and soon he and Mammy were chatting
away in the merriest fashion, and the old nurse was looking forward
to the unusual excitement of the day, with quite as much
expectation as her little charge of seven. Had she not devoted the
leisure of two long mornings of preparation to the shelling of
almonds and the stoning of raisins, and then when the day came,
with eager trembling hands, packed all the good things away in the
great, roomy hamper that seemed now to look at her so
complacently from the opposite seat of the phaeton? Yes, indeed, it
was every whit as glad a day for Mammy as Brevet, and she peered
out from the carriage just as anxiously as they drove up to the gate
and Mary Duff came out to greet them. But Mammy had something
to say before making any motion to leave the carriage.
“Are you quite sure, Miss, dat dis yere little pickaninny of ours ain’t
gwine to be in any one’s way or nuffin?” she asked, bowing a how-
do-you-do to Mary, and keeping a restraining hand upon Brevet.
“Oh, perfectly sure.”
“He done told us you wanted him very much,” but in a half-
questioning tone, as though what Brevet “done told them” was
sometimes “suspicioned” of being slightly coloured by what he
himself would like to do, notwithstanding his general high standard
of truthfulness.
“Brevet is perfectly right—we do want him very much,” Mary
answered, heartily.
“Even if you have to take his old Mammy ‘long wid him, kase Miss
Lindy wasn’t quite willin to ‘low him ter come by hisself?”
“And we’re very glad to see you, Mammy,” Mary answered
cordially, and so the last of Mammy’s scruples, which were not as
real as Mammy herself tried to think them, were put to rest, and
Brevet was permitted to scramble out of the carriage, while Mary
Duff lent a hand to Mammy’s more difficult alighting.
“Is dere ere a man ‘bout could lift dis yere basket ter de house for
us?” she asked, looking helplessly up to the hamper, “kase Daniel
dere has instructions from de Missus neber to leave de hosses less’n
dere ain’t no way to help it.”
“Well, I guess dere is,” chuckled a familiar voice behind her back,
and Mammy turned to discover Joe close beside her.
“Well, I klar, you heah!” she exclaimed. “Why, it seems like de
whole county turn out to welcome dese yere little Bennetts. Seems,
too, like some of us goin’ to be in de way sure ‘nuff.”
“Howsomever, some on us don’ take up so much room as oders,”
grunted Joe, surmising, and quite correctly, too, that Mammy
considered his presence on the scene something wholly unnecessary
and undesirable. “I’se heah to help wid de trunks, Mammy,” he then
added; “what you heah to help wid?”
Mammy, scorning the insinuation, turned to Mary Duff as they
walked up the path.
“You know, Honey, de Lord ain’t lef’ no choice ter most on us as
ter what size we’ll be, but pears like you’d better be a fat ole
Mammy like me, than such a ole bag o’ bones as Joe yonder.”
But Joe by that time was depositing his basket in the hall-way of
the cottage, and was fortunately quite beyond the fire of this
personal attack. Mary Duff was naturally much amused at the real
but harmless jealousy of these old coloured folk, and realised for
perhaps the five hundredth time what children we all are, be race
and nationality what they may.
Meantime Brevet had taken his position on the top rail of the gate,
with one arm around a slim little cedar that stood guard beside it.
“May I stand right out here, Miss Duff,” he called back to Mary, “so
as to see them a long way off?”
“Bless your heart, yes!” Mary answered, quite certain in her mind
that since Courage herself was a little girl she never had seen such a
dear child. Brevet’s watch was a brief one.
“They are coming! Hear the wheels! They are coming,” he cried
exultingly, with almost the next breath. In just two minutes more
they really had come, and Brevet was calling out “How’dy, how’dy,
how’dy” at the top of his strong little lungs, to the wide-eyed
amazement of the Bennetts, who had never heard this Southern
abbreviation of the Northern “How-do-you-do.” Then jumping down
from his perch, he ran up to the carriage, repeating over again his
cordial welcoming “How’dy.”
“How’dy, dear little stranger!” replied Courage, waving a greeting
to Mary; “and who are you I would like to know?”
“I’m Howard Stanhope Ellis, but that’s not what you’re to call me,
I have another name. It’s the name they give—” but he did not
finish his sentence, for charming little fellow though he was, he
could not be allowed to monopolise things in this fashion, and Mary
gently pushed him aside to get him out of her way.
“And so here you are at last,” she said joyously; “welcome home,
Miss Courage. How are you, Sylvia?” while she bent down with a
cordial kiss for each friendly little Bennett. Meantime Courage was
making friends with Brevet, and a moment later the children were
crowding close about him.
“My, but I’m glad to see you all,” he exclaimed, with an
emphasising shake of his head, “and I think I know who’s who too. I
believe this is Gertrude,” laying one little brown hand on Gertrude’s
sleeve, “and you are Mary, because Mary’s the oldest, and you
Teddy, because Teddy comes next, and you—you are Allan.” Brevet
had learned his lesson from Mary Duff quite literally by heart, and
altogether vanquished by his’ joyous, friendly greeting, the children
vied with each other in giving him the loudest kiss and the very
hardest hug, but from that first moment of meeting it was an
accepted fact that Allan held first place. There was no gainsaying the
special joyousness of his “And you—you are Allan.” The boy play-
fellow for whom he had hitherto longed in vain had come, and to
little Brevet it seemed as though the millennium had come with him.
All this while Joe and Mammy, barely tolerating each other’s
presence, waited respectfully in the background, so that Mary had a
chance to explain who they were, as Courage stood in the path,
delightedly looking up at the dear little house that was to be her
home. But Sylvia had already made their acquaintance. After paying
the driver and making sure that nothing had been left in the
carriage, she went straight toward them. “I thought I should find
some of my own people down here in Virginia,” she said, cordially
extending a hand to each as she spoke, “but I did not expect they
would be right on the spot, the very first moment, to welcome me,”
"Miss Duff done tol’ us ‘bout Miss Sylvy bein’ of de party,” said Joe
with great elegance of manner, while Mammy looked daggers at him,
for replying to a remark which she considered addressed chiefly to
herself. It was queer enough, the attitude of these two oldtime
slaves toward each other, and yet to be accounted for, I think, in
their eagerness to be of use to those whom they claimed the
privilege of serving; and each was conscious, by a subtle intuition, of
a determination to outwit the other if possible in this regard—which
was all very well, if they only could have competed in the right sort
of spirit.
But there is no more time in this chapter for Mammy or Joe, nor
anything else for that matter. Indeed, it would take quite a chapter
of itself if I should try to tell you of the unpacking of Grandma Ellis’s
basket, and then of the children’s merry supper; but it seems to me
there are more important things for me to write about, and for you
to read about, than things to eat and of how the children ate them.
By nine o’clock quiet reigned in the little cottage, and “the children
were nestled as snug in their beds” as the little folk in “The Night
before Christmas.” Joe and Mammy and Brevet had long ago gone
home, and Courage and Mary Duff were sitting together in the little
living-room, while Sylvia, in the hall just outside, was busy arranging
the books they had brought with them, on some hanging shelves.
“I think this has been the happiest day in all my life,” said
Courage. “I have simply forgotten everything in the pleasure of
those children.” And then, sitting down at the little cottage piano and
running her hands for a few moments over the keys, she sang softly,
—
“For all the Saints, who from their labour rest,
Thy name, O Jesus, be forever blest.”
The sweet, familiar hymn brought Sylvia to the door.
“Miss Courage,” she said, standing with her arms folded behind
her back, as she had always a way of standing when deeply
interested, “you have forgotten yourself and your sorrow to-day, but
not for one moment have you really forgotten Miss Julia,” and
Courage knew that this was true, and closed the little piano with
tears in her eyes and a wondrous joy and peace in her heart.
N
CHAPTER VI.—ARLINGTON BEFORE
THE WAR.
o sooner were our little New Yorkers settled in their pretty
summer home than they naturally desired that it should have
a name, and after much discussion, according to the Bennett
custom, they all agreed that “Little Homespun,” one of the names
that Courage had suggested, seemed to fit the cosy, unpretentious
little home better than anything else that had been thought of. No
sooner were they settled either before they became friends firm and
fast of the household up at Ellismere. It needed but very little time
to bring that about, because everything was—to use a big word
because no smaller one will do—propitious. You can imagine what it
meant to Courage—taking up her home in a new land, and with
cares wholly new to her—to have a dear old lady like Grandma Ellis
call upon her, as she did the very first morning after her arrival. Of
course Courage had to explain how it was she had come way down
there to Virginia with the little Bennett children in charge. Indeed,
almost before she knew it, and in answer to Grandma Ellis’s gentle
inquiries, she had told her all there was to tell—about Miss Julia,
about herself and Mary Duff and Sylvia, and finally, as always with
any new friend, the why and wherefore of her own unusual name.
The tears stood in Grandma Ellis’s eyes many times during the
narration, and her face was aglow with love and sympathy and
admiration as Courage brought her story to a close.
“And now, my dear,” she had said, “I want you should know what
little there is to tell about us. We live just three miles from here, and
in the same old Virginia homestead where my husband was born.
We, means my son Harry, and Brevet and myself. Brevet, as you
already know, perhaps, has neither father nor mother. His mother
died when he was six months old, and his father, my oldest son, was
drowned when the Utopia went down, off the coast of Spain five
years ago. We are doing our best, Harry and I, to make up to Brevet
for his great loss; but it is sad that the little fellow should only know
the love of an old grandmama like me, and never of his own young
mother. But I do not want to burden you with my sorrows, dear
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PPTX
Introduction to DPDK
PDF
DPDK In Depth
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
PDF
3 additional dpdk_theory(1)
PDF
7 hands on
PDF
DPDK - Data Plane Development Kit
PDF
Embedded Software for the IoT 3rd Edition Klaus Elk
PDF
Making Networking Apps Scream on Windows with DPDK
Introduction to DPDK
DPDK In Depth
High Performance Networking Leveraging the DPDK and Growing Community
3 additional dpdk_theory(1)
7 hands on
DPDK - Data Plane Development Kit
Embedded Software for the IoT 3rd Edition Klaus Elk
Making Networking Apps Scream on Windows with DPDK

Similar to Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu (20)

PDF
Managing Cloud Native Data On Kubernetes 1st Early Release Jeff Carpenter Pat...
PDF
DPDK Summit - 08 Sept 2014 - Introduction - St Leger
PDF
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
PDF
1 intro to_dpdk_and_hw
PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
PDF
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
PDF
How to build and run a big data platform in the 21st century
PDF
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
PDF
DPDK Integration: A Product's Journey - Roger B. Melton
PDF
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
PDF
Embedded Software For The Iot 3rd Edition Klaus Elk
PDF
Agentless Monitoring with AdRem Software's NetCrunch 7
PPT
Closed2Open Networking
PDF
C19013010 the tutorial to build shared ai services session 2
PDF
DPDK Summit 2015 - Intel - Keith Wiles
PDF
The Role of a Network Software Developer in Network Transformation
PDF
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
PDF
Using the SDACK Architecture to Build a Big Data Product
Managing Cloud Native Data On Kubernetes 1st Early Release Jeff Carpenter Pat...
DPDK Summit - 08 Sept 2014 - Introduction - St Leger
Kafka Up And Running For Network Devops Set Your Network Data In Motion Eric ...
1 intro to_dpdk_and_hw
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
DPDK: Multi Architecture High Performance Packet Processing
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
How to build and run a big data platform in the 21st century
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
DPDK Integration: A Product's Journey - Roger B. Melton
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
Embedded Software For The Iot 3rd Edition Klaus Elk
Agentless Monitoring with AdRem Software's NetCrunch 7
Closed2Open Networking
C19013010 the tutorial to build shared ai services session 2
DPDK Summit 2015 - Intel - Keith Wiles
The Role of a Network Software Developer in Network Transformation
Computer Networks An Open Source Approach 1st Edition Ying-Dar Lin
Using the SDACK Architecture to Build a Big Data Product
Ad

Recently uploaded (20)

PDF
Computing-Curriculum for Schools in Ghana
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Cell Structure & Organelles in detailed.
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Classroom Observation Tools for Teachers
PDF
RMMM.pdf make it easy to upload and study
PPTX
Institutional Correction lecture only . . .
PDF
Insiders guide to clinical Medicine.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
Computing-Curriculum for Schools in Ghana
Final Presentation General Medicine 03-08-2024.pptx
O7-L3 Supply Chain Operations - ICLT Program
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Supply Chain Operations Speaking Notes -ICLT Program
Sports Quiz easy sports quiz sports quiz
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Cell Structure & Organelles in detailed.
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Module 4: Burden of Disease Tutorial Slides S2 2025
Classroom Observation Tools for Teachers
RMMM.pdf make it easy to upload and study
Institutional Correction lecture only . . .
Insiders guide to clinical Medicine.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Ad

Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu

  • 1. Data Plane Development Kit A Guide To The User Spacebased Fast Network Applications Heqing Zhu download https://ebookbell.com/product/data-plane-development-kit-a-guide- to-the-user-spacebased-fast-network-applications-heqing- zhu-51692386 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. China Automotive Low Carbon Action Plan 2022 Low Carbon Development Strategy And Transformation Path For Carbon Neutral Automotive Automotive Data Of China Co https://ebookbell.com/product/china-automotive-low-carbon-action- plan-2022-low-carbon-development-strategy-and-transformation-path-for- carbon-neutral-automotive-automotive-data-of-china-co-49199832 Texts From Mittens A Cat Who Has An Unlimited Data Planand Isnt Afraid To Use It Angie Bailey https://ebookbell.com/product/texts-from-mittens-a-cat-who-has-an- unlimited-data-planand-isnt-afraid-to-use-it-angie-bailey-61016852 Data Analysis Plans A Blueprint For Success Using Sas 1ed Kathleen Jablonski Mark Guagliardo https://ebookbell.com/product/data-analysis-plans-a-blueprint-for- success-using-sas-1ed-kathleen-jablonski-mark-guagliardo-10414832 Fundamentals Of Data Engineering Plan And Build Robust Data Systems 1 Converted Joe Reis https://ebookbell.com/product/fundamentals-of-data-engineering-plan- and-build-robust-data-systems-1-converted-joe-reis-55035078
  • 3. Oracle Big Data Handbook Plan And Implement An Enterprise Big Data Infrastructure 1st Edition Tom Plunkett Et Al https://ebookbell.com/product/oracle-big-data-handbook-plan-and- implement-an-enterprise-big-data-infrastructure-1st-edition-tom- plunkett-et-al-5845688 Fundamentals Of Data Engineering Plan And Build Robust Data Systems Joe Reis Matt Housley https://ebookbell.com/product/fundamentals-of-data-engineering-plan- and-build-robust-data-systems-joe-reis-matt-housley-56609802 Data Center Handbook Plan Design Build And Operations Of A Smart Data Center 2nd Edition Hwaiyu Geng https://ebookbell.com/product/data-center-handbook-plan-design-build- and-operations-of-a-smart-data-center-2nd-edition-hwaiyu-geng-28537542 Genetic Data Analysis For Plant And Animal Breeding Holland James Isik https://ebookbell.com/product/genetic-data-analysis-for-plant-and- animal-breeding-holland-james-isik-6750180 Community And Quality Of Life Data Needs For Informed Decision Making Committee On Identifying Data Needs For Placebased Decision Making https://ebookbell.com/product/community-and-quality-of-life-data- needs-for-informed-decision-making-committee-on-identifying-data- needs-for-placebased-decision-making-1632810
  • 8. Data Plane Development KIT (DPDK) A Software Optimization Guide to the User Space-based Network Applications Edited by Heqing Zhu
  • 9. First edition published 2021 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN © 2021 Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright. com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermis sions@tandf.co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. ISBN: 9780367520175 (hbk) ISBN: 9780367373955 (pbk) ISBN: 9780429353512 (ebk) Typeset in Times by codeMantra DPDK uses the FreeBSD license for the most software code, which are running in the user mode. A small amount of code will reside in the kernel mode such as VFIO, KNI, which are released under GPL. BSD license model gives the developers and consumers more flexibility for the commercial use. If a developer wants to contribute any new software code, the license model need be followed. If we go to the DPDK website (www.dpdk.org), we can download the source code for immediate use. DPDK open source development is still very active today, rapidly evolving with new features and large source code contribution, the open source package is released every 3months, this release cadence is decided since 2016.
  • 10. v Contents Preface vii ...................................................................................................................... Editor xv ....................................................................................................................... Contributors xvii ........................................................................................................... SECTION 1 DPDK Foundation Chapter 1 Introduction 3 Heqing Zhu and Cunming Liang .......................................................................................... Chapter 2 Cache and Memory 31 Chen Jing and Heqing Zhu ............................................................................ Chapter 3 Core-Based Parallelism 51 Qun Wan, Heqing Zhu, and Zhihong Wang ...................................................................... Chapter 4 Synchronization 65 Frank Liu and Heqing Zhu .................................................................................. Chapter 5 Forwarding 85 Yipeng Wang, Jasvinder Singh, Zhe Tao, Liang Ma, and Heqing Zhu ......................................................................................... Chapter 6 PCIe/NIC 115 Cunming Liang, Hunt David, and Heqing Zhu .......................................................................................... Chapter 7 PMD 133 Helin Zhang and Heqing Zhu ................................................................................................. Chapter 8 NIC-Based Parallellism 157 Jingjing Wu, Xiaolong Ye, and Heqing Zhu .................................................................... Chapter 9 NIC Offload 183 Wenzhuo Lu and Heqing Zhu ......................................................................................
  • 11. vi Contents Chapter 10 Packet Security 199 Fan Zhang (Roy) and Heqing Zhu ................................................................................. SECTION 2 I/O Virtualization Chapter 11 Hardware Virtualization 219 Qian Xu and Rashmin Patel .................................................................. Chapter 12 Virtio 229 Tiwei Bie, Changchun Ouyang, and Heqing Zhu ................................................................................................ Chapter 13 Vhost-User 251 Tiwei Bie and Heqing Zhu ........................................................................................ SECTION 3 DPDK Applications Chapter 14 DPDK for NFV 265 Xuekun Hu, Waterman Cao, and Heqing Zhu ................................................................................ Chapter 15 Virtual Switch 277 Ciara Loftus, Xuekun Hu, and Heqing Zhu .................................................................................. Chapter 16 Storage Acceleration 291 Ziye Yang and Heqing Zhu ........................................................................ Index 305 ......................................................................................................................
  • 12. vii Preface DPDK (Data Plane Development Kit) started as a small software project at Intel® about a decade ago. By 2019, it has evolved into a leading open-source project under the governance of Linux Foundation. DPDK is known as a kernel bypass networking technology, and it has gained the huge adoption by cloud, telecom, and enterprise networking and security systems. Many technical articles and online documentation shared the DPDK technology and application; many of the contents are available on the Internet. They are very informative and useful, but isolated in many different places. As of 2019, the general-purpose processors are widely used for cloud and telecom network infrastructure systems. DPDK played an important role to deliver the high-speed I/O, and it is widely used in software-defined data centers (Open vSwitch, VMware NSX, Red Hat virtualization, load balancers, etc.) and telecom networking appliances such as Virtual EPC, Virtual Router, and NGFW (next-gen- eration firewall). In April 2015, the first DPDK China summit was held in Beijing. Network devel- opers and system designers from China Mobile, China Telecom, Alibaba, IBM, Intel®, Huawei, and ZTE presented a a variety of topics around DPDK technologies and their use cases. The passionate audience inspired us to write this book about DPDK, and the first edition of Chinese book was published in 2016. In the past 3years, there were continuous improvements in DPDK community. We incorporate some changes into this English book. The DPDK community spent lots of effort on the documentation improvement; the significant progress has been made. This book is intended to make DPDK easy to use, and to cover the design principles and software libraries and software opti- mization approach. We have to admit we have the limited knowledge and language skills, which may leave this book imperfect. This book is a joint work from many talented engineers from Intel®, Alibaba, and Huawei; they are DPDK developers and users. For network engineers and college students, if you are working to build a networking/security system or service, which has not used DPDK yet, this book may help you. The “network function virtualization (NFV)” transformation has inspired DPDK as a key data plane technology, a crucial ingredient to build the 5G and edge systems (uCPE/SD-WAN, MEC). DPDK is used as a foundational I/O software to deliver the wireless and fixed network services. SILICON When I joined Intel® in 2005, a big challenge is how to ensure software to gain the benefits using multicore CPU. Most legacy software was not designed with paral- lelization in mind, not able to gain performance benefits from the multicore proces- sor. Prior to 2005, CPU design focused on increasing the running frequency; the software gained performance improvement without any changes; the increased CPU frequency helps the software to achieve a higher performance. As industry enters
  • 13. viii Preface the multicore journey (almost 15years up to now), the CPU innovation focuses on delivering more cores at each new product release cycle. Table 1 summarizes the core count increase history on Intel® Xeon processors. The server platform evolution is largely driven by Intel® Xeon processor; the most commonly used server platform is based on the dual-socket design; the number of logical computing cores increased about 56 times within 13years. It is a similar aggressive technology development from all other CPU suppliers. In addition to the rapid CPU development, the other silicon components in the server system, such as memory and NIC (network interface card), have gone through a similar and signifi- cant capability boost. Ethernet technology has evolved from the early interface speed at 10/100 Mbps to gigabit Ethernet (1 Gbps). Today, 10 Gbps/25 Gbps NIC is adopted by the new data center servers. In the networking production systems, 100 Gbps ultra-high-speed network interface is also being used, but CPU frequency has remained at the same level as 10years ago. Table 2 summarizes Ethernet development history from https:// ethernetalliance.org/. Supporting the high-speed NIC, substantial software innovations should be made; it is important to have the software innovation with the high parallelism. DPDK is born in the time of change; one of the designed goals is to take advantage of the multicore architecture to enable high-speed NIC. DPDK is an open-source project with broad industry support; many organizations recognize the power of open source and start to participate and contribute; there is a very clear progress in cloud and telecom/security solution. DPDK is born at Intel® first, but it gained the broad contribution due to the open source model. From a cost perspective, the server platform cost is reduced significantly, and a dual-socket server today is just about the similar cost to a high-end laptop 10years ago. The computing capability exceeds what a supercomputer can do at that time. Such rapid technology development makes the server platform as the preferable TABLE 1 The Intel® Xeon Processor in Years CPU Code Name CPU Process (nm) Max # Cores per CPU Release Time Hyper- Threading Total # of Cores on 2-CPU Server WoodCrest 65 2 2006 No 4 Nehalem-EP 45 4 2009 Yes 16 Westmere-EP 32 6 2010 Yes 24 Sandy Bridge-EP 32 8 2012 Yes 32 Ivy Bridge-EP 22 12 2013 Yes 48 Haswell-EP 22 18 2014 Yes 72 Skylake-EP 14 28 2017 Yes 112 Cascade Lake-EP 14 56 2019 Yes 224
  • 14. ix Preface option to implement software-defined infrastructure (SDI), a common platform to deliver the computing, network, and storage tasks. DPDK and Intel® server platform are great recipes to build out the software-defined network or security infrastructure and service. The early DPDK started from a performance test case of Intel® NIC. Today, DPDK software is widely recognized and deployed in the cloud and telecom network infrastructure. DPDK is an important software library to realize the NFV; it helps the concept; it helps in the production deployment. NETWORK PROCESSOR In the context of the hardware platform for the networking workload, it is necessary to discuss the network processor first. The telecom vendors have used network proces- sors or similar chip technology as the primary silicon choice for data plane process- ing. Intel® was a market leader in this domain. Intel® had a product line known as Intel® Internet Exchange Architecture (IXA); the silicon product is known as IXP4xx, IXP12xx, IXP24xx/23xx, and IXP28xx. The technology was successful, and the product was a market leader. Within IXP silicon, a large number of microengines are inside of the processors; they are programmable for data plane processing. IXP has the XScale processor for the control plane, and it is a StrongARM-based silicon component. In 2006, AMD has gained the processor leadership in x86 processor domain, and Intel® has to optimize the research and development organization, and investment portfolio. The network processor business was evaluated, and the internal finding indicated that IXP’s overall business potential was not large enough for the long-term investment. Intel® gained the No.1 market share in 2006, but the market size was not big enough to support the long-term growth. Without the specialized network proces- sor silicon, it requires other solution to address high-performance packet processing TABLE 2 Ethernet Port Speed Evolution Ethernet Port Speed 10 Mb/s 100 Mb/s 1 GbE Year ~1980 ~1995 ~1998 10 GbE ~2002 25 GbE ~2016 40 GbE ~2010 50 GbE ~2018 100 GbE ~2011 200 GbE ~2017 400GbE ~2017
  • 15. x Preface workloads. Intel®’s architects predicted that multicore x86 processor will be devel- oped in a faster pace, so it makes sense to replace IXP roadmap with CPU-based technology. As a result, Intel® stopped the IXP product line development and grad- ually shifted towards CPU-based solution, which requires a software approach for networking workloads. This is an important strategic shift. The business plan is to converge all the networking-related solutions into x86-based multicore processor. The data plane workload requirement is different than the general computing character- istic; it needs to be fulfilled with the dedicated software solution; DPDK is a solution to respond to this strategic shift. The IXP is still existing at Intel®, which contributes to the accelerator technology, known as Intel® QuickAssist Technology (QAT), which is commonly available in QAT PCIe card, server chipset, or SoC (system on chip). In 2006, the networking system needs to support the 10Gbps I/O. At that time, Linux system and kernel drivers were not able to achieve this. DPDK comes up as the initial software-based solution, as it meets up a new silicon trend at the dawn of multiple-core processor architecture. Since its birth, DPDK is very busy keeping up to the growing high-speed I/O such as 25Gbe, 40Gbe, and 100Gbe Ethernet. In a short summary, DPDK was born in a disruptive time. The software design focuses on performance and scalability, and it is achieved by using multicores. Together with Intel’s tick-tock silicon model, it sets the rapid product cadence com- pared to most of the network silicon technologies. Under the tick-tock model, Intel® released the new processor in such a cadence. • CPU architecture needs to be refreshed every 2years. • CPU manufacturing process needs to be refreshed every 2years. At the early time, this is a very aggressive product beat rate in the silicon technology sector. Later, the smartphone silicon has the more aggressive schedule. DPDK HISTORY A network processor supports the packet movement in and out of the system at the line rate. For example, the system receives the 64-byte packet at a line rate of 10 Gbps, which is about 14.88 Mpps (million packets per second). This cannot be achieved in the early Linux kernel using x86 server platform. Intel’s team started with a NIC performance test code; a breakthrough is made with NIC poll mode driver in Linux user mode. Traditionally, the NIC driver runs in Linux kernel mode and wakes up the system for the packet processing via interrupts, for every incoming packet. In early days, CPU is faster than the I/O processing unit; the interrupt-based pro- cessing is very effective; CPU resource is expensive, so it is shared by different I/O and computing tasks. However, the processor and high-speed network interface speed mandates the packet processing to support 10 Gbps and above, which exceeds what the traditional Linux networking software stack can deliver. The CPU fre- quency still remains at 3GHz or lower; only the game computer platform can do the frequency overclock up to 5GHz. The networking and communication systems need to consider the energy efficiency as it is always on; it runs 24 × 7hours. Therefore, the network infrastructure need to take the power consumption into the total cost
  • 16. xi Preface of ownership (TCO) analysis. Today, most network systems run below 2.5GHz for energy saving. From the silicon perspective, I/O speed is 10×~100× faster than before; CPU core count on a given platform is also up to 100×. One obvious way is to assign the dedicated cores to poll on the high-speed network ports/queues, so that the software can take advantage of the multicore architecture; this concept is a design foundation on DPDK. At the very beginning, Intel® only shared the early prototype source code with the limited customers. Intel® shipped the early software package with FreeBSD license. 6wind played an important role to help in the software development and enhancement; there is a business contract. From 2011, 6wind, Wind River, Tieto, and Radisys announced business services support for Intel® DPDK. Later, Intel® shared DPDK code package on its website; it’s free for more developers to download and use. 6wind set up the open-source website www.dpdk.org in April 2013; this became a host website; and eventually, DPDK becomes one of Linux Foundation Projects. OPEN SOURCE Today, any developer can submit source code patches via www.dpdk.org. At the beginning, DPDK focused on Intel® server platform enabling such as the optimal use of Intel® processor, chipset, accelerator, and NIC. This project grew significantly with the broad participation with many other silicon companies; this transformed DPDK to be ready with multiple architecture and multiple I/O (such as NIC, FPGA (field-programmable gate array)) support. Intel® is a foundational member to invest and grow the software community, together with other member companies and individual developers such as 6Wind, Redhat, Mellanox, ARM, Microsoft, Cisco, VMware, and Marvell (Cavium). DPDK version 2.1 was released in August 2015. All major NIC manufacturers joined the DPDK community to release the NIC PMD support, including Broadcom NIC (acquired by Emulex), Mellanox, Chelsio, and Cisco. Beyond the NIC driver, DPDK is expanded for the packet-related acceleration technology; Intel® submitted the software modules to enable Intel® QAT for crypto acceleration, which is used for packet encryption/decryption and data compression. The DPDK community has made great progress on the multiple architecture sup- port. Dr. Zhu Chao at IBM Research China started the migrating DPDK to sup- port Power Architecture. Freescale China's developers joined the code contribution. Engineers from Tilera and EZchip have spent efforts to make DPDK running on the tile architecture. DPDK also supported the ARM architecture later. DPDK became a Linux Foundation Project in 2017: “The first release of DPDK open-source code came out 8years ago; since that time, we’ve built a vibrant com- munity around the DPDK project”. Jim St. Leger, DPDK board chair, Intel®, wrote this statement “We’ve created a series of global DPDK Summit events where the community developers and code consumers gather. The growth in the number of code contributions, participating companies, and developers working on the project continues to reflect the robust, healthy community that the DPDK project is today”. RedHat integrated DPDK into Fedora Linux first, then added to RedHat Enterprise Linux; many other Linux OS distribution packages followed this. VMware
  • 17. xii Preface this. VMware engineers joined the DPDK community and took charge of the maintainer role of VMXNET3-PMD, i.e., the de-facto high-performance software virtual interface to the guest software on VMware NSX. Canonical added DPDK support since Ubuntu 15. For public cloud computing, netvsc PMD can be used for Microsoft Hyper-V, and ENA PMD is available to support AWS Elastic Network Adapter. EXPANSION DPDK is designed to run in Linux user space; it is intended to stay closer to the application, which usually runs in Linux user space. The DPDK test case indicated a single Intel® core can forward packets with an approximate speed of 57 Mpps; this is achieved in an extremely simplified test case. Open vSwitch is a classical open- source component which is used by cloud infrastructure server. OVS integrated DPDK to accelerate virtual switching performance; this is widely adopted by the large-scale cloud computing and network virtualization systems, DPDK added the virtio–user interface to connect container with OVS-DPDK, and this is an overlay networking acceleration in a vendor neutral way. For telecom equipment manufacturers, the server platform and open-source software, such as Linux, DPDK/VPP/Hyperscan, are important and new recipes to design and deliver the networking and security systems; they are also important reci- pes for cloud service model. Furthermore, Linux networking stack is also innovating fast with XDP and AF_XDP. This adds the interesting dynamics now as it offers the bypass path for Linux kernel stack, and it returns NIC management to the existing Linux utilities. It provides the new way to use Linux and DPDK together. As one of the best open-source networking projects in the past decade, DPDK became a widely adopted software library for accelerating packet processing per- formance (10x more than Linux kernel networking) on the general-purpose server platforms. It is heavily used in many use cases, which are as follows: • Build the network and security appliances and systems. • Optimize the virtual switching for cloud infrastructure. • Optimize storage systems with high I/O demand, like NVM device. • NFV, build the software-centric networking infrastructure with servers. • Cloud networking and security function as a service. BOOK CONTRIBUTION This book is a joint contribution from many individuals who worked and are work- ing for Intel®. The early Chinese book editions are mainly contributed by Cunming Liang, Xuekun Hu, Waterman Cao (Huawei), and Heqing Zhu as the main editors. Each chapter in this book has the following list of contributors. Section 1: DPDK foundation Chapter 1: Heqing Zhu, Cunming Liang Chapter 2: Chen Jing (Alibaba), Heqing Zhu
  • 18. xiii Preface Chapter 4: Frank Liu (Netease), Heqing Zhu Chapter 5: Yipeng Wang, Zhe Tao (Huawei), Liang Ma, Heqing Zhu Chapter 6: Cunming Liang, Hunt David, Heqing Zhu Chapter 7: Helin Zhang, Heqing Zhu Chapter 8: Jingjing Wu, Xiaolong Ye, Heqing Zhu Chapter 9: Wenzhuo Lu, Heqing Zhu Chapter 10: Fan Zhang (Roy), Heqing Zhu Section 2: I/O Virtualization Chapter 11: Qian Xu, Rashmin Patel Chapter 12: Tiwei Bie, Changchun Ouyang (Huawei), Heqing Zhu Chapter 13: Tiwei Bie, Heqing Zhu Section 3: DPDK Application Chapter 14: Xuekun Hu, Waternan Cao (Huawei), Heqing Zhu Chapter 15: Loftus Ciara, Xuekun Hu, Heqing Zhu Chapter 16: Ziye Yang, Heqing Zhu For the DPDK Chinese edition, the draft received the review and feedback from the below volunteers: • Professor Bei Hua (USTC) and Kai Zhang (USTC, in Fudan University now); • Professor Yu Chen, Dan Li (Tsinghua University); • Dr. Liang Ou (China Telecom). Lots of folks’s work lead to this book content: Intel®: Yong Liu, Tao Yang, De Yu, Qihua Dai, Cunyin Chang and Changpeng Liu, St Leger Jim, Yigit, Ferruh, Cristian Dumitrescu, Gilmore Walter, O'Driscoll Tim, Kinsella Ray, Konstantin Ananyev, Doherty Declan; Bruce Richardson, Keith Wiles, DiGiglio, John; Liang-min Wang, Jayakumar, Muthurajan Alibaba: Xun Li, Huawei Xie. This book leveraged the content from Redhat and VMware open-source developers and product managers. VMware: William Tu, Justin Petit; Redhat: Kim Buck, Anita Tragler, and Franck Baudin. Special thanks to John McManara, Lin Zhou, Jokul Li, Ahern, Brian, Michael Hennessy Xiaomei Zhou, and Labatte Timmy for providing leadership support. Dan Luo/Lin Li helped in the translation and review from Chinese edition to the English version. Roy Zhang is very instrumental to guide the technology review of English edition. DPDK is contributed by worldwide talents, who created the technology and made it thrive. At Intel®, it was mainly led by Intel® fellow: Venky (passed away at 2018) in Oregon and St leger Jim in Arizona. Heqing Zhu
  • 20. xv Editor Heqing Zhu was born in China. He has worked with Intel® for 15years. His roles include software developer, engineering leadership, product management, solution architect in telecom and cloud networking, and open-source software development. Prior to Intel®, he worked for Alcatel Shanghai Bell and Huawei. Mr. Zhu currently lives in Chandler, Arizona, in the United States. He graduated from the University of Electronic Science and Technology of China (UESTC) with a master’s degree in Information and Communication System.
  • 22. xvii Contributors Tiwei Bie Ant Financial Shanghai China Waterman Cao Huawei Shanghai China Jing Chen Alibaba Shanghai China Xuekun Hu Intel® Shanghai China David Hunt Intel® Shannon Ireland Cunming Liang Intel® Shanghai China Frank Liu NetEase Hangzhou China Jijiang(Frank) Liu NetEase Hangzhou China Ciara Loftus Intel® Shannon Ireland Wenzhuo Lu Intel® Shanghai China Liang Ma Intel® Shannon Ireland Changchun Ouyang Huawei Shanghai China Rashmin Patel Intel® Arizona USA Jasvinder Singh Intel® Shannon Ireland Qun Wan Intel® Shanghai China Yipeng Wang Intel® Oregon USA
  • 23. xviii Contributors Zhihong Wang Intel® Shanghai China Jingjing Wu Intel® Shanghai China Qian Xu Intel® Shanghai China Ziye Yang Intel® Shanghai China Xiaolong Ye Intel® Shanghai China Fan(Roy) Zhang Intel® Shannon Ireland Helin Zhang Intel® Shanghai China Tao Zhe Huawei Shanghai China
  • 24. Section 1 DPDK Foundation There are ten chapters in this section, focusing on the basic concepts and software libraries, including CPU scheduler, multicore usage, cache/memory management, data synchronization, PMD (poll mode driver) and NIC-related features, and soft- ware API (application programming interface). PMD is very important concept and the new user space software driver for NIC. Understanding the DPDK (Data Plane Development Kit) basics will build a solid foundation before starting on the actual projects. In the first five chapters, we will introduce the server platform basics such as cache use, parallel computing, data synchronization, data movement, and packet forward- ing models and algorithms. Chapter 1 will introduce the networking technology evo- lution, and the network function appliance trend, going from hardware purpose-built box to software-defined infrastructure, which is present in the cloud. Essentially, it is the silicon advancement that drives the birth of DPDK, and a few basic examples are given here. Chapter 2 will introduce memory and cache in the performance opti- mization context, how to use cache and memory wisely, the concept of the HugePage and NUMA (non-uniform memory access), and the cache alignment data structure. Chapters 3 and 4 will focus on multicore and multi-thread, the effective model for the data sharing for high parallelism, and the lock-free mechanism. Chapter 5 will move to the packet forwarding models and algorithms, and the decision is required to choose the run-to-completion and/or pipeline model, or both. The next five chapters will focus on I/O optimization, and we will talk about PCIe, NIC, and PMD design and optimization details, which enables DPDK PMD to deliver the high-speed forwarding rate to meet demands at 10 Gbe, 25 Gbe, and 40 Gbe up to 100 Gbe in a server platform. Chapter 6 will have a deep dive on PCIe transaction details for the packet movement. Chapter 7 will focus on NIC
  • 25. 2 Data Plane Development KIT performance tuning, and the platform and NIC configuration. Chapters 8 and 9 will go further into NIC common features and software usage, the multi-queue, flow classification, core assignment, and load balancing methods to enable a highly scal- able I/O throughput, and will introduce the NIC offload feature on L2/L3/L4 packet processing. Chapter 10 is about packet security and crypto processing; securing the data in transit is an essential part of Internet security. How can DPDK add value to “encrypt everywhere”?
  • 26. 3 1 Introduction Heqing Zhu and Cunming Liang Intel® 1.1 WHAT’S PACKET PROCESSING? Depending on whether the system is a network endpoint or middlebox, packet pro- cessing (networking) may have different scope. In general, it consists of packet recep- tion and transmission, packet header parsing, packet modification, and forwarding. It occurs at multiple protocol layers. CONTENTS 1.1 What’s Packet Processing? 3 ............................................................................... 1.2 The Hardware Landscape 4 ................................................................................. 1.2.1 Hardware Accelerator 5 ........................................................................... 1.2.2 Network Processor Unit 6 ........................................................................ 1.2.3 Multicore Processor 7 .............................................................................. 1.3 The Software Landscape 9 .................................................................................. 1.3.1 Before DPDK 11 ...................................................................................... 1.3.2 DPDK Way 13 ......................................................................................... 1.3.3 DPDK Scope 14 ....................................................................................... 1.4 Performance Limit 16 .......................................................................................... 1.4.1 The Performance Metric 16 ..................................................................... 1.4.2 The Processing Budget 17 ....................................................................... 1.5 DPDK Use Case 18 .............................................................................................. 1.5.1 Accelerated Network 19 .......................................................................... 1.5.2 Accelerated Computing 20 ...................................................................... 1.5.3 Accelerated Storage 20 ............................................................................ 1.6 Optimization Principles 20 .................................................................................. 1.7 DPDK Samples 21 ............................................................................................... 1.7.1 HelloWorld 21 .......................................................................................... 1.7.1.1 Initialize the Runtime Environment 22 .................................... 1.7.1.2 Multicore Initialization 23 ........................................................ 1.7.2 Skeleton 24 ............................................................................................... 1.7.2.1 Ethernet Port Initialization 25 .................................................. 1.7.3 L3fwd 28 .................................................................................................. 1.8 Conclusion 30 ...................................................................................................... Further Reading 30 .......................................................................................................
  • 27. 4 Data Plane Development KIT • In the endpoint system, the packet is sent to the local application for further processing. Packet encryption and decryption, or tunnel overlay, may be part of the packet processing, session establishment, and termination. • In the middlebox system, the packet is forwarded to the next hop in the net- work. Usually, this system handles a large number of packets in/out of the system, packet lookup, access control, and quality of service (QoS). The packet may go through the hardware components such as I/O (NIC) interface, bus interconnect (PCIe), memory, and processor; sometimes, it may go through the hardware accelerator in the system. Most of the packet movements and modifications can be categorized as follows: • Data movement is, like packet I/O, from PCIe-based NIC device to cache/ memory, so that CPU can process further. • Table lookup/update, this is memory access (read/write), this is used for packet- based access control, or routing decision (which interface to be sent out). • Packet modification involves network protocols that are defined at many different layers, just like peeling the onion layers; each protocol layer has its data format, and usually, it is defined by the International Standard like Internet Engineering Task Force/ Request for Comments (IETF/RFC). Packet processing often involves the packet change, header removal, or addition. 1.2 THE HARDWARE LANDSCAPE Traditionally, the network system is highly complicated and consists of the control, data, signal, and application plane; each plane can be realized with the different sub- systems; and these systems are known as the embedded systems with low power con- sumption, low memory footprint, but real-time characteristics. Such systems require the hardware and software talents to work together. In early 2000, CPU only had a single core with high frequency; the first dual-core processor for general computing emerged in 2004. Prior to that, the multicore, multi- thread architecture is available in the networking silicon, but not in the general- purpose processors. In the early years, x86 was not the preferred choice for packet processing. As of today, the below silicon can be used for packet processing system. From the programmer skills, they can be split into the different category. • Hardware accelerator (FPGA (field-programmable gate array), ASIC (application-specific integrated circuit)); • Network processor unit (NPU); • Multicore general-purpose processor (x86). These systems are used for different scenarios; each hardware has certain advantages and disadvantages. For large-scale and fixed function systems, the hardware accel- erator is preferred due to its high performance and low cost. The network processor
  • 28. 5 Introduction provides the programmable packet processing, thereby striking a balance between flexibility and high performance, but the programming language is vendor specific. In the recent years, P4 has emerged as a new programming language for packet pro- cessing, and it gained the support from Barefoot Switch and/or FPGA silicon, but not common for NPU. The multicore general-purpose processor has the traditional advantages such as supporting all generic workloads and the server platform that is commonly equipped with high-speed Ethernet adapters. The server has quickly evolved as the preferred platform for packet processing. It can support the complex packet process- ing together with the application and service; the application and service software can be written with many different programming languages (C, Java, Go, Python). Over the years, there are lots of high-quality open-source projects that emerged for packet processing, such as DPDK, FD.io, OPNFV, and Tungsten.io. The cloud infrastructure has gone down a path known as NetDevOps approach, taking the open source to deliver software-defined networking and security infrastructure and service. From the perspective of the silicon advancement, new accelerator and high-speed I/O units have been integrated with multicore processors. This leads to the genera- tion of system on chip (SoC). SoC is cost-effective. Silicon design has longer life cycles. 1.2.1 HARDWARE ACCELERATOR ASIC and FPGA have been widely used in packet processing. Hardware developers are required to implement the chip and use the chip. An ASIC is an integrated circuit designed for special purpose. This integrated circuit is designed and manufactured based on the specific requirements of target systems. ASIC is designed for specific users’ needs; it needs the large-volume production to afford the high R&D cost; it is smaller in size; and it has lower power consumption, high reliability and perfor- mance, and reduced cost, in comparison with the general-purpose processor. ASIC’s shortcomings are also obvious: not flexible, not scalable, high development costs, and long development cycles. ASIC leads to the development of the popular accelerators such as crypto and signal processing. Combining ASIC with the general-purpose processors will lead into SoC that provides heterogeneous processing capability. In general, the dedicated board design is needed to use ASIC. FPGA is a semi-custom circuit in the ASIC domain. Unlike ASIC, FPGA is pro- grammable and flexible to use. FPGA is inherently parallelized. Its development method greatly differs from the software. FPGA developers require an in-depth understanding of hardware description language (Verilog or VHDL). The general software executes in the sequential order on the general-purpose processor; the soft- ware parallelism is up to the software design. FPGA silicon can include some fixed functions and some programmable functions. FPGA has made great progress in the data center in the recent years, and FPGA can be used as a cloud service. FPGA can be used for smart NIC. FPGA is often selected to build the super-high-speed I/O interface, advanced packet parsing and flow filtering, and QoS acceleration. FPGA
  • 29. 6 Data Plane Development KIT can be offered as add-in card; through PCIe interface, it is easy to be plugged into the server system; it is popular for cloud data center scenario. FPGA can also be used for a specific purpose, like signal processing. FPGA is also often used in a special board design. Take the 5G wireless base station as an example; the telecom vendor develops the system in stages, and it may use FPGA to build the early-stage product. Once the product quality is in good shape, the high-volume needs will drive the new stage, which focuses on using ASIC (SoC) to replace FPGA, which will drive the cost down for the large-scale use. 1.2.2 NETWORK PROCESSOR UNIT NPU is a programmable chip specifically designed for packet processing. It is usu- ally designed with a multicore-based parallel execution logic, dedicated modules for packet I/O, protocol analysis, routing table lookup, voice/data encoding/decoding, access control, QoS, etc. NPU is programmable, but not easy; the developer needs to take a deep dive into the chip’s datasheet and learn the vendor-specific instruction sets, known as microcode (firmware); the user needs to develop the hardware-based processing pipeline for the target network application. The network applications are realized with the loadable microcode (firmware) running on NPU. NPU generally has the built-in high-speed bus and I/O interface technology. In general, NPU has the built-in low latency memory modules; it allows the forwarding table using the on-chip memory, which makes it faster than the external DRAM access. NPU can be integrated as part of SoC; in recent years, NPU-based silicon vendors have been consolidated by CPU or NIC vendors The below diagram (Figure 1.1) is a conceptual diagram. The “packet processing engines” is programmable hardware logic, which allows the rapid implementation of workload-specific packet processing; the written microcode can run on many paral- lel engines. “Physical I/O” interface is a fixed function to comply with the standard- ized interface specification. “Traffic manager” and “classification and queueing” are relatively fixed functions, which are common features in most of the network systems. They are built in as the specialized hardware units for QoS and packet ordering. “Internal memory” and “TCAM (ternary content-addressable memory)” provide the low latency access memory for the packet header parsing and forward- ing decision. The memory controller connects the external memory chips for larger memory capacity. NPU has many advantages, such as high performance and programmability, but its cost and workload-specific characteristics imply the market limit. It is often used in communication systems. Different NPU products have been available from various silicon vendors. As said before, NPU microcode is often vendor-specific; thus, it is not easy to use, and it is not easy to hire the talented developers who understand NPU well. Due to the steep learning curve, the time to market will be affected by the availability of talents. Because NPU is used to its limited market, it does not create enough job opportunities. The experienced talents may shift focus to leave the network processor domain to seek the career growth opportunities elsewhere.
  • 30. 7 Introduction FIGURE 1.1 NPU conceptual block. There are many attempts of using NPU with the common programming lan- guages (like C); technically speaking, it is possible and it is available, but it is not the best way of using NPU. But the other reality is the performance gap between using C and using microcode language. Translation from C programming language to microcode is doable, but it is not the optimal way of getting the most performance out of NPU. So this path is feasible, but does not have the real value. If all NPUs from different vendors can support P4 language, it will be much easier to use. Though, the performance dilemma may remain as a similar concern. P4 ecosystem is still in its development phase, and the broad NPU/P4 support is yet to be seen. 1.2.3 MULTICORE PROCESSOR In the past 15 years, CPU delivered huge boost on the processing cycles, thanks to the new era of the multicore architecture. With more cores available in the general- purpose processor, it is natural to assign some cores for the packet processing pur- pose; CPU finds its way to converge more workload on its own. From the historic networking system perspective, (low-power) CPU has always been preferred for control plane (protocol handling), and some network services (like wireless data service) are computer intensive. A few cores can be assigned to handle the data plane processing, whereas the remaining cores can be used for control and service plane processing, which can make a more efficient and cost-effective system. Looking back, take a telecom appliance in 2005, which is a complicated system (chassis) consisting of ten separate boards; each board is a subsystem having a specific function, and each board is built with a dedicated silicon and software. So different boards within a single system have different platform designs and silicon components. Later, the idea is to converge all the subsystems into a powerful server system, and the subsystems can run as software threads (within virtual machines or containers). This new approach is expected to transform the network industry with less silicon compo- nents, less heterogeneous system and platform architecture, and hence less cost. This is the initial motivation for implementing network functions with software. On the other hand, CPU release cycle is short, and new processors come to the market on a yearly basis. But the network processors and accelerators are difficult to
  • 31. 8 Data Plane Development KIT catch up with this fast release cadence, which often happens every 3 or 4 years. The market requires the large-scale shipping of the general-purpose processor, and the market does not have the strong business demand to refresh the networking silicon. Over the years, the business drives the significant progress on the general-purpose processor with a very competitive cost. Figure 1.2 describes a dual-socket server platform. Two Xeon processors are interconnected with UPI (Ultra Path Interconnect) system bus; memory channels are directly connected to both processor units; all the external devices are connected via the PCIe interface; each socket is connected with 2 × 25 Gbe Ethernet adapters (NIC) with 100 Gbps I/O for data in and out. Lewisburg PCH is a chipset, which is served as the platform controller, supporting the additional management engine, such as high-speed I/O (USB, SATA, PCIe), 4 × 10 Gbe Ethernet interface, and Intel® QAT (built-in security accelerator for crypto and compression functions). The processor, memory, and Ethernet devices are the main hardware modules to handle the packet processing tasks. PCI-e interface provides the I/O extensibility and sup- ports to add the differentiated accelerators (or more flexible I/O) into server plat- forms, e.g., FPGA cards The general-purpose processor can be integrated with additional silicon IP; then, it will evolve into SoC. The SoC system often consists of a processor, the integrated memory controller, the network I/O modules, and even hardware accelerators such as security engine and FPGA. Here are a few known SoC examples: • Intel®: Xeon-D SoC, Atom SoC; • Tilera: TILE-Gx; • Cavium network: OCTEON & OCTEON II; • Freescale: QorIQ; • NetLogic: XLP. FIGURE 1.2 A dual-socket server platform.
  • 32. 9 Introduction The block diagram as shown in Figure 1.3 is one example of Intel® Atom SoC, which is a power-efficient silicon; it’s the tiny chip consists of 2x atom core, internal chip- set, Intel® QAT, and 4 × 10 Gbe Ethernet interface. Given that all the hardware units are integrated into a single chip, SoC is viewed as power efficient and cost-effective. The power consumption of this chip is less than 10 W. DPDK can run on this chip to move the packet data from Ethernet to CPU with the line rate of 10 Gbps. SoC is also popular in the ARM-based processor design. For example, AWS released the Graviton SoC for cloud computing use case. It is important to point out that CPU has made huge progress with cloud virtualization and container technologies, which provides more granularity and flexibility to place software workload with computa- tion resources. CPU-based packet processing reduces the needs for hardware talents; it deals mostly with the software. 1.3 THE SOFTWARE LANDSCAPE DPDK is created for high-speed networking (packet processing). Before DPDK was born, most of the personal computers and server systems were installed with Windows or Linux OS. All these systems had the networking capabilities, and support the network protocol stack and talk to each other via Ethernet and socket interface; the low-speed network processing capability is good enough for a computation-centric system. There is a big difference in supporting low-speed network processing (10/100 Mbps) and supporting high-speed network processing (1/10/25/40/50/100 Gbps). Before the multicore architecture was common in a CPU, it was not a popular idea to use an Intel® processor for network processing system. Let’s take a look at the popular wireline and wireless network protocol stacks, and a little more specific on what’re the network protocol layers. Figure 1.4 shows an example of classic wireline network protocol layers (including both OSI model and TCP/IP model). The left side shows the OSI 7-layer model, whereas the right FIGURE 1.3 Intel® Atom SoC diagram.
  • 33. 10 Data Plane Development KIT FIGURE 1.4 The OSI model and TCP/IP model (wireline). side shows the TCP/IP model. The TCP/IP model is often implemented by the Linux kernel systems. By default, the incoming packet goes to the Linux kernel at the “link layer”, and this is often done by NIC and its driver, which is running in the kernel space. The whole network stack can be handled by the Linux kernel. In the network system, the packet has to be copied from the kernel space to the user space because the application usually resides in the user space, and this application will eventually consume the arrived packet. Before zero copy is introduced to Linux stack, packet copy is an expensive processing, but it is essential, as packet is received in Linux kernel but consumed by the user space application. For the middlebox networking systems (like routers), routing functions are largely implemented in the user space as well, not using the Linux kernel space stack. Generally, software development in the kernel space is much harder in terms of debugging and testing. User space software is easy to develop/debug. There are certain systems to handle everything in the ker- nel, but they are rare. Figure 1.5 describes wireless 4G (LTE) user plane network protocol layers. eNo- deB is a wireless base station with air interface. Serving gateway (GW) and PDN GW are the wireless core networking systems. For the 5G system, the user plane stack is very similar to 4G, eNodeB becomes gNodeB, and serving GW/PDN GW becomes UPF. As we can see, the protocol layers are different between the base sta- tion and the core system. In the wireless base station (eNodeB), L1/L2 protocols are different because the wireless interface is essentially air interface and uses wireless signal processing and codec technology. We will not cover any details here. It is a completely different technology domain. From eNodeB to serving GW, the packet processing system is similar to a typical wireline network system, where L1/L2 is based on Ethernet interface and GTP/UDP/IP protocols are running on the top of Ethernet. 5G network stack is also similar. 5G prefers to use cloud architecture in order to implement the service-oriented elastic model, so that the edge computing
  • 34. 11 Introduction FIGURE 1.5 Wireless user plane network protocol (4G/LTE). and network slicing are part of the new service. Packet processing and computing service are converged at the 5G network infrastructure node. This is largely due to the multicore CPU that has the huge computation power. In the early years, not a long time ago, the computing system focused on singular workload, or singular ser- vice provision, where the packet processing requirement is low and the Linux kernel stack is sufficient to handle less than 1 Gbps Ethernet interface. Later, the network- ing systems need support multiple high-speed networking interfaces (far more than 1 Gbps), and they require different software options. Linux kernel networking cannot meet the higher networking demand. 1.3.1 BEFORE DPDK In the early 2000s, Intel® processor was not widely used for high-speed network processing. NPU was a silicon choice at Intel®; now the change must happen; a path- finding effort at Intel® is kicked off. How does a traditional NIC device process the packet in a server system using Linux? The steps are summarized below: • A packet arrives at a NIC (PCIe device). • The NIC completes DMA (direct memory access) and copy packet into host memory region known as a packet buffer. • The NIC sends an interrupt to wake up the processor. • The processor reads and writes the packet descriptor and packet buffer. • The packet is sent to the Linux kernel protocol stack for more protocol pro- cessing like IP-related access control decision. • If the application resides in the user space, the packet data (also known as payload) will be copied from the kernel space to the user space. • If the application resides in the kernel space, the data will be processed in the kernel mode (less percentage).
  • 35. 12 Data Plane Development KIT In the early system, each incoming packet may trigger an interrupt. The interrupt overhead includes the context switching, and it is affordable if there are not many packets coming to the system in a short period. In the past decade, CPU frequency remained almost the same, but Ethernet speed jumped from 10/100 Mbps to 1 Gbps, 10 Gbps to 25 Gbps, and 40 Gbps to 100 Gbps. Software is facing the challenge to handle the large packet burst scenario on a growing network interface, huge number of packets will arrive, and the system cannot afford high amount of interrupt pro- cessing. Simply speaking, the overhead is too high. NAPI (new API) mechanism was introduced into the Linux kernel. It allowed the system, after wakeup by the interrupt, to initiate the software routine that processes multiple packets in a poll- ing manner, until all packets are handled, then goes back to the interrupt mode. The NAPI method can significantly improve the packet processing efficiency in a high burst scenario. Later, Netmap (2011), a well-known high-performance network I/O framework, uses a shared pool of packets to reduce the packet replication from the kernel space to the user space. This solves the other problem—the high cost of packet copy [1,2]. Netmap and NAPI have significantly improved the packet processing capability on legacy Linux system. Is there any further improvement room? As a time-sharing operating system, Linux will schedule many tasks with time-slicing mechanism. Compared with equal time assigned to all tasks, Linux scheduler has the job to assign the different time slices for different tasks. The number of CPU cores was relatively small in earlier years, and in order to allow every task to be processed in a timely fashion, time sharing was a good strategy to support multiple tasks to sharing the expensive processor cycles, although this method was done at the cost of efficiency. Later, CPU has more cores available, so it is time to look at the new ways to optimize the system performance. If the goal is to pursue high performance, time sharing is not the best option. One new idea is to assign the dedicated cores to the dedicated tasks. Netmap reduces the memory copy from the kernel space to the user space, but there are still Linux schedulers, which do not eliminate the overhead of the task switch. The additional overhead from the task switch and the subsequent cache replacement caused by the task switch (each task has its own data occupied in cache) will also have an adverse impact on the system performance. By nature, network workload is latency sensitive, and packet may traverse many hops through Internet; as a result, real time is a critical system requirement for the networking system. It is a long processing path from the NIC interrupt to the soft- ware interrupt routine (served by CPU), and then the packet payload is handled by the final application; this path takes lots of cycles. Prior to 2010, x86 CPU was not a popular silicon choice to design the high-speed packet processing system. In order to complete the solution transition from NPU- based silicon to x86/software, there are a few fundamental challenges. Intel® engi- neers need answer for the following: • A software path to enable the packet processing on x86 CPU; • Find a better software method to do things differently; • A performance scale way using multicore architecture; • How to tune “Linux system” as packet processing environment.
  • 36. 13 Introduction 1.3.2 DPDK WAY DPDK is the answer to the above challenges; particularly, PMD (polling mode driver) has been proved as a high-speed packet processing software library on Linux. DPDK, essentially, is based on a set of software optimization principles, a set of soft- ware libraries to implement the high-performance packet movement on the multicore processor. The initial goal is to focus on the high-speed packet I/O on the server platform. This software demonstrated that the server is good for the networking system, and it can handle the high-speed data plane. The journey was not easy, and it was achieved through heavy engineering investment. It is built on the many software optimization practices. Let’s navigate a few ideas and techniques quickly. Polling mode: Assign the dedicated core for NIC packet reception and transmis- sion. This approach does not share core for other software tasks, and the core can run in the endless loop to check if any packet just arrives or needs to be sent out, thus reducing the need for interrupt service and its overhead. We will discuss the trade- offs between polling and interrupt mechanisms later. In fact, DPDK supports both mechanisms and even hybrid-use model. User space driver: In fact, in most scenarios, the packet needs to be sent to the user space eventually. Linux NIC driver is mostly kernel based. The user space driver can avoid unnecessary packet memory copy from the kernel space to the user space, and it also saves the cost of system calls. An indirect benefit is that the user space driver is not limited to the packet buffer structure mandated in the Linux kernel mode. Linux kernel stack mandates the stable interface, and the DPDK-based mbuf (memory buffer) header format can be flexibly defined (because it is new) so that it can be designed in DMA-optimized way for NIC. This flexibility adds the perfor- mance benefit. The user space driver is flexible, it is easy to modify, and it meets the rapid development needed for different scenarios. Core affinity: By setting a thread’s affinity to a particular CPU core, specific tasks can be bound with cores (thread). Without the core affinity assignment, there might be the task switching among different cores, and the drawback to this assignment is that thread switching between cores can easily lead to performance losses due to cache misses and cache write-back. One further step is to ask a core to be excluded from Linux scheduling system, so that the core is only used for the specific task. Optimized memory: Network processing is an I/O-bound workload scenario. Both CPU and NIC need access to the data in memory (actually cache and/or DRAM) frequently. The optimal memory access includes the use of HugePage and contiguous memory regions. For example, HugePage memory can reduce the TLB misses, multichannel-interleaved memory access can improve the total bandwidth efficiency, and the asymmetric memory access can reduce the access latency. The key idea is to get the data into cache as quickly as possible, so that CPU doesn’t stall. Software tuning: Tuning itself cannot be claimed as the best practice. In fact, it refers to a few known tuning practices, such as cache line alignment of data struc- ture, avoiding false sharing between multiple cores, pre-fetching data in a timely manner, and bulk operations of multiple data (multi-buffer). These optimization
  • 37. 14 Data Plane Development KIT methods are used in every corner of DPDK. The code example can be found in the “l3fwd” case study. It is important to know that these techniques are com- monly applicable; beyond DPDK, any software can be optimized with the similar approach. Using the latest instruction set and platform technologies: The latest instruc- tion sets of Intel® processor and other new features has been one of the innovation sources of DPDK optimization. For example, Intel® DDIO (Direct Data I/O) tech- nology is a hardware platform innovation in DMA and the cache subsystem. DDIO plays a significant role to boost I/O performance as the packet data can be directly placed into cache, thus reducing the CPU access latency on DRAM. Without DDIO, packet is always placed into memory first, and then CPU needs to fetch packet data from DRAM into cache, which means the extra cycles that CPU needs to wait. The other example is how to make the best use of SIMD (single-instruction multiple data) and multiple buffer (multi-buffer) programming techniques. Some instructions, like CMPXCHG, are the cornerstone for lockless data structure design. Crc32 instruction is also a good source for efficient hash computation. These contents will be covered in later chapters. NIC driver tuning: When the packet enters the system memory through PCIe interface, I/O performance is affected by the transaction efficiency among the PCIe- based device, bus transaction, and the system memory. For example, the packet data coalescence can make a difference through transferring multiple packets together, thus allowing a more efficient use of PCIe bus transactions. Modern NICs also support load balancing mechanisms such as receive side scaling (RSS) and Flow Director (FDir) features, which enable NIC multiple queue to work with CPU mul- tiple core model. New NIC offload can also perform the packet header checksum, TCP segmentation offload (TSO), and tunnel header processing. DPDK is designed to take full advantage of the NIC features for performance reasons. These contents will be described in Chapters 6–9. Network virtualization and cloud-native acceleration: Initial DPDK optimi- zation focuses on moving packets from I/O to CPU. Later, DPDK provides the optimal way to move packets from host to tenants (VM, container tenants). This is a crucial ingredient for cloud infrastructure and network function virtualiza- tion (NFV). DPDK supports both SR-IOV and vSwitch optimization with PMD concept. Security acceleration: DPDK can run from the bare metal to the virtualized guest and container-based environment; the initial Application Programming Interface (API) abstraction is NIC centric; later, it is extended from Ethernet to crypto, com- pression, and storage I/O acceleration. Crypto and compression APIs are impor- tant software abstraction; they can hide the underlying silicon’s implementation difference. 1.3.3 DPDK SCOPE Here are the basic modules within DPDK. It mimics the most network functions in software and serves as a foundational layer to develop a packet processing system (Figure 1.6).
  • 38. 15 Introduction FIGURE 1.6 DPDK framework and modules. Core libraries (core libs) provide the Linux system abstraction layer and pro- vides software APIs to make use of the HugePage memory, cache pool, timers, lock- free rings, and other underlying components. PMD libraries provide all user-space drivers in order to obtain a high network throughput by PMD. Basically, all industry-leading Ethernet companies are offer- ing the PMD at DPDK. In addition, a variety of virtual NICs for Microsoft (netvsc) and VMware (vmxnet3), and KVM-based virtualized interfaces (virtio) are also implemented. Classify libraries support exact match, longest prefix match (LPM), wildcard matching (ACL [access control list]), and cuckoo hash algorithm. They focus on flow lookup operations for common packet processing. Accelerator APIs supported the packet security, data compression, and event modeler for core–core communications. The FPGA accelerated function or SoC units can be hidden under the abstract software layers here. QoS libraries provide network QoS components such as Meter and Sched. In addition to these components, DPDK also provides the platform features like POWER, which allows the CPU clock frequency to change at runtime for energy saving, and KNI (kernel network interface), which builds a fast channel to the Linux kernel stack. The Packet Framework and DISTRIB provide the underlying compo- nents for building a more complex multicore pipeline processing model. This is an incomplete picture as the DPDK project evolves quickly with new release in every quarter. In DPDK, most components are BSD (Berkeley Software Distribution)- based license, making it friendly for the further code modification and commercial adoption.
  • 39. 16 Data Plane Development KIT 1.4 PERFORMANCE LIMIT The performance limit can be estimated by the theoretical analysis. The theoretical limit can come from multiple dimensions. Take an example of packet processing; the first limit is the packet forwarding rate, which is determined by the physical interface speed, also known as the line speed on a given interface. When the packet enters the memory from NIC, it will go through the I/O bus (e.g., PCIe bus). There is a limit at the PCIe bus transaction level. Of course, there is a ceiling for CPU to load/store packet data to cache lines. For example, Intel® Haswell processor can only load 64 bytes and store 32 bytes in a cycle. The memory controller is limited by a memory read/write bandwidth. All these hardware platform boundaries in different dimensions contribute to the workload’s performance limit. Through optimization, the software developer can look at these dimensions to write a high-performing soft- ware; the goal is to get closer to the performance limit. In theory, I/O interface, PCIe bus, memory bandwidth, and cache utilization can set quantitative limits; it is easy to say but not easy to do, as it requires in-depth system-level understanding. Good developers know how to measure the performance tuning progress and find potential room for improvement. It takes a lot of effort to tune the workload and then get closer the theoretical limits. If the software is already extremely optimized, there will be no good return to continue pushing the boundary. DPDK has a design goal to provide the software libraries so that we can push the performance limit of packet processing. Is it designed in a way that has already reached the system limit? As we try to gain a better understanding of DPDK, let’s get back to the simple facts. What are the common performance metrics to measure the packet processing? 1.4.1 THE PERFORMANCE METRIC The performance indicators for packet processing such as throughput, latency, packet loss, and jitter are the most common metrics. For packet forwarding, the throughput can be measured as packet rate—pps (packets per second); another way to measure is bit rate—bps (bits per second). Bit rate is often associated with the physical inter- face, like NIC port speed. Different packet size often indicates different require- ments for packet storing and forwarding capabilities. Let’s establish a basic concept of effective bandwidth and packet forwarding rates. The line rate (transmitting in the wire speed) is the maximum packet (or frame) forward rate, which is limited by the speed of physical interface, theoretically. Take Ethernet as an example. The interface is defined as 1 Gbps, 10 Gbps, 25 Gbps, 40 Gbps, and 100 Gbps; each represents the maximum transmission rate, measured in bps. Indeed, not every bit is used to transmit the effective data. There is an inter- packet gap (IPG) between Ethernet frames. The default IPG is 12 bytes. Each frame also has a 7-byte preamble and a 1-byte Ethernet start frame delimiter (SFD). The Ethernet frame format is shown in Figure 1.7. The effective data in Ethernet frame mainly includes the destination MAC (media access control) address, source MAC address, Ethernet type, and payload. The packet tail is known as FCS (frame check- sum) code, which is designed to validate the Ethernet frame integrity (Figure 1.7).
  • 40. 17 Introduction The Ethernet frame forwarding rate and the bit rate can be translated by the equa- tion shown in Figure 1.8. As shown in Table 1.1, if the data is transmitted in full interface speed, the small packet implies the high packet arrival rate. In a nutshell, small packets cause the larger processing burden as more packets will arrive in a given time, and it can be 10× more overhead for small packet (64 bytes) than for normal packet 1.4.2 THE PROCESSING BUDGET What is the biggest challenge for processing packets on a general-purpose proces- sor like CPU? Take 40 Gbps Ethernet as an example; the curve shown in Figure 1.9 indicates the maximum forwarding rate for the different packet sizes. Internet traffic is a mix of many packets of different sizes For the packet size of 64 bytes and 1024 bytes, respectively, CPU instruction cycles (as the processing budget) are different to meet 40 Gbps forwarding require- ment, i.e., 33 cycles vs 417 cycles. It is a very different system load when dealing with small or large size packets. If the packet size is smaller, the packet interval is shorter, i.e., 16.8 ns vs 208.8 ns. Assuming that the CPU frequency is running at 2 GHz, 64-byte and 1024-byte packets can, respectively, consume 33 and 417 clock cycles to reach the line rate. In the store-forward model, the packet receiving, transmitting, and forward table lookup need to access the memory. CPU can only wait if there is memory access; this access takes the cycles, known as memory latency. The actual FIGURE 1.7 Ethernet frame format. Bit Rate / 8 From bit to byte Frame Forwarding Rate = (# of pps), mpps - million pps IPG + Preamble + SFD + Packet size FIGURE 1.8 Packet forwarding rate: From bps to pps. TABLE 1.1 Packet Forwarding Rate with Different Packet Sizes Interface Speed 10 Gbps 25 Gbps 40 Gbps # of Mpps arrival (ns) # of Mpps arrival (ns) # of Mpps arrival (ns) Packet size (Byte) 64 14.88 67.20 37.20 26.88 59.52 16.80 128 8.45 118.40 21.11 47.36 33.78 29.60 256 4.53 220.80 11.32 88.32 18.12 55.20 512 2.35 425.60 5.87 170.24 9.40 106.40 1024 1.20 835.20 2.99 334.08 4.79 208.80
  • 41. 18 Data Plane Development KIT FIGURE 1.9 Cost of packet instructions at 40 Gbps line rate. latency depends on where the data is located. If data resides in the last level cache (LLC), the access takes about 40 clock cycles. If there is an LLC miss, it will need an external memory read, with an estimated latency of 70 ns, which is translated to 140 cycles on a 2-GHz system. For a small packet size (64 bytes) to reach 40 Gbps, the total budget is just about 33 cycles. This poses a challenge to the general-purpose CPU system for high-performance network workload use. Does this rule out CPU for high-performance network workload? The answer is no. DPDK has a solution to tackle the challenge; a set of optimization methods are implemented to make the best use of the hardware platform features (DDIO, HugePage memory, cache alignment, thread binding, NUMA (non-uniform memory access) awareness, memory interleave, lock-free data structure, data pre-fetching, use of SIMD instructions, etc.). By combining all these optimization methods, the memory latency challenge can be minimized. A single core can do L3 packet forwarding rate at about 40 Mpps; it is measured in a simplified test case, which just moves packets from Ethernet to CPU. Giving that there is an increasing number of CPU cores that are available, I/O performance can scale up with more Ethernet adapters that are inserted with more PCIe slots. The L3 forwarding throughput has been measured in 4-socket system, and it can reach about 1TPPS on Intel® Xeon Scalable Processor. It is a signifi- cant milestone announced by an open-source project—FD.io/VPP in 2017 [3]. DPDK is not yet able to handle the extreme high I/O use case. For example, the optical transport interface needs support from 400 Gbps interface. This is too much for a software-only approach on a server platform; the PCIe bus and NIC are not available yet. DPDK reduces the network system development barrier, and now soft- ware developers can implement the network functions. 1.5 DPDK USE CASE DPDK is the new open-source technology, and it is rising together with the server platform, which is going through the rapid development, and has more CPU cores and higher-speed Ethernet interface. The server platform cost is very appealing when compared to the mainstream network/storage systems (which are expensive
  • 42. 19 Introduction due to their purpose-built silicon-based system design). From business side, this open-source project reduces the software investment. Overall, the trend is to build the software-defined infrastructure on server platforms, and DPDK has been proved to deliver the accelerated network, and computing and storage functions. 1.5.1 ACCELERATED NETWORK DPDK, as open source, can be of immediate use without license cost. What’s more exciting, it delivers a performance boost with each new generation of server plat- form. Processor refresh means the improved IPC (instruction per cycle) at per core level, and it may be enhanced with a more effective system bus that is connected with more cores. DPDK on a single core can deliver 10/25 Gbps easily. Together with multi-queues in NIC, DPDK drives the 40/100 Gbps interface using two or more cores. Xeon processor has many cores, and the packet throughput bottleneck is often limited by PCI-e (I/O) interface and lanes. Indeed, I/O is the limit, not CPU. Prior to DPDK (or similar technology), the network infrastructure system is often designed with a highly complicated system; hardware and software co-design is common; it requires the engineers to know both hardware and software skills. So the network system was mainly done by a large-sized engineering organization with high cost. It involves the system chassis; different but purpose-built boards are con- nected as the subsystems, which may meet the needs of signal processing, data plane, control, and application services. Each board is built with the specific embedded software and its function; hence, it requires the dedicated engineering investment. Decoupling the hardware and software development, and building software on the top of common server system, is a big and disruptive change; the server platform and high-speed NIC are easy to get; it is flexible to add more NICs to meet more I/O demand; the cost is very transparent on the server platform. It is easy to find software talents who can write code, and load and debug the program on the standard server platform. It is difficult to find the designers who understand the NPU or Network ASIC/ FPGA to build the highly complex system. From the cloud computing industry, the software developers adopt the open source on the server platform, so they built the load balancer and the anti-DDoS system using DPDK, and they run the software in a stan- dard server for its in-house production deployment. This is the early example to replace the commercial load balancer system. Later, the network gateway can be moved to run within a virtual machine, and it can be offered as an elastic network service. Similar idea known as network function virtualization (NFV) emerged in 2012. Server virtualization leads to the cloud computing; it allows multiple tenants to run workload on the logically isolated environment but consolidated on a physically shared server. By replicating this to the telecom network infrastructure, NFV is intended to run the network functions as a tenant workload and consolidate multiple workloads on a physically shared server. This will allow telecom service providers to get more flexibility in choosing the vendor and solution suppliers, e.g., decoupling the hardware and software suppliers. This idea is an important step to build out 5G infrastructure and network service. Software defined networking (SDN)/NFV is a big wave to transform network infrastructure; to build the new system, DPDK is valuable to be part of software architecture.
  • 43. 20 Data Plane Development KIT 1.5.2 ACCELERATED COMPUTING For network nodes, it is very easy to understand the value of DPDK. In fact, DPDK is very helpful for the cloud computing node, like Open vSwitch acceleration. Open vSwitch is responsible for creating the overlay network for multiple tenant’s isolation needs; any traffic to tenant workload will go through Open vSwitch. DPDK is used to accelerate Open vSwitch, which performs better than the Open vSwitch (OVS) kernel data path and saves the CPU cycles for cloud computing server. Linux kernel protocol stack provides the rich and powerful network service. However, it is also slow. The user space stack, on the top of DPDK, is highly desired for more performance boost. The attempt has been known, like applying BSD protocol stack in the user space, built on DPDK. Tencent, one of the biggest cloud comput- ing companies, released this project, F-Stack, in open source, and it is available at http://www.f-stack.org/. Furthermore, the web application software like Nginx, and the memory database software like Redis, can be easily integrated with F-Stack. Nginx and Redis are very popular computing workloads running in the cloud data center. 1.5.3 ACCELERATED STORAGE In 2016, Intel® open-sourced the storage performance development kit (www.SPDK.io). Its storage device is similar to NIC; it has I/O device, but for data storage, user space PMD is a more effective than Linux kernel driver for the high-performance scenario; the PMD can drive the NVMe device. It allows the application to have a faster access to the SSD. SDPK provides the other components such as NVMe-oF, iSCSI, and vhost server support. We will describe the storage and network acceleration in the later chapters. 1.6 OPTIMIZATION PRINCIPLES While DPDK has adopted lots of optimization methods, many approaches are appli- cable for any software. The core principles are believed to be reusable in other area, which are as follows: 1. Target Software Optimization for a Specific Workload The specialized hardware is one way to achieve high performance, but DPDK uses the general-purpose processor and reaches the desired high- performance goal with software optimization techniques. In the early phase, the research efforts covered all possible platform components such as CPU, chipset, PCIe, and NIC. The optimization is always on the network workload characteristics. 2. Pursuing Scalable Performance The multicore era is a significant silicon progress. It enables high paral- lelism to achieve scalable performance. To avoid data contention, the sys- tem design needs to avoid the data race as much as possible. This is to say, design the data structure as a local variable, and use the lockless design to gain high throughout. Focus on an architectural approach to exploit the multicore for performance scaling.
  • 44. 21 Introduction 3. Seeking Cache-Centric Design and Optimization In the context of system and algorithm optimization, code implementa- tion optimization is much less known, often ignored. Code implementation optimization requires developers to have a good understanding of computer and processor architecture. DPDK depends on the optimization techniques such as cache usage and memory access latency impacts. As a programmer, if the code is written in a way to take cache utilization into account, the software optimization is probably half-finished. Most new software devel- opers may not think how cache will behave at all. 4. Theoretical Analysis with Practice What is the performance limit? Is there any room for performance tun- ing? Is it worthy of in-depth study? Sometimes, it is easy to say, but difficult to do. By doing analysis, inferring, prototyping, and testing over and over, the optimization is often such an experimental journey. It is an incremental progress, and if the time permits, it is always good to have the performance model and analysis, as it will help to set achievable design goals. Cloud computing is essentially taking more workload in a physical system. The success is due to CPU that has many core. Edge computing platform is the new trend, i.e., move the computing closer to the data, so it will provide the low latency computing experi- ence, i.e., drive the new use case. DPDK can move data faster and store data quicker. DPDK, as the user space networking, is known for bypassing the heavy Linux kernel networking stack. Many of the DPDK optimization principles are also applicable to the Linux kernel network; there is an excellent progress in the kernel stack, such as XDP and AF-XDP. As Linux kernel comes up with its own bypass mechanism, more options are available for network developers to choose. 1.7 DPDK SAMPLES DPDK concepts are discussed, and three examples are given here to get started with code for a quick look and feel. 1. HelloWorld is a simple example. It will set up a basic running environment for packet processing. DPDK establishes a software environment abstrac- tion layer (EAL), which is based on Linux (alike) operating system, and causes this environment to be optimized for packet processing. 2. Skeleton is a most streamlined single-core packet sending and receiving example. It may be one of the fastest packets in/out testing code in the world. 3. L3fwd, Layer 3 forwarding, is one of main DPDK applications to showcase the use case, and it is heavily used for performance benchmark tests. 1.7.1 HELLOWORLD HelloWorld is a simple sample for both codes and functions. It creates a basic run- ning environment for multicore (multi-thread) packet processing. Each thread will print a message “hello from core #”. Core # is managed by the operating system.
  • 45. 22 Data Plane Development KIT Unless otherwise indicated, the DPDK thread in this book is associated with a hard- ware thread. The hardware thread can be a logical core (lcore) or a physical core. One physical core can become two logical cores if hyper-threading is turned on. Hyper-threading is an Intel® processor feature that can be turned on or off via BIOS/ UEFI. In the code example, rte refers to the runtime environment and eal means environment abstraction layer (EAL). The most DPDK APIs are prefixed with rte. Similar to most of the parallel systems, DPDK has adopted the master thread and multiple slave threads models, which is often running in an endless loop. Int main(int argc, char **argv) { int ret; unsigned lcore_id; ret = rte_eal_init(argc, argv); if (ret < 0) rte_panic("Cannot init EALn"); /* call lcore_hello() on every slave lcore */ RTE_LCORE_FOREACH_SLAVE(lcore_id) { rte_eal_remote_launch(lcore_hello, NULL, lcore_id); } /* call it on master lcore too */ lcore_hello(NULL); rte_eal_mp_wait_lcore(); return 0; } 1.7.1.1 Initialize the Runtime Environment The main entry of the master thread is <main> function. It will invoke the below entry function to start the initialization. int rte_eal_init(int argc, char **argv). The entry function supports the command line input, which is a long string of command combinations. One of the most common parameters is “-c <core mask>”; the core mask assigns which CPU threads (cores) need to be assigned to run DPDK master and slave threads, with each bitmask representing a specific core. Using “cat /proc/cpuinfo” can inspect the CPU cores on the given platform. A select core needs to be careful on the dual-socket platform, as local core and remote core can bring different workload performance.
  • 46. 23 Introduction As said, rte _ eal _ init includes a list of complicated tasks, e.g., parsing the input command parameters, analyzing and configuring the DPDK, and setting up the runtime environment. It can be categorized as follows: • Configuration initialization; • Memory initialization; • Memory pool initialization; • Queue initialization; • Alarm initialization; • Interrupt initialization; • PCI initialization; • Timer initialization; • Memory detection and NUMA awareness; • Plug-in initialization; • Master thread initialization; • Polling device initialization; • Establishing master-slave thread channels; • Setting the slave thread to the wait mode; • Probing and initializing PCIe devices. For further details, it is recommended to read the online document or even source code. One place to start is liblibrte _ ealcommoneal _ common _ options.c. For DPDK users, the initialization has been grouped with EAL inter- face. The deep dive is only needed for in-depth DPDK customization. 1.7.1.2 Multicore Initialization DPDK always tries to run with multiple cores for high parallelism. The software program is designed to occupy the logical core (lcore) exclusively. The main function is responsible for creating a multicore operation environment. As its name suggests, RTE _ LCORE _ FOREACH _ SLAVE (lcore _ id) iterates all usable lcores designated by EAL and then enables a designated thread on each lcore through rte _ eal _ remote _ launch. int rte_eal_remote_launch(int (*f)(void *), void *arg, unsigned slave_id); “f” is the entry function that slave thread will execute. “arg” is the input parameter, which is passed to the slave thread. “slave_id” is the designated logical core to run as slave thread. For example, int rte _ eal _ remote _ launch(lcore _ hello, NULL, lcore _ id). The parameter lcore _ id designates a specific core to run as a slave thread, and executes from the entry function lcore _ hello. In this simple example, lcore _ hello just reads its own logical core number (lcore_id) and prints out “hello from core #”.
  • 47. 24 Data Plane Development KIT static int lcore_hello(__attribute__((unused)) void *arg) { unsigned lcore_id; lcore_id = rte_lcore_id(); printf("hello from core %un", lcore_id); return 0; } In this simple example, the slave thread finishes its assigned work and quits immedi- ately. As a result, the core is released. In most other DPDK-based samples, the slave thread will run as an infinite loop, taking care of the packet processing. 1.7.2 SKELETON This sample only uses a single core. It is probably the only DPDK sample to run with a single core, and it is designed to implement the simplest and fastest packet in and out of the platform. The received packets are transmitted out directly and without any meaningful packet processing. The code is short, simple, and clean. This is a test case to measure a single-core packet in/out performance on a given platform. The pseudocode will call rte _ eal _ init to initialize the runtime environ- ment, check the number of Ethernet network interfaces, and assign the memory pool via rte _ pktmbuf _ pool _ create. Input parameter is designated with rte _ socket _ id (), this is to specify which memory needs to be used, and with no doubt, it always prefers the local memory in the local socket. We will explain the basic concept later. Then, the sample calls port _ init(portid, mbuf _ pool) to initialize Ethernet port with memory configuration, and finally, it calls lcore _ main() to start the packet processing. int main(int argc, char *argv[]) { struct rte_mempool *mbuf_pool; unsigned nb_ports; uint8_t portid; /* Initialize the Environment Abstraction Layer (EAL). */ int ret = rte_eal_init(argc, argv); /* Check there is an even number of ports to send/ receive on. */ nb_ports = rte_eth_dev_count(); if (nb_ports < 2 || (nb_ports & 1)) rte_exit(EXIT_FAILURE, "Error: number of ports must be evenn"); /* Creates a new mempool in memory to hold the mbufs. */ mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_ MBUFS * nb_ports,MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_ BUF_SIZE, rte_socket_id());
  • 48. 25 Introduction /* Initialize all ports. */ for (portid = 0; portid < nb_ports; portid++) if (port_init(portid, mbuf_pool) != 0) rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu8 "n",portid); /* Call lcore_main on the master core only. */ lcore_main(); return 0; } 1.7.2.1 Ethernet Port Initialization port_init(uint8_t port, struct rte_mempool *mbuf_pool) This function will be responsible for Ethernet port configuration like the queue con- figuration, and in general, Ethernet port is configurable with multi-queue support. Each receive or transmit queue is assigned with memory buffers for packet in and out. Ethernet device will place the received packets into the assigned memory buffer (DMA), and buffer is part of the memory pool, which is assigned at the initialization phase, socket aware. It is important to configure the number of queues for the designated Ethernet port. Usually, each port contains many queues. For simplicity, this example only specifies a single queue. For packet receiving and transmitting, port, queue, and memory buf- fer configuration are the separate concept. If no specific configuration is specified, the default configuration will be applied. Ethernet device configuration: Set the number of receiving and transmitting queues on a specified port, and configure the ports with input options. int rte_eth_dev_configure(uint8_t port_id, uint16_t nb_rx_q, uint16_t nb_tx_q, const struct rte_eth_conf *dev_conf) Ethernet port/queue setup: Configure the specific queue of a specified port with memory buffer, the number of descriptors, etc. int rte_eth_rx_queue_setup(uint8_t port_id, uint16_t rx_queue_id, uint16_t nb_rx_desc, unsigned int socket_id, const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp) int rte_eth_tx_queue_setup(uint8_t port_id, uint16_t tx_queue_id, uint16_t nb_tx_desc, unsigned int socket_id, const struct rte_eth_txconf *tx_conf) After the Ethernet port initialization is completed, the device can be started with int rte_eth_dev_start(uint8_t port_id).
  • 49. 26 Data Plane Development KIT Upon finishing, Ethernet port can have the physical MAC address, and the port will be turned on with promiscuous mode. In this mode, the incoming Ethernet packets can be received into the memory, allowing the core to do further processing. static inline int port_init(uint8_t port, struct rte_mempool *mbuf_pool) { struct rte_eth_conf port_conf = port_conf_default; const uint16_t rx_rings = 1, tx_rings = 1; /* Configure the Ethernet device. */ retval = rte_eth_dev_configure(port, rx_rings, tx_ rings, &port_conf); /* Allocate and set up 1 RX queue per Ethernet port. */ for (q = 0; q < rx_rings; q++) { retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE, rte_eth_dev_socket_id(port), NULL, mbuf_pool); } /* Allocate and set up 1 TX queue per Ethernet port. */ for (q = 0; q < tx_rings; q++) { retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE, rte_eth_dev_socket_id(port), NULL); } /* Start the Ethernet port. */ retval = rte_eth_dev_start(port); /* Display the port MAC address. */ struct ether_addr addr; rte_eth_macaddr_get(port, &addr); /* Enable RX in promiscuous mode for the Ethernet device. */ rte_eth_promiscuous_enable(port); return 0; } The packet reception and transmission is done in an endless loop, which is imple- mented in the function of lcore _ main. It is designed with performance in mind and will validate the assigned CPU cores (lcore) and Ethernet devices that are physi- cally on the same socket. It is highly recommended to use local CPU and local NIC in the local socket. It is known that remote socket will bring a negative performance impact (more details will be discussed in a later section). The packet processing is done with the packet burst functions. In both receive (rx) and transmit (tx) sides, four param- eters are given, namely, ports, queues, packet buffer, and the number of burst packets.
  • 50. 27 Introduction Packet RX/TX Burst Function: static inline uint16_t rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **rx_pkts, const uint16_t nb_pkts) static inline uint16_t rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) Now we have the basic look and feel of DPDK packet receiving and transmitting code. The software has no dependency on vendor-specific NIC. From the very begin- ning, DPDK takes the software design into account, and the device abstraction layer is well designed to run across the platforms and NICs from multiple vendors. static __attribute__((noreturn)) void lcore_main(void) { const uint8_t nb_ports = rte_eth_dev_count(); uint8_t port; for (port = 0; port < nb_ports; port++) if (rte_eth_dev_socket_id(port) > 0 && rte_eth_dev_socket_id(port) !=(int) rte_socket_id()) printf("WARNING, port %u is on remote NUMA node to " "polling thread.ntPerformance will " "not be optimal.n", port); /* Run until the application is quit or killed. */ for (;;) { /* * Receive packets on a port and forward them on the paired * port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc. */ for (port = 0; port < nb_ports; port++) { /* Get burst of RX packets, from first port of pair. */ struct rte_mbuf *bufs[BURST_SIZE]; const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs, BURST_SIZE); if (unlikely(nb_rx == 0)) continue; /* Send burst of TX packets, to second port of pair. */
  • 51. 28 Data Plane Development KIT const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0, bufs, nb_rx); /* Free any unsent packets. */ if (unlikely(nb_tx < nb_rx)) { uint16_t buf; for (buf = nb_tx; buf < nb_rx; buf++) rte_pktmbuf_free(bufs[buf]); } } } } 1.7.3 L3FWD This is a famous and popular DPDK example, as it is frequently used to measure DPDK performance metrics. The typical test scenario, server, is installed with the high-speed Ethernet adapters and connected with PCIe slots. Ethernet ports of the server platform are connected with the external hardware packet generator (usually from IXIA or Spirent), and the l3fwd sample can demonstrate 200 Gbps forwarding rate on dual-socket server platforms easily. In this sample, the packet is received from Ethernet. CPU will check the IP header for validation, and it will complete the routing table lookup using the destination IP address. Once the destination port is found, the packet is sent out with IP header modification like TTL update. Two routing table lookup mechanisms are implemented: the exact match based on the destination IP address and the LPM-based lookup. The l3fwd sample contains more than 2,700 lines of code (including blank lines and comment lines), and the main body is actually a combination of the HelloWorld and Skeleton. To enable this instance, the command parameters are given in the following format: ./build/l3fwd [EAL options] -- -p PORTMASK [-P] --config(port, queue, lcore)[, (port, queue, lcore)] The command parameter is divided by “--” into two parts. • The section after “--” is the command option of l3fwd sample. • The section before “--” is used for DPDK’s EAL options, mainly for run- time environment initialization and configuration. [EAL options] is the configuration to set up the runtime environment, and it is passed over to rte _ eal _ init for processing. • PORTMASK identifies the Ethernet ports for DPDK use. By default, the Ethernet device is managed by the Linux kernel driver. For example, the device name is “eth1”. In today’s DPDK version, the user can bind the spe- cific device to DPDK, which uses the igb _ uio kernel module to allow
  • 52. 29 Introduction the device configuration in the user space, where DPDK can take the device into control. A script is available to help the device bind operation, known as dpdk-devbind.py. The below example binds “eth1” for DPDK use. dpdk-devbind --bind=igb_uio eth1 Note: In the early DPDK version, the DPDK initialization will scan the known PCIe devices for use, which can lead to the in-use network port being disconnected. l3fwd sample configuration options a scalable performant approach on the basis of the (port, queue, lcore) configuration. It connects the assigned core with Ethernet port and queues. In order to achieve a high packet forwarding rate, multiple CPU cores can work together, and each core drives the specific port and queue for packet I/O (Table 1.2). The master thread is similar to HelloWorld or Skeleton, and no further explana- tion is required here. Initialize the running environment: rte_eal_init(argc, argv); Parse the input parameters: parse_args(argc, argv) Initialize lcore and port configuration Initialize Ethernet ports and queues, similar to Skeleton sample Start the Ethernet port Invoke the slave threads to execute main_loop () The slave thread will do the actual packet I/O, and the entry function is known as main _ loop(). It will run as follows: Reads lcore information to complete configuration; Reads information about send and receive queues; Packet loop processing: { Sends packets in bulk to the transmit queue; Receives packets in bulk from the receive queue; Forwards packets in bulk; } Sending packets in bulk (or in bursts) to the designated queue and receiving packets in bulk from the designated queue are common in DPDK. It is an effective way for TABLE 1.2 L3fwd Common Options: Port, Queue, Core Port Queue Thread Characterization 0 0 0 Queue 0 of processing port 0, thread 0 0 1 2 Queue 1 of processing port 0, thread 2 1 0 1 Queue 0 of processing port 1, thread 1 1 1 3 Queue 1 of processing port 1, thread 3
  • 53. 30 Data Plane Development KIT optimal platform resource use. Batched packet forwarding is done based on either the exact match (Hash) or the LPM selected as the compilation option. The example includes the code implementation based on SSE, known as “multi-buffer” principle, and is a known practice to get more performance on Intel® processors. So far, lots of code have been shown here, and the intent is to give a quick feel on what’s DPDK. In the later chapters, this book will not try much on the code deep dive. It is difficult to keep up to date with the latest code development, as DPDK community is still very active, so the used code might be obsolete as this book is in public. 1.8 CONCLUSION What is DPDK? It is a set of software libraries, implemented on the basis of the software optimization principles and practices, hosted as an open-source project under Linux foundation. It is known for moving packets into server platform, and for moving packets into virtualization machines (or container tenants). The DPDK has been included by many Linux distribution packages, such as Red Hat, CentOS, and Ubuntu. DPDK is established as the leading user space networking open-source project. DPDK is widely adopted for networking and cloud infrastructure, network appliances, and virtual switch and storage acceleration systems worldwide. FURTHER READING 1. https://wiki.linuxfoundation.org/networking/napi. 2. http://info.iet.unipi.it/~luigi/netmap/. 3. https://fd.io/2017/07/fdio-doubles-packet-throughput-performance-terabit-levels/.
  • 54. 31 2 Cache and Memory Chen Jing Alibaba Heqing Zhu Intel® DPDK is initially optimized for Intel®-based network platform. This chapter focuses on the cache, memory using Intel® processor based server platform as the hardware example. This chapter does not incorporate the non-Intel® platform; however, the concept should be very similar and applicable to other architecture and platforms. 2.1 DATA ACCESS AND LATENCY Generally, the computer system consists of cache, memory, and storage hardware modules for data process, movement, and storage. Cache and DRAM hold the run- time data that the CPU needs to access, and the data will go away if the system is power off. Hard drives, SSD, optical disk, and USB flash drive are persistent data storage devices; once the data is written, it will exist in these devices even after the system is powered off. Cache and memory controllers have been an integrated part of modern processors. Typically, processors always access data from storage device CONTENTS 2.1 Data Access and Latency 31 ............................................................................... 2.2 Intel® Xeon Architecture 33 ............................................................................... 2.3 Intel® Atom® SoC 36 ........................................................................................... 2.4 Cache 37 .............................................................................................................. 2.4.1 Cache Line 38 ......................................................................................... 2.4.2 Cache Prefetching 38 ............................................................................... 2.4.3 Software Prefetching 40 .......................................................................... ...................................................................................... ................................................................................ ....................................................................... ......................................................................................... ......................................................... ............................................................ ............................................................................................. ............................................................................................................... ............................................................................................................ ....................................................................................................... Further Reading 49 2.8 NUMA 48 2.7 DDIO 48 2.6 Memory Latency 47 2.5.2 Reserve HugePage at Runtime 46 2.5.1 Reserve HugePage at Boot time 46 2.5 TLB and HugePage 45 2.4.6 Noisy Tenant and RDT 45 2.4.5 Cache Coherency 43 2.4.4 False Sharing 43
  • 55. Random documents with unrelated content Scribd suggests to you:
  • 56. T CHAPTER II.—COURAGE TAKES HEART. his time, as before, there is a story to tell because of something braved and dared for Miss Julia’s sake; something that needed less nerve, perhaps, than the leap Courage took that night on the drawbridge, but something that called not only for a world of a different sort of courage, but for infinite patience as well, and that claimed the whole summer for its doing. The reason for it all lay in four little words—Miss Julia was dead. Beautiful, strong, radiant Miss Julia! why, no one had thought of death for her, save as years and years away in the serene twilight of a calm old age; and yet it had come, suddenly, after a week’s brief illness, and Courage was simply broken-hearted. She felt she had no right to her name now, and never should have again. Miss Julia had been teacher, mother, friend to her, one or the other almost since her babyhood, and to care for Miss Julia in return, now that she herself was grown up, to let every thing else “come second,” had been her only thought. And now to find her hands suddenly empty, and all the sunshine gone out of her life—was it strange that she felt despairing and desolate and that nothing whatever was left? “But we are left,” pleaded a chorus of little voices, and Courage seemed to see four brighteyed little children; bright-eyed because God had made them so, but with faces almost as sad as her own. “Yes, we are left,” they continued pleading. “Miss Julia was going to do so much for us this summer; could not you do it in her place for her sake?” Courage shook her head gravely as in answer to her own thoughts.
  • 57. “No, I cannot,” she said, firmly. “Everything that I leaned on is gone; nothing is left to me—nothing.” “But could you not try just for her sake?” chorused the little voices over and over in her heart, day after day, in all the sad hours of waking, and sometimes even in sleeping, until at last she bravely brushed the tears away and made answer, “Yes, for her sake I will!” She remembered the day of her six-year-old christening, when her remarkable name had been given her and she had asked: “Is courage something that people have, Papa? Have I got it?” and he had told her, “Courage is something that people have, dear, something fine, and I hope you will have it.” Yes, she would try, even in this dark hour, to live up to her father’s hope for her, and so her resolve was taken. But the four bright-eyed little children knew nothing of any resolve; they would not have understood what it meant if they had, and as for their singing a pathetic little chorus in any one’s heart, they were altogether unconscious of that as well. But one thing they did know, and that was they should never see Miss Julia again in this world, and they thought they also knew that a beautiful plan she had made for them could never be carried out. The wisest thing, therefore, for these four little people was to put, so far as possible, all thought of the plan from their minds, and Mary, the eldest of the four, said as much to the others. “Oh, don’t let us think about it any more,” she urged, earnestly. “If we only could have Miss Julia back what would we care for anything else? Besides, when you think what has happened, it seems selfish, and as though we did not have any hearts, to grieve over our own little plans for a moment.” “But it wasn’t just over our own little plan,” insisted her younger brother Teddy, “it was Miss Julia’s plan for us, and I don’t think it strange a bit that we should grieve over it.” “Neither do I,” urged Allan, who came next to Teddy in age. “Of course us boys, not going to the sewing-school, did not know Miss
  • 58. Julia as well as you, but I just guess there wasn’t a boy who thought more of her than I did. What’s more I loved her; not making a fuss over her, to be sure, like you girls, still I did really love her,” (emphasising the word by a shake of his head, and firm pursing of his lips). “All the same, I think it’s natural we should feel awfully disappointed.” Gertrude who was seven, and the youngest of the four, nodded in approval of the stand Allan had taken, and continued nodding, as he added, “We haven’t travelled so much, seems to me, or had so much change in our lives as to settle back to the idea of a hot summer here in town, instead of going to the country, without feeling it a bit; that is, I don’t think we have.” Mary sighed and said nothing, as though ready to admit, after all, that perhaps it was natural that they should take their disappointment somewhat to heart, but the tears that had sprung suddenly into her eyes were from real longing for Miss Julia and not from the disappointment. This quiet talk in which the little Bennetts were indulging, was being carried on from the backs of two horses—the two girls mounted upon one and the two boys astride the other—but they happened to be the quietest horses in the world; horses that never budged in fact, tailless and headless, and that belonged to the
  • 59. carpenter who lived on the first floor. The Bennetts lived on the top floor; but whenever there was anything to be talked over, down they trooped to the yard and climbed and helped each other to the backs of these high seats, and when all were able to declare themselves perfectly comfortable the conclave would commence. The little Bennetts were great talkers. They simply loved to discuss things, and this shows, when you stop to consider it, that they must be, on the whole, an amiable little family, for some little people that we hear of are quite too impatient and self-assertive to be willing to discuss things at all. But whatever may have been the faults of the little Bennetts they did have respect for each other’s opinions, and were generally ready to admit that two heads were better than one, and “Four heads,” to quote little Gertrude, “four times as better.” This habit of discussion, for it really amounted to that, was partly no doubt the outcome of a little strategy on the part of their mother. Mary and Teddy and Allan and Gertrude were just a “pair of steps,” as the saying goes, and sometimes the little living-room on the fourth floor seemed all too small for the noisy company, and then Mrs. Bennett would exclaim, and as though the most novel sort of an idea had occurred to her: “Children, why don’t you run down to the yard and have a good talk?” There was no resisting this appeal, such untold delights were implied in Mrs. Bennett’s tone and manner, and the children seldom failed to act upon the advice, and what was more, seldom failed to light upon some interesting thing to talk about; and then, always as a last resort, some one could tell a story. The some one was generally Teddy, for he had the wildest imagination, and could upon any and every occasion invent most thrilling romances, which were quite as much of a surprise to himself as to his hearers. And so the children had come to love their perch in the corner of the city yard, with the uncertain shade of an old alanthus flickering over them in summer, and the bright sun streaming full upon them in its leafless winter days. And this was how it chanced that the Bennett children found themselves in their old haunt that breezy May morning, and
  • 60. were easing their heavy little hearts by frankly admitting to one another how very great indeed was their disappointment. Better so, I think. Wrinkles come earlier and plow deeper, and thoughts are apt to grow bitter and morbid, when one broods and broods, and will not take hearts near and dear into one’s confidence. The day never dawns when truly brave hearts cry out for pity, but sympathy is a sweet and blessed thing the world over, and God meant not only that we should have it, but that, if need be, we should reach our hands and grasp it. There was one little Bennett, however, who did not share in the general depression. Too short a time in the world to know aught of its joys or sorrows, Baby Bennett lay comfortably in his mother’s lap, having just dropped off to sleep after a good half hour of rocking, Mrs. Bennett, who had herself grown drowsy with her low crooning over the baby, glanced first at the bustling little clock on the mantel shelf, and then, leaning her head against the back of the chair, closed her eyes; but instead of falling asleep she fell to thinking, and then her face grew very sad and tears made their way from beneath her closed eyelids. So, you see, the mother-heart was heavy as well as the-child-hearts in the Bennett family, and for the same reason. It was not because they were not learning to face and accept the thought that Miss Julia, whom they so dearly loved, could not return to them; they were trying to be as brave as Miss Julia herself would have had them. But this was the day, the very day that they were all to have started, and they could not seem to forget it for a moment; neither could somebody else, and soon there came a gentle knock at Mrs. Bennett’s door. “Come in,” she answered, forgetting the tears in her eyes; and, laying the baby in its little clothes-basket of a bed, she turned to greet the newcomer. Courage had mounted the four flights of stairs very bravely, but the sight of the tears in Mrs. Bennett’s eyes disarmed her, and, sinking into the nearest chair, she found she would best not try to speak for a moment.
  • 61. “Oh, I’m so sorry, Miss Courage, that you should have seen me,” said Mrs. Bennett, with a world of regret in her voice; “it is so much harder for you than for anybody, but this was the day, you know, almost the very hour.” “Yes, I know,” Courage faltered; “that was why I came.” “It’s like you, Miss Courage; you’ve Miss Julia’s own thoughtfulness, but I’m thinking it will be easier for us all when this day’s over. I got rid of the trunk last week; it seemed to make us all so disheartened to have it standing round.” “You didn’t sell it, did you?” “No, indeed I did not, for it may be the children will have a chance yet some day, for a bit of an outing.” “I have decided they are all to have it yet, Mrs. Bennett, this very summer, and just as Miss Julia planned, too. That’s what I came to tell you, if you will trust them to me.” “Trust you! Oh, my dear! but it would be too much care for those young shoulders; too much by far.” “Mrs. Bennett,” said Courage, so earnestly as to carry conviction, “I thought so at first, too, but the plan has grown to be just as dear to me as it was to Miss Julia, and now, if you do not let me carry it out, I do not see how I can ever live through this first summer.” “Then indeed I will let you,” and then she added slowly, and with an accent on every word, “and you are just Miss Julia’s own child!” and Courage thought them the very sweetest words she had ever heard, or ever could hear again. “May I tell the children?” she asked, eagerly. “Where are they?” Mrs. Bennett did not answer. I believe she could not, but she opened the window and Courage knew that meant the children were below in their favourite corner. “Oh, let me call them, please,” resting one hand on Mrs. Bennett’s arm and leaning far out over the sill.
  • 62. “Children! come up stairs for a moment, I have something to tell you. Come up quickly.” Courage hardly knew her own voice, it rang out so cheerily. “Oh, Miss Courage!” chorused four little voices, only this time the sound was in her ears as well as in her heart, and as she watched the children tumble helter-skelter from the horses in the yard way down below her, a smile that was almost merry drove the shadows from her face.
  • 63. W CHAPTER III.—A DELIGHTFUL DISCOVERY. hy, whatever’s going on here?” exclaimed Brevet. “Oh, yes,” said Joe, turning slowly round, for he knew what had attracted Brevet’s attention. “I done notice it on de way up ter Ellismere fo’ you dis mornin’, an’ den I was so took up with dat fascinatin’ song of yo’s as we drove back, dat I didn’t want to interrupt you long ’nuff to call yo’ attention to it. Looks as dough dere mus’ be some one come ter live in de pretty little house, doesn’t it?” “Why, yes, it does,” said Brevet, very much interested; “and you don’t know who it is, Joe?” “No, I hasn’t knowed nuffin’ ’bout it, till I seed de whole place lookin’ so pert like dis mornin’,” and Joe brought old Jennie to a standstill that they might more fully take in the situation. “Don’t you think I ought to find out, Joe?” “Why, yes, Honey, seems ter me it would be sort of frien’ly,” and suiting the action to the word he took Brevet by the arms and dropped him down over the cart-wheel. The change that had come over this point in the road was indeed remarkable. A little house that had remained untenanted for years, in the midst of an overgrown enclosure, stood this bright June morning with every door and window open to the air and sunshine. The vines which had half hidden it from view had already been cut away, and on every hand were signs that the place was being brought into liveable shape with all possible expedition. No one was in sight, so Brevet noiselessly pushed open the gate, and, making his way to the little front porch, reached upward and lifted the brass
  • 64. knocker of the open door. The unexpected sound instantly brought a neatly-dressed, elderly-looking woman from some room in the rear. “How’dy,” said Brevet, instantly put at his ease by the kindness of the woman’s face. “What did you say, dear?” she asked, with a puzzled frown. “I said how’dy,” explained Brevet, wondering that the woman’s face still wore the puzzled look. “We just stopped to ask who was
  • 65. coming. We go by here very often, Joe and I,” pointing to the cart, “and we were wondering what was up seeing this place open that’s been closed so long.” “It can’t be that Miss Julia’s self is a comin’ can it?” called Joe, for the little house was not set so far back from the road but that he could hear every word spoken between the woman and Brevet. “Why, did you know Miss Julia?” she asked, stepping at once to the gate, with Brevet following close behind her. “No, Miss; dat is not personally, but I knowed dat Miss Julia owned dis little plantation, an’ I often wonder dat she never done come to live on it. I can ‘member when her Uncle Dave was livin’, an’ it was den des de homiest little homestead in de country.” “You have not heard then of Miss Julia’s death?” “No,” exclaimed Joe, with as much feeling in his voice as though Miss Julia had indeed been an old friend; “you don’ tell me! I’se often heard what a reg’lar lady she was, and often wished I done have a chance to lay eyes on her.” “She was a very good friend to me,” said the woman, sorrowfully, “and she had expected to come down here this summer and open the house, and bring a little family of city children with her who had never spent a day in the real country in their lives.” "You don’t say so!” said Joe, shaking his head sadly. “It’s strange what times de Lord chooses to call de good folks out of dis worl’.” And then he added, after a moment of respectful silence, “But de place here, am it sold to some new party?” “No; Miss Julia left it in her will to a young lady who was just the same as a daughter to her, and she has decided to come down in Miss Julia’s place this summer.” “And bring the little children?” asked Brevet, eagerly. “And bring the little children,” answered the woman, her face brightening. “I have come down to make everything ready for them, and they are coming on Friday.”
  • 66. “Oh, do you think I could know them?” “Of course you can know them. You must come and see them so soon as ever they come. But you must tell me your name so that I can tell them about you.” “My name is Howard Ellis, but that name isn’t any use now. Everybody calls me Brevet since I and the Captain here have grown to be such friends. It means kind of an officer in the army, and when I grow up I’m going to West Point and learn how to be a real officer, and not just kind of a one at all. But till then everybody’s going to call me Brevet. And now what is your name please, and the children’s, because I want to tell my grandnana all about you?” “Well, my name is Mary Duff, dear, and the children are named Bennett—Mary and Teddy and Allan and Gertrude Bennett.” “Oh, are two of them boys?” and Brevet’s face was radiant. “I haven’t had a boy to play with ever hardly, but I s’pose they’re older boys than me,” he added, a little crestfallen; “almost all boys are.” “Well, Teddy is not very much older, just a little, and Allan is just about your age I should say. Never you fear, Brevet, you’ll have beautiful times with them all, I know.”
  • 67. “When shall I come then?” wishing to have matters very definitely arranged. “Do you think they would like to have me here to help them feel at home right off at the very first?” “Well, I should not wonder but they would like that very much indeed.” “Then I will come on Friday.” “You mean you will ask your grannana, Brevet,” said Joe, significantly.
  • 68. “Oh, yes; I mean I will ask if I may come.” This last very quickly and eagerly, remembering his little lecture of the morning. “Well, it’s des a comfort to see de ole place in shape once more, an’ I trus’ you an’ de young lady an’ de chilluns will have des a beautiful summer. P’r’aps some day,” and Joe’s eyes twinkled with the thought, “dey’ll all come up and spen’ de day with me at Arlington. Brevet here alway des loves to come. You know Arlington’s where all de soldiers am buried. I used to be a slave on de place ‘fo’ de wah, an’ dere ain’t much happened dere fur de las’ fifty years dat I hasn’t some knowledge of, and dey done tell me” (indulging in a little complacent chuckle) “dat it’s mighty interestin’ ter spen’ de day with Joe at Arlington.” “Well, indeed I should think it would be,” said Mary, very much interested, “and I wish you would stop and see Miss Courage about it the first time you drive by.” “Thank you very much, Miss; and now. Brevet, your grannana will be watchin’ fur us an’ we had bes’ be joggin’ on I’m thinkin’.” “All right, Captain,” clambering into the cart, and then Joe and Brevet courteously touched their caps, in true military fashion, and old Jenny jogged on. “Miss Courage did she say?” asked Brevet, the moment they were out of hearing, just as Joe knew he would. “Yes; it soun’ like dat, Honey, but some day we must make inquiries. Dere mus’ be some ‘splanation of a name like dat.”
  • 69. I CHAPTER IV.—EVERYBODY HAPPY. t is strange and beautiful,” thought Courage as she moved busily about her room, putting one thing and another into a trunk that stood open before the fireplace; “strange and beautiful how difficulties take to themselves wings, when you once make up your mind what is right to do and then go straight ahead and do it.” “Miss Courage,” said a young coloured girl, who was leaning over the bed trying to fold a black dress in a fashion that should leave no creases to show for its packing, “I felt all along there was nothing else for you to do.” “Then, Sylvia, why did you not say so?” Courage asked, a little sharply. “You knew how hard it was for me to come to any decision. It was not because you were afraid to say so, was it?” “Afraid?” and a merry look shone for a moment in Sylvia’s eyes. “No, I don’t believe I ever could grow afraid of the little curly-headed girl I used to work for when we were both children together. No, indeed; it was only because I thought you ought to see it so yourself. It seemed as though it was just as plain a duty as the hand before your face, and I felt sure you would come to it, as you have, if we only gave you time enough.” It was a comfort to Courage to feel that Sylvia so thoroughly understood her. Indeed, they were far more to each other than mistress and maid; they were true friends these two, whose only home for a while had been Larry Starr’s brave lighter, and for both of whom he had cared in the same kind, fatherly way. Of course you do not understand about Larry or Larry’s lighter, unless you have read “Courage,” but then on the other hand there is no reason why you need to understand. Nor was Sylvia the only one who approved of what Courage had done. The Elversons, Miss Julia’s brother and his wife, and with whom Courage and Miss Julia had lived, were as glad as glad could be to have Courage carry
  • 70. out Miss Julia’s plan; and so in fact was everybody who saw how sad and lonely Courage was, and what a blessing anything that would occupy her thoughts must be to her. And so, in the light of all this, you can see how sad it would have been if Courage had yielded to her fears, and persistently turned away from a duty, in very truth as plain as the hand before your face, as Sylvia had put it. But Courage had not turned away, nor for one instant wavered from the moment her resolve was taken. And now at last the day for the start had dawned. The little Bennetts had been awake at sunrise. Fancy having three months of Christmas ahead of you—for it seemed just as fine as that to them. It was a wonder they had slept at all. They had read about brooks and hills and valleys, and woods where all manner of beautiful wild things were growing; of herds of cow’s grazing in grassy pastures; of loads of hay with children riding atop of them, and of the untold delights of a hay-loft. And now they were going to know and enjoy every one of these delights for themselves. Why, they could not even feel sad about leaving their mother, and indeed she was as radiant as they at the thought of their going. “You see,” she explained to them, “I shall have the baby for company, and such a beautiful time to rest; and your father and I will take a sail now and then down the bay, or go to the park for the day in the very warm weather; and then it is going to be such a comfort to have your father home for two whole months, and that couldn’t have happened either, you know, if you had not been going away for the summer.” The children’s father, Captain Bennett, was one of the pilots who earn their living by bringing the great ocean steamers into the harbour, and often he would be aboard the pilot- boat, at sea for weeks at a time, waiting his turn to take the helm of one of the incoming steamers, and then, as like as not, he would have to put straight to sea again, for there were many to keep, and there was need for every hard-earned dollar. But the Captain’s chance for a vacation had come with the children’s. He could afford to take it, since four of his little family were to be provided for, for the entire summer, and so every one was happy and every one
  • 71. believed that somehow Miss Julia must know and be so glad for them all. But this was the day for the start, as I told you, and the children had started. They were in the waiting-room at the foot of Cortlandt Street, where Courage was to meet them. “And here she is,” exclaimed Mary, with a great sigh of relief, being the first to espy Courage coming through the gate of the ferry- house, “and doesn’t she look lovely!” Mary was right; Courage did look lovely as, with Sylvia close behind her, she walked the length of the waiting-room to where the little group were standing. Other people thought so too, as she passed, and watched her with keenest interest. Her stylish black dress and black sailor hat were wonderfully becoming, and the face that had been so pale and sad was flushed with pleasure now, and with the rather uncomfortable consciousness that she and her little party could scarcely fail to be the observed of all observers. Mrs. Bennett was there, of course, to see them off, and the baby and the Captain, and it must be confessed that the eyes of both father and mother grew a little misty as they said “Good-bye” to their little flock. The girl contingent was a trifle misty, too, but the baby was the only one who really cried outright. However, I half believe that was because he wanted a banana that hung in a fruit stand near by, and not at all because the children were going to leave him; some babies seem to have so very little feeling. But now it was time to go aboard the boat, and the Captain and Mrs. Bennett saw the last of the little party as they disappeared within the ferry-boat cabin, and then in fifteen minutes more the same little party was ranged along one side of a parlor car on the “Washington Limited”; then the wheels slowly and noiselessly commenced to turn and they were really off; all of the little party’s hearts thrilling with the thought, and all sitting up as prim as you please, in their drawing-room chairs, quite overawed with the magnificence of their surroundings and the unparalleled importance of the occasion.
  • 72. Courage, very much amused, watched them for a few moments and then suggested that they should settle themselves for the journey. Bags were stowed away in the racks overhead, coats and hats banished to coat hooks, and one thing and another properly adjusted, until at last four little pair of hands having placed four little footstools at exactly the desired angle, four pair of brand-new russet shoes found a resting-place rather conspicuously atop of them, and the four children leaned comfortably back in the large, upholstered chairs as though now at last permanently established for the entire length of the journey. But of course no amount of adjusting and arranging really meant anything of that sort, or that they could be able to sit still for more than five minutes at a time, and Courage and Sylvia soon had to put their wits to work to think up ways of keeping the restless little company in some sort of order. But fortunately none of the fellow-passengers appeared disturbed thereby. On the contrary, they seemed very much interested, and finally a handsome old gentleman came down the aisle, and leaning over the chair in which Courage was sitting, said courteously: “My dear young lady, if you will pardon an old man’s curiosity, and do not for any reason mind telling me, I should very much like to know what you are doing, and where you are going with this little family?” “And I am very glad to tell you,” answered Courage cordially, for since that summer spent with Larry there had always been such a very warm corner in her heart for all old people; and Teddy, who was sitting next to Courage, had the grace to offer the old gentleman a chair. Then for some time he listened intently, his kind old face glowing with pleasure as Courage told him all about the children, and finally of the cosy little cottage awaiting their coming down in Virginia. “But in doing all this,” Courage concluded, “I am simply carrying out the plans of my dearest friend, Miss Julia Everett.” “Oh, you don’t mean it!” the old gentleman exclaimed, his voice trembling. “I knew Miss Everett well. She always stopped with me
  • 73. when she came to Washington.” “Can it be that you are old Colonel Anderson?” “Yes, I am Colonel Anderson, and I suppose I am old,” he added, smiling; “and can it be you are young Miss Courage, of whom I have heard so often?” “Yes, I am Courage, but you will excuse me, won’t you, for speaking as I did? I only had happened to hear Miss Julia——” Courage hesitated. “Oh, yes, dear child, I understand perfectly. You used to hear Miss Julia speak of me as old Colonel Anderson, and so I am, and I am not ashamed of it either, although I could not resist the temptation to tease you a little, which was very rude of me. But now, can it be that it is to Miss Julia’s estate near Arlington that you are going—to the home that her Uncle Everett left her when she was just a little slip of a girl, years before the war?” “Yes, that is exactly where, but I have never seen it.” “Well, you will love it when you do. It is the dearest little spot in the world. I will drive out some day and take luncheon with you and the children, if I should happen to have an invitation. I could tell you some interesting things about the old place.” “Oh, will you come?” exclaimed Mary and Gertrude in one breath, for with a curiosity as pardonable, I think, as that of old Mr. Anderson, all of the children had grouped themselves about Courage, and had listened with keenest interest to every word spoken. And so one more happy anticipation was added to the many with which their happy hearts were overflowing. At last the train steamed into Washington, although at times it had seemed to the children as though it never would, and then a carriage was soon secured, and, three on a seat, the little party crowded into it, and they were off for their eight mile drive to Arlington.
  • 75. A CHAPTER V.—HOWDY nd meantime what excitement in the little cottage down in Virginia! Everything was in readiness and everybody was on the tiptoe of expectation. Everybody meant Mary Duff, (it was she, you know, who had cared for little Courage through all her babyhood, and who had been sent down to get everything in order), and besides Mary Duff, Mary Ann the cook, old Joe and Brevet. It must be confessed, Brevet had had a little difficulty in winning his grandmother’s consent to this visit, but he had been able to meet every objection with such convincing arguments, that he had come off victor in the encounter. “You see, Grandnana,” he had confidentially explained, with his pretty little half-southern, half-darkey accent, “I is a perfec’ stranger to them now I know, but then everything is strange to them down here, so don’t you s’pose it would be nice for me to be right there waiting at the gate, where I can call out ‘How’dy’ just so soon as ever they come in sight, and so for me not to be a stranger to them more’n the first minute, and have them find there are folks here who are very glad to know them right from the start? Besides, the lady— Mary Duff was her name—told me she just knew those little Bennetts would love to see me, and that she would surely expect me down to-day for certain.” And so “Grandnana” succumbed, not having the heart to nip such noble hospitality in the bud, and at two o’clock precisely, the best carriage wheeled up to the door and Mammy and Brevet were quickly stowed away within it, to say nothing of a basketful of good things covered with a huge napkin of fine old damask. But who is Mammy? you ask, and indeed you should have been told pages ago, for no one for many years had been half so important as Mammy in the Ellis household. She is an old negro woman, almost as old as Joe
  • 76. himself, and when on the first of January, 1863, President Lincoln issued the proclamation that made all the slaves free, she was among the first to turn her back upon the plantation where she was raised, and make her way to Washington. It was there that Mrs. Ellis had found her, when in search of a nurse for her two little boys, and from that day to this she has been the faithful worshipper of the whole Ellis family. Now in her old age her one and only duty has been to care for Brevet, a care constantly lessening as that little fellow daily proves his ability to look out more and more for himself. Brevet was not to be allowed, however, on the occasion of this first visit to their new neighbours, to make the trip alone. “Grandnana” had been very firm about that, somewhat to his chagrin, and so, if the truth be told, Mammy’s presence in the comfortable, old-fashioned carriage was at first simply tolerated. But that state of affairs did not last long. Try as he would, Brevet was too happy at heart to cherish any grievance, imaginary or otherwise, for many minutes together; and soon he and Mammy were chatting away in the merriest fashion, and the old nurse was looking forward to the unusual excitement of the day, with quite as much expectation as her little charge of seven. Had she not devoted the leisure of two long mornings of preparation to the shelling of almonds and the stoning of raisins, and then when the day came, with eager trembling hands, packed all the good things away in the great, roomy hamper that seemed now to look at her so complacently from the opposite seat of the phaeton? Yes, indeed, it was every whit as glad a day for Mammy as Brevet, and she peered out from the carriage just as anxiously as they drove up to the gate and Mary Duff came out to greet them. But Mammy had something to say before making any motion to leave the carriage. “Are you quite sure, Miss, dat dis yere little pickaninny of ours ain’t gwine to be in any one’s way or nuffin?” she asked, bowing a how- do-you-do to Mary, and keeping a restraining hand upon Brevet. “Oh, perfectly sure.”
  • 77. “He done told us you wanted him very much,” but in a half- questioning tone, as though what Brevet “done told them” was sometimes “suspicioned” of being slightly coloured by what he himself would like to do, notwithstanding his general high standard of truthfulness. “Brevet is perfectly right—we do want him very much,” Mary answered, heartily. “Even if you have to take his old Mammy ‘long wid him, kase Miss Lindy wasn’t quite willin to ‘low him ter come by hisself?” “And we’re very glad to see you, Mammy,” Mary answered cordially, and so the last of Mammy’s scruples, which were not as real as Mammy herself tried to think them, were put to rest, and Brevet was permitted to scramble out of the carriage, while Mary Duff lent a hand to Mammy’s more difficult alighting. “Is dere ere a man ‘bout could lift dis yere basket ter de house for us?” she asked, looking helplessly up to the hamper, “kase Daniel dere has instructions from de Missus neber to leave de hosses less’n dere ain’t no way to help it.” “Well, I guess dere is,” chuckled a familiar voice behind her back, and Mammy turned to discover Joe close beside her. “Well, I klar, you heah!” she exclaimed. “Why, it seems like de whole county turn out to welcome dese yere little Bennetts. Seems, too, like some of us goin’ to be in de way sure ‘nuff.” “Howsomever, some on us don’ take up so much room as oders,” grunted Joe, surmising, and quite correctly, too, that Mammy considered his presence on the scene something wholly unnecessary and undesirable. “I’se heah to help wid de trunks, Mammy,” he then added; “what you heah to help wid?” Mammy, scorning the insinuation, turned to Mary Duff as they walked up the path. “You know, Honey, de Lord ain’t lef’ no choice ter most on us as ter what size we’ll be, but pears like you’d better be a fat ole
  • 78. Mammy like me, than such a ole bag o’ bones as Joe yonder.” But Joe by that time was depositing his basket in the hall-way of the cottage, and was fortunately quite beyond the fire of this personal attack. Mary Duff was naturally much amused at the real but harmless jealousy of these old coloured folk, and realised for perhaps the five hundredth time what children we all are, be race and nationality what they may.
  • 79. Meantime Brevet had taken his position on the top rail of the gate, with one arm around a slim little cedar that stood guard beside it. “May I stand right out here, Miss Duff,” he called back to Mary, “so as to see them a long way off?” “Bless your heart, yes!” Mary answered, quite certain in her mind that since Courage herself was a little girl she never had seen such a dear child. Brevet’s watch was a brief one. “They are coming! Hear the wheels! They are coming,” he cried exultingly, with almost the next breath. In just two minutes more they really had come, and Brevet was calling out “How’dy, how’dy, how’dy” at the top of his strong little lungs, to the wide-eyed amazement of the Bennetts, who had never heard this Southern abbreviation of the Northern “How-do-you-do.” Then jumping down from his perch, he ran up to the carriage, repeating over again his cordial welcoming “How’dy.” “How’dy, dear little stranger!” replied Courage, waving a greeting to Mary; “and who are you I would like to know?” “I’m Howard Stanhope Ellis, but that’s not what you’re to call me, I have another name. It’s the name they give—” but he did not finish his sentence, for charming little fellow though he was, he could not be allowed to monopolise things in this fashion, and Mary gently pushed him aside to get him out of her way. “And so here you are at last,” she said joyously; “welcome home, Miss Courage. How are you, Sylvia?” while she bent down with a cordial kiss for each friendly little Bennett. Meantime Courage was making friends with Brevet, and a moment later the children were crowding close about him. “My, but I’m glad to see you all,” he exclaimed, with an emphasising shake of his head, “and I think I know who’s who too. I believe this is Gertrude,” laying one little brown hand on Gertrude’s sleeve, “and you are Mary, because Mary’s the oldest, and you Teddy, because Teddy comes next, and you—you are Allan.” Brevet had learned his lesson from Mary Duff quite literally by heart, and
  • 80. altogether vanquished by his’ joyous, friendly greeting, the children vied with each other in giving him the loudest kiss and the very hardest hug, but from that first moment of meeting it was an accepted fact that Allan held first place. There was no gainsaying the special joyousness of his “And you—you are Allan.” The boy play- fellow for whom he had hitherto longed in vain had come, and to little Brevet it seemed as though the millennium had come with him. All this while Joe and Mammy, barely tolerating each other’s presence, waited respectfully in the background, so that Mary had a chance to explain who they were, as Courage stood in the path, delightedly looking up at the dear little house that was to be her home. But Sylvia had already made their acquaintance. After paying the driver and making sure that nothing had been left in the carriage, she went straight toward them. “I thought I should find some of my own people down here in Virginia,” she said, cordially extending a hand to each as she spoke, “but I did not expect they would be right on the spot, the very first moment, to welcome me,” "Miss Duff done tol’ us ‘bout Miss Sylvy bein’ of de party,” said Joe with great elegance of manner, while Mammy looked daggers at him, for replying to a remark which she considered addressed chiefly to herself. It was queer enough, the attitude of these two oldtime slaves toward each other, and yet to be accounted for, I think, in their eagerness to be of use to those whom they claimed the privilege of serving; and each was conscious, by a subtle intuition, of a determination to outwit the other if possible in this regard—which was all very well, if they only could have competed in the right sort of spirit. But there is no more time in this chapter for Mammy or Joe, nor anything else for that matter. Indeed, it would take quite a chapter of itself if I should try to tell you of the unpacking of Grandma Ellis’s basket, and then of the children’s merry supper; but it seems to me there are more important things for me to write about, and for you to read about, than things to eat and of how the children ate them. By nine o’clock quiet reigned in the little cottage, and “the children were nestled as snug in their beds” as the little folk in “The Night
  • 81. before Christmas.” Joe and Mammy and Brevet had long ago gone home, and Courage and Mary Duff were sitting together in the little living-room, while Sylvia, in the hall just outside, was busy arranging the books they had brought with them, on some hanging shelves. “I think this has been the happiest day in all my life,” said Courage. “I have simply forgotten everything in the pleasure of those children.” And then, sitting down at the little cottage piano and
  • 82. running her hands for a few moments over the keys, she sang softly, — “For all the Saints, who from their labour rest, Thy name, O Jesus, be forever blest.” The sweet, familiar hymn brought Sylvia to the door. “Miss Courage,” she said, standing with her arms folded behind her back, as she had always a way of standing when deeply interested, “you have forgotten yourself and your sorrow to-day, but not for one moment have you really forgotten Miss Julia,” and Courage knew that this was true, and closed the little piano with tears in her eyes and a wondrous joy and peace in her heart.
  • 83. N CHAPTER VI.—ARLINGTON BEFORE THE WAR. o sooner were our little New Yorkers settled in their pretty summer home than they naturally desired that it should have a name, and after much discussion, according to the Bennett custom, they all agreed that “Little Homespun,” one of the names that Courage had suggested, seemed to fit the cosy, unpretentious little home better than anything else that had been thought of. No sooner were they settled either before they became friends firm and fast of the household up at Ellismere. It needed but very little time to bring that about, because everything was—to use a big word because no smaller one will do—propitious. You can imagine what it meant to Courage—taking up her home in a new land, and with cares wholly new to her—to have a dear old lady like Grandma Ellis call upon her, as she did the very first morning after her arrival. Of course Courage had to explain how it was she had come way down there to Virginia with the little Bennett children in charge. Indeed, almost before she knew it, and in answer to Grandma Ellis’s gentle inquiries, she had told her all there was to tell—about Miss Julia, about herself and Mary Duff and Sylvia, and finally, as always with any new friend, the why and wherefore of her own unusual name. The tears stood in Grandma Ellis’s eyes many times during the narration, and her face was aglow with love and sympathy and admiration as Courage brought her story to a close.
  • 84. “And now, my dear,” she had said, “I want you should know what little there is to tell about us. We live just three miles from here, and in the same old Virginia homestead where my husband was born. We, means my son Harry, and Brevet and myself. Brevet, as you already know, perhaps, has neither father nor mother. His mother died when he was six months old, and his father, my oldest son, was drowned when the Utopia went down, off the coast of Spain five years ago. We are doing our best, Harry and I, to make up to Brevet for his great loss; but it is sad that the little fellow should only know the love of an old grandmama like me, and never of his own young mother. But I do not want to burden you with my sorrows, dear
  • 85. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com