SlideShare a Scribd company logo
http://Algo-Logic.com© 2018 Algo-Logic Systems Inc., All rights reserved.
In Memory Computing
with FPGA Key-Value Store
John W. Lockwood, CEO, PhD
December 12, 2018
GDN
Gateware Defined Networking ®
© 2018 Algo-Logic Systems Inc., All rights reserved. 2
▪ Gateware
▪ Programmable as code, like software
▪ Compiled to fully parallel logic,
not sequential instructions
▪ Placed and routed into gates on an FPGA
▪ Data Center Networking
▪ Volumes of Ethernet and IP network traffic
▪ Searching through streams of data
▪ Supporting services for the cloud
▪ Gateware Defined Networking (GDN):
▪ Ultra-low-latency (sub-µs),
▪ Deterministic (jitter-free)
▪ High throughput (10 to 400 Gbps)
▪ Fully reprogrammable (like software)
Gateware Defined Networking is a registered trademark of Algo-Logic Systems, Inc.
Gateware Defined Networking®
Gateware Defined Networking® (GDN)
GDN-
Search:
Key/Value
search
service
GDN-Switch:
Load Balancer
Datacenter
services
provisioned
within rack of
Standard
and/or
FGPA-
Accelerated
Servers
GDN
© 2018 Algo-Logic Systems Inc., All rights reserved. 3
Algo-Logic’s GDN Products
FPGA
Gateware
Integrated
Server
Pre-Programmed
apps in multiple
FPGA Cards
Scale Out
System
Pre-Programmed apps
in multiple FPGA
vendor devices
Low Latency
MAC,TCP,
Protocol Parsers
Order Book cores
IP CORE
GDN
Data Center
Deployments to
Co-Location
Facility
Gateware
Defined
Networking ®
© 2018 Algo-Logic Systems Inc., All rights reserved. 4
Outline
• Algo-Logic FPGA KVS
– Allows direct access of data from both CPU and network
– Associates values (data) with keys (names) without overhead of SQL
– Implemented completely in Field Programmable Gate Array (FPGA) logic
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
CPU Cores
KVS on FPGA
Ethernet
Accelerated Server
CPU
FPGA
RAM
10G
10G
CPU
Cores
RAM
RAM
RAM
RAM
RAM
HDD
SSD
40G
• Scale-Out KVS
– Shares data over Ethernet and PCIe
– Scales out for widescale deployment in datacenters
– Provides ultra-low (sub-microsecond) latency at low cost
• Inference for Real-time Decision Making
– Shares data between CPUs for machine learning
– Key Value Store provides random access to data by name
– Ensures real-time inference with deterministic latency
© 2018 Algo-Logic Systems Inc., All rights reserved. 5
Key/Value Store
• Key/Value Store (KVS)
– Associates values (data) with keys (names)
– Keys can be: names, numbers, IDs, vertices, etc.
– Data can be: addresses, locations, graph edges..
• KVS integral part of network applications
– Sensors { location, movement, .. }
– Social { status, updates, games .. }
– Media { video/security, audio/music, .. }
– Communication { network status, handoff, messages .. }
– Database { users, payments, travel, authentication, .. }
– Security { warrents }
Company Phone #
Interface : MAC AddressIP Address
Examples:
Directory
Forwarding
Tables
Storage Block IDContent Hash
Data De-
duplication
Key Value
Algo-Logic (408) 707-3740
204.2.34.5 Eth6 : 02:33:29:F2:AB:CC
XYZ 948830038411
GPS LocationVehicle ID
Car Location ATY1121 37.775, 122.41
Edge ListVertex
Graph Search v140 v201, v206, v225
© 2018 Algo-Logic Systems Inc., All rights reserved. 6
Servers Accelerated with FPGA Gateware
• Server with Intel CPU and FPGA hardware
– CPU cores connected to FPGA via an PCIe card
– Integrated into the CPU socket (MCP)
• Algo-Logic Gateware Defined Networking (GDN)
– Applications like KVS that run in FPGA logic
– Logic provides optimal latency, power, throughput
Accelerated Server
CPU
FPGA
RAM
ENet
ENet
CPU
Cores
RAM
RAM
RAM
RAM
RAM
HDD
SSD
ENet
© 2018 Algo-Logic Systems Inc., All rights reserved. 7
• Benchmark same application
– Key-Value Store (KVS)
• Running on the same PC
– Intel i7-4770k CPU, 82598 NIC, and Altera Stratix V A7 FPGA
• With three different implementations
– Socket I/O, Kernel Bypass, FPGA
OCSM
Packet
Intel
82598
DPDK
Supported
NIC
Receive
Queue
Message
Buffer
Transmit
Queue
Message
Process
Response
Generation
Note: Message read once into CPU Cache
Store
Enqueue
Dequeue
Enqueue
Dequeue
10g Ethernet
Data Transfer =
Control Handoff =
LEGEND
Algo-Logic software
on Intel 82598 10GE NIC
and Core i7-4770k CPU
Exact
Match
Search
Engine
(EMSE)
REQUEST GENERATOR
Packet
Parser
OCSM
Header
Identifier
Key/Value
Extractor
RESPONSE GENERATOR
Key/Value
Search
Response
Decoder
OCSM
Header
Reconstruct
Packet
Modifier
OCSM
Packet
10g Ethernet
Algo-Logic gateware
on Nallatech P385 with
Altera Stratix V A7 FPGA
OCSM
Packet
Intel
10G NIC
Kernel
Driver
Message
Process
10g Ethernet
Data Transfer =
LEGEND
Algo-Logic software
on Intel 82598 10GE NIC
and Core i7-4770k CPU
Kernel Bypass (DPDK)
Socket I/O
FPGA
Implementation of KVS with Socket I/O, DPDK, and FPGA
© 2018 Algo-Logic Systems Inc., All rights reserved. 8
Full-Rate
Packet Processing
(100% of TX Load)
Software
Drops
Packets
Stratix V FPGA with
FPGA
Drops no Packets
DPDK
Drops
Packets
Software drops packets
beyond 4M messages/second
Kernel bypass drops packets beyond
15M messages/second
FPGA drops no packets beyond
40M messages/second
• Supports 10GbE line rate
Throughput Comparison
© 2018 Algo-Logic Systems Inc., All rights reserved. 9
KVS Latency in FPGA, Kernel Bypass, and Sockets
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
0 5.5 11 16.5 22 27.5 33 38.5 44
PeercentageofObservedPackets
Latency Distribution [µs]
Latency Comparison 100k packets, 1 OCSM per packet, 1k pps
RTL
Sockets
DPDK
Altera Stratix V RTL Average: 0.467µs
Sockets Average: 41.40µs
DPDK Average: 6.29µs
KVS in Software
Worst Latency
Worst Jitter
0.00%
0.10%
0.20%
0.30%
0.40%
0.50%
0.60%
0.70%
38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0
PercentageofPacketsObserved[%]
Latency Distribution [µs]
Socket Implementation Latency Distribution with One OCSM/Packet
Sockets
Intel i7 Average: 41.54µs
KVS in FPGA:
Best Latency,
No Jitter
KVS with kernel
bypass
Lowers Latency,
Some Jitter
Lower Latency = Faster ResponseLowest
TighterSpread=LessJitter
© 2018 Algo-Logic Systems Inc., All rights reserved. 11
KVS in the Data Center
• Lowers Latency
―88x faster than Linux networking sockets
―14x faster than Kernel bypass
• Increases Throughput (IOPs)
―3x to 13x improvement in throughput
―Lowers Capital Expenditures
• Reduces Power
―13x to 21x reduction in power
―Reduces Operating Expenditures (OpEx)
© 2018 Algo-Logic Systems Inc., All rights reserved. 12
KVS Network Architecture
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
CPU Server
KVS in FPGA
Ethernet
Mobile Data
aggregation
servers
Storage in SRAM, DRAM, NVMe
In-memory databases
Compute servers for
Machine Learning with GPUs,
Skylake, Phi, NNP, and/or FPGAs
Real-time Actions
(Inference)
Mobile Data
aggregation
servers
Mobile Data
Aggregation
Servers
KVS Clients
KVS Clients
KVS ClientsReal-time
Decision
Making
Internet Devices
(IoT)
Datacenter nodes used for storage & compute
Cloud
© 2018 Algo-Logic Systems Inc., All rights reserved. 13
KVS Scale Out Results
− Search rate:
• 150M x 40
= 6 Billion IOPs per rack
− Bandwidth:
• 40G Ethernet x 40 cards
= 1.6 Tbps per Rack
− Latency:
• 0.342ns for KVS to Network
• Sub-microsecond
− Programming:
• KVS integrates with existing
software via open-source APIs
• Search rate: 150M x
40 = 6 Billion IOPs
per rack
• Bandwidth:
40G x 40 = 1.6 Tbps
• Sub-microsecond
latency
• 4 x 40 Gbps Ethernet
line rate support
• 30,000 IOPs per $
• Easily integrates with
software via open-
source APIs
2U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
1U System
KVS on FPGA
2U System
KVS on FPGA
2U System
KVS on FPGA
1U System
KVS on FPGA
…40G / card
Key Features:
40 GbE
© 2018 Algo-Logic Systems Inc., All rights reserved. 14
Algo-Logic’s FPGA Key-Value Store
• Fully implemented in FPGA logic
• Shares data over Ethernet
• Supports fastest in-memory
storage for cloud services
• Fully Open Software API
– Create, Read, Update, Delete
Scales for the Datacenter:
– Scale up to provide the best:
• Latency
• Throughput
• Power
– Scale out to increase:
• Storage capacity
• Throughput
• Redundancy
© 2018 Algo-Logic Systems Inc., All rights reserved. 15
Algo-Logic Systems
–Web: http://Algo-Logic.com
–Twitter: @Algo_Logic_Inc
–Phone: (408) 707-3740
–Email: info@Algo-Logic.com
–Address:
172 Component Dr., San Jose, CA 95131

More Related Content

PDF
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
PDF
1 intro to_dpdk_and_hw
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
PDF
DPDK Summit 2015 - Intro - Tim O'Driscoll
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
1 intro to_dpdk_and_hw
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
DPDK: Multi Architecture High Performance Packet Processing
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
Versal Premium ACAP for Network and Cloud Acceleration
Ligato - A platform for development of Cloud-Native VNF's - SDN/NFV London me...
DPDK Summit 2015 - Intro - Tim O'Driscoll

What's hot (20)

PDF
InfiniBand In-Network Computing Technology and Roadmap
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PDF
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
PDF
What are latest new features that DPDK brings into 2018?
PPTX
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
PDF
Scaling the Container Dataplane
PDF
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
PPSX
Development, test, and characterization of MEC platforms with Teranium and Dr...
PDF
Mellanox OpenPOWER features
PPT
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
PDF
Enabling new protocol processing with DPDK using Dynamic Device Personalization
PDF
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
PDF
Xilinx Edge Compute using Power 9 /OpenPOWER systems
PDF
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
PDF
Accelerate the SDN with Intel ONP
PDF
FPGAs and Machine Learning
PPTX
Introduction to architecture exploration
PPTX
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
PPTX
Closed Loop Platform Automation - Tong Zhong & Emma Collins
PDF
State of ARM-based HPC
InfiniBand In-Network Computing Technology and Roadmap
Preparing to program Aurora at Exascale - Early experiences and future direct...
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
What are latest new features that DPDK brings into 2018?
Software Network Data Plane - Satisfying the need for speed - FD.io - VPP and...
Scaling the Container Dataplane
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
Development, test, and characterization of MEC platforms with Teranium and Dr...
Mellanox OpenPOWER features
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
Enabling new protocol processing with DPDK using Dynamic Device Personalization
SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Network: Synchronization: IEEE1588's Future in Computing and the Data Center
Accelerate the SDN with Intel ONP
FPGAs and Machine Learning
Introduction to architecture exploration
Closed Loop Network Automation for Optimal Resource Allocation via Reinforcem...
Closed Loop Platform Automation - Tong Zhong & Emma Collins
State of ARM-based HPC
Ad

Similar to In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throughput (20)

PDF
5G transformation with Open Source (on ONF)
PPTX
G rpc talk with intel (3)
PDF
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
PDF
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
PDF
PLNOG 17 - Nicolai van der Smagt - Building and connecting the eBay Classifie...
PDF
Introduction to Programmable Networks by Clarence Anslem, Intel
PDF
DPDK IPSec Security Gateway Application
PPTX
DPDK IPSec performance benchmark ~ Georgii Tkachuk
PPTX
Netsft2017 day in_life_of_nfv
PDF
Hari Krishna Vetsa Resume
PPTX
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
PDF
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
PDF
How our Cloudy Mindsets Approached Physical Routers
PPTX
ADAM-3600 Sales kit_WATER.pptx
PDF
Nfd18 anuta-networks
PDF
Oracle Cloud Infrastructure
PDF
AI Crash Course- Supercomputing
PPTX
Fixed-point Multi-Core DSP Platform
PDF
Quickassist adapter-8960-8970-brief
PDF
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
5G transformation with Open Source (on ONF)
G rpc talk with intel (3)
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
PLNOG 17 - Nicolai van der Smagt - Building and connecting the eBay Classifie...
Introduction to Programmable Networks by Clarence Anslem, Intel
DPDK IPSec Security Gateway Application
DPDK IPSec performance benchmark ~ Georgii Tkachuk
Netsft2017 day in_life_of_nfv
Hari Krishna Vetsa Resume
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
How our Cloudy Mindsets Approached Physical Routers
ADAM-3600 Sales kit_WATER.pptx
Nfd18 anuta-networks
Oracle Cloud Infrastructure
AI Crash Course- Supercomputing
Fixed-point Multi-Core DSP Platform
Quickassist adapter-8960-8970-brief
“Flexible Machine Learning Solutions with Lattice FPGAs,” a Presentation from...
Ad

More from Tom Diederich (12)

PDF
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
PDF
How to build & grow online communities: with Tom Diederich
PDF
Troubleshooting Apache® Ignite™
PDF
How to build a production-ready in-memory-based application in 1 hour
PPTX
Ingesting streaming data for analysis in apache ignite (stream sets theme)
PDF
IT Modernization in Practice
PDF
Machine learning and deep learning with Apache Ignite
PPTX
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
PDF
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
PDF
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
PDF
“Building consistent and highly available distributed systems with Apache Ign...
PPTX
Quick MySQL performance check
Tom Diederich portfolio presentation (updated Nov. 18, 2016)
How to build & grow online communities: with Tom Diederich
Troubleshooting Apache® Ignite™
How to build a production-ready in-memory-based application in 1 hour
Ingesting streaming data for analysis in apache ignite (stream sets theme)
IT Modernization in Practice
Machine learning and deep learning with Apache Ignite
Heimdall Data: "Increase Application Performance with SQL Auto-Caching; No Co...
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Comparing Apache Ignite and Cassandra for Hybrid Transactional/Analytical Pro...
“Building consistent and highly available distributed systems with Apache Ign...
Quick MySQL performance check

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Business Analytics and business intelligence.pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
Introduction to Knowledge Engineering Part 1
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction-to-Cloud-ComputingFinal.pptx
Quality review (1)_presentation of this 21
Business Analytics and business intelligence.pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
IB Computer Science - Internal Assessment.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Knowledge Engineering Part 1
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Miokarditis (Inflamasi pada Otot Jantung)
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Acceptance and paychological effects of mandatory extra coach I classes.pptx

In-Memory Key Value Store (KVS) in FPGA for Ultra Low Latency and High Throughput

  • 1. http://Algo-Logic.com© 2018 Algo-Logic Systems Inc., All rights reserved. In Memory Computing with FPGA Key-Value Store John W. Lockwood, CEO, PhD December 12, 2018 GDN Gateware Defined Networking ®
  • 2. © 2018 Algo-Logic Systems Inc., All rights reserved. 2 ▪ Gateware ▪ Programmable as code, like software ▪ Compiled to fully parallel logic, not sequential instructions ▪ Placed and routed into gates on an FPGA ▪ Data Center Networking ▪ Volumes of Ethernet and IP network traffic ▪ Searching through streams of data ▪ Supporting services for the cloud ▪ Gateware Defined Networking (GDN): ▪ Ultra-low-latency (sub-µs), ▪ Deterministic (jitter-free) ▪ High throughput (10 to 400 Gbps) ▪ Fully reprogrammable (like software) Gateware Defined Networking is a registered trademark of Algo-Logic Systems, Inc. Gateware Defined Networking® Gateware Defined Networking® (GDN) GDN- Search: Key/Value search service GDN-Switch: Load Balancer Datacenter services provisioned within rack of Standard and/or FGPA- Accelerated Servers GDN
  • 3. © 2018 Algo-Logic Systems Inc., All rights reserved. 3 Algo-Logic’s GDN Products FPGA Gateware Integrated Server Pre-Programmed apps in multiple FPGA Cards Scale Out System Pre-Programmed apps in multiple FPGA vendor devices Low Latency MAC,TCP, Protocol Parsers Order Book cores IP CORE GDN Data Center Deployments to Co-Location Facility Gateware Defined Networking ®
  • 4. © 2018 Algo-Logic Systems Inc., All rights reserved. 4 Outline • Algo-Logic FPGA KVS – Allows direct access of data from both CPU and network – Associates values (data) with keys (names) without overhead of SQL – Implemented completely in Field Programmable Gate Array (FPGA) logic 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA CPU Cores KVS on FPGA Ethernet Accelerated Server CPU FPGA RAM 10G 10G CPU Cores RAM RAM RAM RAM RAM HDD SSD 40G • Scale-Out KVS – Shares data over Ethernet and PCIe – Scales out for widescale deployment in datacenters – Provides ultra-low (sub-microsecond) latency at low cost • Inference for Real-time Decision Making – Shares data between CPUs for machine learning – Key Value Store provides random access to data by name – Ensures real-time inference with deterministic latency
  • 5. © 2018 Algo-Logic Systems Inc., All rights reserved. 5 Key/Value Store • Key/Value Store (KVS) – Associates values (data) with keys (names) – Keys can be: names, numbers, IDs, vertices, etc. – Data can be: addresses, locations, graph edges.. • KVS integral part of network applications – Sensors { location, movement, .. } – Social { status, updates, games .. } – Media { video/security, audio/music, .. } – Communication { network status, handoff, messages .. } – Database { users, payments, travel, authentication, .. } – Security { warrents } Company Phone # Interface : MAC AddressIP Address Examples: Directory Forwarding Tables Storage Block IDContent Hash Data De- duplication Key Value Algo-Logic (408) 707-3740 204.2.34.5 Eth6 : 02:33:29:F2:AB:CC XYZ 948830038411 GPS LocationVehicle ID Car Location ATY1121 37.775, 122.41 Edge ListVertex Graph Search v140 v201, v206, v225
  • 6. © 2018 Algo-Logic Systems Inc., All rights reserved. 6 Servers Accelerated with FPGA Gateware • Server with Intel CPU and FPGA hardware – CPU cores connected to FPGA via an PCIe card – Integrated into the CPU socket (MCP) • Algo-Logic Gateware Defined Networking (GDN) – Applications like KVS that run in FPGA logic – Logic provides optimal latency, power, throughput Accelerated Server CPU FPGA RAM ENet ENet CPU Cores RAM RAM RAM RAM RAM HDD SSD ENet
  • 7. © 2018 Algo-Logic Systems Inc., All rights reserved. 7 • Benchmark same application – Key-Value Store (KVS) • Running on the same PC – Intel i7-4770k CPU, 82598 NIC, and Altera Stratix V A7 FPGA • With three different implementations – Socket I/O, Kernel Bypass, FPGA OCSM Packet Intel 82598 DPDK Supported NIC Receive Queue Message Buffer Transmit Queue Message Process Response Generation Note: Message read once into CPU Cache Store Enqueue Dequeue Enqueue Dequeue 10g Ethernet Data Transfer = Control Handoff = LEGEND Algo-Logic software on Intel 82598 10GE NIC and Core i7-4770k CPU Exact Match Search Engine (EMSE) REQUEST GENERATOR Packet Parser OCSM Header Identifier Key/Value Extractor RESPONSE GENERATOR Key/Value Search Response Decoder OCSM Header Reconstruct Packet Modifier OCSM Packet 10g Ethernet Algo-Logic gateware on Nallatech P385 with Altera Stratix V A7 FPGA OCSM Packet Intel 10G NIC Kernel Driver Message Process 10g Ethernet Data Transfer = LEGEND Algo-Logic software on Intel 82598 10GE NIC and Core i7-4770k CPU Kernel Bypass (DPDK) Socket I/O FPGA Implementation of KVS with Socket I/O, DPDK, and FPGA
  • 8. © 2018 Algo-Logic Systems Inc., All rights reserved. 8 Full-Rate Packet Processing (100% of TX Load) Software Drops Packets Stratix V FPGA with FPGA Drops no Packets DPDK Drops Packets Software drops packets beyond 4M messages/second Kernel bypass drops packets beyond 15M messages/second FPGA drops no packets beyond 40M messages/second • Supports 10GbE line rate Throughput Comparison
  • 9. © 2018 Algo-Logic Systems Inc., All rights reserved. 9 KVS Latency in FPGA, Kernel Bypass, and Sockets 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% 0 5.5 11 16.5 22 27.5 33 38.5 44 PeercentageofObservedPackets Latency Distribution [µs] Latency Comparison 100k packets, 1 OCSM per packet, 1k pps RTL Sockets DPDK Altera Stratix V RTL Average: 0.467µs Sockets Average: 41.40µs DPDK Average: 6.29µs KVS in Software Worst Latency Worst Jitter 0.00% 0.10% 0.20% 0.30% 0.40% 0.50% 0.60% 0.70% 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 PercentageofPacketsObserved[%] Latency Distribution [µs] Socket Implementation Latency Distribution with One OCSM/Packet Sockets Intel i7 Average: 41.54µs KVS in FPGA: Best Latency, No Jitter KVS with kernel bypass Lowers Latency, Some Jitter Lower Latency = Faster ResponseLowest TighterSpread=LessJitter
  • 10. © 2018 Algo-Logic Systems Inc., All rights reserved. 11 KVS in the Data Center • Lowers Latency ―88x faster than Linux networking sockets ―14x faster than Kernel bypass • Increases Throughput (IOPs) ―3x to 13x improvement in throughput ―Lowers Capital Expenditures • Reduces Power ―13x to 21x reduction in power ―Reduces Operating Expenditures (OpEx)
  • 11. © 2018 Algo-Logic Systems Inc., All rights reserved. 12 KVS Network Architecture 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA CPU Server KVS in FPGA Ethernet Mobile Data aggregation servers Storage in SRAM, DRAM, NVMe In-memory databases Compute servers for Machine Learning with GPUs, Skylake, Phi, NNP, and/or FPGAs Real-time Actions (Inference) Mobile Data aggregation servers Mobile Data Aggregation Servers KVS Clients KVS Clients KVS ClientsReal-time Decision Making Internet Devices (IoT) Datacenter nodes used for storage & compute Cloud
  • 12. © 2018 Algo-Logic Systems Inc., All rights reserved. 13 KVS Scale Out Results − Search rate: • 150M x 40 = 6 Billion IOPs per rack − Bandwidth: • 40G Ethernet x 40 cards = 1.6 Tbps per Rack − Latency: • 0.342ns for KVS to Network • Sub-microsecond − Programming: • KVS integrates with existing software via open-source APIs • Search rate: 150M x 40 = 6 Billion IOPs per rack • Bandwidth: 40G x 40 = 1.6 Tbps • Sub-microsecond latency • 4 x 40 Gbps Ethernet line rate support • 30,000 IOPs per $ • Easily integrates with software via open- source APIs 2U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 1U System KVS on FPGA 2U System KVS on FPGA 2U System KVS on FPGA 1U System KVS on FPGA …40G / card Key Features: 40 GbE
  • 13. © 2018 Algo-Logic Systems Inc., All rights reserved. 14 Algo-Logic’s FPGA Key-Value Store • Fully implemented in FPGA logic • Shares data over Ethernet • Supports fastest in-memory storage for cloud services • Fully Open Software API – Create, Read, Update, Delete Scales for the Datacenter: – Scale up to provide the best: • Latency • Throughput • Power – Scale out to increase: • Storage capacity • Throughput • Redundancy
  • 14. © 2018 Algo-Logic Systems Inc., All rights reserved. 15 Algo-Logic Systems –Web: http://Algo-Logic.com –Twitter: @Algo_Logic_Inc –Phone: (408) 707-3740 –Email: info@Algo-Logic.com –Address: 172 Component Dr., San Jose, CA 95131