OpenCAPI next generation accelerator

OpenCAPI: Next Generation of
Acceleration for the Cognitive Area
—
Brian Allison
OpenCAPI Technology and Enablement
E-mail: ballis@us.ibm.com

Industry Collaboration and Innovation
2

OpenCAPI Topics
Key Messages
Industry Background
Technology Overview
Possible Solutions
OpenCAPI Based Systems
OpenCAPI & CAPI2 Adapters
Design Enablement
Performance Metrics
OpenCAPI Consortium
3

Key Messages Throughout
 IBM’s Strong History in IO Offerings
 OpenCAPI is an Open IO Standard
 Not tied to Power – Architecture Agnostic
 High Performance – No OS/Hypervisor/FW Overhead with
Low Latency and High Bandwidth
 Programing Ease
 Very Low Accelerator Design Overhead
 Supports heterogeneous environment – Use Cases
 Ideal for Accelerated Computing and SCM
 Optimized for within a single system node
 Products exist today!
4

OpenCAPI Topics
Key Messages
5

What’s Driving the Creation of a High Perf. Bus
• Historical silicon technology improvements out of steam
• More cores on a processor help but you’ll never have enough; especially for today’s emerging
workloads (analytics, artificial intelligence, machine learning, real time analysis, etc.)
• New advanced memory technologies are changing the economics of computing
• Companies realizing the need to off load the microprocessors from routine algorithms to
meet demand and improve system performance  Accelerated Computing
• Accelerators - If you are going to use them, you need a lot of data in/out quickly
6
Computation Data Access

© 2018 IBM Corporation
POWER Processor Technology Roadmap
2H12
POWER7+
32 nm
- 2.5x Larger L3 cache
- On-die acceleration
- Zero-power core idle state
- Up to 12 Cores
- SMT8
- CAPI Acceleration
- High Bandwidth GPU Attach
1H14 – 2H161H10
POWER7
45 nm
- 8 Cores
- SMT4
- eDRAM L3 Cache
POWER9 Family
14nm
POWER8 Family
22nm
7
Enterprise
Enterprise
Enterprise &
Big Data Optimized
2H17 – 2H18+
Built for the Cognitive Era
− Enhanced Core and Chip
Architecture Optimized for
Emerging Workloads
− Processor Family with
Scale-Up and Scale-Out
Optimized Silicon
− Premier Platform for
Accelerated Computing

Open Coherent Accelerator Processor Interface
Industry driving these necessary Accelerator attributes
• High performance (move a lot of data quickly; bandwidth, latency)
• Fulfill various accelerator form factors (e.g., GPUs, FPGAs, ASICs)
• Introduction of device coherency requirements
• Emergence of complex Storage and Memory solutions
• Growing demand for network performance with increased
computational demand
• Need to be architecture agnostic to enable the ecosystem growth and
adaption
8

Accelerated Computing Example
IBM’s AC922 built for Accelerated Computing
 Classic system topology setup with two POWER9 sockets, each socket having at
least two NVIDIA GPUs
Does Accelerated Computing work?
• Summit Supercomputer, an IBM-built supercomputer now running at the
Department of Energy’s (DOE) Oak Ridge National Laboratory, captured the
number one spot with a performance of 143,5 petaflops on High Performance
Linpack (HPL), the benchmark used to rank the TOP500 list. Summit has 4,356
nodes, each one equipped with two 22-core Power9 CPUs, and six NVIDIA Tesla
V100 GPUs. The nodes are linked together with a Mellanox dual-rail EDR
InfiniBand network.
9

OpenCAPI Topics
Key Messages
OpenCAPI Consortium 10

Why OpenCAPI and what is it?
 OpenCAPI is a new ‘bottom’s up’ IO standard
 Key Attributes of OpenCAPI 3.0
• Open IO Standard – Choice for developers and others to contribute and grow an ecosystem
• Coherent interface – Microprocessor memory, accelerator and caches share the same memory space
• Architecture agnostic – Capable going beyond Power Architecture
• Not tied to Power – Architecture Agnostic
• High performance – No OS/Hypervisor/FW Overhead for Low Latency and High Bandwidth
• Ease of programing
• Ease of implementation with minimal accelerator design overhead
• Ideal for accelerated computing and SCM including various form factors (FPGA, GPU, ASIC, TPU, etc.)
• Optimized for within a single system node
• Supports heterogeneous environment – Use Cases
 OpenCAPI 3.1
 Applies OpenCAPI technology for use of standard DRAM off the microprocessor
 Based on an Open Memory Interface (OMI)
 Further tuned for extreme lower latency
11

Accelerated
OpenCAPI Device
OpenCAPI Key Attributes
12
TL/DL 25Gb I/O
Any OpenCAPI Enabled Processor
U
Accelerated
Function
TLx/DLx
1. Architecture agnostic bus – Applicable with any system/microprocessor architecture
2. Optimized for High Bandwidth and Low Latency
3. High performance 25Gbps PHY design with zero ‘overhead’
4. Coherency - Attached devices operate natively within application’s user space and coherently with host microprocessor
5. Virtual addressing enables low overhead with no Kernel, hypervisor or firmware involvement; security benefit
6. Wide range of Use Cases and access semantics
7. CPU coherent device memory (Home Agent Memory)
8. Architected for both Classic Memory and emerging Advanced Storage Class Memory
9. Minimal OpenCAPI design overhead (FPGA less than 5%)
Caches
Application
 Storage/Compute/Network etc
 ASIC/FPGA/FFSA
 FPGA, SOC, GPU Accelerator
 Load/Store or Block Access
Standard System Memory Advanced SCM
Solutions
BufferedSystemMemory
OpenCAPIMemoryBuffers
Device Memory

Use Cases - A True Heterogeneous Architecture Built Upon OpenCAPI
OpenCAPI 3.0
OpenCAPI 3.1
OpenCAPI specifications are
downloadable from the website
at www.opencapi.org
- Register
- Download
13

© 2018 IBM Corporation
Proposed POWER Processor Technology and I/O Roadmap
POWER8 Architecture POWER9 Architecture
2014
POWER8
12 cores
22nm
New Micro-
Architecture
New Process
Technology
2016
POWER8
w/ NVLink
12 cores
22nm
Enhanced
Micro-
Architecture
With NVLink
2017
P9 SO
12/24 cores
14nm
New Micro-
Architecture
Direct attach
memory
New Process
Technology
2018
P9 SU
12/24 cores
14nm
Enhanced
Micro-
Architecture
Buffered
Memory
POWER7 Architecture
2010
POWER7
8 cores
45nm
New Micro-
Architecture
New Process
Technology
2012
POWER7+
8 cores
32nm
Enhanced
Micro-
Architecture
New Process
Technology
2020+
P10
TBA cores
New Micro-
Architecture
New
Technology
POWER10
2019
P9
w/ Adv. I/O
12/24 cores
14nm
Enhanced
Micro-
Architecture
New
Memory
Subsystem
150 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
Sustained Memory Bandwidth
Standard I/O Interconnect
Advanced I/O Signaling
Advanced I/O Architecture
210 GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI3.0,
NVLink
350+ GB/s
PCIe Gen4 x48
25 GT/s
300GB/s
CAPI 2.0,
OpenCAPI4.0,
NVLink
435+ GB/s
PCIe Gen5
32 & 50 GT/s
TBA
210 GB/s
PCIe Gen3
N/A
CAPI 1.0
210 GB/s
PCIe Gen3
20 GT/s
160GB/s
CAPI 1.0 ,
NVLink
65 GB/s
PCIe Gen2
N/A
N/A
65 GB/s
PCIe Gen2
N/A
N/A
Statement of Direction, Subject to Change 14

POWER9 IO Features
8 and 16Gbps PHY
Protocols Supported
• PCIe Gen3 x16 and PCIe Gen4 x8
• CAPI 2.0 on PCIe Gen4
PCIeGen4
P9
25Gbs
25Gbps PHY
Protocols Supported
• OpenCAPI 3.0
• NVLink 2.0
Silicon Die
Various packages
(scale-out, scale-up)
POWER9 IO Leading the Industry
• PCIe Gen4
• CAPI 2.0 (Power)
• NVLink 2.0
• OpenCAPI 3.0
POWER9
15

Why do I care about Virtual Addressing?
 An OpenCAPI device operates in the virtual address spaces of the applications that it supports
• Eliminates kernel and device driver software overhead
• Allows device to operate on application memory without kernel-level data copies/pinned pages
• Simplifies programming effort to integrate accelerators into applications
• Culmination => Improves Accelerator Performance
 The Virtual-to-Physical Address Translation occurs in the host CPU
• Reduces design complexity of OpenCAPI accelerator development
• Makes it easier to ensure interoperability between OpenCAPI devices and different CPU architectures
• Security - Since the OpenCAPI device never has access to a physical address, this eliminates the
possibility of a defective or malicious device accessing memory locations belonging to the kernel or
other applications that it is not authorized to access
16

17
OpenCAPI vs I/O Device Driver – Because minimizing SW Path Length is crucial for performance
Typical I/O Model Flow using Device Driver
Flow with a OpenCAPI Model
Shared Memory
Notify Accelerator Acceleration
Shared memory completion
or fast thread wake up
DD Call
Copy or Pin
Source Data
MMIO Notify
Accelerator Acceleration
Poll / Intr
Completion
Copy or Unpin
Result Data
Return From DD
Completion
TL/DL
25Gb I/O
Processor
FPGA/SoC/GPU
Functionn
Function0
Function1
Function2
OpenCAPI
TLx/DLx
Device Memory
Host Memory
Total ~13µs for data prep
Total 0.36µs
400 Instructions 100 Instructions
0.3µs 0.06µs
Application
Dependent, but
Equal to above
3,000 Instructions
1,000 Instructions
1,000 Instructions
4.9µs
300 Instructions 10,000 Instructions
7.9µs
Application
Dependent, but
Equal to below

OpenCAPI Topics
Key Messages

So How is Accelerated Computing Leveraged?
Okay, I’m sold. But how do I leverage this technology?
Isolate and identify your heavily used workloads,
algorithms and/or applications
Place this workload onto an accelerator
Path of least resistance and flexibility is an FPGA that is
programmable
Purchase any of the OpenCAPI vendor FPGA cards and an
OpenCAPI enabled system and start developing your
solution!
19

Comparison of Memory Paradigms
Emerging Storage Class Memory
Processor Chip DLx/TLx
SCMData
OpenCAPI 3.1 Architecture
Ultra Low Latency ASIC Memory buffer chip
adding ~ +7-10 ns on top of native DDR direct
connect!!
Storage Class Memory tiered with traditional DDR
Memory all built upon OpenCAPI 3.1 & 3.0
architecture.
Still have the ability to use Load/Store Semantics
Storage Class Memories have the potential to be
the next disruptive technology…..
Examples include ReRAM, MRAM, Z-NAND……
All are racing to become the defacto
Main Memory
Processor Chip DLx/TLx
DDR4/5
Example: Basic DDR attach
Data
Tiered Memory
Processor Chip
DLx/TLx
DDR4/5
DLx/TLx
SCM
Data
Data
Tier 1 Memory
Tier 2 Memory

Acceleration Paradigms with Great Performance
21
Examples: Encryption, Compression, Erasure prior to
delivering data to the network or storage
Processor Chip
Acc
Data
Egress Transform
DLx/TLx
Processor Chip
Acc
Data
Bi-Directional Transform
Acc
DLx/TLx
Examples: NoSQL such as Neo4J with Graph Node Traversals, etc
Needle-in-a-haystack Engine
Examples: Machine or Deep Learning such as Natural Language processing,
sentiment analysis or other Actionable Intelligence using OpenCAPI attached memory
Memory Transform
Processor Chip
Acc
DataDLx/TLx
Example: Basic work offload
Processor Chip
Acc
NeedlesDLx/TLx
Examples: Database searches, joins, intersections, merges
Only the Needles are sent to the processor
Ingress Transform
Processor Chip
Acc
DataDLx/TLx
Examples: Video Analytics, Network Security,
Deep Packet Inspection, Data Plane Accelerator,
Video Encoding (H.265), High Frequency Trading etc
Needle-In-A-Haystack Engine
Large
Haystack
Of Data
OpenCAPI is ideal for acceleration due
to Bandwidth to/from accelerators, best
of breed latency, and flexibility of an
Open architecture

OpenCAPI Topics
Key Messages

IBM AC922
Air Cooled
23
Power9 Systems with OpenCAPI
System Details
• 2 – Socket 2U
• Up to 40 cores
• Up to 2TB memory (16 DDR4 Dimms)
• 4 Gen4 PCIe Slots, 3 CAPI2.0 Enabled
• 2 2.5” SFF Drive Bays
• 4 OpenPOWER Mezzanine Sockets
• Up to 4 NVLink V100 GPUs
• Up to 4 socketed OpenCAPI Adapters*
• Up to 1 cabled OpenCAPI Cards w/ SlimSAS
adapter*
* Future Support

24
• Dual Socket
• OCP 48V
• Up to 4TB memory (32 DDR4 Dimms)
• 5 PCIe Gen4 Slots, 3 CAPI2.0 Enabled
• 2 x8 OCP Gen4 PCIe Mezz Slots CAPI2.0 Enabled
• 4 x8 25 Gbps SlimSAS Accelerator Ports
• Supports up to 4 OpenCAPI cabled adapters
Google & Rackspace Zaius
Motherboard

Rackspace BarrelEye G2 ServerSed
25
• Zaius Motherboard
• Full-depth 48V
Open Rack V2
• 2OU Chassis
• High Density, Hot
Swap Storage Bay
• 4 x8 25Gb/s
Coherent Attach
Ports
“The OpenCAPI accelerator and
software ecosystem is growing
rapidly. With collateral available via
the Open Compute website,
accelerator developers find it easy to
design and test their solutions on our
platform.”
Adi Gangidi, Senior Design
Engineer with Rackspace

26
• 2 Socket 2U
• Up to 48 cores
• Up to 4TB memory (32 DDR4 DIMMs)
• 4 Gen4 PCIe Slots, CAPI2.0 Enabled
• 6 Gen3 PCIe Slots
• Up to 24 SFF / 12 LFF Drives
• 4 x8 25 Gbps Ports
• Up to 4 cabled OpenCAPI
Adapters*
Mihawk
“In order to provide the best backend architecture in AI, Big Data, and
Cloud applications, Wistron POWER9 system design incorporates OpenCAPI
technology through 25Gbps high speed link to dramatically change the
traditional data transition method. This design not only improves GPU
performance, but also utilizes next generation advanced memory, coherent
network, storage, and FPGA. This is an ideal system infrastructure to
meet next decade computing world challenges.”
Donald Hwang, CTO and President of EBG at Wistron Corporation.

OpenCAPI Topics
Key Messages

28
OpenCAPI and CAPI2 Adapters
Nallatech 250S+
Storage Expansion
• Xilinx US+ KU15P FPGA
• 4 GB DDR4
• PCIe Gen4 x8 and CAPI2
• 4x M.2 Slots
• M.2 to MiniSAS or Oculink for U.2
drive support
CAPI Flash API, Accelerated DB, Burst
Buffer
Nallatech 250-SoC
Multipurpose Converged Network /
Storage
• Xilinx Zynq US+ ZU19EG FPGA
• 8/16 GB DDR4, 4/8 GB DDR4 ARM
• PCIe Gen4 x8 or Gen3 x16, CAPI2
• 4 x8 Oculink Ports support NVMe,
Network, or OpenCAPI
• 2 100Gb QSFP28 Cages
Mellanox Innova2
Network + FPGA
• Xilinx US+ KU15P FPGA
• Mellanox CX5 NIC
• 16 GB DDR4
• PCIe Gen4 x8
• 2 25Gb SFP Cages
• X8 25Gb/s OpenCAPI Support
Network Acceleration (NFV, Packet
Classification), Security Acceleration

29
OpenCAPI and CAPI2 Adapters
AlphaData ADM-9V3
High Performance Reconfigurable
Computing
• Xilinx US+ VU3P FPGA
• 16 / 32 GB DDR4
• PCIe Gen3 x16 or Gen4 x8 and CAPI2
• 2 QSFP28 Cages
• X8 25Gb/s OpenCAPI SlimSAS
Data Center, Network Accel, HPC, HFT
Bittware XUPVV4
Massive FPGA
• Xilinx US+ VU13P FPGA
• 4 288-pin DIMM Slots, DDR4 or Dual QDR
• Up to 512GB DDR4
• PCIe Gen3 x16, CAPI2 Capable
• 2 x8 25Gb/s OpenCAPI Support
Optimized for Thermal Performance for
Large acceleration in the Data Center
AlphaData ADM-9H7
Large FPGA with 8GB HBM
• Xilinx US+ VU37P FPGA + HBM
• 8GB High Bandwidth Memory
• 2 x8 25 Gb/s OpenCAPI Ports (support
up to 50 GB/s)
AlphaData ADM-9H3
Large FPGA with 8GB HBM
• Xilinx Virtex US+ VU33P-3 FPGA + HBM
• 8GB High Bandwidth Memory
• 1 x8 25 Gb/s OpenCAPI Ports (support
up to 50 GB/s)
ML/DL, Inference, System Modeling, HPC

OpenCAPI Topics
Key Messages
30

TLx and DLx Reference Designs in an FPGA
 TLX and DLX will be provided as reference designs to OpenCAPI consortium members
• Associated reference design specifications for TLx and DLx will also be delivered
along with RTL
 Open Verilog – free to enhance, improve or leverage pieces of reference design for
your own accelerator development
 Designed for 64B packet flow running at 400MHz
 Xilinx Vivado 2017.1 TLX and DLX Statistics on VU3P Device
31
VU3P Resources CLB FlipFlops LUT as Logic LUT Memory Block Ram Tile
DLx 9392/788160
(1.19%)
19026/394080
(4.82%)
0/197280
(0%)
7.5/720
(1.0%)
TLx 13806/788160
(1.75%)
8463/394080
(2.14%)
2156/197280
(1.09%)
0/720
(0%)
Total 23108/788160
(2.94%)
27849/394080
(6.98%)
2156/197280
(1.09%)
7.5/720
(1.0%)
Because power efficiency and size do matter

OpenCAPI Device
• Customer application and accelerator
• Operation system enablement
• Little Endian Linux
• Reference Kernel Driver (ocxl)
• Reference User Library (libocxl)
• Hardware and reference designs to
enable coherent acceleration
Core
Processor
OS
App
(software)
Memory (Coherent)
Accelerated
Function(s)
TLx
DLx
25G
ocxl
libocxl
OCSE (OpenCAPI Simulation Environment)
models the red outlined area
OCSE enables AFU and Application co-
simulation when the reference libocxl
and reference TLx/DLx are used
Contributed to the Enablement
Workgroup under the OpenCAPI
consortium
Cable
Memory (Coherent)
Elements of the OpenCAPI Simulation Environment
25G
DL
TL
32

Exerciser Examples – Provided to OCC Members
 MemCopy
• The MemCopy example is a data mover from source address -> destination address
using Virtual Addressing and includes these features
• Configuration and MMIO Register Space
• acTag Table used for Bus/Device/Function and Process ID identification
• 512 processes/contexts and 32 engines supporting up to 2K transfers using 64B,
128B, or 256B operations
 Memory Home Agent
• The Memory Home Agent example implements memory off the endpoint OpenCAPI
accelerator to act as a coherent extension to the host processor memory
• The Memory Home Agent example includes these features
• Configuration and MMIO Register Space
• Individual and pipelined operation for memory loads and stores
• Interrupts, with error details reported to software through MMIO registers
• Sparse Address Mapping feature to extend 1 MB of real space to 4 TB of address
 Open Examples – free to enhance, improve or leverage pieces of exerciser
examples for your own accelerator development 33

Reference Card Design Number 1
 Definition of the FPGA reference card(s) is driven as part of the Enablement Work Group
 Definition of the cable(s) is driven as part of the PHY Mechanical Work Group
 Representative Diagram is articulated below
34
FPGA
OpenCAPI Protocol Over the 25Gbps link
Including Discovery, Configuration and Enumeration
Power and Ground only
Convenient Fixture
Amphenol SlimSAS OpenCAPI
Connector

Reference Card Design Number 2
 Definition of the FPGA reference card(s) is driven as part of the Enablement Work Group
 Definition of the cable(s) will be driven as part of the PHY Mechanical Work Group
 Representative Diagram is articulated below
35
Amphenol SlimSAS OpenCAPI
Connector
Mezzanine based Form Factor

Table of Enablement Deliveries
36
Item Delivery Name Where to Obtain Available
OpenCAPI 3.0 TLx and DLx
Reference Xilinx FPGA Designs (RTL
and Specifications)
<snapshot>.tar.gz Enablement WG Today
Xilinx Vivado Project Build with
Memcopy Exerciser
Vivado Project Flow Enablement WG Today
Device Discovery and Configuration
Specification and RTL
OpenCAPI 3.0 Configuration Sub-
System Reference Design
Specification
Enablement WG Causeway Today
AFU Interface Specification TLX 3.0 Reference Design.pdf Enablement WG Causeway Today
25Gbps PHY Signal Specification OC PHY 25G Specification PHY Signaling WG Causeway Today
25Gbps PHY Mechanical
Specification
25Gbps Interface Mechanical Spec PHY Mechanical WG Causeway Today
OpenCAPI Simulation Environment
(OCSE)
ocse-<version>.tar.gz
OpenCAPIDemokit.pdf
Enablement WG Today
Today
Memcopy and Memory Home
Agent Exercisers
MCP3 and LPC
<snapshot>.tar.gz
Enablement WG Today
Reference Driver Available LIBOCXL Ubuntu 18.04
GitHub
Today
u

SmartDV OpenCAPI VIP Environment
contains:
• Complete regression suite
• Usage examples
• Detailed documentation
• User’s Guide and Release Notes
Benefits
• Complete Verification of
OpenCAPI Design
• Easy to Use
• Simplify Result Analysis
• Runs in every sim environment
37
SmartDV
OpenCAPI Verification IP
http://www.smart-dv.com/vip/opencapi.html

OpenCAPI Topics
Key Messages
38

CAPI and OpenCAPI Performance
39
CAPI 1.0
PCIE Gen3 x8
Measured BW
@8Gb/s
CAPI 2.0
PCIE Gen4 x8
Measured BW
@16Gb/s
OpenCAPI 3.0
25 Gb/s x8
Measured BW
@25Gb/s
128B DMA
Read
3.81 GB/s 12.57 GB/s 22.1 GB/s
128B DMA
Write
4.16 GB/s 11.85 GB/s 21.6 GB/s
256B DMA
Read
N/A 13.94 GB/s 22.1 GB/s
256B DMA
Write
N/A 14.04 GB/s 22.0 GB/s
POWER8
Introduced
in 2013
POWER9
Second
Generation
POWER9
Open Architecture with a
Clean Slate Focused on
Bandwidth and Latency
POWER8 CAPI 1.0
POWER9 CAPI 2.0
and OpenCAPI 3.0
Xilinx
KU60/VU3P FPGA

Latency Test Results
OpenCAPI Link
P9 OpenCAPI
3.9GHz Core, 2.4GHz Nest
Xilinx FPGA VU3P
298ns‡
2ns Jitter
TL, DL, PHY
TLx, DLx, PHYx (80ns‖)
378ns† Total Latency
PCIe G4 Link
P9 PCIe Gen4
Xilinx FPGA VU3P
est. <337ns
PCIe Stack
Xilinx PCIe HIP (218ns¶)
est. <555ns§ Total Latency
PCIe G3 Link
P9 PCIe Gen3
Altera FPGA Stratix V
337ns
7ns Jitter
PCIe Stack
Altera PCIe HIP (400ns¶)
737ns§ Total Latency
PCIe G3 Link
Kaby Lake PCIe Gen3*
Altera FPGA Stratix V
376ns
31ns Jitter
PCIe Stack
Altera PCIe HIP (400ns¶)
776ns§ Total Latency
* Intel Core i7 7700 Quad-Core 3.6GHz (4.2GHz Turbo Boost)
† Derived from round-trip time minus simulated FPGA app time
‡ Derived from round-trip time minus simulated FPGA app time and simulated FPGA TLx/DLx/PHYx time
§ Derived from measured CPU turnaround time plus vendor provided HIP latency
‖ Derived from simulation
¶ Vendor provided latency statistic
RACE TO ZERO LATENCY
BECAUSE JITTER MATTERS
40

OpenCAPI Topics
Key Messages
41

OpenCAPI Consortium Overview
Goals
1. Provide a forum to give the industry ability to innovate the next generation bus protocol
2. Drive hardware/software innovation to enable choice and efficiency in data center
architectures
3. Build an ecosystem with the flexibility to build servers and data centers best suited for their
computational demands
Mission
Create an open coherent high performance bus interface based on a new bus standard called Open
Coherent Accelerator Processor Interface (OpenCAPI) and grow the ecosystem that utilizes this interface.
Incorporated September 13, 2016
Announced October 14, 2016
42

OpenCAPI Consortium
• Open forum founded by AMD, Google, IBM, Mellanox, and Micron
• Manage the OpenCAPI specification, establish enablement, grow the ecosystem
• Currently about 40 members
• Why its own consortium? Architecture agnostic thus capable of going beyond Power Architecture
• Consortium now established
• Established Board of Directors (AMD, Google, IBM, Mellanox Technologies, Micron, NVIDIA, Western
Digital, Xilinx)
• Governing Documents (Bylaws, IPR Policy, Membership) with established Membership Levels
• Technical Steering Committee and Marketing/Communications Committee
• Website www.opencapi.org
• OpenCAPI 3.0 and 3.1 Specifications available as contributed to consortium as starting point for the Work Groups
• OpenCAPI 4.0 now added to the web site !
• AFU Coherent Data Caching of System Memory
• AFU Address Translation Caching (allows posted operations to system memory)
Incorporated September 13, 2016
Announced October 14, 2016
43

OpenCAPI Workgroup Status
39
Item
OpenCAPI Technical Steering Committee
Marketing & Communications Committee
PHY Signaling Workgroup
PHY Mechanical Workgroup
TL Architecture Specification Workgroup
DL Architecture Specification Workgroup
Enablement Workgroup
Compliance Workgroup
Accelerator/Memory Workgroup

45
Item Availability
OpenCAPI Technical Steering Committee Up and running
Marketing & Communications Committee Up and running
PHY Signaling Workgroup Up and running
PHY Mechanical Workgroup Up and running
TL Architecture Specification Workgroup Up and running
DL Architecture Specification Workgroup Up and running
Enablement Workgroup Up and running
Compliance Workgroup Up and running
Accelerator/Memory Workgroup Forthcoming

46
Item Availability Specs Available for Review
OpenCAPI Technical Steering Committee Up and running WG Process Spec
Marketing & Communications Committee Up and running Regular News Letters
PHY Signaling Workgroup Up and running Today
PHY Mechanical Workgroup Up and running May 2019
TL Architecture Specification Workgroup Up and running Today
DL Architecture Specification Workgroup Up and running April 2019
Enablement Workgroup Up and running Today
Compliance Workgroup Up and running Ongoing
Accelerator/Memory Workgroup Forthcoming --

Cross Industry Collaboration and Innovation
47
OpenCAPI Protocol
Welcoming new members in all areas of the ecosystem
Systems and Software
SW
Research &
Academic
Products and Services
Deployment
SOC
Accelerator Solutions

Membership Entitlement Details
Strategic level - $25K
• Draft and Final Specifications and
enablement
• License for Product development
• Workgroup participation and voting
• TSC participation
• Vote on new Board Members
• Nominate and/or run for officer election
• Prominent listing in appropriate materials
Observing level - $5K
• Final Specifications and enablement
Contributor level - $15K
• Draft and Final Specifications and
enablement
• TSC participation
• Submit proposals
Academic and Non-Profit level - Free
• Final Specifications and enablement
48

OpenCAPI Consortium Next Steps
JOIN TODAY!
www.opencapi.org
49

OpenCAPI next generation accelerator

More Related Content

What's hot (20)

Similar to OpenCAPI next generation accelerator (20)

More from Ganesan Narayanasamy (20)

Recently uploaded (20)

OpenCAPI next generation accelerator