SlideShare a Scribd company logo
Dec 7, 2013
CARL2013@Davis, CA

PyCoRAM
Yet Another Implementation of CoRAM Memory
Architecture for Modern FPGA-based Computing

Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe*
†Tokyo

Institute of Technology

‡JSPS

Research Fellow

*Carnegie

Mellon University
Agenda

n  Background
n  PyCoRAM Overview
n  PyCoRAM Microarchitecture
n  Evaluation
n  Conclusion

Dec 7, 2013

Shinya T-Y. Tokyo Tech

2
Background

Dec 7, 2013

Shinya T-Y. Tokyo Tech

3
FPGA as SoC
n  Put together various components on a single FPGA
l  CPU core

FPGA

•  Microblaze (Soft-macro)
•  Cortex-A9 (Hard-macro)

CPU

HW
Acc

HW
Acc

l  Hardware accelerator logic
Interconnect

•  Modeled in traditional RTL
–  Verilog HDL, VHDL

•  Modeled in new modeling tool

Ether

DRAM
I/F

PCI-E

–  Bluespec, AutoESL, Chisel, …

l  DDRx DRAM interface
l  PCI-express
l  Ethernet, …
Dec 7, 2013

Shinya T-Y. Tokyo Tech

4
Portability Issue of Application Design
n  How to support various FPGA platforms?
l  Different logic size,
memory interface,
peripherals and I/O propertyL

Digilent Atlys
(Xilinx Spartan-6 LX45)

ScalableCore System (our FPGA system)
(Xilinx Spartan-6 LX16 × 128-node)
Dec 7, 2013

Shinya T-Y. Tokyo Tech

Xilinx ML605
(Xilinx Virtex-6 LX240T)
5
IP-core Based System Development
n  To build a system, add IP-cores and connect themJ
l  IP-cores are connected through a standard on-chip interconnect
l  EDK automatically generates an on-chip interconnection and
(some) device-dependent interfaces
•  No (or few) annoying steps!
IP-core List
FPGA
CPU
IP-core
Instances

HW
Acc

HW
Acc

Interconnect
Ether

Interconnect

DRAM
I/F

PCI-E

DRAM
Dec 7, 2013

Xilinx Platform Studio (XPS)

Shinya T-Y. Tokyo Tech

6
Abstract Memory System for FPGAs
n  CoRAM (Connected RAM) [FPGA’11]
l  High-level abstraction for memory management
•  Decoupling computing logics and memory access behaviors
•  Memory access patterns in software model (C language)
Read/Write

Communication
FIFOs (Registers)

CoRAM
Channel

Read/Write

Read
Write

CoRAM
Memory

Abstracted
On-chip Memories

HW Kernels
(Computing Logics)

Dec 7, 2013

Off-chip
Memory

Shinya T-Y. Tokyo Tech

Manage
Control Threads
(Memory Access
Pattern in C)

7
What  “Runs”  CoRAM?

From
CoRAM Tutorial
@FPGA’13

RTL conversion
Core Logic
SRAM

Control
thread
programs

Architecture
Microarchitecture

FPGA

Network-on-Chip
Memory Translation (TLBs)
Memory Interfaces and Caches

6/19/2013

Dec 7, 2013

CoRAM Tutorial

Shinya T-Y. Tokyo Tech

Cluster DMA

Control Logic

Fabric

Cluster DMA

Fabric

Control Logic

High-level synthesis
from C to RTL
using LLVM

CONNECT
NoC generator
18

8
PyCoRAM Overview

Dec 7, 2013

Shinya T-Y. Tokyo Tech

9
Motivation: CoRAM for EDK
n  Integration of CoRAM memory architecture for modern
EDK-based development flow with standard IP-cores
Portable application
design with CoRAM

Cooperation with standard IP-cores

Accelerator logic
Standard IP-core

CPU core

CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces

Dec 7, 2013

Shinya T-Y. Tokyo Tech

10
PyCoRAM
n  Python-based implementation of CoRAM memory
architecture for modern FPGA EDKs
l  CoRAM memory abstraction for EDK development flow

n  Key features
l  Control Thread in Python
•  We developed Python-to-Verilog HLS Compiler from scratch

l  AMBA AXI4 Interconnect for on-chip interconnect
•  For IP-core based development on Xilinx Platform Studio (XPS)

l  Parameterized RTL Design Support for User-logic
•  Generate-statement and Parameter-statement analyzed by our
original Verilog analysis tool-chain (Pyverilog)

Dec 7, 2013

Shinya T-Y. Tokyo Tech

11
Comparison with Original CoRAM
CoRAM

PyCoRAM

Language
for Control-Thread

C

Python

Supported
Memory Operations

(Blocking/Non-Blocking)
Read/Write

(Blocking/Non-Blocking)
Read/Write

On-chip Interconnect

CONNECT NoC [FPGA’12]

AMBA AXI4

FSM Granularity
in Control Thread

LLVM-IR

Python AST Node

Generate Statement
Support for User logics

No

Yes

Supported FPGAs

Xilinx ML605
Altera Terasic DE-4

Any FPGAs
supporting AXI Bus

# Lines of Code

11,682 lines
(w/o CONNECT)

4,922 lines
(w/o Pyverilog)

FSM: Finite State Machine
LLVM-IR: Low Level Virtual Machine Intermediate Representation
AST: Abstract Syntax Tree
Shinya T-Y. Tokyo Tech
Dec 7, 2013

12
PyCoRAM Development Flow
n  PyCoRAM generates an IP-core package from user-logic
RTLs and control thread scripts in Python
l  Each part can be replaced with the original CoRAM’s component
RTL Conversion

User-logic
(Verilog HDL)

Control
Threads
(Python)

Portable
Application
Design

Dec 7, 2013

Logic
Hierarchy
Analysis

Python-toVerilog
Compilation

Control
Signal
Insertion

IP-core
generation
with AXI4
Interface
IP-core
Packing

Control
Signal Port
Addition

PyCoRAM Tool-chain
Python-to-Verilog HLS
Shinya T-Y. Tokyo Tech

(RTL,
.mpd,
and
.pao)

Top design
synthesis with
AXI4
IP-core
Integration
on EDK

Synthesis

FPGA
Bit
File

Vendor EDA Flow
13
FPGA Accelerator with PyCoRAM IP-core
FPGA

Other
AXI
IP-core
or
CPU

PyCoRAM IP
(Application)

CoRAM
Memory
DMA
Cluster

HW Kernels
(Computing Logics)

CoRAM
Memory

DMAC

AXI I/F

CoRAM
Channel

CoRAM
Stream

CoRAM
Stream

DMAC

DMAC

DMAC

AXI I/F

AXI I/F

AXI I/F

CoRAM
Memory
DMA
Cluster

Control
Thread

CoRAM
Memory

FSM

AXI4 Interconnect

DRAM Controller

DRAM (Off-chip)

Dec 7, 2013

Shinya T-Y. Tokyo Tech

14
PyCoRAM
Microarchitecture

Dec 7, 2013

Shinya T-Y. Tokyo Tech

15
PyCoRAM Microarchitecture (Logical View)
GPIO

User
I/O

User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel
CoRAM
Memory
DMAC

Dec 7, 2013

CoRAM
Stream
DMAC

Shinya T-Y. Tokyo Tech

FSM

16
PyCoRAM Microarchitecture (Logical View)
Modeled in RTL
(Verilog HDL) User
I/O

GPIO

User Logic
CoRAM
Register

Control
Thread

Memory Access
Pattern
in Python

CoRAM
Channel
CoRAM
Memory
DMAC

Dec 7, 2013

CoRAM
Stream
DMAC

Shinya T-Y. Tokyo Tech

FSM

17
PyCoRAM Microarchitecture (Physical View)
GPIO

User
I/O

PyCoRAM IP
User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel
CoRAM
Memory

CoRAM
Stream

DMAC

DMAC

AXI I/F

AXI I/F

FSM

AXI4 Interconnect

FPGA
Dec 7, 2013

DRAM Controller
Shinya T-Y. Tokyo Tech

18
PyCoRAM Microarchitecture (Physical View)
GPIO

User
I/O

Control Thread
in Python

PyCoRAM IP
User Logic
CoRAM
Register

Control
Thread

CoRAM
Channel

Parameterized RTL
CoRAM
Design Support
Memory

CoRAM
Stream

DMAC

DMAC

AXI I/F

AXI I/F

FSM

AXI4 Master Interface

AXI4 Interconnect

FPGA
Dec 7, 2013

DRAM Controller
Shinya T-Y. Tokyo Tech

19
Control Thread in Python
n  Operations for CoRAM objects
l  To/from CoRAM Memory

User
I/O

User Logic

•  Data movement pattern with DMA operations
between on-chip CoRAM memory and DRAM

l  To/from CoRAM Channel
•  Token communication action
between user-logic and control thread

Control
Thread

CoRAM
Channel
CoRAM
Memory

FSM

DMAC

0� def calc_sum(times):�
ram = CoramMemory(idx=0, datawidth=32, size=1024)�
1�
channel = CoramChannel(idx=0, datawidth=32)�
2�
addr = 0�
3�
sum = 0�
4�
for i in range(times):�
5�
ram.write(0, addr, 128)� # Transfer (off-chip DRAM to BRAM)
6�
channel.write(addr)�
# Notification to User-logic
7�
sum += channel.read()� # Wait for Notification from User-logic
8�
addr += 128 * (32/8)�
9�
print(‘sum=’, sum)�
# $display Verilog system task
10�
�
11� calc_sum(8)�
Dec 7, 2013

Shinya T-Y. Tokyo Tech

20
CoRAM objects in User Logic
n  CoRAM objects as standard BRAM or FIFO
l  Very similar interface to the standard memory components
l  User-logic can use their contents in them in the same way

n  Essential parameters to define object characteristics
l  Thread name, ID, data width, address length, …

CoramMemory1P�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(ADDR_LEN),�
.CORAM_DATA_WIDTH(DATA_WIDTH)�
)�
inst_memory�
(.CLK(CLK),�
.ADDR(mem_addr),�
.D(mem_d),�
.WE(mem_we),�
.Q(mem_q)�
);�
Dec 7, 2013

CoramChannel�
#(�
.CORAM_THREAD_NAME("thread_name"),�
.CORAM_ID(0),�
.CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),�
.CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)�
)�
inst_channel�
(.CLK(CLK),�
.RST(RST),�
.D(comm_d),�
.ENQ(comm_enq),�
.FULL(comm_full),�
.Q(comm_q),�
.DEQ(comm_deq),�
.EMPTY(comm_empty)�
);�

(a) CoRAM Memory

(b) CoRAM Channel

Shinya T-Y. Tokyo Tech

21
AXI4 Master Interface
n  DMA controller works as AXI4 master IP-core interface
WrData

Enque

AlmFull

WrData

Enque

AlmFull

Empty
Deque

RdData

FSM

Addr

Size

RdEn

RdEn

Ready

WrData

Enque

AlmFull

Control
Thread

RdEn
Busy
Ready

RdData

・・・

WrEn

DMA Controller

Deque

RdData

Empty
Deque

WrEn

WrData

BramAddr
DramAddr
Size

Empty

CoRAM
Channel

DMA
Cluster

WrEn

Addr

WrData

・・・

RdData

CoRAM
Memory
(BRAM)

WrEn

WrData

RdData

Addr

CoRAM
Memory
(BRAM)

RdData

Addr

WrEn

RdData

WrData

Addr

HW Kernels
(Computing Logic)

Write Address
Channel

Write Data
Channel

Read Address
Channel

RDATA

RREADY

RVALID

ARADDR

ARLEN

ARVALID

ARREADY

WDATA

BVALID

WVALID

WREADY

AWADDR

AWLEN

AWVALID

AWREADY

AXI Master Interface
(Protocol Conversion)

Read Data
Channel

AXI4 Interconnect

Dec 7, 2013

Shinya T-Y. Tokyo Tech

22
For Parameterized RTL design support
n  Generate-statement support by advanced RTL analyzer
l  Not supported by the original CoRAM compiler
Dataflow

n  Pyverilog: Python-based Tool-chain
for Verilog HDL Design
l  Parser
l  Dataflow Analysis
l  Optimization
l  RTL Code Generation
l  Control flow Analysis
l  Graphical Output
Dec 7, 2013

State Machine
Shinya T-Y. Tokyo Tech

23
Evaluation

Dec 7, 2013

Shinya T-Y. Tokyo Tech

24
Evaluation
n  Point: Maximum memory bandwidth utilization
l  PyCoRAM is a memory abstraction framework

n  Setup
l  2 FPGA boards
•  Digilent Atlys
–  Spartan-6 LX45
–  DDR2-800 DRAM 128MB (1.2GB/s*)

*Due to 300MHz operation

–  AXI4 128-bit, 100MHz (1.6GB/s)

•  Xilinx ML605

Digilent Atlys
(Xilinx Spartan-6 LX45)

–  Virtex-6 LX240T
–  DDR3-800 DRAM 512MB (6.4GB/s)
–  AXI4 256-bit, 200MHz (6.4GB/s)

l  EDK
•  Xilinx Platform Studio (14.6)
Dec 7, 2013

Shinya T-Y. Tokyo Tech

Xilinx ML605
(Xilinx Virtex-6 LX240T)

25
Evaluation: Application
n  Array-sum: calculate summation value of an array
l  Two CoRAM memories as Double-buffered
l  Varying SIMD width (=# simultaneous ops) to check the effect to
the memory bandwidth utilization
•  4, 8, 16, 32, 64 (bytes)
sum

Output

+

MUX

s3

s2

s1

s0

+

+

+

+

D[3]

D[2]

D[1]

D[0]

MUX
CoRAM
Memory
0

Dec 7, 2013

D[3
]

D[2
]

D[1
]

D[0
]

from DMA Controller 0

D[3
]

D[2
]

D[1
]

D[0
]

from DMA Controller 1

Shinya T-Y. Tokyo Tech

CoRAM
Memory
1

26
Memory Bandwidth Utilization
n  Good bandwidth utilization
l  Atlys: 85.5% (at 16-byte)
l  ML605: 84.9% (at 64-byte)

n  Degradation reasons
l  Sequential (single) transaction for each DMA controller

1

Atlys (Spartan-6)
Bandwidth Utilization

Bandwidth Utilization

•  Memory latency directly affects the performance adversely

0.8
0.6
0.4
0.2
0
4

Dec 7, 2013

8
16
SIMD size [byte]

1

ML605 (Virtex-6)

0.8
0.6
0.4
0.2

32
Shinya T-Y. Tokyo Tech

0
4

8
16
32
SIMD size [byte]

64

27
Conclusion and …

Dec 7, 2013

Shinya T-Y. Tokyo Tech

28
Conclusion
n  PyCoRAM: Python-based implementation of CoRAM
memory architecture for modern FPGA EDKs
n  Future work
l  Further evaluation on more realistic applications
l  AXI4 slave feature for control thread
l  Tutorial slideJ
Portable application
design with CoRAM

Cooperation with standard IP-cores

Accelerator logic
Standard IP-core

CPU core

CoRAM
Abstraction
Standard On-chip Interconnect
Device-dependent Interfaces

Dec 7, 2013

Automatically managed by EDK
Shinya T-Y. Tokyo Tech

29
PyCoRAM and Pyverilog are ready for public!

n PyCoRAM (0.7.0-public)
l https://github.com/shtaxxx/PyCoRAM

n Pyverilog (0.6.0-public)
l https://github.com/shtaxxx/Pyverilog

Thanks!
Dec 7, 2013

Shinya T-Y. Tokyo Tech

30

More Related Content

PDF
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
PDF
A CGRA-based Approach for Accelerating Convolutional Neural Networks
PDF
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
PPTX
小型安価なFPGAボードの紹介と任意波形発生器
PDF
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
PDF
ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発
PDF
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
PDF
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...
A CGRA-based Approach for Accelerating Convolutional Neural Networks
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
小型安価なFPGAボードの紹介と任意波形発生器
助教が吼える! 各界の若手研究者大集合「ハードウェアはやわらかい」
ディープニューラルネットワーク向け拡張可能な高位合成コンパイラの開発
Pythonによるカスタム可能な高位設計技術 (Design Solution Forum 2016@新横浜)
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...

What's hot (20)

PDF
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
PPTX
An open flow for dn ns on ultra low-power RISC-V cores
PDF
ゆるふわコンピュータ (IPSJ-ONE2017)
PDF
RISC-V 30908 patra
PDF
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
PDF
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
PDF
FPGAs for Supercomputing: The Why and How
PDF
AI is Impacting HPC Everywhere
PPTX
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
PDF
Improve Vectorization Efficiency
PDF
A Library for Emerging High-Performance Computing Clusters
PDF
RISC-V Linker Relaxation and LLD
PPTX
Reverse Engineering of Rocket Chip
PPTX
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
PDF
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
KEY
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
PDF
Andes open cl for RISC-V
PPTX
RISC-V assembly
PDF
RISC-V Zce Extension
PPTX
LEGaTO Integration
Klessydra-T: Designing Configurable Vector Co-Processors for Multi-Threaded E...
An open flow for dn ns on ultra low-power RISC-V cores
ゆるふわコンピュータ (IPSJ-ONE2017)
RISC-V 30908 patra
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Debian Linux on Zynq (Xilinx ARM-SoC FPGA) Setup Flow (Vivado 2015.4)
FPGAs for Supercomputing: The Why and How
AI is Impacting HPC Everywhere
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Improve Vectorization Efficiency
A Library for Emerging High-Performance Computing Clusters
RISC-V Linker Relaxation and LLD
Reverse Engineering of Rocket Chip
An Open Discussion of RISC-V BitManip, trends, and comparisons _ Cuff
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
A compiler approach_to_fast_hardware_design_exploration_in_fpga-based-systems
Andes open cl for RISC-V
RISC-V assembly
RISC-V Zce Extension
LEGaTO Integration
Ad

Viewers also liked (20)

PDF
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
PDF
マルチパラダイム型高水準ハードウェア設計環境の検討
PDF
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PDF
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
PDF
Zynq+PyCoRAM(+Debian)入門
PPTX
8051 memory
PDF
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
PPT
Fpga 02-memory-and-pl ds
PPTX
Direct memory access
PDF
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
PPTX
memory 8051
PPT
8051 Inturrpt
PDF
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
PPT
Interrupt programming with 8051 microcontroller
PDF
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
PPT
8 interrupt 8051
PPT
PPTX
8086 Interrupts & With DOS and BIOS by vijay
PDF
Interaction Networks for Learning about Objects, Relations and Physics
PDF
Conditional Image Generation with PixelCNN Decoders
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
マルチパラダイム型高水準ハードウェア設計環境の検討
PyCoRAMによるPythonを用いたポータブルなFPGAアクセラレータ開発 (チュートリアル@ESS2014)
PythonとPyCoRAMでお手軽にFPGAシステムを開発してみよう
Zynq+PyCoRAM(+Debian)入門
8051 memory
PyCoRAM (高位合成友の会@ドワンゴ, 2015年1月16日)
Fpga 02-memory-and-pl ds
Direct memory access
Pythonによる高位設計フレームワークPyCoRAMでFPGAシステムを開発してみよう
memory 8051
8051 Inturrpt
PyCoRAM: Python-Verilog高位合成とメモリ抽象化によるFPGAアクセラレータ向けIPコア開発フレームワーク (FPGAX #05)
Interrupt programming with 8051 microcontroller
DL Hacks輪読 Semi-supervised Learning with Deep Generative Models
8 interrupt 8051
8086 Interrupts & With DOS and BIOS by vijay
Interaction Networks for Learning about Objects, Relations and Physics
Conditional Image Generation with PixelCNN Decoders
Ad

Similar to PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46) (20)

PPTX
PyCoRAM_fpga_verilog_Detailed_Presentation.pptx
PDF
Implementing an interface in r to communicate with programmable fabric in a x...
PDF
Python in the real world : from everyday applications to advanced robotics
PDF
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
PPTX
DESIGN CHOICES FOR EMBEDDED REAL-TIME CONTROL SYSTEMS @ 4th FPGA Camp
PDF
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
PDF
Course: "Introductory course to HLS FPGA programming"
PPTX
מצגת פרויקט
PDF
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
PDF
Lecture 1 - Introduction to embedded system and Robotics
PPTX
tau 2015 spyrou fpga timing
PDF
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
PPTX
Microcontroller from basic_to_advanced
PDF
Transformation and dynamic visualization of images from computer through an F...
PPT
lecture1-244.ppt
PPTX
Advanced embedded training details & syllabus - Wiztech
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
week15a.pdf
PPTX
Introduction to Embedded C++.pptx
PPT
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
PyCoRAM_fpga_verilog_Detailed_Presentation.pptx
Implementing an interface in r to communicate with programmable fabric in a x...
Python in the real world : from everyday applications to advanced robotics
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
DESIGN CHOICES FOR EMBEDDED REAL-TIME CONTROL SYSTEMS @ 4th FPGA Camp
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
Course: "Introductory course to HLS FPGA programming"
מצגת פרויקט
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
Lecture 1 - Introduction to embedded system and Robotics
tau 2015 spyrou fpga timing
cReComp : Automated Design Tool for ROS-Compliant FPGA Component
Microcontroller from basic_to_advanced
Transformation and dynamic visualization of images from computer through an F...
lecture1-244.ppt
Advanced embedded training details & syllabus - Wiztech
International Journal of Computational Engineering Research(IJCER)
week15a.pdf
Introduction to Embedded C++.pptx
Lec8 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...

More from Shinya Takamaeda-Y (14)

PDF
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
PDF
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
PDF
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
PDF
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
PDF
PythonとVeriloggenを用いたRTL設計メタプログラミング
PDF
Pythonを用いた高水準ハードウェア設計環境の検討
PDF
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
PDF
FPGA・リコンフィギャラブルシステム研究の最新動向
PDF
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
PDF
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
PDF
むかし名言集bot作りました!
PDF
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
PDF
Mapping Applications with Collectives over Sub-communicators on Torus Network...
PDF
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...
オープンソースコンパイラNNgenでつくるエッジ・ディープラーニングシステム
DNNのモデル特化ハードウェアを生成するオープンソースコンパイラNNgenのデモ
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Thread & Stream: 最高性能FPGAコンピューティングを 目指したミックスドパラダイム型高位合成 (FPGAX 201...
PythonとVeriloggenを用いたRTL設計メタプログラミング
Pythonを用いた高水準ハードウェア設計環境の検討
コンピュータアーキテクチャ研究の最新動向〜ISCA2015参加報告〜 @FPGAエクストリーム・コンピューティング 第7回 (#fpgax #7)
FPGA・リコンフィギャラブルシステム研究の最新動向
PyCoRAMを用いたグラフ処理FPGAアクセラレータ
メモリ抽象化フレームワークPyCoRAMを用いたソフトプロセッサ混載FPGAアクセラレータの開発
むかし名言集bot作りました!
APGAS言語X10を用いたオンチップネットワークシミュレーションの並列化
Mapping Applications with Collectives over Sub-communicators on Torus Network...
Network Performance of Multifunction On-chip Router Architectures (IEICE-CPSY...

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Review of recent advances in non-invasive hemoglobin estimation
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Review of recent advances in non-invasive hemoglobin estimation

PyCoRAM: Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing (CARL2013 co-located with MICRO-46)

  • 1. Dec 7, 2013 CARL2013@Davis, CA PyCoRAM Yet Another Implementation of CoRAM Memory Architecture for Modern FPGA-based Computing Shinya Takamaeda-Yamazaki†‡, Kenji Kise†, James C. Hoe* †Tokyo Institute of Technology ‡JSPS Research Fellow *Carnegie Mellon University
  • 2. Agenda n  Background n  PyCoRAM Overview n  PyCoRAM Microarchitecture n  Evaluation n  Conclusion Dec 7, 2013 Shinya T-Y. Tokyo Tech 2
  • 3. Background Dec 7, 2013 Shinya T-Y. Tokyo Tech 3
  • 4. FPGA as SoC n  Put together various components on a single FPGA l  CPU core FPGA •  Microblaze (Soft-macro) •  Cortex-A9 (Hard-macro) CPU HW Acc HW Acc l  Hardware accelerator logic Interconnect •  Modeled in traditional RTL –  Verilog HDL, VHDL •  Modeled in new modeling tool Ether DRAM I/F PCI-E –  Bluespec, AutoESL, Chisel, … l  DDRx DRAM interface l  PCI-express l  Ethernet, … Dec 7, 2013 Shinya T-Y. Tokyo Tech 4
  • 5. Portability Issue of Application Design n  How to support various FPGA platforms? l  Different logic size, memory interface, peripherals and I/O propertyL Digilent Atlys (Xilinx Spartan-6 LX45) ScalableCore System (our FPGA system) (Xilinx Spartan-6 LX16 × 128-node) Dec 7, 2013 Shinya T-Y. Tokyo Tech Xilinx ML605 (Xilinx Virtex-6 LX240T) 5
  • 6. IP-core Based System Development n  To build a system, add IP-cores and connect themJ l  IP-cores are connected through a standard on-chip interconnect l  EDK automatically generates an on-chip interconnection and (some) device-dependent interfaces •  No (or few) annoying steps! IP-core List FPGA CPU IP-core Instances HW Acc HW Acc Interconnect Ether Interconnect DRAM I/F PCI-E DRAM Dec 7, 2013 Xilinx Platform Studio (XPS) Shinya T-Y. Tokyo Tech 6
  • 7. Abstract Memory System for FPGAs n  CoRAM (Connected RAM) [FPGA’11] l  High-level abstraction for memory management •  Decoupling computing logics and memory access behaviors •  Memory access patterns in software model (C language) Read/Write Communication FIFOs (Registers) CoRAM Channel Read/Write Read Write CoRAM Memory Abstracted On-chip Memories HW Kernels (Computing Logics) Dec 7, 2013 Off-chip Memory Shinya T-Y. Tokyo Tech Manage Control Threads (Memory Access Pattern in C) 7
  • 8. What  “Runs”  CoRAM? From CoRAM Tutorial @FPGA’13 RTL conversion Core Logic SRAM Control thread programs Architecture Microarchitecture FPGA Network-on-Chip Memory Translation (TLBs) Memory Interfaces and Caches 6/19/2013 Dec 7, 2013 CoRAM Tutorial Shinya T-Y. Tokyo Tech Cluster DMA Control Logic Fabric Cluster DMA Fabric Control Logic High-level synthesis from C to RTL using LLVM CONNECT NoC generator 18 8
  • 9. PyCoRAM Overview Dec 7, 2013 Shinya T-Y. Tokyo Tech 9
  • 10. Motivation: CoRAM for EDK n  Integration of CoRAM memory architecture for modern EDK-based development flow with standard IP-cores Portable application design with CoRAM Cooperation with standard IP-cores Accelerator logic Standard IP-core CPU core CoRAM Abstraction Standard On-chip Interconnect Device-dependent Interfaces Dec 7, 2013 Shinya T-Y. Tokyo Tech 10
  • 11. PyCoRAM n  Python-based implementation of CoRAM memory architecture for modern FPGA EDKs l  CoRAM memory abstraction for EDK development flow n  Key features l  Control Thread in Python •  We developed Python-to-Verilog HLS Compiler from scratch l  AMBA AXI4 Interconnect for on-chip interconnect •  For IP-core based development on Xilinx Platform Studio (XPS) l  Parameterized RTL Design Support for User-logic •  Generate-statement and Parameter-statement analyzed by our original Verilog analysis tool-chain (Pyverilog) Dec 7, 2013 Shinya T-Y. Tokyo Tech 11
  • 12. Comparison with Original CoRAM CoRAM PyCoRAM Language for Control-Thread C Python Supported Memory Operations (Blocking/Non-Blocking) Read/Write (Blocking/Non-Blocking) Read/Write On-chip Interconnect CONNECT NoC [FPGA’12] AMBA AXI4 FSM Granularity in Control Thread LLVM-IR Python AST Node Generate Statement Support for User logics No Yes Supported FPGAs Xilinx ML605 Altera Terasic DE-4 Any FPGAs supporting AXI Bus # Lines of Code 11,682 lines (w/o CONNECT) 4,922 lines (w/o Pyverilog) FSM: Finite State Machine LLVM-IR: Low Level Virtual Machine Intermediate Representation AST: Abstract Syntax Tree Shinya T-Y. Tokyo Tech Dec 7, 2013 12
  • 13. PyCoRAM Development Flow n  PyCoRAM generates an IP-core package from user-logic RTLs and control thread scripts in Python l  Each part can be replaced with the original CoRAM’s component RTL Conversion User-logic (Verilog HDL) Control Threads (Python) Portable Application Design Dec 7, 2013 Logic Hierarchy Analysis Python-toVerilog Compilation Control Signal Insertion IP-core generation with AXI4 Interface IP-core Packing Control Signal Port Addition PyCoRAM Tool-chain Python-to-Verilog HLS Shinya T-Y. Tokyo Tech (RTL, .mpd, and .pao) Top design synthesis with AXI4 IP-core Integration on EDK Synthesis FPGA Bit File Vendor EDA Flow 13
  • 14. FPGA Accelerator with PyCoRAM IP-core FPGA Other AXI IP-core or CPU PyCoRAM IP (Application) CoRAM Memory DMA Cluster HW Kernels (Computing Logics) CoRAM Memory DMAC AXI I/F CoRAM Channel CoRAM Stream CoRAM Stream DMAC DMAC DMAC AXI I/F AXI I/F AXI I/F CoRAM Memory DMA Cluster Control Thread CoRAM Memory FSM AXI4 Interconnect DRAM Controller DRAM (Off-chip) Dec 7, 2013 Shinya T-Y. Tokyo Tech 14
  • 16. PyCoRAM Microarchitecture (Logical View) GPIO User I/O User Logic CoRAM Register Control Thread CoRAM Channel CoRAM Memory DMAC Dec 7, 2013 CoRAM Stream DMAC Shinya T-Y. Tokyo Tech FSM 16
  • 17. PyCoRAM Microarchitecture (Logical View) Modeled in RTL (Verilog HDL) User I/O GPIO User Logic CoRAM Register Control Thread Memory Access Pattern in Python CoRAM Channel CoRAM Memory DMAC Dec 7, 2013 CoRAM Stream DMAC Shinya T-Y. Tokyo Tech FSM 17
  • 18. PyCoRAM Microarchitecture (Physical View) GPIO User I/O PyCoRAM IP User Logic CoRAM Register Control Thread CoRAM Channel CoRAM Memory CoRAM Stream DMAC DMAC AXI I/F AXI I/F FSM AXI4 Interconnect FPGA Dec 7, 2013 DRAM Controller Shinya T-Y. Tokyo Tech 18
  • 19. PyCoRAM Microarchitecture (Physical View) GPIO User I/O Control Thread in Python PyCoRAM IP User Logic CoRAM Register Control Thread CoRAM Channel Parameterized RTL CoRAM Design Support Memory CoRAM Stream DMAC DMAC AXI I/F AXI I/F FSM AXI4 Master Interface AXI4 Interconnect FPGA Dec 7, 2013 DRAM Controller Shinya T-Y. Tokyo Tech 19
  • 20. Control Thread in Python n  Operations for CoRAM objects l  To/from CoRAM Memory User I/O User Logic •  Data movement pattern with DMA operations between on-chip CoRAM memory and DRAM l  To/from CoRAM Channel •  Token communication action between user-logic and control thread Control Thread CoRAM Channel CoRAM Memory FSM DMAC 0� def calc_sum(times):� ram = CoramMemory(idx=0, datawidth=32, size=1024)� 1� channel = CoramChannel(idx=0, datawidth=32)� 2� addr = 0� 3� sum = 0� 4� for i in range(times):� 5� ram.write(0, addr, 128)� # Transfer (off-chip DRAM to BRAM) 6� channel.write(addr)� # Notification to User-logic 7� sum += channel.read()� # Wait for Notification from User-logic 8� addr += 128 * (32/8)� 9� print(‘sum=’, sum)� # $display Verilog system task 10� � 11� calc_sum(8)� Dec 7, 2013 Shinya T-Y. Tokyo Tech 20
  • 21. CoRAM objects in User Logic n  CoRAM objects as standard BRAM or FIFO l  Very similar interface to the standard memory components l  User-logic can use their contents in them in the same way n  Essential parameters to define object characteristics l  Thread name, ID, data width, address length, … CoramMemory1P� #(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(ADDR_LEN),� .CORAM_DATA_WIDTH(DATA_WIDTH)� )� inst_memory� (.CLK(CLK),� .ADDR(mem_addr),� .D(mem_d),� .WE(mem_we),� .Q(mem_q)� );� Dec 7, 2013 CoramChannel� #(� .CORAM_THREAD_NAME("thread_name"),� .CORAM_ID(0),� .CORAM_ADDR_LEN(CHANNEL_ADDR_LEN),� .CORAM_DATA_WIDTH(CHANNEL_DATA_WIDTH)� )� inst_channel� (.CLK(CLK),� .RST(RST),� .D(comm_d),� .ENQ(comm_enq),� .FULL(comm_full),� .Q(comm_q),� .DEQ(comm_deq),� .EMPTY(comm_empty)� );� (a) CoRAM Memory (b) CoRAM Channel Shinya T-Y. Tokyo Tech 21
  • 22. AXI4 Master Interface n  DMA controller works as AXI4 master IP-core interface WrData Enque AlmFull WrData Enque AlmFull Empty Deque RdData FSM Addr Size RdEn RdEn Ready WrData Enque AlmFull Control Thread RdEn Busy Ready RdData ・・・ WrEn DMA Controller Deque RdData Empty Deque WrEn WrData BramAddr DramAddr Size Empty CoRAM Channel DMA Cluster WrEn Addr WrData ・・・ RdData CoRAM Memory (BRAM) WrEn WrData RdData Addr CoRAM Memory (BRAM) RdData Addr WrEn RdData WrData Addr HW Kernels (Computing Logic) Write Address Channel Write Data Channel Read Address Channel RDATA RREADY RVALID ARADDR ARLEN ARVALID ARREADY WDATA BVALID WVALID WREADY AWADDR AWLEN AWVALID AWREADY AXI Master Interface (Protocol Conversion) Read Data Channel AXI4 Interconnect Dec 7, 2013 Shinya T-Y. Tokyo Tech 22
  • 23. For Parameterized RTL design support n  Generate-statement support by advanced RTL analyzer l  Not supported by the original CoRAM compiler Dataflow n  Pyverilog: Python-based Tool-chain for Verilog HDL Design l  Parser l  Dataflow Analysis l  Optimization l  RTL Code Generation l  Control flow Analysis l  Graphical Output Dec 7, 2013 State Machine Shinya T-Y. Tokyo Tech 23
  • 24. Evaluation Dec 7, 2013 Shinya T-Y. Tokyo Tech 24
  • 25. Evaluation n  Point: Maximum memory bandwidth utilization l  PyCoRAM is a memory abstraction framework n  Setup l  2 FPGA boards •  Digilent Atlys –  Spartan-6 LX45 –  DDR2-800 DRAM 128MB (1.2GB/s*) *Due to 300MHz operation –  AXI4 128-bit, 100MHz (1.6GB/s) •  Xilinx ML605 Digilent Atlys (Xilinx Spartan-6 LX45) –  Virtex-6 LX240T –  DDR3-800 DRAM 512MB (6.4GB/s) –  AXI4 256-bit, 200MHz (6.4GB/s) l  EDK •  Xilinx Platform Studio (14.6) Dec 7, 2013 Shinya T-Y. Tokyo Tech Xilinx ML605 (Xilinx Virtex-6 LX240T) 25
  • 26. Evaluation: Application n  Array-sum: calculate summation value of an array l  Two CoRAM memories as Double-buffered l  Varying SIMD width (=# simultaneous ops) to check the effect to the memory bandwidth utilization •  4, 8, 16, 32, 64 (bytes) sum Output + MUX s3 s2 s1 s0 + + + + D[3] D[2] D[1] D[0] MUX CoRAM Memory 0 Dec 7, 2013 D[3 ] D[2 ] D[1 ] D[0 ] from DMA Controller 0 D[3 ] D[2 ] D[1 ] D[0 ] from DMA Controller 1 Shinya T-Y. Tokyo Tech CoRAM Memory 1 26
  • 27. Memory Bandwidth Utilization n  Good bandwidth utilization l  Atlys: 85.5% (at 16-byte) l  ML605: 84.9% (at 64-byte) n  Degradation reasons l  Sequential (single) transaction for each DMA controller 1 Atlys (Spartan-6) Bandwidth Utilization Bandwidth Utilization •  Memory latency directly affects the performance adversely 0.8 0.6 0.4 0.2 0 4 Dec 7, 2013 8 16 SIMD size [byte] 1 ML605 (Virtex-6) 0.8 0.6 0.4 0.2 32 Shinya T-Y. Tokyo Tech 0 4 8 16 32 SIMD size [byte] 64 27
  • 28. Conclusion and … Dec 7, 2013 Shinya T-Y. Tokyo Tech 28
  • 29. Conclusion n  PyCoRAM: Python-based implementation of CoRAM memory architecture for modern FPGA EDKs n  Future work l  Further evaluation on more realistic applications l  AXI4 slave feature for control thread l  Tutorial slideJ Portable application design with CoRAM Cooperation with standard IP-cores Accelerator logic Standard IP-core CPU core CoRAM Abstraction Standard On-chip Interconnect Device-dependent Interfaces Dec 7, 2013 Automatically managed by EDK Shinya T-Y. Tokyo Tech 29
  • 30. PyCoRAM and Pyverilog are ready for public! n PyCoRAM (0.7.0-public) l https://github.com/shtaxxx/PyCoRAM n Pyverilog (0.6.0-public) l https://github.com/shtaxxx/Pyverilog Thanks! Dec 7, 2013 Shinya T-Y. Tokyo Tech 30