SlideShare a Scribd company logo
A Random Forest using a Multi-valued
Decision Diagram on an FPGA
1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato,
2Tsutomu Sasao
1Tokyo Institute of Technology, JP, 2Meiji University, JP
May, 22nd, 2017
@ISMVL2017
Outline
• Background
• Random forest (RF)
• Multi-valued decision diagram (MDD)
• RF using MDDs
• Experimental results
• Conclusion
2
Machine Learning
3
Much computation power, and Big data
(Left): “Single-Threaded Integer Performance,” 2016
(Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
Machine Learning Algorithms
M. Warrick, “How to get started with machine learning,” PyCon2014 4
Introduction
• Random Forest (RF)
• Ensemble learning method
• Consists of multiple decision trees (DTs)
• Applications: Segmentation, human pose
detection
• It is based on binary DTs (BDTs)
• A node is evaluated by an if-then-else
statement
• The same variable may appear several times
• Multiple-valued decision diagram (MDD)
• Each variable appears only once on a path
5
Introduction (Contʼd)
• Target platform
• CPU: Too slow
• GPU: Not suitable to the RF → slow, and
consumes much power
• FPGA: Faster, low power, long TAT
• High-level synthesis (HLS) for the RF using
MDDs on an FPGA
• Low power, high performance,
short design time
6
Random Forest
7
Classification by a Binary
Decision Tree (BDT)
• Partition of the feature map
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
8
Training of a BDT
• It is built by randomized samples
• Recursively partition the dataset to maximize its
entropy → The same variables may appear
9
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1 C2 C1
X1
X2
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
Random Forest (RF)
• Ensemble learning
• Classification and regression
• Consists of multiple BDT
10
Tree 1 Tree 2 Tree n
C1
C2
C1
Voter
C1 (Class)
InputX1<0.53?
X3<0.71? X2<0.63?
X2<0.63? X3<0.72?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C3
C1
Tree 1
Binary Decision Tree (BDT) Random Forest
...
Applications
• Key point matching [Lepetit et al., 2006]
• Object detector [Shotton et al., 2008][Gall et al., 2011]
• Hand written character recognition [Amit&Geman, 1997]
• Visual word clustering
[Moosmann et al.,2006]
• Pose recognition
[Yamashita et al., 2010]
• Human detector
[Mitsui et al., 2011]
[Dahang et al., 2012]
• Human pose estimation
[Shotton 2011]
11
Known Problem
• Build BDTs from randomized samples
• The same variable may appear on a path
• Tend to be slow, even if we use the GPUs
12
X2<0.53?
X2<0.29? X2<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
if X2 < 0.09 then
output C1;
else
goto Child_node;
Multi-valued Decision Diagram
13
14
Binary Decision Diagram (BDD)
• Recursively apply Shannon expansion to a
given logic function
• Non-terminal node: If-then-else statement
• Terminal node: Set functional value
0 1
x1
x2
x3
x4
x5
x6
Non‐terminal node
Terminal node
15
Measurement of BDD
Memory size: # of nodes size of a node
Worst case performance: LPL (Longest Path Length)
→Dedicated fully pipeline hardware
0 1
x1
x2
x3
x4
x5
x6

16
Multi-Valued Decision Diagram (MDD)
• MDD(k): 2k outgoing edges
• Evaluates k variables at a time
0 1
x1
x2
x3
x4
x5
x6
BDD
0 1
X3
X2
X1
{x5,x6}
{x3,x4}
{x1,x2}
MDD(2)
Comparison the BDT with the MDD
17
X2<0.53?
X2<0.29? X1<0.09?
X1<0.63? X1<0.71?
Y N
N
NN
NY
Y
Y
Y
C1
C1C2 C1C2
C1
X2
X1 X1
C1 C2
<0.29
<0.53
<1.00
<1.00
<0.71
<0.71
<1.00
<0.63
BDT MDD
# of Nodes
18
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
1.00
0.53
0.29
0.00
0.09
0.63
0.71
1.00
C1
C2 C1
C
1
C2 C1
X2
X1
BDT MDD
Complexities of the BDT
and the MDD
19
# Nodes LPL
BDT O(Σ|Xi|) O(Σ|Xi|)
MDD O(|Xi|k) O(n)
The RF prefers shallow decision trees for avoid 
the overfitting
Random Forest
using MDDs on an FPGA
20
FPGA (Field Programmable
Gate Array)
• Reconfigurable architecture
• Look-up Table (LUT)
• Configurable channel
• Advantages
• Faster than CPU
• Dissipate lower power
than GPU
• Short time design
than ASIC
21
Fully Pipeline Circuit
Tree 1 Tree 2 Tree b
C1 C2
C1
Voter
C1
X (Input)
...
22
MUX-based Realization
23
System Design Tool
24
①
②
④
③
1. Behavior design
+ pragmas
2. Profile analysis
3. IP core generation by HLS
4. Bitstream generation by
FPGA CAD tool
5. Middle ware generation
↓
Automatically done
Proposed Tool Flow
Training
Dataset
scikit‐learn
Hyper
Parameter
(by Grid‐
search)
Random
Forest
Host
Code
Kernel
Code aocx
Binary
Host
PC
FPGA
Board
aoc
gcc
RF2AOC
25
scikit‐learn Intel SDK for OpenCL
Experimental Results
26
Comparison the MDD
based with the BDT based
27
BDT MDD
Name Path len.
(Peform.)
#Nodes
(Mem.)
Max.
Path
Path len.
(Peform.)
#Nodes
(Mem.)
Dermatology 720 676 15 322 118336
Contraceptive 
Method
600 1055 9 198 7360
Glass 
Identification
952 1260 10 268 17204
Hayes‐Roth 480 577 5 73 448
Hepatitis 720 1040 15 357 145664
Ionosphere 1196 1077 20 381 671744
Iris 1056 777 4 199 517
Dataset: UCI Machine Learning Repository
http://archive.ics.uci.edu/ml/datasets.html
Comparison of Platforms
• Implemented RF following devices
• CPU: Intel Core i7 650
• GPU: NVIDIA GeForce GTX Titan
• FPGA: Terasic DE5-NET
• Measure dynamic power including
the host PC
• Test bench: 10,000 random vectors
• Execution time including
communication time between
the host PC and devices
28
GPU
FPGA
Comparison of Platforms
29
GPU@86W
GeForce Titan
CPU@13W
Xeon (R) E5607
FPGA@15W
Stratix V A7
Name LPS LPS/W LPS LPS/W LPS LPS/W
Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7
Contraceptive 
Method
521.9 6.1 286.4 22.0 10924.3 728.3
Glass 
Identification
726.7 8.5 587.5 45.2 6442.3 429.5
Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0
Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3
Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2
Iris 446.6 5.2 436.7 33.6 4831.7 322.1
LPS: #Looks Per Second
Conclusion
• Proposed the RF using MDDs
• Reduced the path length
• Increased the column multiplicity
• # of nodes: O(|X|k)
• The shallow decision diagram is
recommended to avoid the overfitting
• Developed the high-level synthesis design
flow toward the FPGA realization
• 10.7x faster than the GPU
• 14.0x faster than the CPU
30

More Related Content

PDF
FPL15 talk: Deep Convolutional Neural Network on FPGA
PDF
FPT17: An object detector based on multiscale sliding window search using a f...
PDF
Naist2015 dec ver1
PDF
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
PDF
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
PDF
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
PDF
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
PDF
Batch normalization
FPL15 talk: Deep Convolutional Neural Network on FPGA
FPT17: An object detector based on multiscale sliding window search using a f...
Naist2015 dec ver1
FCCM2020: High-Throughput Convolutional Neural Network on an FPGA by Customiz...
FPGA2018: A Lightweight YOLOv2: A binarized CNN with a parallel support vecto...
ISCAS'18: A Deep Neural Network on the Nested RNS (NRNS) on an FPGA: Applied ...
ISMVL2018: A Ternary Weight Binary Input Convolutional Neural Network
Batch normalization

What's hot (20)

PDF
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
PDF
A Platform for Accelerating Machine Learning Applications
PPTX
TensorFlow Study Part I
PDF
Towards Machine Comprehension of Spoken Content
PDF
第11回 配信講義 計算科学技術特論A(2021)
PDF
Deep Learning Initiative @ NECSTLab
PPTX
Lec08 optimizations
PDF
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
PDF
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PPTX
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
PPTX
Lec07 threading hw
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
PDF
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
PPTX
Lec13 multidevice
PDF
Lecture 7: Recurrent Neural Networks
PDF
[PR12] PR-036 Learning to Remember Rare Events
PPTX
Electricity price forecasting with Recurrent Neural Networks
PPTX
Lec09 nbody-optimization
PDF
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
PDF
An FPGA-based acceleration methodology and performance model for iterative st...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
A Platform for Accelerating Machine Learning Applications
TensorFlow Study Part I
Towards Machine Comprehension of Spoken Content
第11回 配信講義 計算科学技術特論A(2021)
Deep Learning Initiative @ NECSTLab
Lec08 optimizations
Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Deep Learning with TensorFlow: Understanding Tensors, Computations Graphs, Im...
Lec07 threading hw
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
【DL輪読会】Incorporating group update for speech enhancement based on convolutio...
Lec13 multidevice
Lecture 7: Recurrent Neural Networks
[PR12] PR-036 Learning to Remember Rare Events
Electricity price forecasting with Recurrent Neural Networks
Lec09 nbody-optimization
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
An FPGA-based acceleration methodology and performance model for iterative st...
Ad

Viewers also liked (20)

PDF
(公開版)FPGAエクストリームコンピューティング2017
PDF
(公開版)Reconf研2017GUINNESS
PDF
Tensor flow usergroup 2016 (公開版)
PDF
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
PDF
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
PDF
Verilog-HDL Tutorial (15) hardware
PDF
Nested RNSを用いたディープニューラルネットワークのFPGA実装
PDF
Verilog-HDL Tutorial (12)
PDF
Verilog-HDL Tutorial (14)
PDF
Verilog-HDL Tutorial (13)
PDF
FPGAX2016 ドキュンなFPGA
PDF
Verilog-HDL Tutorial (11)
PDF
Verilog-HDL Tutorial (9)
PDF
Verilog-HDL Tutorial (15) software
PDF
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
PDF
Altera sdk for open cl アンケート集計結果(公開版)
PDF
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
PPTX
電子回路の民主化とその実践
PDF
ゆるふわコンピュータ (IPSJ-ONE2017)
PDF
Beatroboでのハードウェアプロトタイピング
(公開版)FPGAエクストリームコンピューティング2017
(公開版)Reconf研2017GUINNESS
Tensor flow usergroup 2016 (公開版)
2値ディープニューラルネットワークと組込み機器への応用: 開発中のツール紹介
2値化CNN on FPGAでGPUとガチンコバトル(公開版)
Verilog-HDL Tutorial (15) hardware
Nested RNSを用いたディープニューラルネットワークのFPGA実装
Verilog-HDL Tutorial (12)
Verilog-HDL Tutorial (14)
Verilog-HDL Tutorial (13)
FPGAX2016 ドキュンなFPGA
Verilog-HDL Tutorial (11)
Verilog-HDL Tutorial (9)
Verilog-HDL Tutorial (15) software
私のファミコンのfpsは530000です。もちろんフルパワーで(以下略
Altera sdk for open cl アンケート集計結果(公開版)
電波望遠鏡用の分光器をAltera SDK for OpenCL使ってサクッと作ってみた
電子回路の民主化とその実践
ゆるふわコンピュータ (IPSJ-ONE2017)
Beatroboでのハードウェアプロトタイピング
Ad

Similar to A Random Forest using a Multi-valued Decision Diagram on an FPGa (20)

DOCX
A new architecture of internet of things and big data ecosystem for
PPTX
team12.project_ver_1_(1).pptx
PDF
Implementation of first order statistical processor on FPGA for feature extra...
PDF
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
PDF
DEEP LEARNING-BASED ECG CLASSIFICATION ON RASPBERRY PI USING A TENSORFLOW LIT...
PDF
Disease Prediction Using Machine Learning
PPTX
Dr.s.shiyamala fpga ppt
PDF
Machine Learning Project - Neural Network
PPTX
Machine Learning and Apache Edgent with STM32F401 to Firebase
PDF
Introduction to Machine Learning @ Mooncascade ML Camp
PDF
Implementation of BDDs by Various Techniques in Low Power VLSI Design
PPTX
ARRHYTHMIA.pptx
PDF
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
PPTX
Introduction to FPGA acceleration
PPTX
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
PDF
High performance modified bit-vector based packet classification module on lo...
PDF
FPGA-enhanced Bioinformatics @ NECST
PPTX
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
PDF
Fpga based artificial neural network
PDF
IRJET - Automated Fraud Detection Framework in Examination Halls
A new architecture of internet of things and big data ecosystem for
team12.project_ver_1_(1).pptx
Implementation of first order statistical processor on FPGA for feature extra...
⭐⭐⭐⭐⭐ CHARLA FIEC: Monitoring of system memory usage embedded in #FPGA
DEEP LEARNING-BASED ECG CLASSIFICATION ON RASPBERRY PI USING A TENSORFLOW LIT...
Disease Prediction Using Machine Learning
Dr.s.shiyamala fpga ppt
Machine Learning Project - Neural Network
Machine Learning and Apache Edgent with STM32F401 to Firebase
Introduction to Machine Learning @ Mooncascade ML Camp
Implementation of BDDs by Various Techniques in Low Power VLSI Design
ARRHYTHMIA.pptx
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Introduction to FPGA acceleration
DeepLearningAlgorithmAccelerationOnHardwarePlatforms_V2.0
High performance modified bit-vector based packet classification module on lo...
FPGA-enhanced Bioinformatics @ NECST
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Fpga based artificial neural network
IRJET - Automated Fraud Detection Framework in Examination Halls

More from Hiroki Nakahara (6)

PDF
ROS User Group Meeting #28 マルチ深層学習とROS
PDF
FPGAX2019
PDF
SBRA2018講演資料
PDF
DSF2018講演スライド
PDF
Verilog-HDL Tutorial (8)
PDF
Verilog-HDL Tutorial (7)
ROS User Group Meeting #28 マルチ深層学習とROS
FPGAX2019
SBRA2018講演資料
DSF2018講演スライド
Verilog-HDL Tutorial (8)
Verilog-HDL Tutorial (7)

Recently uploaded (20)

DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
composite construction of structures.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Welding lecture in detail for understanding
PPTX
web development for engineering and engineering
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PPTX
Construction Project Organization Group 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PPT on Performance Review to get promotions
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPT
Project quality management in manufacturing
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
composite construction of structures.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
bas. eng. economics group 4 presentation 1.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Welding lecture in detail for understanding
web development for engineering and engineering
Strings in CPP - Strings in C++ are sequences of characters used to store and...
CYBER-CRIMES AND SECURITY A guide to understanding
OOP with Java - Java Introduction (Basics)
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Lesson 3_Tessellation.pptx finite Mathematics
Construction Project Organization Group 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT on Performance Review to get promotions
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Digital Logic Computer Design lecture notes
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Project quality management in manufacturing
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx

A Random Forest using a Multi-valued Decision Diagram on an FPGa

  • 1. A Random Forest using a Multi-valued Decision Diagram on an FPGA 1Hiroki Nakahara, 1Akira Jinguji, 1Shimpei Sato, 2Tsutomu Sasao 1Tokyo Institute of Technology, JP, 2Meiji University, JP May, 22nd, 2017 @ISMVL2017
  • 2. Outline • Background • Random forest (RF) • Multi-valued decision diagram (MDD) • RF using MDDs • Experimental results • Conclusion 2
  • 3. Machine Learning 3 Much computation power, and Big data (Left): “Single-Threaded Integer Performance,” 2016 (Right): Nakahara, “Trend of Search Engine on modern Internet,” 2014
  • 5. Introduction • Random Forest (RF) • Ensemble learning method • Consists of multiple decision trees (DTs) • Applications: Segmentation, human pose detection • It is based on binary DTs (BDTs) • A node is evaluated by an if-then-else statement • The same variable may appear several times • Multiple-valued decision diagram (MDD) • Each variable appears only once on a path 5
  • 6. Introduction (Contʼd) • Target platform • CPU: Too slow • GPU: Not suitable to the RF → slow, and consumes much power • FPGA: Faster, low power, long TAT • High-level synthesis (HLS) for the RF using MDDs on an FPGA • Low power, high performance, short design time 6
  • 8. Classification by a Binary Decision Tree (BDT) • Partition of the feature map 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 8
  • 9. Training of a BDT • It is built by randomized samples • Recursively partition the dataset to maximize its entropy → The same variables may appear 9 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X1 X2 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1
  • 10. Random Forest (RF) • Ensemble learning • Classification and regression • Consists of multiple BDT 10 Tree 1 Tree 2 Tree n C1 C2 C1 Voter C1 (Class) InputX1<0.53? X3<0.71? X2<0.63? X2<0.63? X3<0.72? Y N N NN NY Y Y Y C1 C1C2 C1C3 C1 Tree 1 Binary Decision Tree (BDT) Random Forest ...
  • 11. Applications • Key point matching [Lepetit et al., 2006] • Object detector [Shotton et al., 2008][Gall et al., 2011] • Hand written character recognition [Amit&Geman, 1997] • Visual word clustering [Moosmann et al.,2006] • Pose recognition [Yamashita et al., 2010] • Human detector [Mitsui et al., 2011] [Dahang et al., 2012] • Human pose estimation [Shotton 2011] 11
  • 12. Known Problem • Build BDTs from randomized samples • The same variable may appear on a path • Tend to be slow, even if we use the GPUs 12 X2<0.53? X2<0.29? X2<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 if X2 < 0.09 then output C1; else goto Child_node;
  • 14. 14 Binary Decision Diagram (BDD) • Recursively apply Shannon expansion to a given logic function • Non-terminal node: If-then-else statement • Terminal node: Set functional value 0 1 x1 x2 x3 x4 x5 x6 Non‐terminal node Terminal node
  • 15. 15 Measurement of BDD Memory size: # of nodes size of a node Worst case performance: LPL (Longest Path Length) →Dedicated fully pipeline hardware 0 1 x1 x2 x3 x4 x5 x6 
  • 16. 16 Multi-Valued Decision Diagram (MDD) • MDD(k): 2k outgoing edges • Evaluates k variables at a time 0 1 x1 x2 x3 x4 x5 x6 BDD 0 1 X3 X2 X1 {x5,x6} {x3,x4} {x1,x2} MDD(2)
  • 17. Comparison the BDT with the MDD 17 X2<0.53? X2<0.29? X1<0.09? X1<0.63? X1<0.71? Y N N NN NY Y Y Y C1 C1C2 C1C2 C1 X2 X1 X1 C1 C2 <0.29 <0.53 <1.00 <1.00 <0.71 <0.71 <1.00 <0.63 BDT MDD
  • 18. # of Nodes 18 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 1.00 0.53 0.29 0.00 0.09 0.63 0.71 1.00 C1 C2 C1 C 1 C2 C1 X2 X1 BDT MDD
  • 19. Complexities of the BDT and the MDD 19 # Nodes LPL BDT O(Σ|Xi|) O(Σ|Xi|) MDD O(|Xi|k) O(n) The RF prefers shallow decision trees for avoid  the overfitting
  • 20. Random Forest using MDDs on an FPGA 20
  • 21. FPGA (Field Programmable Gate Array) • Reconfigurable architecture • Look-up Table (LUT) • Configurable channel • Advantages • Faster than CPU • Dissipate lower power than GPU • Short time design than ASIC 21
  • 22. Fully Pipeline Circuit Tree 1 Tree 2 Tree b C1 C2 C1 Voter C1 X (Input) ... 22
  • 24. System Design Tool 24 ① ② ④ ③ 1. Behavior design + pragmas 2. Profile analysis 3. IP core generation by HLS 4. Bitstream generation by FPGA CAD tool 5. Middle ware generation ↓ Automatically done
  • 25. Proposed Tool Flow Training Dataset scikit‐learn Hyper Parameter (by Grid‐ search) Random Forest Host Code Kernel Code aocx Binary Host PC FPGA Board aoc gcc RF2AOC 25 scikit‐learn Intel SDK for OpenCL
  • 27. Comparison the MDD based with the BDT based 27 BDT MDD Name Path len. (Peform.) #Nodes (Mem.) Max. Path Path len. (Peform.) #Nodes (Mem.) Dermatology 720 676 15 322 118336 Contraceptive  Method 600 1055 9 198 7360 Glass  Identification 952 1260 10 268 17204 Hayes‐Roth 480 577 5 73 448 Hepatitis 720 1040 15 357 145664 Ionosphere 1196 1077 20 381 671744 Iris 1056 777 4 199 517 Dataset: UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets.html
  • 28. Comparison of Platforms • Implemented RF following devices • CPU: Intel Core i7 650 • GPU: NVIDIA GeForce GTX Titan • FPGA: Terasic DE5-NET • Measure dynamic power including the host PC • Test bench: 10,000 random vectors • Execution time including communication time between the host PC and devices 28 GPU FPGA
  • 29. Comparison of Platforms 29 GPU@86W GeForce Titan CPU@13W Xeon (R) E5607 FPGA@15W Stratix V A7 Name LPS LPS/W LPS LPS/W LPS LPS/W Dermatology 336.2 3.9 211.6 16.3 3221.2 214.7 Contraceptive  Method 521.9 6.1 286.4 22.0 10924.3 728.3 Glass  Identification 726.7 8.5 587.5 45.2 6442.3 429.5 Hayes‐Roth 1512.9 17.6 1165.5 89.7 12884.6 859.0 Hepatitis 739.1 8.6 662.7 51.0 8209.9 547.3 Ionosphere 821.0 9.5 595.9 45.8 9663.5 644.2 Iris 446.6 5.2 436.7 33.6 4831.7 322.1 LPS: #Looks Per Second
  • 30. Conclusion • Proposed the RF using MDDs • Reduced the path length • Increased the column multiplicity • # of nodes: O(|X|k) • The shallow decision diagram is recommended to avoid the overfitting • Developed the high-level synthesis design flow toward the FPGA realization • 10.7x faster than the GPU • 14.0x faster than the CPU 30