SlideShare a Scribd company logo
LeFlow
Enabling Flexible FPGA High-Level Synthesis
of Tensorflow Deep Neural Networks
@Vengineer
2018/07/22 (LeFlow追記)
TensorFlow XLAのロゴ
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/xla
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural
Networks
論文 : https://arxiv.org/abs/1807.05317 
Submitted on 14 Jul 2018
github : https://github.com/danielholanda/LeFlow
最適化
 XLAの最適化無しのLLVM IRを使って、LegUpで最適化を行う。
LeFlow
LeFlowのフォロー
・TensorFlowのコード が Verilog HDL に!
・TensorFlow XLA側の修正無し
 (ただし、ソースコードへのパッチは必要)
 (2ファイル)
・LegUpのそのまま使える
・Intel (Altera)へのサンプルコード有
(Xilinxでもできる模様)
生成されるHDLの構成
最適化
Algorithm 1
 %0 = bitcast i8** %params to [2 x float]**
 %arg0 = load [2 x float] %0, align 8
=> @arg0 を global に変更し、ゼロで初期化
Algorithm 2
 %arg0 = global [2 x float] zeroinitializer, align 8
MNIST
with tf.devide(“device:XLA_CPU:0”):
y = tf.nn.softmax(tf.add(tf.matmul(input, weights)[0], bias))
LeFlowの処理内容
src/LeFlow
# Create folder and generate Makefile
# Clean folder to erase previously generated files
# Generate IR from tensorflow
# Remove unused files and name things properly
# Convert to old LLVM syntax (Tensorflow and LegUp use different versions
of LLVM)
# Unrolling loops
# Restructure function signature, change variables scope and reorganizes code
# Reqrites unsupported operations
# Partitioning the memories
# Convert human-readable .ll file to bitcode (.bc file)
# Start LegUp compilation
# Instrumenting verilog testbench
XLA関連のオプション
src/LeFlow
--xla_dump_ir_to="+project_folder+"ir "
ダンプファイルのパス名を指定
--xla_llvm_enable_invariant_load_metadata=false
--xla_llvm_enable_noalias_metadata=false
--xla_llvm_enable_alias_scope_metadata=false --xla_enable_fast_math=false
--xla_backend_optimization_level=0
LLVMのオプションで最適化をしないようにしている
LeFlowの処理内容
src/LeFlow
# Create folder and generate Makefile
# Clean folder to erase previously generated files
# Generate IR from tensorflow
# Remove unused files and name things properly
# Convert to old LLVM syntax (Tensorflow and LegUp use different versions
of LLVM)
# Unrolling loops
# Restructure function signature, change variables scope and reorganizes code
# Reqrites unsupported operations
# Partitioning the memories <= ここがポイント(メモリを分割)
# Convert human-readable .ll file to bitcode (.bc file)
# Start LegUp compilation
# Instrumenting verilog testbench
生成されるHDLの構成
← メモリの分割
Current Limitations and Opportunities
1). LeFlow currently uses kernels that were implemented in XLA and were originally meant to be used
by CPUs. Although compiler optimizations and scheduling are able to retrieve a substantial amount of
parallelism from those implementations, LeFlow would heavily benefit from an XLA back-end with
kernels targeting FPGAs ;
2). The high dimensionality of inputs/weights and the amount of parallel accesses that are typical in
machine learning applications is a challenge for modern automatic memory partitioning algorithms .
LeFlow would specially benefit from a machine learning specific automatic memory partitioning
algorithm.
3). One of the key possibilities that make deep learning networks efficient in FPGAs is t he opportunity
to use a customizable fixed-point bit width . Adding fixed-point support to LeFlow will be an important
step in the development of this toolkit. Additionally, techniques to automatically profile the application
and choose the appropriate representation could be easily explored in software with Tensorflow and
deployed in hardware.
4). Although it is straightforward to use Tensorflow to debug the functionality of an implementation, it
is currently difficult for software developers to debug the generated hardware in terms of the original
Python code. A performance debugging infrastructure suitable for software developers is another
interesting venue for research .
ブログ (2007年~) : Vengineerの戯言
 http://blogs.yahoo.co.jp/verification_engineer
SlideShare :
 https://www.slideshare.net/ssuser479fa3
Twitter (2009年~) :
@Vengineer
ありがとうございました

More Related Content

PDF
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
PDF
Tensor comprehensions
PDF
Tiramisu概要
PDF
Scientific Python
PDF
A peek on numerical programming in perl and python e christopher dyken 2005
PDF
Hey! There's OCaml in my Rust!
PDF
PyPy's approach to construct domain-specific language runtime
PDF
Machine learning with py torch
TensorFlow Lite (r1.5) & Android 8.1 Neural Network API
Tensor comprehensions
Tiramisu概要
Scientific Python
A peek on numerical programming in perl and python e christopher dyken 2005
Hey! There's OCaml in my Rust!
PyPy's approach to construct domain-specific language runtime
Machine learning with py torch

What's hot (20)

PDF
O caml2014 leroy-slides
PDF
C++20 the small things - Timur Doumler
PPT
Os Reindersfinal
PDF
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
PDF
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...
PDF
Functional Programming Patterns for the Pragmatic Programmer
PPTX
Programming Assignment Help
PDF
The Benefits of Type Hints
PPTX
Dynamic memory allocation in c++
PDF
EKON 25 Python4Delphi_mX4
PPT
STL ALGORITHMS
PDF
Exploitation Crash Course
PPTX
Dynamic memory allocation
PDF
Pascal script maxbox_ekon_14_2
PDF
Introduction to functional programming using Ocaml
PPTX
Summary of C++17 features
PDF
Glow user review
PPT
MTaulty_DevWeek_Parallel
PDF
Numba: Array-oriented Python Compiler for NumPy
PDF
Python NumPy Tutorial | NumPy Array | Edureka
O caml2014 leroy-slides
C++20 the small things - Timur Doumler
Os Reindersfinal
An Embedded Error Recovery and Debugging Mechanism for Scripting Language Ext...
PyCon TW 2017 - PyPy's approach to construct domain-specific language runtime...
Functional Programming Patterns for the Pragmatic Programmer
Programming Assignment Help
The Benefits of Type Hints
Dynamic memory allocation in c++
EKON 25 Python4Delphi_mX4
STL ALGORITHMS
Exploitation Crash Course
Dynamic memory allocation
Pascal script maxbox_ekon_14_2
Introduction to functional programming using Ocaml
Summary of C++17 features
Glow user review
MTaulty_DevWeek_Parallel
Numba: Array-oriented Python Compiler for NumPy
Python NumPy Tutorial | NumPy Array | Edureka
Ad

Similar to LeFlowを調べてみました (20)

PDF
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
PPTX
OpenCL Overview Japan Virtual Open House Feb 2021
DOCX
Best points for fabric
PDF
eBPF — Divulging The Hidden Super Power.pdf
PDF
Tensorflow Lite and ARM Compute Library
PDF
Distributed Deep Learning with Keras and TensorFlow on Apache Spark
PDF
Benchmarking open source deep learning frameworks
PDF
[NetApp Managing Big Workspaces with Storage Magic
PDF
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
PDF
Tensorflow - Overview, Features And Advantages.pdf
PDF
"TensorFlow Basics: A GDSC VITB Studty jams"
PDF
TFLite NNAPI and GPU Delegates
PDF
Introduction of eBPF - 時下最夯的Linux Technology
PDF
eBPF — Divulging The Hidden Super Power.pdf
PPTX
Python libraries
PPTX
Hadoop training in mumbai
PDF
Deep learning on HDP 2018 Prague
PDF
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
PDF
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
PyTorch vs TensorFlow: The Force Is Strong With Which One? | Which One You Sh...
OpenCL Overview Japan Virtual Open House Feb 2021
Best points for fabric
eBPF — Divulging The Hidden Super Power.pdf
Tensorflow Lite and ARM Compute Library
Distributed Deep Learning with Keras and TensorFlow on Apache Spark
Benchmarking open source deep learning frameworks
[NetApp Managing Big Workspaces with Storage Magic
"APIs for Accelerating Vision and Inferencing: Options and Trade-offs," a Pre...
Tensorflow - Overview, Features And Advantages.pdf
"TensorFlow Basics: A GDSC VITB Studty jams"
TFLite NNAPI and GPU Delegates
Introduction of eBPF - 時下最夯的Linux Technology
eBPF — Divulging The Hidden Super Power.pdf
Python libraries
Hadoop training in mumbai
Deep learning on HDP 2018 Prague
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
Ad

More from Mr. Vengineer (20)

PDF
XilinxのxsimでSoftware Driven Verification.pdf
PDF
VerilatorとSystemCでSoftware Driven Verification
PDF
VerilatorとSystemC
PDF
TVM VTA (TSIM)
PDF
Cloud TPU Driver API ソースコード解析
PDF
Cloud Deep Learning Chips Training & Inference
PDF
TensorFlow Lite Delegateとは?
PDF
Pixel Visual Core device driver source code analysis
PDF
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
PDF
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
PDF
Facebook Glow Compiler のソースコードをグダグダ語る会
PDF
Ultra96(UltraZed)実践勉強会
PDF
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
PDF
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
PDF
TensorFlow XLA RPC
PDF
TensorFlow local Python XLA client
PDF
Tiramisu をちょっと、味見してみました。
PDF
Tensorflow dynamically loadable XLA plugin ソースコード解析
PDF
「ディープラーニングでは、エコシステムが大切よ!」
PDF
TensorFlow XLA とハードウェア
XilinxのxsimでSoftware Driven Verification.pdf
VerilatorとSystemCでSoftware Driven Verification
VerilatorとSystemC
TVM VTA (TSIM)
Cloud TPU Driver API ソースコード解析
Cloud Deep Learning Chips Training & Inference
TensorFlow Lite Delegateとは?
Pixel Visual Core device driver source code analysis
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
TensorFlow XLA 「XLAとは、から、最近の利用事例について」
Facebook Glow Compiler のソースコードをグダグダ語る会
Ultra96(UltraZed)実践勉強会
Bridge TensorFlow to run on Intel nGraph backends (v0.4)
Bridge TensorFlow to run on Intel nGraph backends (v0.5)
TensorFlow XLA RPC
TensorFlow local Python XLA client
Tiramisu をちょっと、味見してみました。
Tensorflow dynamically loadable XLA plugin ソースコード解析
「ディープラーニングでは、エコシステムが大切よ!」
TensorFlow XLA とハードウェア

Recently uploaded (20)

PDF
SAHIL PROdhdjejss yo yo pdf TOCOL PPT.pdf
PPTX
rorakshsjppaksvsjsndjdkndjdbdidndjdbdjom.pptx
PPTX
Pin configuration and project related to
PDF
Dozuki_Solution-hardware minimalization.
PDF
GENERATOR AND IMPROVED COIL THEREFOR HAVINGELECTRODYNAMIC PROPERTIES
PPTX
AIR BAG SYStYEM mechanical enginweering.pptx
PDF
ICT grade for 8. MATATAG curriculum .P2.pdf
PPTX
RTS MASTER DECK_Household Convergence Scorecards. Use this file copy.pptx
PPTX
Unit-1.pptxgeyeuueueu7r7r7r77r7r7r7uriruru
PPTX
Presentation 1.pptxnshshdhhdhdhdhdhhdhdhdhd
PPTX
vortex flow measurement in instrumentation
DOCX
Copy-OT LIST 12.8.25.docxjdjfufufufufuuffuf
PDF
PakistanCoinageAct-906.pdfdbnsshsjjsbsbb
PPTX
Growth Capital Investment - Espresso Capital.pptx
PPTX
Prograce_Present.....ggation_Simple.pptx
PPTX
Clauses_Part1.hshshpjzjxnznxnxnndndndndndndndnndptx
PPTX
Group 4 [BSIT-1C] Computer Network (1).pptx
DOCX
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
PDF
2_STM32&SecureElements2_STM32&SecureElements
PDF
20A LG INR18650HJ2 3.6V 2900mAh Battery cells for Power Tools Vacuum Cleaner
SAHIL PROdhdjejss yo yo pdf TOCOL PPT.pdf
rorakshsjppaksvsjsndjdkndjdbdidndjdbdjom.pptx
Pin configuration and project related to
Dozuki_Solution-hardware minimalization.
GENERATOR AND IMPROVED COIL THEREFOR HAVINGELECTRODYNAMIC PROPERTIES
AIR BAG SYStYEM mechanical enginweering.pptx
ICT grade for 8. MATATAG curriculum .P2.pdf
RTS MASTER DECK_Household Convergence Scorecards. Use this file copy.pptx
Unit-1.pptxgeyeuueueu7r7r7r77r7r7r7uriruru
Presentation 1.pptxnshshdhhdhdhdhdhhdhdhdhd
vortex flow measurement in instrumentation
Copy-OT LIST 12.8.25.docxjdjfufufufufuuffuf
PakistanCoinageAct-906.pdfdbnsshsjjsbsbb
Growth Capital Investment - Espresso Capital.pptx
Prograce_Present.....ggation_Simple.pptx
Clauses_Part1.hshshpjzjxnznxnxnndndndndndndndnndptx
Group 4 [BSIT-1C] Computer Network (1).pptx
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
2_STM32&SecureElements2_STM32&SecureElements
20A LG INR18650HJ2 3.6V 2900mAh Battery cells for Power Tools Vacuum Cleaner

LeFlowを調べてみました

  • 1. LeFlow Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks @Vengineer 2018/07/22 (LeFlow追記) TensorFlow XLAのロゴ https://github.com/tensorflow/tensorflow/tree/master/tensorflow/compiler/xla
  • 2. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer
  • 3. Enabling Flexible FPGA High-Level Synthesis of Tensorflow Deep Neural Networks 論文 : https://arxiv.org/abs/1807.05317  Submitted on 14 Jul 2018 github : https://github.com/danielholanda/LeFlow 最適化  XLAの最適化無しのLLVM IRを使って、LegUpで最適化を行う。 LeFlow
  • 4. LeFlowのフォロー ・TensorFlowのコード が Verilog HDL に! ・TensorFlow XLA側の修正無し  (ただし、ソースコードへのパッチは必要)  (2ファイル) ・LegUpのそのまま使える ・Intel (Altera)へのサンプルコード有 (Xilinxでもできる模様)
  • 6. 最適化 Algorithm 1  %0 = bitcast i8** %params to [2 x float]**  %arg0 = load [2 x float] %0, align 8 => @arg0 を global に変更し、ゼロで初期化 Algorithm 2  %arg0 = global [2 x float] zeroinitializer, align 8
  • 7. MNIST with tf.devide(“device:XLA_CPU:0”): y = tf.nn.softmax(tf.add(tf.matmul(input, weights)[0], bias))
  • 8. LeFlowの処理内容 src/LeFlow # Create folder and generate Makefile # Clean folder to erase previously generated files # Generate IR from tensorflow # Remove unused files and name things properly # Convert to old LLVM syntax (Tensorflow and LegUp use different versions of LLVM) # Unrolling loops # Restructure function signature, change variables scope and reorganizes code # Reqrites unsupported operations # Partitioning the memories # Convert human-readable .ll file to bitcode (.bc file) # Start LegUp compilation # Instrumenting verilog testbench
  • 10. LeFlowの処理内容 src/LeFlow # Create folder and generate Makefile # Clean folder to erase previously generated files # Generate IR from tensorflow # Remove unused files and name things properly # Convert to old LLVM syntax (Tensorflow and LegUp use different versions of LLVM) # Unrolling loops # Restructure function signature, change variables scope and reorganizes code # Reqrites unsupported operations # Partitioning the memories <= ここがポイント(メモリを分割) # Convert human-readable .ll file to bitcode (.bc file) # Start LegUp compilation # Instrumenting verilog testbench
  • 12. Current Limitations and Opportunities 1). LeFlow currently uses kernels that were implemented in XLA and were originally meant to be used by CPUs. Although compiler optimizations and scheduling are able to retrieve a substantial amount of parallelism from those implementations, LeFlow would heavily benefit from an XLA back-end with kernels targeting FPGAs ; 2). The high dimensionality of inputs/weights and the amount of parallel accesses that are typical in machine learning applications is a challenge for modern automatic memory partitioning algorithms . LeFlow would specially benefit from a machine learning specific automatic memory partitioning algorithm. 3). One of the key possibilities that make deep learning networks efficient in FPGAs is t he opportunity to use a customizable fixed-point bit width . Adding fixed-point support to LeFlow will be an important step in the development of this toolkit. Additionally, techniques to automatically profile the application and choose the appropriate representation could be easily explored in software with Tensorflow and deployed in hardware. 4). Although it is straightforward to use Tensorflow to debug the functionality of an implementation, it is currently difficult for software developers to debug the generated hardware in terms of the original Python code. A performance debugging infrastructure suitable for software developers is another interesting venue for research .
  • 13. ブログ (2007年~) : Vengineerの戯言  http://blogs.yahoo.co.jp/verification_engineer SlideShare :  https://www.slideshare.net/ssuser479fa3 Twitter (2009年~) : @Vengineer ありがとうございました