SlideShare a Scribd company logo
Glow user review @96cfc41
Kuan Hsu Chen (Zakk)
Goal
1. Introduce Compilation Flow
2. Introduce IR Design
3. Pros and Cons
4. TODO Features
5. Make you want to do something on
Glow!?
https://www.books.com.tw/products/0010750924
Compilation Flow
Importer
HIR
optimizer
GenLIR
Scheduler
Optimize
Caffe2/ONNX/
ONNXIFI
builtin
HIR
builtin
HIR Backend
Pre-
Lowering
Node
Lowering
Backend
Post-
Lowering
builtin or
custom
HIR
builtin or
custom
HIR
builtin or
custom
HIR
builtin or
custom
LIR
glow::generateAndOptimizeIR(Function *F, bool shouldShareBuffers)
Backend
save
Backend
compile
glow::CompiledFunction
Standalone Executable
Bundle
backend code
3 Level IR
High Level IR (HIR)
1. dataflow node-based graph representation
2. There are two types (placeholder and constant) of variable.
All nodes can access all variables in the Module.
Currently glow can not distinguish placeholder is input or
output, but the importer has naming rule convention when
creating output placeholder, so we use the name with
“save_” prefix to workaround this issue.
3. Glow is NHWC order. (Caffe2/ONNX are NCHW).
If the backend only supports NCHW, remember to insert
transpose node in pre-lowering.
4. If some model format only support float type but user want to
quantize model. After optimization, the graph inserted
quantize/dequantize before/after input/output placeholder.
But in some case, you would like to perform
quantize/dequantize in user program side.
Low Level IR (LIR)
1. Instruction-based representation, LIR allows multiple
output (ex. loss function)
2. The operand has qualifiers with @in, @out,
@inout. So the user instruction in the users list
maybe is output value.
3. All LIR can be built-in Op.
4. LIR designs the allocactivation/dealloc instruction to
measure live range of activation. LIR optimizer will
sink/hoist the alloc/dealloc place to reduce memory
pressure. But in some backend, we will insert
allocactivation because the weights will occupy
memory
5. Glow provides simple static time memory allocator
(first-fit) for backend usage.
Backend in Glow
1. tools/ClassGen/Backends/: Define backend-specific Nodes and Instructions for the Backend. (like
LLVM’s tablegen)
2. Node attribute:
● .addOverwrittenInput(“Output”)
● .setHasSideEffects(true)
3. Instr attribute:
● .autoIRGen() : Framework help backend to generates translation code (HIR->LIR) .
● .inplaceOperand({"Dest", "Batch"})
● .dataParallel()
Backend in Glow
2. lib/Backends/: implement derivied classes for Backend and CompiledFunction.
a. Backend abstract class
i. bool transformPreLowering/transforPostLowering
ii. bool shouldLower(const Node *N) const;
iii. bool shouldShareBuffers() const;
iv. compile/save
v. isOpSupported(Kinded::Kind opKind, ElemKind elementTy) const;
b. CompiledFunction
i. execute() = 0;
ii. setupRuns(), beforeRun(), afterRun(), tearDownRuns();
Pros and Cons
Pros:
1. Supprot training and inference compilation
2. Support quantization feature
3. Support many HIR and LIR optimziation and it also can work on custom nodes/instructions.
4. Support “dump DAG”
5. Support ASIC-friendly IR and helper function
6. more...
Cons:
1. Does not support python interface. But user can use ONNXIFI to achieve it.
2. Not-exist any ASIC backend for reference.
3. missed some builtin operator
4. more ...
Quantization feature
1. Quantization nodes in HIR.
a. QuantizationProfile
b. Quantize/Dequantize /RescaleQuantized
c. IntLookupTable
d. RowwiseQuantizedFullyConnected
2. Support related optimizations.
a. Quantize(Dequantize(X)) -> RescaleQuantized(X)
b. Dequantize(Quantize(X)) -> X
c. Quantize(Constant) -> Constant
d. PoolingNode(Rescale(X)) -> Rescale(PoolingNode(X)).
e. more...
Optimizations
1. Graph optimizer (HIR)
a. DCE, CSE.
b. Optimize specific node.
i. Concat(Slice(X, 0..10), Slice(X, 10..20)) -> X
ii. merge Transpose into MatMul
iii. Relu (MaxPool(X)) -> MaxPool(Relu(X))
iv. merge batch normalization operations. (Inference)
v. more …
2. IR optimizer (LIR)
a. Reduce memory usage
i. sinkAllocas/hoistDealloc/sinkTensorViews
ii. eliminate copy instruction
b. Eliminate redundant instructions
c. Peephole optimizations
d. more...
Support ASIC-friendly IR and helper function
1. Slice/InsertTensor/Tile/Gather/Scatter (HIR)
2. TensorView (LIR): a view of an existing tensor and does not allocate any new memory
3. Tensor class: represent a contiguous n-dim array. (copyRawFrom/copySlice/Transpose)
4. Handles: easy to access/operation on a Tensor
/// Create a tensor of type Float, of the shape {4 x 2}.
Tensor inputs(ElemKind::FloatTy, {4, 2});
/// Create a handle to the tensor.
auto I = inputs.getHandle<float>();
/// Store an element to the tensor at index {0, 0}.
I.at({0, 0}) = 13.1;
Cons (?)
1. There is only ShareBuffers flag to enable/disable optimization.
2. There is only one memory space in the one LIR function. If you backend has two memory spaces in
the one LIR function, some ShareBuffer optimization will generate unwanted result.
IRFunctionplaceholder
weight
Cons (?)
3. We does not see any advanced optimization comparing with TVM or in-house compiler.
ex. activation/weight partition when memory insufficient, reuse activation to avoid memory
movement, computation and data movement parallelism, more..
Make you want to do something on Glow!?
You can try to
1. Add a real ASIC backend
2. Add more advanced optimizations
3. Offloading subgraph to different backend
a. how to cowork with cpu
4. Improve JIT performance
a. How to support dynamic input shape?
b. How to support ROI pooling layer? (becuase the layer parameter is runtime information)
5. How to debug optimized model
6. Advanced scheduler
7. Advanced memory allocator
8. more..
ex. help function for advanced optimization
Glow user review
https://sampl.cs.washington.edu/tvmconf/slides/Thierry-Moreau-VTA.pdf
Reference.
1. https://github.com/pytorch/glow
2. https://sophon-edge.gitbook.io/project/getting-started/bmnnsdk-framework
3. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-en
gine/
4. Sophon backend
5. DAG graph sponsor
Q&A

More Related Content

PPTX
Dive into ROP - a quick introduction to Return Oriented Programming
PPT
PPTX
Onnc intro
PDF
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
PDF
Return-Oriented Programming: Exploits Without Code Injection
PPT
Virtual platform
PPTX
Operating Systems - A Primer
PDF
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例
Dive into ROP - a quick introduction to Return Oriented Programming
Onnc intro
Bolt C++ Standard Template Libary for HSA by Ben Sanders, AMD
Return-Oriented Programming: Exploits Without Code Injection
Virtual platform
Operating Systems - A Primer
LLVM 總是打開你的心:從電玩模擬器看編譯器應用實例

What's hot (20)

PDF
Low Level Exploits
PDF
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
PDF
EKON 25 Python4Delphi_mX4
PPTX
06 - ELF format, knowing your friend
PPT
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
PDF
Course lecture - An introduction to the Return Oriented Programming
PDF
OpenCL - The Open Standard for Heterogeneous Parallel Programming
PDF
Pascal script maxbox_ekon_14_2
PPTX
GCC RTL and Machine Description
PDF
Return oriented programming
PDF
Ekon 25 Python4Delphi_MX475
PPTX
Return oriented programming (ROP)
PPTX
Return Oriented Programming (ROP) Based Exploits - Part I
PDF
不深不淺,帶你認識 LLVM (Found LLVM in your life)
PPTX
Fortran & Link with Library & Brief Explanation of MKL BLAS
ODP
Runtime Symbol Resolution
PDF
Ehsan parallel accelerator-dec2015
PPT
GEM - GNU C Compiler Extensions Framework
PPTX
HPAT presentation at JuliaCon 2016
PPTX
Hands on OpenCL
Low Level Exploits
Processor Verification Using Open Source Tools and the GCC Regression Test Suite
EKON 25 Python4Delphi_mX4
06 - ELF format, knowing your friend
FORECAST: Fast Generation of Accurate Context-Aware Signatures of Control-Hij...
Course lecture - An introduction to the Return Oriented Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
Pascal script maxbox_ekon_14_2
GCC RTL and Machine Description
Return oriented programming
Ekon 25 Python4Delphi_MX475
Return oriented programming (ROP)
Return Oriented Programming (ROP) Based Exploits - Part I
不深不淺,帶你認識 LLVM (Found LLVM in your life)
Fortran & Link with Library & Brief Explanation of MKL BLAS
Runtime Symbol Resolution
Ehsan parallel accelerator-dec2015
GEM - GNU C Compiler Extensions Framework
HPAT presentation at JuliaCon 2016
Hands on OpenCL
Ad

Similar to Glow user review (20)

PDF
Linux kernel driver tutorial vorlesung
PDF
Basics of building a blackfin application
PDF
掀起 Swift 的面紗
PDF
Part II: LLVM Intermediate Representation
PDF
Find your own iOS kernel bug
PDF
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
PDF
Whirlwind tour of the Runtime Dynamic Linker
PDF
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
PDF
Test02
PDF
Hs P005 Reflective Dll Injection
PDF
How to build a tool for operating Flink on Kubernetes
PDF
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
ODP
Hibernate complete Training
PPT
bh-europe-01-clowes
PDF
25 must know python for Interview by Tutort Academy
PDF
Php7 extensions workshop
PDF
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
PDF
Towards JVM Dynamic Languages Toolchain
RTF
Readme
PDF
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
Linux kernel driver tutorial vorlesung
Basics of building a blackfin application
掀起 Swift 的面紗
Part II: LLVM Intermediate Representation
Find your own iOS kernel bug
distage: Purely Functional Staged Dependency Injection; bonus: Faking Kind Po...
Whirlwind tour of the Runtime Dynamic Linker
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Test02
Hs P005 Reflective Dll Injection
How to build a tool for operating Flink on Kubernetes
英文【Xu hao chen xiaobo】find your_own_ios_kernel_bug
Hibernate complete Training
bh-europe-01-clowes
25 must know python for Interview by Tutort Academy
Php7 extensions workshop
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Towards JVM Dynamic Languages Toolchain
Readme
2013.02.02 지앤선 테크니컬 세미나 - Xcode를 활용한 디버깅 팁(OSXDEV)
Ad

Recently uploaded (20)

PDF
Website Design Services for Small Businesses.pdf
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PDF
Time Tracking Features That Teams and Organizations Actually Need
PDF
Microsoft Office 365 Crack Download Free
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PDF
AI Guide for Business Growth - Arna Softech
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Types of Token_ From Utility to Security.pdf
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Tech Workshop Escape Room Tech Workshop
PPTX
GSA Content Generator Crack (2025 Latest)
Website Design Services for Small Businesses.pdf
CCleaner 6.39.11548 Crack 2025 License Key
Time Tracking Features That Teams and Organizations Actually Need
Microsoft Office 365 Crack Download Free
How to Use SharePoint as an ISO-Compliant Document Management System
How Tridens DevSecOps Ensures Compliance, Security, and Agility
AI Guide for Business Growth - Arna Softech
Why Generative AI is the Future of Content, Code & Creativity?
Monitoring Stack: Grafana, Loki & Promtail
iTop VPN Crack Latest Version Full Key 2025
Trending Python Topics for Data Visualization in 2025
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Weekly report ppt - harsh dattuprasad patel.pptx
Topaz Photo AI Crack New Download (Latest 2025)
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Types of Token_ From Utility to Security.pdf
Oracle Fusion HCM Cloud Demo for Beginners
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Tech Workshop Escape Room Tech Workshop
GSA Content Generator Crack (2025 Latest)

Glow user review

  • 1. Glow user review @96cfc41 Kuan Hsu Chen (Zakk)
  • 2. Goal 1. Introduce Compilation Flow 2. Introduce IR Design 3. Pros and Cons 4. TODO Features 5. Make you want to do something on Glow!? https://www.books.com.tw/products/0010750924
  • 3. Compilation Flow Importer HIR optimizer GenLIR Scheduler Optimize Caffe2/ONNX/ ONNXIFI builtin HIR builtin HIR Backend Pre- Lowering Node Lowering Backend Post- Lowering builtin or custom HIR builtin or custom HIR builtin or custom HIR builtin or custom LIR glow::generateAndOptimizeIR(Function *F, bool shouldShareBuffers) Backend save Backend compile glow::CompiledFunction Standalone Executable Bundle backend code
  • 5. High Level IR (HIR) 1. dataflow node-based graph representation 2. There are two types (placeholder and constant) of variable. All nodes can access all variables in the Module. Currently glow can not distinguish placeholder is input or output, but the importer has naming rule convention when creating output placeholder, so we use the name with “save_” prefix to workaround this issue. 3. Glow is NHWC order. (Caffe2/ONNX are NCHW). If the backend only supports NCHW, remember to insert transpose node in pre-lowering. 4. If some model format only support float type but user want to quantize model. After optimization, the graph inserted quantize/dequantize before/after input/output placeholder. But in some case, you would like to perform quantize/dequantize in user program side.
  • 6. Low Level IR (LIR) 1. Instruction-based representation, LIR allows multiple output (ex. loss function) 2. The operand has qualifiers with @in, @out, @inout. So the user instruction in the users list maybe is output value. 3. All LIR can be built-in Op. 4. LIR designs the allocactivation/dealloc instruction to measure live range of activation. LIR optimizer will sink/hoist the alloc/dealloc place to reduce memory pressure. But in some backend, we will insert allocactivation because the weights will occupy memory 5. Glow provides simple static time memory allocator (first-fit) for backend usage.
  • 7. Backend in Glow 1. tools/ClassGen/Backends/: Define backend-specific Nodes and Instructions for the Backend. (like LLVM’s tablegen) 2. Node attribute: ● .addOverwrittenInput(“Output”) ● .setHasSideEffects(true) 3. Instr attribute: ● .autoIRGen() : Framework help backend to generates translation code (HIR->LIR) . ● .inplaceOperand({"Dest", "Batch"}) ● .dataParallel()
  • 8. Backend in Glow 2. lib/Backends/: implement derivied classes for Backend and CompiledFunction. a. Backend abstract class i. bool transformPreLowering/transforPostLowering ii. bool shouldLower(const Node *N) const; iii. bool shouldShareBuffers() const; iv. compile/save v. isOpSupported(Kinded::Kind opKind, ElemKind elementTy) const; b. CompiledFunction i. execute() = 0; ii. setupRuns(), beforeRun(), afterRun(), tearDownRuns();
  • 9. Pros and Cons Pros: 1. Supprot training and inference compilation 2. Support quantization feature 3. Support many HIR and LIR optimziation and it also can work on custom nodes/instructions. 4. Support “dump DAG” 5. Support ASIC-friendly IR and helper function 6. more... Cons: 1. Does not support python interface. But user can use ONNXIFI to achieve it. 2. Not-exist any ASIC backend for reference. 3. missed some builtin operator 4. more ...
  • 10. Quantization feature 1. Quantization nodes in HIR. a. QuantizationProfile b. Quantize/Dequantize /RescaleQuantized c. IntLookupTable d. RowwiseQuantizedFullyConnected 2. Support related optimizations. a. Quantize(Dequantize(X)) -> RescaleQuantized(X) b. Dequantize(Quantize(X)) -> X c. Quantize(Constant) -> Constant d. PoolingNode(Rescale(X)) -> Rescale(PoolingNode(X)). e. more...
  • 11. Optimizations 1. Graph optimizer (HIR) a. DCE, CSE. b. Optimize specific node. i. Concat(Slice(X, 0..10), Slice(X, 10..20)) -> X ii. merge Transpose into MatMul iii. Relu (MaxPool(X)) -> MaxPool(Relu(X)) iv. merge batch normalization operations. (Inference) v. more … 2. IR optimizer (LIR) a. Reduce memory usage i. sinkAllocas/hoistDealloc/sinkTensorViews ii. eliminate copy instruction b. Eliminate redundant instructions c. Peephole optimizations d. more...
  • 12. Support ASIC-friendly IR and helper function 1. Slice/InsertTensor/Tile/Gather/Scatter (HIR) 2. TensorView (LIR): a view of an existing tensor and does not allocate any new memory 3. Tensor class: represent a contiguous n-dim array. (copyRawFrom/copySlice/Transpose) 4. Handles: easy to access/operation on a Tensor /// Create a tensor of type Float, of the shape {4 x 2}. Tensor inputs(ElemKind::FloatTy, {4, 2}); /// Create a handle to the tensor. auto I = inputs.getHandle<float>(); /// Store an element to the tensor at index {0, 0}. I.at({0, 0}) = 13.1;
  • 13. Cons (?) 1. There is only ShareBuffers flag to enable/disable optimization. 2. There is only one memory space in the one LIR function. If you backend has two memory spaces in the one LIR function, some ShareBuffer optimization will generate unwanted result. IRFunctionplaceholder weight
  • 14. Cons (?) 3. We does not see any advanced optimization comparing with TVM or in-house compiler. ex. activation/weight partition when memory insufficient, reuse activation to avoid memory movement, computation and data movement parallelism, more..
  • 15. Make you want to do something on Glow!? You can try to 1. Add a real ASIC backend 2. Add more advanced optimizations 3. Offloading subgraph to different backend a. how to cowork with cpu 4. Improve JIT performance a. How to support dynamic input shape? b. How to support ROI pooling layer? (becuase the layer parameter is runtime information) 5. How to debug optimized model 6. Advanced scheduler 7. Advanced memory allocator 8. more..
  • 16. ex. help function for advanced optimization
  • 19. Reference. 1. https://github.com/pytorch/glow 2. https://sophon-edge.gitbook.io/project/getting-started/bmnnsdk-framework 3. https://devblogs.nvidia.com/production-deep-learning-nvidia-gpu-inference-en gine/ 4. Sophon backend 5. DAG graph sponsor
  • 20. Q&A