SlideShare a Scribd company logo
Robotic Computing On Fpgas Synthesis Lectures On
Distributed Computing Theory Shaoshan Liu
download
https://ebookbell.com/product/robotic-computing-on-fpgas-
synthesis-lectures-on-distributed-computing-theory-shaoshan-
liu-33377966
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Autonomous Robotic Systems Soft Computing And Hard Computing
Methodologies And Applications 1st Edition J Mira
https://ebookbell.com/product/autonomous-robotic-systems-soft-
computing-and-hard-computing-methodologies-and-applications-1st-
edition-j-mira-4189056
Human Communication Technology Internetofroboticthings And Ubiquitous
Computing 1st Edition Anandan R
https://ebookbell.com/product/human-communication-technology-
internetofroboticthings-and-ubiquitous-computing-1st-edition-
anandan-r-35849040
Soft Computing In Advanced Robotics 1st Edition Yongtae Kim
https://ebookbell.com/product/soft-computing-in-advanced-robotics-1st-
edition-yongtae-kim-4662850
Soft Computing For Intelligent Control And Mobile Robotics 1st Edition
Ramn Zatarain
https://ebookbell.com/product/soft-computing-for-intelligent-control-
and-mobile-robotics-1st-edition-ramn-zatarain-4194888
Geometric Computing With Clifford Algebras Theoretical Foundations And
Applications In Computer Vision And Robotics Softcover Reprint Of
Hardcover 1st Ed 2001 Editorgerald Sommer
https://ebookbell.com/product/geometric-computing-with-clifford-
algebras-theoretical-foundations-and-applications-in-computer-vision-
and-robotics-softcover-reprint-of-hardcover-1st-ed-2001-editorgerald-
sommer-54790386
Wavelets In Soft Computing World Scientific Series In Robotics And
Intelligent Systems 25 Marc Thuillard
https://ebookbell.com/product/wavelets-in-soft-computing-world-
scientific-series-in-robotics-and-intelligent-systems-25-marc-
thuillard-2170742
Aspects Of Soft Computing Intelligent Robotics And Control 1st Edition
Endre Pap Auth
https://ebookbell.com/product/aspects-of-soft-computing-intelligent-
robotics-and-control-1st-edition-endre-pap-auth-4193848
Computational Surgery And Dual Training Computing Robotics And Imaging
B L Bass
https://ebookbell.com/product/computational-surgery-and-dual-training-
computing-robotics-and-imaging-b-l-bass-4593808
Advances In Soft Computing Intelligent Robotics And Control 1st
Edition Jnos Fodor
https://ebookbell.com/product/advances-in-soft-computing-intelligent-
robotics-and-control-1st-edition-jnos-fodor-4662878
Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu
Series Editor: Natalie Enright Jerger, University of Toronto
Robotic Computing on FPGAs
Shaoshan Liu, PerceptIn
Zishen Wan, Georgia Institute of Technology
Bo Yu, PerceptIn
Yu Wang, Tsinghua University
This book provides a thorough overview of the state-of-the-art field-programmable gate
array (FPGA)-based robotic computing accelerator designs and summarizes their adopted
optimized techniques.This book consists of ten chapters, delving into the details of how
FPGAs have been utilized in robotic perception, localization, planning, and multi-robot
collaboration tasks. In addition to individual robotic tasks, this book provides detailed
descriptions of how FPGAs have been used in robotic products, including commercial
autonomous vehicles and space exploration robots.
store.morganclaypool.com
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.
LIU
•
ET
AL
				
ROBOTIC
COMPUTING
ON
FPGAS
			MORGAN
&
CLAYPOOL
Synthesis Lectures on
Computer Architecture
Synthesis Lectures on
Computer Architecture
Series ISSN: 1935-3235
Natalie Enright Jerger, Series Editor
Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu
Robotic Computing
on FPGAs
Synthesis Lectures on
Computer Architecture
Editor
Natalie Enright Jerger, University of Toronto
Editor Emerita
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page books on topics pertaining to
the science and art of designing, analyzing, selecting, and interconnecting hardware components to
create computers that meet functional, performance, and cost goals. The scope will largely follow
the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and
ASPLOS.
Robotic Computing on FPGAs
Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang
2021
AI for Computer Architecture: Principles, Practice, and Prospects
Lizhong Chen, Drew Penney, and Daniel Jiménez
2020
Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale
Production
Andres Rodriguez
2020
Parallel Processing, 1980 to 2020
Robert Kuhn and David Padua
2020
Data Orchestration in Deep Learning Accelerators
Tushar Krishna, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, and Ananda Samajdar
2020
Efficient Processing of Deep Neural Networks
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
2020
iii
Quantum Computer System: Research for Noisy Intermediate-Scale Quantum
Computers
Yongshan Ding and Frederic T. Chong
2020
A Primer on Memory Consistency and Cache Coherence, Second Edition
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David Wood
2020
Innovations in the Memory System
Rajeev Balasubramonian
2019
Cache Replacement Policies
Akanksha Jain and Calvin Lin
2019
The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition
Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan
2018
Principles of Secure Processor Architecture Design
Jakub Szefer
2018
General-Purpose Graphics Processor Architectures
Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers
2018
Compiling Algorithms for Heterogenous Systems
Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz
2018
Architectural and Operating System Support for Virtual Memory
Abhishek Bhattacharjee and Daniel Lustig
2017
Deep Learning for Computer Architects
Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks
2017
On-Chip Networks, Second Edition
Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh
2017
iv
Space-Time Computing with Temporal Neural Networks
James E. Smith
2017
Hardware and Software Support for Virtualization
Edouard Bugnion, Jason Nieh, and Dan Tsafrir
2017
Datacenter Design and Management: A Computer Architect’s Perspective
Benjamin C. Lee
2016
A Primer on Compression in the Memory Hierarchy
Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood
2015
Research Infrastructures for Hardware Accelerators
Yakun Sophia Shao and David Brooks
2015
Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015
Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015
Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015
Single-Instruction Multiple-Data Execution
Christopher J. Hughes
2015
Power-Efficient Computer Architectures: Recent Advances
Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras
2014
FPGA-Accelerated Simulation of Computer Systems
Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe
2014
A Primer on Hardware Prefetching
Babak Falsafi and Thomas F. Wenisch
2014
v
On-Chip Photonic Interconnects: A Computer Architect’s Perspective
Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella
2013
Optimization and Mathematical Modeling in Computer Architecture
Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and
David Wood
2013
Security Basics for Computer Architects
Ruby B. Lee
2013
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale
Machines, Second Edition
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
2013
Shared-Memory Synchronization
Michael L. Scott
2013
Resilient Architecture Design for Voltage Variation
Vijay Janapa Reddi and Meeta Sharma Gupta
2013
Multithreading Architecture
Mario Nemirovsky and Dean M. Tullsen
2013
Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012
Automatic Parallelization: An Overview of Fundamental Compiler Techniques
Samuel P. Midkiff
2012
Phase Change Memory: From Devices to Systems
Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran
2011
Multi-Core Cache Hierarchies
Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
2011
vi
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, and David A. Wood
2011
Dynamic Binary Modification: Tools, Techniques, and Applications
Kim Hazelwood
2011
Quantum Computing for Computer Architects, Second Edition
Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong
2011
High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
Dennis Abts and John Kim
2011
Processor Microarchitecture: An Implementation Perspective
Antonio González, Fernando Latorre, and Grigorios Magklis
2010
Transactional Memory, Second Edition
Tim Harris, James Larus, and Ravi Rajwar
2010
Computer Architecture Performance Evaluation Methods
Lieven Eeckhout
2010
Introduction to Reconfigurable Supercomputing
Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009
On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009
Fault Tolerant Computer Architecture
Daniel J. Sorin
2009
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale
Machines
Luiz André Barroso and Urs Hölzle
2009
vii
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
2008
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Kunle Olukotun, Lance Hammond, and James Laudon
2007
Transactional Memory
James R. Larus and Ravi Rajwar
2006
Quantum Computing for Computer Architects
Tzvetan S. Metodi and Frederic T. Chong
2006
Copyright © 2021 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
Robotic Computing on FPGAs
Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang
www.morganclaypool.com
ISBN: 9781636391656 paperback
ISBN: 9781636391663 ebook
ISBN: 9781636391670 hardcover
DOI 10.2200/S01101ED1V01Y202105CAC056
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE
Lecture #56
Series Editor: Natalie Enright Jerger, University of Toronto
Editor Emerita: Margaret Martonosi, Princeton University
Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison
Series ISSN
Print 1935-3235 Electronic 1935-3243
Robotic Computing
on FPGAs
Shaoshan Liu
PerceptIn
Zishen Wan
Georgia Institute of Technology
Bo Yu
PerceptIn
Yu Wang
Tsinghua University
SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #56
C
M
& cLaypool
Morgan publishers
&
ABSTRACT
This book provides a thorough overview of the state-of-the-art field-programmable gate array
(FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized
techniques. This book consists of ten chapters, delving into the details of how FPGAs have been
utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In
addition to individual robotic tasks, this book provides detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.
KEYWORDS
robotics, FPGAs, autonomous machines, perception, localization, planning, con-
trol, space exploration, deep learning
xi
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Planning and Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 FPGAs in Robotic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The Deep Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 An Introduction to FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Types of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.3 Commercial Applications of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 What is Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 How to Use Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Achieving High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.4 Real-World Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Robot Operating System (ROS) on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 Robot Operating System (ROS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 ROS-Compliant FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.3 Optimizing Communication Latency for the ROS-Compliant
FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Perception on FPGAs – Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Why Choose FPGAs for Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Preliminary: Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
xii
3.3 Design Methodology and Criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Hardware-Oriented Model Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 Data Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.2 Weight Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Hardware Design: Efficient Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.1 Computation Unit Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5.2 Loop Unrolling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.5.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Perception on FPGAs – Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 Perception in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Stereo Vision in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Local Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4 Global Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.5 Semi-Global Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.5.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Efficient Large-Scale Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . 63
4.6.1 ELAS Algorithm Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.7 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.1 Dataset and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7.2 Power and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Localization on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.2 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xiii
5.3 Frontend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.2 Exploiting Task-Level Parallelisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Backend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.2 Resource Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 Planning on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1 Motion Planning Context Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.1.1 Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.1.2 Rapidly Exploring Random Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Collision Detection on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.1 Motion Planning Compute Time Profiling . . . . . . . . . . . . . . . . . . . . . 94
6.2.2 General Purpose Processor-Based Solutions . . . . . . . . . . . . . . . . . . . . 95
6.2.3 Specialized Hardware Accelerator-Based Solutions . . . . . . . . . . . . . . 97
6.2.4 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.3 Graph Search on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7 Multi-Robot Collaboration on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.1 Multi-Robot Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 INCAME Framework for Multi-Task on FPGAs . . . . . . . . . . . . . . . . . . . . . 113
7.2.1 Hardware Resource Conflicts in ROS . . . . . . . . . . . . . . . . . . . . . . . . 113
7.2.2 Interruptible Accelerator with ROS (INCAME) . . . . . . . . . . . . . . . 115
7.3 Virtual Instruction-Based Accelerator Interrupt . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.1 Instruction Driven Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
7.3.2 How to Interrupt: Virtual Instruction . . . . . . . . . . . . . . . . . . . . . . . . 119
7.3.3 Where to Interrupt: After SAVE/CALC_F . . . . . . . . . . . . . . . . . . . 121
7.3.4 Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3.5 Virtual Instruction ISA (VI-ISA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.3.6 Instruction Arrangement Unit (IAU) . . . . . . . . . . . . . . . . . . . . . . . . 125
7.3.7 Example of Virtual Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7.4 Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
xiv
7.4.2 Virtual Instruction-Based Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.4.3 ROS-Based MR-Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8 Autonomous Vehicles Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.1 The PerceptIn Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.2 Design Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2.1 Overview of the Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
8.2.2 Performance Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.2.3 Energy and Cost Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.3 Software Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.4 On Vehicle Processing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.4.1 Hardware Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.4.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.4.3 Sensor Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.4.4 Performance Characterizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
9 Space Robots Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.1 Radiation Tolerance for Space Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
9.2 Space Robotic Algorithm Acceleration on FPGAs . . . . . . . . . . . . . . . . . . . . 151
9.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
9.2.2 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.3 Utilization of FPGAs in Space Robotic Missions . . . . . . . . . . . . . . . . . . . . . 154
9.3.1 Mars Exploration Rover Missions . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
9.3.2 Mars Science Laboratory Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3.3 Mars 2020 Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.1 What we Have Covered in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
xv
Preface
In this book, we provide a thorough overview of the state-of-the-art FPGA-based robotic com-
puting accelerator designs and summarize their adopted optimized techniques. The authors
combined have over 40 years of research experiences of utilizing FPGAs in robotic applications,
both in academic research and commercial deployments. For instance, the authors have demon-
strated that, by co-designing both the software and hardware, FPGAs can achieve more than 10×
better performance and energy efficiency compared to the CPU and GPU implementations. The
authors have also pioneered the utilization of the partial reconfiguration methodology in FPGA
implementations to further improve the design flexibility and reduce the overhead. In addition,
the authors have successfully developed and shipped commercial robotic products powered by
FPGAs and the authors demonstrate that FPGAs have excellent potential and are promising
candidates for robotic computing acceleration due to its high reliability, adaptability, and power
efficiency.
The authors believe that FPGAs are the best compute substrate for robotic applications
for several reasons. First, robotic algorithms are still evolving rapidly, and thus any ASIC-based
accelerators will be months or even years behind the state-of-the-art algorithms. On the other
hand, FPGAs can be dynamically updated as needed. Second, robotic workloads are highly di-
verse, thus it is difficult for any ASIC-based robotic computing accelerator to reach economies
of scale in the near future. On the other hand, FPGAs are a cost effective and energy-effective
alternative before one type of accelerator reaches economies of scale. Third, compared to sys-
tems on a chip (SoCs) that have reached economies of scale, e.g., mobile SoCs, FPGAs deliver a
significant performance advantage. Fourth, partial reconfiguration allows multiple robotic work-
loads to time-share an FPGA, thus allowing one chip to serve multiple applications, leading to
overall cost and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability of massively parallel computations and to make use of the prop-
erties of perception (e.g., stereo matching), localization (e.g., simultaneous localization and
mapping (SLAM)), and planning (e.g., graph search) kernels to remove additional logic so
as to simplify the end-to-end system implementation. Taking into account hardware charac-
teristics, several algorithms are proposed which can be run in a hardware-friendly way and
achieve similar software performance. Therefore, FPGAs are possible to meet real-time require-
ments while achieving high energy efficiency compared to central processing units (CPUs) and
graphics processing units (GPUs). In addition, unlike the application-specific integrated circuit
(ASIC) counterparts, FPGA technologies provide the flexibility of on-site programming and
re-programming without going through re-fabrication with a modified design. Partial Recon-
xvi PREFACE
figuration (PR) takes this flexibility one step further, allowing the modification of an operating
FPGA design by loading a partial configuration file. Using PR, part of the FPGA can be recon-
figured at runtime without compromising the integrity of the applications running on those parts
of the device that are not being reconfigured. As a result, PR can allow different robotic applica-
tions to time-share part of an FPGA, leading to energy and performance efficiency, and making
FPGA a suitable computing platform for dynamic and complex robotic workloads. Due to the
advantages over other compute substrates, FPGAs have been successfully utilized in commercial
autonomous vehicles as well as in space robotic applications, for FPGAs offer unprecedented
flexibility and significantly reduced the design cycle and development cost.
This book consists of ten chapters, providing a thorough overview of how FPGAs have
been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks.
In addition to individual robotic tasks, we provide detailed descriptions of how FPGAs have
been used in robotic products, including commercial autonomous vehicles and space exploration
robots.
Shaoshan Liu
June 2021
1
C H A P T E R 1
Introduction and Overview
The last decade has seen significant progress in the development of robotics, spanning from al-
gorithms, mechanics to hardware platforms. Various robotic systems, like manipulators, legged
robots, unmanned aerial vehicles, and self-driving cars have been designed for search and res-
cue [1, 2], exploration [3, 4], package delivery [5], entertainment [6, 7], and more applications
and scenarios. These robots are on the rise of demonstrating their full potential. Take drones,
a type of aerial robot, as an example. The number of drones has grown by 2.83x between 2015
and 2019 based on the U.S. Federal Aviation Administration (FAA) report [8]. The registered
number reached 1.32 million in 2019, and the FFA expects this number will grow to 1.59 billion
by 2024.
However, robotic systems are very complex [9–12]. They tightly integrate many tech-
nologies and algorithms, including sensing, perception, mapping, localization, decision making,
control, etc. This complexity poses many challenges for the design of robotic edge computing
systems [13, 14]. On the one hand, robotic systems need to process an enormous amount of
data in real-time. The incoming data often comes from multiple sensors, which is highly het-
erogeneous and requires accurate spatial and temporal synchronization and pre-processing [15].
However, the robotic system usually has limited on-board resources, such as memory storage,
bandwidth, and compute capabilities, making it hard to meet the real-time requirements. On
the other hand, the current state-of-the-art robotic system usually has strict power constraints
on the edge that cannot support the amount of computation required for performing tasks, such
as 3D sensing, localization, navigation, and path planning. Therefore, the computation and stor-
age complexity, as well as real-time and power constraints of the robotic system, hinder its wide
application in latency-critical or power-limited scenarios [16].
Therefore, it is essential to choose a proper compute platform for robotic systems. CPUs
and GPUs are two widely used commercial compute platforms. CPUs are designed to handle
a wide range of tasks quickly and are often used to develop novel algorithms. A typical CPU
can achieve 10–100 GFLOPS with below 1 GOP/J power efficiency [17]. In contrast, GPUs
are designed with thousands of processor cores running simultaneously, which enable massive
parallelism. A typical GPU can perform up to 10 TOPS performance and become a good can-
didate for high-performance scenarios. Recently, benefiting in part from the better accessibility
provided by CUDA/OpenCL, GPUs have been predominantly used in many robotic applica-
tions. However, conventional CPUs and GPUs usually consume 10–100 W of power, which
are orders of magnitude higher than what is available on the resource-limited robotic system.
2 1. INTRODUCTION AND OVERVIEW
Besides CPUs and GPUs, FPGAs are attracting attention and becoming compute sub-
strate candidates to achieve energy-efficient robotic task processing. FPGAs require low power
and are often built into small systems with less memory. They have the ability to process mas-
sively parallel computations and to make use of the properties of perception (e.g., stereo match-
ing), localization (e.g., SLAM), and planning (e.g., graph search) kernels to remove additional
logic and simplify the implementation. Taking into account hardware characteristics, researchers
and engineers have proposed several algorithms that can be run in a hardware-friendly way and
achieve similar software performance. Therefore, FPGAs are possible to meet real-time require-
ments while achieving high energy efficiency compared to CPUs and GPUs.
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and
re-programming without going through re-fabrication with a modified design. Partial Recon-
figuration (PR) takes this flexibility one step further, allowing the modification of an operating
FPGA design by loading a partial configuration file. By using PR, part of the FPGA can be
reconfigured at runtime without compromising the integrity of the applications running on the
parts of the device that are not being reconfigured. As a result, PR can allow different robotic
applications to time-share part of an FPGA, leading to energy and performance efficiency, and
making FPGA a suitable computing platform for dynamic and complex robotic workloads.
Note that robotics is not one technology but rather an integration of many technologies.
As shown in Fig. 1.1, the stack of the robotic system consists of three major components: ap-
plication workloads, including sensing, perception, localization, motion planning, and control;
a software edge subsystem, including operating system and runtime layer; and computing hard-
ware, including both micro-controllers and companion computers [16, 18, 19].
We focus on the robotic application workloads in this chapter. The application subsystem
contains multiple algorithms that are used by the robot to extract meaningful information from
raw sensor data to understand the environment and dynamically make decisions about its actions.
1.1 SENSING
The sensing stage is responsible for extracting meaningful information from the sensor raw data.
To enable intelligent actions and improve reliability, the robot platform usually supports a wide
range of sensors. The number and type of sensors are heavily dependent on the specifications of
the workload and the capability of the onboard compute platform. The sensors can include the
following:
Cameras. Cameras are usually used for object recognition and object tracking, such as
lane detection in autonomous vehicles and obstacle detection in drones, etc. RGB-D camera
can also be utilized to determine object distances and positions. Take autonomous vehicles as
an example, the current system usually mounts eight or more 1080p cameras around the vehicle
to detect, recognize and track objects in different directions, which can greatly improve safety.
1.1. SENSING 3
Usually, these cameras run at 60 Hz, which will process about multiple gigabytes of raw data
per second when combined.
GNSS/IMU. The global navigation satellite system (GNSS) and inertial measurement
unit (IMU) system help the robot localize itself by reporting both inertial updates and an es-
timate of the global location at a high rate. Different robots have different requirements for
localization sensing. For instance, 10 Hz may be enough for a low-speed mobile robot, but
high-speed autonomous vehicles usually demand 30 Hz or higher for localization, and high-
speed drones may need 100 Hz or more for localization, thus we are facing a wide spectrum of
sensing speeds. Fortunately, different sensors have their advantages and drawbacks. GNSS can
enable fairly accurate localization, while it runs at only 10 Hz, thus unable to provide real-time
updates. By contrast, both accelerometer and gyroscope in IMU can run at 100–200 Hz, which
can satisfy the real-time requirement. However, IMU suffers bias wandering over time or per-
turbation by some thermo-mechanical noise, which may lead to an accuracy degradation in the
position estimates. By combining GNSS and IMU, we can get accurate and real-time updates
for robots.
LiDAR. Light detection and ranging (LiDAR) is used for evaluating distance by illu-
minating the obstacles with laser light and measuring the reflection time. These pulses, along
with other recorded data, can generate precise and three-dimensional information about the
surrounding characteristics. LiDAR plays an important role in localization, obstacle detection,
and avoidance. As indicated in [20], the choice of sensors dictates the algorithm and hardware
design. Take autonomous driving as an instance, almost all the autonomous vehicle companies
use LiDAR at the core of their technologies. Examples include Uber, Waymo, and Baidu. Per-
ceptIn and Tesla are among the very few that do not use LiDAR and, instead, rely on cameras
and vision-based systems. In particular, PerceptIn’s data demonstrated that for the low-speed
autonomous driving scenario, LiDAR processing is slower than camera-based vision processing,
but increases the power consumption and cost.
Radar and Sonar. The Radio Detection and Ranging (Radar) and Sound Navigation and
Ranging (Sonar) system is used to determine the distance and speed to a certain object, which
usually serves as the last line of defense to avoid obstacles. Take autonomous vehicles as an ex-
ample, a danger of collision may occur when near obstacles are detected, then the vehicle will
apply brakes or turn to avoid obstacles. Compared to LiDAR, the Radar and Sonar system is
cheaper and smaller, and their raw data is usually fed to the control processor directly with-
out going through the main compute pipeline, which can be used to implement some urgent
functions as swerving or applying the brakes.
One key problem we have observed with commercial CPUs, GPUs, or mobile SoCs is
the lack of built-in multi-sensor processing supports, hence most of the multi-sensor processing
has to be done in software, which could lead to problems such as time synchronization. On
the other hand, FPGAs provide a rich sensor interface and enable most time-critical sensor
4 1. INTRODUCTION AND OVERVIEW
GPS/IMU
LiDAR
Camera
Sensing Perception Decision
Path Planning
Action Prediction
Obstacle Avoidance
Object Detection
Object Tracking
Mapping
Localization
Radar/Sonar Feedback Control
Operating System
Hardware Platform
Figure 1.1: The stack of the robotic system.
processing tasks to be done in hardware [21]. In Chapter 2, we introduce FPGA technologies,
especially how FPGAs provide rich I/O blocks, which can be configured for heterogeneous
sensor processing.
1.2 PERCEPTION
The sensor data is then fed into the perception layer to sense the static and dynamic objects as
well as build a reliable and detailed representation of the robot’s environment by using computer
vision techniques (including deep learning).
The perception layer is responsible for object detection, segmentation, and tracking. There
are obstacles, lane dividers, and other objects to detect. Traditionally, a detection pipeline starts
with image pre-processing, followed by a region of interest detector, and finally a classifier that
outputs detected objects. In 2005, Dalal and Triggs [22] proposed an algorithm based on the
histogram of orientation (HOG) and support vector machine (SVM) to model both the ap-
pearance and shape of the object under various condition. The goal of segmentation is to give
the robot a structured understanding of its environment. Semantic segmentation is usually for-
mulated as a graph labeling problem with vertices of the graph being pixels or super-pixels.
Inference algorithms on graphical models such as conditional random field (CRF) [23, 24] are
used. The goal of tracking is to estimate the trajectory of moving obstacles. Tracking can be
formulated as a sequential Bayesian filtering problem by recursively running the prediction step
and correction step. Tracking can also be formulated by tracking-by-detection handling with
1.3. LOCALIZATION 5
Markovian decision process (MDP) [25], where an object detector is applied to consecutive
frames and detected objects are linked across frames.
In recent years, deep neural networks (DNNs), also known as deep learning, have greatly
affected the field of computer vision and made significant progress in solving robot percep-
tion problems. Most state-of-the-art algorithms now apply one type of neural network based
on convolution operation. Fast R-CNN [26], Faster R-CNN [27], SSD [28], YOLO [29],
and YOLO9000 [30] were used to get much better speed and accuracy in object detection.
Most CNN-based semantic segmentation work is based on Fully Convolutional Networks
(FCNs) [31], and there are some recent work in spatial pyramid pooling network [32] and pyra-
mid scene parsing network (PSPNet) [33] to combine global image-level information with the
locally extracted feature. By using auxiliary natural images, a stacked autoencoder model can be
trained offline to learn generic image features and then applied for online object tracking [34].
In Chapter 3, we review the state-of-the-art neural network accelerator designs and
demonstrate that with software-hardware co-design, FPGAs can achieve more than 10 times
better speed and energy efficiency than the state-of-the-art GPUs. This verifies that FPGAs are
a promising candidate for neural network acceleration. In Chapter 4, we review various stereo vi-
sion algorithms in the robotic perception and their FPGA accelerator designs. We demonstrate
that with careful algorithm-hardware co-design, FPGAs can achieve two orders of magnitude
of higher energy efficiency and performance than the state-of-the-art GPUs and CPUs.
1.3 LOCALIZATION
The localization layer is responsible for aggregating data from various sensors to locate the robot
in the environment model.
GNSS/IMU system is used for localization. The GNSS consist of several satellite systems,
such as GPS, Galileo, and BeiDou, which can provide accurate localization results but with a
slow update rate. In comparison, IMU can provide a fast update with less accurate rotation and
acceleration results. A mathematical filter, such as Kalman Filter, can be used to combine the
advantages of the two and minimize the localization error and latency. However, this sole system
has some problems, such as the signal may bounce off obstacles, introduce more noise, and fail
to work in closed environments.
LiDAR and High-Definition (HD) maps are used for localization. LiDAR can generate
point clouds and provide a shape description of the environment, while it is hard to differentiate
individual points. HD map has a higher resolution compared to digital maps and makes the route
familiar to the robot, where the key is to fuse different sensor information to minimize the errors
in each grid cell. Once the HD map is built, a particle filter method can be applied to localize
the robot in real-time correlated with LiDAR measurement. However, the LiDAR performance
may be severely affected by weather conditions (e.g., rain, snow) and bring localization error.
Cameras are used for localization as well. The pipeline of vision-based localization is sim-
plified as follows: (1) by triangulating stereo image pairs, a disparity map is obtained and used
6 1. INTRODUCTION AND OVERVIEW
to derive depth information for each point; (2) by matching salient features between successive
stereo image frames in order to establish correlations between feature points in different frames,
the motion between the past two frames is estimated; and (3) by comparing the salient features
against those in the known map, the current position of the robot is derived [35].
Apart from these techniques, sensor fusion strategy is also often utilized to combine mul-
tiple sensors for localization, which can improve the reliability and robustness of robot [36, 37].
In Chapter 5, we introduce a general-purpose localization framework that integrates key
primitives in existing algorithms along with its implementation in FPGA. The FPGA-based
localization framework retains high accuracy of individual algorithms, simplifies the software
stack, and provides a desirable acceleration target.
1.4 PLANNING AND CONTROL
The planning and control layer is responsible for generating trajectory plans and passing the
control commands based on the original and destination of the robot. Broadly, prediction and
routing modules are also included here, where their outputs are fed into downstream planning
and control layers as input. The prediction module is responsible for predicting the future be-
havior of surrounding objects identified by the perception layer. The routing module can be a
lane-level routing based on lane segmentation of the HD maps for autonomous vehicles.
Planning and control layers usually include behavioral decision, motion planning, and
feedback control. The mission of the behavioral decision module is to make effective and safe
decisions by leveraging all various input data sources. Bayesian models are becoming more and
more popular and have been applied in recent works [38, 39]. Among the Bayesian mod-
els, the Markov Decision Process (MDP) and Partially Observable Markov Decision Process
(POMDP) are the widely applied methods in modeling robot behavior. The task of motion plan-
ning is to generate a trajectory and send it to the feedback control for execution. The planned
trajectory is usually specified and represented as a sequence of planned trajectory points, and
each of these points contains attributes like location, time, speed, etc. Low-dimensional mo-
tion planning problems can be solved with grid-based algorithms (such as Dijkstra [40] or
A* [41]) or geometric algorithms. High-dimensional motion planning problems can be dealt
with sampling-based algorithms, such as Rapidly exploring Random Tree (RRT) [42] and Prob-
abilistic Roadmap (PRM) [43], which can avoid the problem of local minima. Reward-based
algorithms, such as the Markov decision process (MDP), can also generate the optimal path by
maximizing cumulative future rewards. The goal of feedback control is to track the difference
between the actual pose and the pose on the predefined trajectory by continuous feedback. The
most typical and widely used algorithm in robot feedback control is PID.
While optimization-based approaches enjoy mainstream appeal in solving motion plan-
ning and control problems, learning-based approaches [44–48] are becoming increasingly pop-
ular with recent developments in artificial intelligence. Learning-based methods, such as rein-
forcement learning, can naturally make full use of historical data and iteratively interact with the
1.5. FPGAS IN ROBOTIC APPLICATIONS 7
environment through actions to deal with complex scenarios. Some model the behavioral level
decisions via reinforcement learning [46, 48], while other approaches directly work on motion
planning trajectory output or even direct feedback control signals [45]. Q-learning [49], Actor-
Critic learning [50], and policy gradient [43] are some popular algorithms in reinforcement
learning.
In Chapter 6, we introduce the motion planning modules of the robotics system, and
compare several FPGA and ASIC accelerator designs in motion planning to analyze intrinsic
design trade-offs. We demonstrate that with careful algorithm-hardware co-design, FPGAs can
achieve three orders of magnitude than CPUs and two orders of magnitude than GPUs with
much lower power consumption. This demonstrates that FPGAs can be a promising candidate
for accelerating motion planning kernels.
1.5 FPGAS IN ROBOTIC APPLICATIONS
Besides accelerating the basic modules in the robotic computing stack, FPGAs have been uti-
lized in different robotic applications. In Chapter 7, we explore how FPGAs can be utilized
in multi-robot exploration tasks. Specifically, we present an FPGA-based interruptible CNN
accelerator and a deployment framework for multi-robot exploration.
In Chapter 8, we provide a retrospective summary of PerceptIn’s efforts on developing
on-vehicle computing systems for autonomous vehicles, especially how FPGAs are utilized to
accelerate critical tasks in a full autonomous driving stack. For instance, localization is acceler-
ated on an FPGA while depth estimation and object detection are accelerated by a GPU. This
case study has demonstrated that FPGAs are capable of playing a crucial role in autonomous
driving, and exploiting accelerator-level parallelism while taking into account constraints arising
in different contexts could significantly improve on-vehicle processing.
In Chapter 9, we explore how FPGAs have been utilized in space robotic applications in
the past two decades. The properties of FPGAs make them good onboard processors for space
missions, ones that have high reliability, adaptability, processing power, and energy efficiency.
FPGAs may help us close the two-decade performance gap between commercial processors and
space-grade ASICs when it comes to powering space exploration robots.
1.6 THE DEEP PROCESSING PIPELINE
Different from other computing workloads, autonomous machines have a very deep processing
pipeline with strong dependencies between different stages and a strict time-bound associated
with each stage [51]. For instance, Fig. 1.2 presents an overview of the processing pipeline of an
autonomous driving system. Starting from the left side, the system consumes raw sensing data
from mmWave radars, LiDARs, cameras, and GNSS/IMUs, and each sensor produces raw data
at a different frequency. The cameras capture images at 30 FPS and feed the raw data to the 2D
Perception module, the LiDARs capture point clouds at 10 FPS and feed the raw data to the
8 1. INTRODUCTION AND OVERVIEW
2D
Perception
10 Hz
30 Hz 30 Hz 10 Hz
10 Hz
10 Hz
10 Hz
100 Hz
10 Hz
10 Hz
10 Hz
100 Hz 10 Hz
3D
Perception
Perception
Fusion
mmWave
Radar
Tracking
Localization
Prediction Planning Control
Camera
LiDAR
GNSS/IMU Vehicle Chassis
Figure 1.2: The processing pipeline of autonomous vehicles.
3D Perception module as well as the Localization module, the GNSS/IMUs generate positional
updates at 100 Hz and feed the raw data to the Localization module, the mmWave radars detect
obstacles at 10 FPS and feed the raw data to the Perception Fusion module.
Next, the results of 2D and 3D Perception Modules are fed into the Perception Fusion
module at 30 Hz and 10 Hz, respectively, to create a comprehensive perception list of all detected
objects. The perception list is then fed into the Tracking module at 10 Hz to create a tracking list
of all detected objects. The tracking list is then fed into the Prediction module at 10 Hz to create
a prediction list of all objects. After that, both the prediction results and the localization results
are fed into the Planning module at 10 Hz to generate a navigation plan. The navigation plan is
then fed into the Control module at 10 Hz to generate control commands, which are finally sent
to the autonomous vehicle for execution at 100 Hz.
Hence, for each 10 ms, the autonomous vehicle needs to generate a control command to
maneuver the vehicle. If any upstream module, such as the Perception module, misses the deadline
to generate an output, the Control module still has to generate a command before the deadline.
This could lead to disastrous results as the autonomous vehicle is essentially driving blindly
without the perception output.
The key challenge is to design a system to minimize the end-to-end latency of the deep
processing pipeline within energy and cost constraints, and with minimum latency variation.
In this book, we demonstrate that FPGAs can be utilized in different modules in this long
processing pipeline to minimize latency, reduce latency variation, and achieve energy efficiency.
1.7. SUMMARY 9
1.7 SUMMARY
The authors believe that FPGAs are the indispensable compute substrate for robotic applications
for several reasons.
• First, robotic algorithms are still evolving rapidly. Thus, any ASIC-based accelerators
will be months or even years behind the state-of-the-art algorithms; on the other hand,
FPGAs can be dynamically updated as needed.
• Second, robotic workloads are highly diverse. Thus, it is difficult for any ASIC-based
robotic computing accelerator to reach economies of scale in the near future; on the
other hand, FPGAs are a cost-effective and energy-effective alternative before one type
of accelerator reaches economies of scale.
• Third, compared to SoCs that have reached economies of scale, e.g., mobile SoCs and
FPGAs deliver a significant performance advantage.
• Fourth, partial reconfiguration allows multiple robotic workloads to time-share an
FPGA, thus allowing one chip to serve multiple applications, leading to overall cost
and energy reduction.
Specifically, FPGAs require little power and are often built into small systems with less
memory. They have the ability to parallel computations massively and make use of the proper-
ties of perception (e.g., stereo matching), localization (e.g., SLAM), and planning (e.g., graph
search) kernels to remove additional logic and simplify the implementation. Taking into ac-
count hardware characteristics, several algorithms are proposed which can be run in a hardware-
friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet
real-time requirements while achieving high energy efficiency compared to CPUs and GPUs.
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-
programming without going through re-fabrication with a modified design. PR takes this flex-
ibility one step further, allowing the modification of an operating FPGA design by loading a
partial configuration file. Using PR, part of the FPGA can be reconfigured at runtime without
compromising the integrity of the applications running on those parts of the device that are
not being reconfigured. As a result, PR can allow different robotic applications to time-share
part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable
computing platform for dynamic and complex robotic workloads.
Due to the advantages over other compute substrates, FPGAs have been successfully uti-
lized in commercial autonomous vehicles. Particularly, over the past four years, PerceptIn has
built and commercialized autonomous vehicles for micromobility, and PerceptIn’s products have
been deployed in China, the U.S., Japan, and Switzerland. In this book, we provide a real-world
case study on how PerceptIn developed its computing system by relying heavily on FPGAs,
which perform not only heterogeneous sensor synchronizations but also the acceleration of soft-
ware components on the critical path. In addition, FPGAs are used heavily in space robotic
10 1. INTRODUCTION AND OVERVIEW
applications, for FPGAs offered unprecedented flexibility and significantly reduced the design
cycle and development cost.
11
C H A P T E R 2
FPGA Technologies
Before we delve into utilizing FPGAs for accelerating robotic workloads, in this chapter we
first provide the background of FPGA technologies so that readers without prior knowledge
can grasp the basic understanding of what an FPGA is and how an FPGA works. We also
introduce partial reconfiguration, a technique that exploits the flexibility of FPGAs and one
that is extremely useful for various robotic workloads to time-share an FPGA so as to minimize
energy consumption and resource utilization. In addition, we explore existing techniques that
enable the robot operating system (ROS), an essential infrastructure for robotic computing, to
run directly on FPGAs.
2.1 AN INTRODUCTION TO FPGA TECHNOLOGIES
In the 1980s, FPGAs emerged as a result of increasing integration in electronics. Before the use
of FPGAs, glue-logic designs were based on individual boards with fixed components intercon-
nected via a shared standard bus, which has various drawbacks, such as hindrance of high volume
data processing and higher susceptibility to radiation-induced errors, in addition to inflexibility.
In detail, FPGAs are semiconductor devices that are based around a matrix of config-
urable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be repro-
grammed to desired application or functionality requirements after manufacturing. This feature
distinguishes FPGAs from Application-Specific Integrated Circuits (ASICs), which are custom
manufactured for specific design tasks.
Note that ASICs and FPGAs have different value propositions, and they must be care-
fully evaluated before choosing any one over the other. While FPGAs used to be selected for
lower-speed/complexity/volume designs in the past, today’s FPGAs easily push the 500 MHz
performance barrier. With unprecedented logic density increases and a host of other features,
such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price
points, FPGAs are a compelling proposition for almost any type of design.
Modern FPGAs are with massive reconfigurable logic and memory, which let engineers
build dedicated hardware with superb power and performance efficiency. Especially, FPGAs are
attracting attention from the robotic community and becoming an energy-efficient platform for
robotic computing. Unlike ASIC counterparts, FPGA technology provides the flexibility of on-
site programming and re-programming without going through re-fabrication with a modified
design, due to its underlying reconfigurable fabrics.
12 2. FPGA TECHNOLOGIES
2.1.1 TYPES OF FPGAS
FPGAs can be categorized by the type of their programmable interconnection switches: antifuse,
SRAM, and Flash. Each of the three technologies comes with trade-offs.
• Antifuse FPGAs are non-volatile and have a minimal delay due to routing, resulting
in a faster speed and lower power consumption. The drawback is evident as they have a
relatively more complicated fabrication process and are only one-time programmable.
• SRAM-based FPGAs are field reprogrammable and use the standard fabrication pro-
cess that foundries put in significant effort in optimizing, resulting in a faster rate of
performance increase. However, based on SRAM, these FPGAs are volatile and may
not hold configuration if a power glitch occurs. Also, they have more substantial rout-
ing delays, require more power, and have a higher susceptibility to bit errors. Note that
SRAM-based FPGAs are the most popular compute substrates in space applications.
• Flash-based FPGAs are non-volatile and reprogrammable, and also have low power
consumption and route delay. The major drawback is that runtime reconfiguration is
not recommended for flash-based FPGAs due to the potentially destructive results if
radiation effects occur during the reconfiguration process [52]. Also, the stability of
stored charge on the floating gate is of concern: it is a function including factors such
as operating temperature, the electric fields that might disturb the charge. As a result,
flash-based FPGAs are not as frequently used in space missions [53].
2.1.2 FPGA ARCHITECTURE
In this subsection, we introduce the basic components in FPGA architecture in the hope of
providing basic background knowledge to readers with limited prior knowledge on FPGA tech-
nologies. For a detailed and thorough explanation, interested authors can refer to [54].
As shown in Fig. 2.1, a basic FPGA design usually contains the following components.
• Configurable Logic Blocks (CLBs) are the basic repeating logic resources on an
FPGA. When linked together by the programmable routing blocks, CLBs can exe-
cute complex logic functions, implement memory functions, and synchronize code on
the FPGA. CLBs contain smaller components, including flip-flops (FFs), look-up ta-
bles (LUTs), and multiplexers (MUX). An FF is the smallest storage resource on the
FPGA. Each FF in a CLB is a binary register used to save logic states between clock
cycles on an FPGA circuit. An LUT stores a predefined list of outputs for every com-
bination of inputs. LUTs provide a fast way to retrieve the output of a logic operation
because possible results are stored and then referenced rather than calculated. A MUX
is a circuit that selects between two or more inputs and then returns the selected input.
Any logic can be implemented using the combination of FFs, LUTs, and MUX.
2.1. AN INTRODUCTION TO FPGA TECHNOLOGIES 13
Set by
configuration
bit-strem
Output
SRAM
FF
4-LUT
1
0
Logic block
Inputs
4-input look-up table
I/O block
DSP
DSP
Input
3-state
DDR MUX
Programmable
routing block
Configurable logic block
Interconnect wires
Reg
OCK1
Reg
ICK1
Reg
ICK2
PAD
Reg
OCK2
Output
DDR MUX
Reg
OCK1
Reg
OCK2
Figure 2.1: Overview of FPGA architecture.
• Programmable Routing Blocks (PRBs) provide programmability for connectivity
among a pool of CLBs. The interconnection network contains configurable switch ma-
trices and connection blocks that can be programmed to form the demanded connec-
tion. PRBs can be divided into Connection Blocks (CBs) and a matrix of Switch Boxes
(SBs), namely, Switch Matrix (SM). CBs are responsible to provide a connection be-
tween CLBs input/output pins to the adjacent routing channels. SBs are placed at the
intersection points of vertical and horizontal routing channels. Routing a net from a
CLB source to a CLB target necessitates passing through multiple interconnect wires
and SBs, in which an entering signal from a certain side can connect to any of the other
three directions based on the SM topology.
• I/O Blocks (IOBs) are used to bridge signals onto the chip and send them back off
again. An IOB consists of an input buffer and an output buffer with three-state and
open-collector output controls. Typically, there are pull-up resistors on the outputs and
sometimes pull-down resistors that can be used to terminate signals and buses without
requiring discrete resistors external to the chip. The polarity of the output can usually
be programmed for active high or active low output. There are typical flip-flops on
outputs so that clocked signals can be output directly to the pins without encountering
significant delay, more easily meeting the setup time requirement for external devices.
14 2. FPGA TECHNOLOGIES
Since there are many IOBs available on an FPGA and these IOBs are programmable,
we can easily design a compute system to connect to different types of sensors, which
are extremely useful in robotic workloads.
• Digital Signal Processors (DSPs) have been optimized to implement various com-
mon digital signal processing functions with maximum performance and minimum
logic resource utilization. In addition to multipliers, each DSP block has functions
that are frequently required in typical DSP algorithms. These functions usually in-
clude pre-adders, adders, subtractors, accumulators, coefficient register storage, and a
summation unit. With these rich features, the DSP blocks in the Stratix series FP-
GAs are ideal for applications with high-performance and computationally intensive
signal processing functions, such as finite impulse response (FIR) filtering, fast Fourier
transforms (FFTs), digital up/down conversion, high-definition (HD) video process-
ing, HD CODECs, etc. Besides the aforementioned traditional workloads, DSPs are
also extremely useful for robotic workloads, especially computer vision workloads, pro-
viding high-performance and low-power solutions for robotic vision front ends [55].
2.1.3 COMMERCIAL APPLICATIONS OF FPGAS
Due to their programmable nature, FPGAs are an ideal fit for many different markets such as
the following.
• Aerospace & Defense – Radiation-tolerant FPGAs along with the intellectual
property for image processing, waveform generation, and partial reconfiguration for
Software-Defined Radios, especially for space and defense applications.
• ASIC Prototyping – ASIC prototyping with FPGAs enables fast and accurate SoC
system modeling and verification of embedded software.
• Automotive – FPGAs enable automotive silicon and IP solutions for gateway and
driver assistance systems, as well as comfort, convenience, and in-vehicle infotainment.
• Consumer Electronics – FPGAs provide cost-effective solutions enabling next-
generation, full-featured consumer applications, such as converged handsets, digital
flat panel displays, information appliances, home networking, and residential set top
boxes.
• Data Center – FPGAs have been utilized heavily for high-bandwidth, low-latency
servers, networking, and storage applications to bring higher value into cloud deploy-
ments.
• High-Performance Computing and Data Storage – FPGAs have been utilized
widely for Network Attached Storage (NAS), Storage Area Network (SAN), servers,
and storage appliances.
2.2. PARTIAL RECONFIGURATION 15
• Industrial – FPGAs have been utilized in targeted design platforms for Industrial, Sci-
entific, and Medical (ISM) enable higher degrees of flexibility, faster time-to-market,
and lower overall non-recurring engineering costs (NRE) for a wide range of applica-
tions such as industrial imaging and surveillance, industrial automation, and medical
imaging equipment.
• Medical – For diagnostic, monitoring, and therapy applications, FPGAs have been
used to meet a range of processing, display, and I/O interface requirements.
• Security – FPGAs offer solutions that meet the evolving needs of security applications,
from access control to surveillance and safety systems.
• Video & Image Processing – FPGAs have been utilized in targeted design platforms
to enable higher degrees of flexibility, faster time-to-market, and lower overall non-
recurring engineering costs (NRE) for a wide range of video and imaging applications.
• WiredCommunications – FPGAs have been utilized to develop end-to-end solutions
for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, se-
rial backplanes, and more.
• Wireless Communications – FPGAs have been utilized to develop RF, base band,
connectivity, transport, and networking solutions for wireless equipment, addressing
standards such as WCDMA, HSDPA, WiMAX, and others.
In the rest of this book, we explore robotic computing, an emerging and potentially a killer
application for FPGAs. With FPGAs, we can develop low-power, high-performance, cost-
effective, and flexible compute systems for various robotic workloads. Due to the advantages
provided by FPGAs, we expect that robotic applications will be a major demand driver for
FPGAs in the near future.
2.2 PARTIAL RECONFIGURATION
Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-
programming without going through re-fabrication with a modified design. PR takes this flex-
ibility one step further, allowing the modification of an operating FPGA design by loading a
PR file. Using PR, part of the FPGA can be reconfigured at runtime without compromising the
integrity of the applications running on those parts of the device that are not being reconfigured.
As a result, PR can allow different robotic applications to time-share part of an FPGA, leading
to energy and performance efficiency, and making FPGAs suitable computing platforms for
dynamic and complex robotic workloads.
16 2. FPGA TECHNOLOGIES
2.2.1 WHAT IS PARTIAL RECONFIGURATION?
The obvious benefit of using reconfigurable devices, such as FPGAs, is that the functionality that
a device has now can be changed and updated at some time in the future. As additional func-
tionality is available or design improvements are made available, the FPGA can be completely
reprogrammed with new logic. PR takes this capability one step further by allowing designers
to change the logic within a part of an FPGA without disrupting the entire system. This allows
designers to divide their system into modules, each comprised of one block of logic and, without
disrupting the whole system and stopping the flow of data, the users can update the functionality
within one block.
Runtime partial reconfiguration (RPR) is a special feature offered by many FPGAs that
allows designers to reconfigure certain portions of the FPGA during runtime without influ-
encing other parts of the design. This feature allows the hardware to be adaptive to a changing
environment. First, it allows optimized hardware implementation to accelerate computation.
Second, it allows efficient use of chip area such that different hardware modules can be swapped
in/out of the chip at runtime. Last, it may allow leakage and clock distribution power saving by
unloading hardware modules that are not active.
RPR is extremely useful for robotic applications, as a mobile robot might encounter very
different environments as it navigates, and it might require different perception, localization,
or planning algorithms for these different environments. For instance, while a mobile robot is
in an indoor environment, it is likely to use an indoor map for localization, but when it travels
outdoor, it might choose to use GPS and visual-inertial odometry for localization. Keeping
multiple hardware accelerators for different tasks is not only costly but also energy inefficient.
RPR provides a perfect solution for this problem. As shown in Fig. 2.2, an FPGA is divided into
three partitions for the three basic functions, one for perception, one for localization, and one for
planning. Then for each function, there are three algorithms ready, one for each environment.
Each of these algorithms is converted to a bit file and ready for RPR when needed. For instance,
when a robot navigates to a new environment and decides that a new perception algorithm is
needed, it can load the target bit file and sends it to the internal configuration access port (ICAP)
to reconfigure the perception partition.
One major challenge of RPR for robotic computing is the configuration speed, as most
robotic tasks have strong real-time constraints, and to maintain the performance of the robot,
the reconfiguration process has to finish within a very tight time bound. In addition, the recon-
figuration process incurs performance and power overheads. By maximizing the configuration
speed, these overheads can be minimized as well.
2.2.2 HOW TO USE PARTIAL RECONFIGURATION?
PR allows the modification of an operating FPGA design by loading a PR file, or a bit file
through ICAP [56]. Using PR, after a full bit file configures the FPGA, partial bit files can
also be downloaded to modify reconfigurable regions in the FPGA without compromising the
2.2. PARTIAL RECONFIGURATION 17
Perception
module 1
Localization
module 1
Planning
module 1
Perception
module 2
Localization
module 2
Planning
module 2
Perception
module 3
Localization
module 3
Planning
module 3
Partition 1
Partition 2
FPGA
I
C
A
P
Partition 3
Figure 2.2: An example of partial reconfiguration for robotic applications.
integrity of the applications running on those parts of the device that are not being reconfigured.
RPR allows a limited, predefined portion of an FPGA to be reconfigured while the rest of
the device continues to operate, and this feature is especially valuable where devices operate
in a mission-critical environment that cannot be disrupted while some subsystems are being
redefined.
In an SRAM-based FPGA, all user-programmable features are controlled by memory
cells that are volatile and must be configured on power-up. These memory cells are known as
the configuration memory, and they define the look-up table (LUT) equations, signal routing,
input/output block (IOB) voltage standards, and all other aspects of the design. In order to
program the configuration memory, instructions for the configuration control logic and data for
the configuration memory are provided in the form of a bitstream, which is delivered to the
device through the JTAG, SelectMAP, serial, or ICAP configuration interface. An FPGA can
be partially reconfigured using a partial bitstream. A designer can use such a partial bitstream
to change the structure of one part of an FPGA design as the rest of the device continues to
operate.
RPR is useful for systems with multiple functions that can time-share the same FPGA
device resources. In such systems, one section of the FPGA continues to operate, while other
sections of the FPGA are disabled and reconfigured to provide new functionality. This is anal-
ogous to the situation where a microprocessor manages context switching between software
processes. In the case of PR of an FPGA, however, it is the hardware instead of the software
that is being switched.
RPR provides an advantage over multiple full bitstreams in applications that require con-
tinuous operation, which would not be possible during full reconfiguration. One example is a
mobile robot that switches the perception module while keeping the localization module and
planning module intact when moving from a dark environment to a bright environment. With
RPR, the system can maintain the localization and planning modules while the perception mod-
ule within the FPGA is changed on the fly.
18 2. FPGA TECHNOLOGIES
In-circuit
verification
In-circuit
verification
Static
analysis
Functional
verification
Behavioral
simulation
Static
analysis
Partial
reconfiguration
Partial bit file
generation
Upload to
FPGA
Full bit file
generation
Layout
Source file
Synthesis
Figure 2.3: FPGA regular and partial reconfiguration design flow.
Xilinx has provided the PR feature in their high-end FPGAs, the Virtex series, in limited
access BETA since the late 1990s. More recently it is a production feature supported by their
tools and across their devices since the release of ISE 12. The support for this feature continues
to improve in the more recent release of ISE 13. Altera has promised this feature for their new
high-end devices, but this has not yet materialized. PR of FPGAs is a compelling design concept
for general purpose reconfigurable systems for its flexibility and extensibility.
Using the Xilinx tool chain, designers can go through the regular synthesis flow to generate
a single bitstream for programming the FPGA. This considers the device as a single atomic
entity. As opposed to the general synthesis flow, the PR flow physically divides the FPGA device
into regions. One region is called the “static region,” which is the portion of the device that is
programmed at startup and never changes. Another region is the “PR region,” which is the
portion of the device that will be reconfigured dynamically, potentially multiple times and with
different designs. It is possible to have multiple PR regions, but we will consider only the simplest
case here. The PR flow generates at least two bitstreams, one for the static and one for the
PR region. Most likely, there will be multiple PR bitstreams, one for each design that can be
dynamically loaded.
As shown in Fig. 2.3, the first step in implementing a system using the PR design flow is
the same as the regular design, which is to synthesize the netlists from the HDL sources that
will be used in the implementation and layout process. Note that the process requires separate
netlists for the static (top-level) designs and the PR partitions. A netlist must be generated for
each implementation of the PR partition used in the design. If the system design has multiple
2.2. PARTIAL RECONFIGURATION 19
PR partitions, then it will require a netlist for each implementation of each PR partition, even
if the logic is the same in multiple locations. Then once a netlist is done, we need to work on the
layout for each design to make sure that the netlist fits into the dedicated partition, and we need
to make sure that there are enough resources available for the design in each partition. Once
the implementation is done, we can then generate the bit file for each partition. At runtime,
we can dynamically swap different designs to a partition for the robot to adapt to the changing
environment. For more details on how to use PR on FPGAs, please refer to [57].
2.2.3 ACHIEVING HIGH PERFORMANCE
A major performance bottleneck for PR is the configuration overhead, which determines the
usefulness of PR. If PR is done fast enough, we can use this feature to enable mobile robots to
swap hardware components at runtime. If PR cannot be done fast enough, we can only use this
feature to perform offline hardware updates.
To address this problem, in [58], the authors propose a combination of two techniques
to minimize the overhead. First, the authors design and implement fully streaming DMA en-
gines to saturate the configuration throughput. Second, the authors exploit a simple form of
data redundancy to compress the configuration bitstreams, and implement an intelligent in-
ternal configuration access port (ICAP) controller to perform decompression at runtime. This
design achieves an effective configuration data transfer throughput of up to 1.2 GB/s, which
well surpasses the theoretical upper bound of the data transfer throughput, 400 MB/s. Specifi-
cally, the proposed fully streaming DMA engines reduce the configuration time from the range
of seconds to the range of milliseconds, a more than 1000-fold improvement. In addition, the
proposed compression scheme achieves up to a 75% reduction in bitstream size and results in a
decompression circuit with negligible hardware overhead.
Figure 2.4 shows the architecture of the fast PR engine, which consists of:
• a direct memory access (DMA) engine to establish a direct transfer link between the
external SRAM, where the configuration files are stored, and the ICAP;
• a streaming engine implemented with a FIFO queue to buffer data between the con-
sumer and the producer to eliminate the handshake between the producer and the
consumer for each data transfer; and
• turn on the burst mode for ICAP thus it can fetch four words instead of one word at a
time.
We will explain this design in greater details in the following sections.
Problems with the Out-of-Box PR Engine Design
Without the fast PR engine, in the out-of-box design, the ICAP Controller contains only the
ICAP and the ICAP FSM, and the SRAM Controller only contains the SRAM Bridge and
20 2. FPGA TECHNOLOGIES
SRAM
bridge
SRAM
controller
ICAP
controller
SRAM
interface
ICAP
ICAP
FSM
Secondary
DMA
Primary
DMA
FIFO
System bus
Figure 2.4: Fast partial reconfiguration engine.
the SRAM Interface. Hence, there is no direct memory access between SRAM and ICAP, and
all configuration data transfers are done in software. In this way, the pipeline issues one read
instruction to fetch a configuration word from SRAM, and then issues a write instruction to
send the word to ICAP; instructions are also fetched from SRAM, and this process repeats
until the transfer process completes. This scheme is highly inefficient because the transfer of one
word requires tens of cycles, and the ICAP transfer throughput of this design is only 318 KB/s,
whereas on the product specification, the ideal ICAP throughput is 400 MB/s. Hence the out-
of-box design throughput is 1000 times worse than the ideal design.
Configuration Time is a Pure Function of the Bitstream Size?
Theoretically, the ICAP throughput can reach 400 MB/s, but this is achievable only if the config-
uration time is a pure function of bitstream file size. In order to find out whether this theoretical
throughput is achievable, the authors of [58] performed experiments to configure different re-
gions of the FPGA chip, to repeatedly writing NOPs to ICAP, and to stress the configuration
circuit by repeatedly configuring one region. During all these tests, we found out that ICAP al-
ways ran at full speed such that it was able to consume four bytes of configuration data per cycle,
2.2. PARTIAL RECONFIGURATION 21
regardless of the semantics of the configuration data. This confirms that configuration time is a
pure function of the size of the bitstream file.
Adding the Primary-Secondary DMA Engines
To improve PR throughput, we first can simply implement a pair of primary-secondary DMA
engines. The primary DMA engine resides in the ICAP controller and interfaces with the ICAP
FSM, the ICAP, as well as the secondary DMA engine. The secondary DMA engine resides in
the SRAM Controller, and it interfaces with the SRAM Bridge and the primary DMA engine.
When a DMA operation starts, the primary DMA engine receives the starting address as well
as the size of the DMA operation. Then it starts sending control signals (read-enable, address,
etc.) to the secondary DMA engine, which then forwards the signals to the SRAM Bridge.
After the data is fetched, the secondary DMA engine sends the data back to the primary DMA
engine. Then, the primary DMA engine decrements the size counter, increments the address,
and repeats the process to fetch the next word. Compared to the out-of-box design, simply
adding the DMA engines avoids the involvement of the pipeline in the data transfer process
and it significantly increases the PR throughput to 50 MB/s, a 160-fold improvement.
Adding a FIFO between the DMA Engines
To further improve the PR throughput, we can modify the primary-secondary DMA engines
by adding a FIFO between the two DMA engines. In this version of the design, when DMA
operation starts, instead of sending control signals to the secondary DMA engine, the primary
DMA engine forwards the starting address and the size of the DMA operation to the secondary
DMA engine, then it waits for the data to become available in the FIFO. Once data becomes
available in the FIFO, the primary DMA engine reads the data and decrements its size counter.
When the counter hits zero, the DMA operation completes. On the other side, upon receiving
the starting address and size of the DMA operation, the secondary DMA engine starts sending
control signals to the SRAM Bridge to fetch data one word at a time. Then once the secondary
DMA engine receives data from the SRAM Bridge, it writes the word into the FIFO, decre-
ments its size counter, and increments its address register to fetch the next word. In this design,
only data is transferred between the primary and secondary DMA engines, and all control op-
erations to SRAM are handled in the secondary DMA. This greatly simplifies the handshaking
between the ICAP Controller and the SRAM Controller, and it leads to a 100 MB/s ICAP
throughput, an additional two-fold improvement.
Adding Burst Mode to Provide Fully Streaming
The SRAM on most FPGA boards usually provides burst read mode such that we can read
four words at a time instead of one. Burst mode reads are available on most DDR memories
as well. There is an ADVLD signal to the SRAM device. During a read, if this signal is set,
then a new address is loaded into the device. Otherwise, the device will output a burst of up to
22 2. FPGA TECHNOLOGIES
four words, one word per cycle. Therefore, if we can set the ADVLD signal every four cycles,
each time we increment the address by four words, and given that the synchronization between
control signals and data fetches is correct, then we are able to stream data from SRAM to the
ICAP. We implement two independent state machines in the secondary DMA engine. One state
machine sends control signals as well as the addresses to the SRAM in a continuous manner,
such that in every four cycles, the address is incremented by four words (16 bytes) and sent to
the SRAM device. The other state machine simply waits for the data to become ready at the
beginning, and then in each cycle, it receives one word from the SRAM and streams the word
to the FIFO until the DMA operation completes. Similarly, the primary DMA engine waits for
data to become available in the FIFO, and then in each cycle, it reads one word from the FIFO
and streams the word to the ICAP until the DMA operation completes. This fully streaming
DMA design leads to an ICAP throughput that exceeds 395 MB/s, which is very close to the
ideal 400 MB/s throughputs.
Energy Efficiency
In [59], the authors indicate that the polarity of the FPGA hardware structures may significantly
impact leakage power consumption. Based on this observation, the authors of [60] tried to find
out whether FPGAs utilize this property such that when the blank bitstream is loaded to wipe
out an accelerator, the circuit is set to a state to minimize the leakage power consumption. In
order to achieve this, the authors implemented eight PR regions on an FPGA chip, with each
region occupying a configuration frame. These eight PR regions did not consume any dynamic
power, as the authors purposely gated off the clock to these regions. Then the authors used the
blank bitstream files to wipe out each of these regions and observed the chip power consumption
behavior. The results indicated that for every four configuration frames that we applied the blank
bitstream on, the chip power consumption dropped by a constant amount. This study confirms
that PR indeed leads to static power reduction and suggests that FPGAs may have utilized the
polarity property to minimize leakage power.
In addition, the authors of [60] studied whether PR can be used as an effective energy re-
duction technique in reconfigurable computing systems. To approach this problem, the authors
first identified the analytical models that capture the necessary conditions for energy reduc-
tion under different system configurations. The models show that increasing the configuration
throughput is a general and effective way to minimize the PR energy overhead. Therefore, the
authors designed and implemented a fully streaming DMA engine that nearly saturates the
configuration throughput.
The findings provide answers to the three questions: first, although we pay extra power to
use an accelerator, depending on the accelerator’s ability to accelerate the program execution, it
will result in actual energy reduction. The experimental results in [60] demonstrate that due to
its low power overhead and excellent ability of acceleration, having an acceleration extension can
lead to both program speedup and system energy reduction. Second, it is worthwhile to use PR
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 23
to reduce chip energy consumption if the energy reduction can make up for the energy overhead
incurred during the reconfiguration process; and the key to minimize the energy overhead during
the reconfiguration process is to maximize the configuration speed. The experimental results
in [60] confirm that enabling PR is a highly effective energy reduction technique. Finally, clock
gating is an effective technique in reducing energy consumption due to its negligible overhead;
however, it reduces only dynamic power whereas PR reduces both dynamic and static power.
Therefore, PR can lead to a larger energy reduction than clock gating, provided the extra energy
saving on static power elimination can make up for the energy overhead incurred during the
reconfiguration process.
Although the conventional wisdom is that PR is only useful if the accelerator would not
be used for a very long period of time, the experimental results in [60] indicate that with the
high configuration throughput delivered by the fast PR engine, PR can outperform clock gating
in energy reduction even if the accelerator inactive time is in the millisecond range. In summary,
based on the results from [58] and [60], we can conclude that PR is an effective technique for
improving both performance and energy efficiency, and it is the key feature that makes FPGAs
a highly attractive choice for dynamic robotic computing workloads.
2.2.4 REAL-WORLD CASE STUDY
Following the design presented in [60], PerceptIn has demonstrated in their commercial prod-
uct that RPR is useful for robotic computing, especially computing for autonomous vehicles,
because many on-vehicle tasks usually have multiple versions where each is used in a particular
scenario [20]. For instance, in PerceptIn’s design, the localization algorithm relies on salient
features; features in keyframes are extracted by a feature extraction algorithm (based on ORB
features [61]), whereas features in non-key frames are tracked from previous frames (using op-
tical flow [62]); the latter executes in 10 ms, 50% faster than the former. Spatially sharing the
FPGA is not only area-inefficient but also power-inefficient as the unused portion of the FPGA
consumes non-trivial static power. In order to temporally share the FPGA and “hot-swap” dif-
ferent algorithms, PerceptIn developed a Partial Reconfiguration Engine (PRE) that dynami-
cally reconfigures part of the FPGA at runtime. The PRE achieves a 400 MB/s reconfiguration
throughput (i.e., bitstream programming rate). Both the feature extraction and tracking bit-
streams are less than 4 MB. Thus, the reconfiguration delay is less than 1 ms.
2.3 ROBOT OPERATING SYSTEM (ROS) ON FPGAS
As demonstrated in the previous chapter, autonomous vehicles and robots demand complex in-
formation processing such as SLAM (Simultaneous Localization and Mapping), deep learning,
and many other tasks. FPGAs are promising in accelerating these applications with high energy
efficiency. However, utilizing FPGAs for robotic workloads is challenging due to the high de-
velopment costs and lack of talents who can understand both FPGAs and robotics. One way to
address this challenge is to directly support ROS on FPGAs as ROS already provides the basic
24 2. FPGA TECHNOLOGIES
infrastructure for supporting efficient robotic computing. Hence, in this section we explore the
state-of-the-art supports for ROS to run on FPGAs.
2.3.1 ROBOT OPERATING SYSTEM (ROS)
Before delving into supports for running ROS on FPGAs, we first understand the importance
of ROS in robotic applications. ROS is an open-source, meta-operating system for autonomous
machines and robots. It provides the essential operating system services, including hardware
abstraction, low-level device control, implementation of commonly used functionality, message-
passing between processes, and package management. ROS also provides tools and libraries for
obtaining, building, writing, and running code across multiple computers. The primary goal
of ROS is to support code reuse in robotics research and development. In essence, ROS is a
distributed framework of processes that enables executables to be individually designed and
loosely coupled at runtime. These processes can be grouped into Packages and Stacks, which
can be easily shared and distributed. ROS also supports a federated system of code Repositories
that enable collaboration to be distributed as well. This design, from the file system level to the
community level, enables independent decisions about development and implementation, but
all can be brought together with ROS infrastructure tools [63].
The core objectives of the ROS framework include the following.
• Thin: ROS is designed to be as thin as possible so that code written for ROS can be
used with other robot software frameworks.
• ROS-agnostic libraries: the preferred development model is to write ROS-agnostic
libraries with clean functional interfaces.
• Language independence: the ROS framework is easy to implement in any modern
programming language. The ROS development team has already implemented it in
Python, C++, and Lisp, and we have experimental libraries in Java and Lua.
• Easy testing: ROS has a built-in unit/integration test framework called rostest that
makes it easy to bring up and tear down test fixtures.
• Scaling: ROS is appropriate for large runtime systems and large development processes.
The Computation Graph is the peer-to-peer network of ROS processes that are processing
data together. The basic Computation Graph concepts of ROS are nodes, Master, Parameter
Server, messages, services, topics, and bags, all of which provide data to the Graph in different
ways.
• Nodes: nodes are processes that perform computation. ROS is designed to be modular
at a fine-grained scale; a robot control system usually comprises many nodes. Take
autonomous vehicles as an example, one node controls a laser range-finder, one node
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 25
controls the wheel motors, one node performs localization, one node performs path
planning, one node provides a graphical view of the system, and so on. A ROS node
is written with the use of a ROS client library, such as roscpp or rospy.
• Master: the ROS Master provides name registration and lookup to the rest of the
Computation Graph. Without the Master, nodes would not be able to find each other,
exchange messages, or invoke services.
• Parameter Server: the parameter server allows data to be stored by key in a central
location. It is currently part of the Master.
• Messages: nodes communicate with each other by passing messages. A message is
simply a data structure, comprising typed fields. Standard primitive types (integer,
floating-point, boolean, etc.) are supported, as are arrays of primitive types. Messages
can include arbitrarily nested structures and arrays (much like C structs).
• Topics: messages are routed via a transport system with publish-subscribe semantics.
A node sends out a message by publishing it to a given topic. The topic is a name that
is used to identify the content of the message. A node that is interested in a certain
kind of data will subscribe to the appropriate topic. There may be multiple concurrent
publishers and subscribers for a single topic, and a single node may publish and sub-
scribe to multiple topics. In general, publishers and subscribers are not aware of each
others’ existence. The idea is to decouple the production of information from its con-
sumption. Logically, one can think of a topic as a strongly typed message bus. Each
bus has a name, and anyone can connect to the bus to send or receive messages as long
as they are the right type.
• Services: the publish-subscribe model is a very flexible communication paradigm, but
its many-to-many, one-way transport is not appropriate for request-reply interactions,
which are often required in a distributed system. Request-reply is done via services,
which are defined by a pair of message structures: one for the request and one for the
reply. A providing node offers a service under a name and a client uses the service
by sending the request message and awaiting the reply. ROS client libraries generally
present this interaction to the programmer as if it were a remote procedure call.
• Bags: bags are a format for saving and playing back ROS message data. Bags are an
important mechanism for storing data, such as sensor data, that can be difficult to
collect but is necessary for developing and testing algorithms.
The ROS Master acts as a name service in the ROS Computation Graph. It stores topics
and services registration information for ROS nodes. Nodes communicate with the Master to
report their registration information. As these nodes communicate with the Master, they can re-
ceive information about other registered nodes and make connections as appropriate. The Master
26 2. FPGA TECHNOLOGIES
ARM
ROS-compliant FPGA component on ARM-FPGA SoC
Publisher Publisher
Input
Subscriber
ROS
node
Output
Subscriber
FPGA
interface
FPGA
interface
FPGA
Applications
Figure 2.5: ROS-compliant FPGAs.
will also make callbacks to these nodes when this registration information changes, which allows
nodes to dynamically create connections as new nodes are run.
Nodes connect to other nodes directly; the Master only provides lookup information,
much like a domain name service (DNS) server. Nodes that subscribe to a topic will request
connections from nodes that publish that topic and will establish that connection over an agreed-
upon connection protocol. This architecture allows for decoupled operations, where the names
are the primary means by which larger and more complex systems can be built. Names have a
very important role in ROS: nodes, topics, services, and parameters all have names. Every ROS
client library supports command-line remapping of names, which means a compiled program
can be reconfigured at runtime to operate in a different Computation Graph topology.
2.3.2 ROS-COMPLIANT FPGAS
In order to integrate FPGAs into a ROS-based system, a ROS-compliant FPGA component
has been proposed [64, 65]. Integration of an FPGA into a robotic system requires equivalent
functionality to replace a software ROS component with a ROS-compliant FPGA component.
Therefore, each ROS message type and data format used in the ROS-compliant FPGA com-
ponent must be the same as that of the software ROS component. The ROS-compliant FPGA
component aims to improve its processing performance while satisfying the requirements.
Figure 2.5 shows the architecture of the ROS-compliant FPGA component model. Each
ROS-compliant FPGA component must implement the following four functions: Encapsula-
tion of FPGA circuits, Interface between ROS software and FPGA circuits, Subscribe interface
from a topic, and Publish interface to a topic. The ARM core is responsible for communicating
with and offloading workloads to the FPGA, whereas the FPGA part performs actual workload
acceleration. Note that there are two roles of software in the component. First, an interface pro-
2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 27
cess for input that subscribes to a topic to receive input data. The software component, which
runs on the ARM core, is responsible for formatting the data suitable for the FPGA processing
and sends the formatted data to the FPGA. Second, an interface process for output receives
processing results from the FPGA. The software component, which runs on the ARM core, is
responsible for reformatting the results suitable for the ROS system and publishes them to a
topic. Such a structure can realize a robot system in which software and hardware cooperate.
Note that the difference of ROS-compliant FPGA component from a ROS node written
in pure software is that processing contains hardware processing of an FPGA. Integration of
ROS-compliant FPGA component into a ROS system only requires connections to ROS nodes
through Publish/Subscribe messaging in ordinary ROS development style. The ROS-compliant
FPGA component provides easy integration of an FPGA by wrapping it with software.
To evaluate this design, the authors of [65] have implemented a hardwired image labeling
application on a ROS-compliant FPGA component on Xilinx Zynq-7020, and verifying that
this design performs 26 times faster than that of software with the ARM processor, and even
2.3 times faster than that of an Intel PC. Moreover, the end-to-end latency of the component is
1.7 times faster than that of processing with pure software. Therefore, the authors verify that the
ROS-compliant FPGA component achieves remarkable performance improvement, maintain-
ing high development productivity by cooperative processing of hardware and software. How-
ever, this also comes with a problem, as the authors found out that the communication of ROS
nodes is a major bottleneck of the execution time in the ROS-compliant FPGA component.
2.3.3 OPTIMIZING COMMUNICATION LATENCY FOR THE
ROS-COMPLIANT FPGAS
As indicated in the previous subsection, large communication latency between ROS components
is a severe problem and has been the bottleneck of offloading computing to FPGAs. The authors
in [66] aim to reduce the latency by implementing Publish/Subscribe messaging of ROS as
hardware. Based on the results of network packets analysis in the ROS system, the authors
propose a method of implementing a hardware ROS-compliant FPGA Component, which is
done by separating the registration part (XMLRPC) and data communication part (TCPROS)
of the Publish/Subscribe messaging.
To study ROS performance, the authors have compared the communication latency of
(1) PC-PC and (2) PC-ARM SoC. Two computer nodes are connected with each other through
a Gigabit Ethernet. The communication latency in (2) PC-ARM SoC environment is about four
times larger than (1) PC-PC. Therefore, the performance in embedded processor environments,
such as ARM processors, should be improved. Hence, the challenge for ROS-compliant FPGA
components is to reduce the large overhead in communication latency. If communication latency
is reduced, the ROS-compliant FPGA component can be used as an accelerator for processing
in robotic applications/systems.
28 2. FPGA TECHNOLOGIES
In order to implement Publish/Subscribe messaging of ROS as hardware, the authors
analyzed network packets that flowed in Publish/Subscribe messaging in the ROS system of
ordinary software. The authors utilized WireShark for network packet analysis [67] with the
basic ROS setup of one master, one publisher, and one subscriber node.
• STEP (1): the Publisher and Subscriber nodes register their nodes and topic informa-
tion to the Master node. The registration is done by calling methods like registerPub-
lisher, hasParam, and so on, using XMLRPC [68].
• STEP (2): the Master node notifies topic information to the Subscriber nodes by call-
ing publisherUpdate (XMLRPC).
• STEP (3): the Subscriber node sends a connection request to the Publisher node by
using requestTopic (XMLRPC).
• STEP (4): the Publisher node returns IP address and port number, TCP connection
information for data communication, as a response to the requestTopic (XMLRPC).
• STEP (5): the Subscriber node establishes a TCP connection by using the information
and sends connection header to the TCP connection. Connection header contains im-
portant metadata about a connection being established, including typing and routing
information, using TCPROS [69].
• STEP (6): if it is a successful connection, the Publisher node sends connection header
(TCPROS).
• STEP (7): data transmission repeats. This data is written with little endian and header
information (4 bytes) is added to the data (TCPROS).
After this analysis, the authors found out that network packets that flowed in Pub-
lish/Subscribe messaging in the ROS system can be categorized into two parts, that is, the
registration part and the data transmission part. The registration part uses XMLRPC (STEPS
(1)–(4)), while the data transmission part uses TCPROS (STEPS (5)–(7)), which is almost
raw data of TCP communication with very small overhead. In addition, once data transmission
(STEP (7)) starts, only data transmission repeats without STEPS (1)–(6).
Based on the network packet analysis, the authors modified the server ports, such that
those used in XMLRPC and TCPROS are assigned differently. In addition, a client TCP/IP
connection of XMLRPC for the Master node is necessary for the Publisher node. For the
Subscriber node, two client TCP/IP connections of XMLRPC and one client connection of
TCPROS are necessary. Therefore, two or three TCP ports are necessary to implement Pub-
lish/Subscribe messaging. It is a problem to implement ROS nodes using the hardware TCP/IP
stack.
2.4. SUMMARY 29
To optimize the communication performance on ROS-compliant FPGAs, the authors
proposed hardware publication and subscription services. Conventionally, publication or sub-
scription of topics was done by software in ROS. By implementing these nodes as hardwired
circuits, direct communication between the ROS nodes and the FPGA becomes not only pos-
sible but also highly efficient. In order to implement the hardware ROS nodes, the authors
designed the Subscriber hardware and the Publisher hardware separately: the Subscriber hard-
ware is responsible to subscribe to a topic of another ROS node and to receive ROS messages
from the topic; whereas the Publisher hardware is responsible to publish ROS messages to a
topic of another ROS node. With this hardware-based design, the evaluation results indicate
that the latency of the Hardware ROS-compliant FPGA component can be cut to half, from
1.0 ms to 0.5 ms, thus effectively improving the communication between the FPGA accelerator
and other software-based ROS nodes.
2.4 SUMMARY
In this chapter, we have provided a general introduction to FPGA technologies, especially run-
time partial reconfiguration, which allows multiple robotic workloads to time-share an FPGA
at runtime. We also have introduced existing research on enabling ROS on FPGAs, which pro-
vides infrastructure supports for various robotic workloads to run directly on FPGAs. However,
the ecosystem of robotic computing on FPGAs is still in its infancy. For instance, due to the
lack of high-level synthesis tools for robotic accelerator design, accelerating robotic workloads,
or part of a robotic workload, on FPGAs still require extensive manual efforts. To make the
matter worse, most robotic engineers do not have sufficient FPGA background to develop an
FPGA-based accelerator, whereas few FPGA engineers possess sufficient robotic background
to fully understand a robotic system. Hence, to fully exploit the benefits of FPGAs, advanced
design automation tools are imperative to bridge this knowledge gap.
Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu
31
C H A P T E R 3
Perception on FPGAs – Deep
Learning
Cameras are widely used in intelligent robot systems because of their lightweight and rich in-
formation for perception. Cameras can be used to complete a variety of basic tasks of intelligent
robots, such as visual odometry (VO), place recognition, object detection, and recognition. With
the development of convolutional neural networks (CNNs), we can reconstruct the depth and
pose with the absolute scale directly from a monocular camera, making monocular VO more
robust and efficient. And monocular VO methods, like Depth-VO-Feat [70], make robot sys-
tems much easier to deploy than stereo ones. Furthermore, although there are previous works
to design accelerators for robot applications, such as ESLAM [71], the accelerators can only be
used for specific applications with poor scalability.
In recent years, CNN has made great improvements on the place recognition for robotic
perception. The accuracy of the place recognition code from another CNN-based method,
GeM [72], is about 20% better than the handcrafted method, rootSIFT [73]. CNN is a general
framework, which can be applied to a variety of robotic applications. With the help of CNN,
the robots can also detect and distinguish objects from input images. In summary, CNNs greatly
enhance robots’ ability in localization, place recognition, and many other perception tasks.
CNNs have become the core component in various kinds of robots. However, since neural
networks (NNs) are computationally intensive, deep learning models are often the performance
bottleneck in robots. In this chapter, we delve into utilizing FPGAs to accelerate neural networks
in various robotic workloads.
Specifically, neural networks are widely adopted in regions like image, speech, and video
recognition. What’s more, deep learning has made significant progress in solving robotic per-
ception problems. But the high computation and storage complexity of neural network inference
poses great difficulty in its application. CPUs are hard to offer enough computational capacity.
GPUs are the first choice for the neural network process because of their high computational
capacity and easy-to-use development frameworks but suffer from energy inefficiency.
On the other hand, with specifically designed hardware, FPGAs are a potential candi-
date to surpass GPUs in performance and energy efficiency. Various FPGA-based accelerators
have been proposed with software and hardware optimization techniques to achieve high per-
formance and energy efficiency. In this chapter, we give an overview of previous work on neural
network inference accelerators based on FPGAs and summarize the main techniques used. An
32 3. PERCEPTION ON FPGAS – DEEP LEARNING
Table 3.1: Performance and resource utilization of state-of-the-art neural network accelerator
designs
AlexNet[74] VGG19[78] ResNet152[81] MobileNet[79]
Year 2012 2014 2016 2017 2017
# Param 60M 144M 57M 4.2M 2.36M
# Operation 1.4G 39G 22.6G 1.1G 0.27G
Top-1 Acc. 61.0% 74.5% 79.3% 70.6% 67.6%
investigation from software to hardware, from circuit level to system level, is carried out for
a complete analysis of FPGA-based deep learning accelerators and serves as a guide to future
work.
3.1 WHY CHOOSE FPGAS FOR DEEP LEARNING?
Recent research works on neural networks demonstrate great improvements over traditional al-
gorithms in machine learning. Various network models, like CNNs, recurrent neural networks
(RNNs), have been proposed for image, video, and speech processes. CNNs [74] improve the
top-5 image classification accuracy on ImageNet [75] dataset from 73.8–84.7% in 2012 and fur-
ther improve object detection [76] with its outstanding ability in feature extraction. RNNs [77]
achieve the state-of-the-art word error rate on speech recognition. In general, NNs feature a
high fitting ability to a wide range of pattern recognition problems. This ability makes NNs
promising candidates for many artificial intelligence applications.
But the computation and storage complexity of NN models are high. In Table 3.1, we list
the number of operations, number of parameters (add or multiplication), and top-1 accuracy on
ImageNet dataset [75] of state-of-the-art CNN models. Take CNNs as an example. The largest
CNN model for a 224  224 image classification requires up to 39 billion floating-point opera-
tions (FLOP) and more than 500 MB model parameters [78]. As the computational complexity
is proportional to the input image size, processing images with higher resolutions may need more
than 100 billion operations. Latest works like MobileNet [79] and ShuffleNet [80] are trying
to reduce the network size with advanced network structures, but with obvious accuracy loss.
The balance between the size of NN models and accuracy is still an open question today. In
some cases, the large model size hinders the application of NNs, especially in power-limited or
latency-critical scenarios.
Therefore, choosing a proper computation platform for neural-network-based applica-
tions is essential. A typical CPU can perform 10–100 GFLOP per second, and the power effi-
ciency is usually below 1 GOP/J. So CPUs are hard to meet the high-performance requirements
in cloud applications nor the low power requirements in mobile applications. In contrast, GPUs
Discovering Diverse Content Through
Random Scribd Documents
that what we see is a painting. At the same time, we are not
satisfied with an expression which several writers, we remark, have
lately used, and which Mr Ruskin very explicitly adopts. The
imitations of the landscape-painter are not a language which he
uses; they are not mere signs, analogous to those which the poet
or the orator employs. There is no analogy between them. Let us
analyse our impressions as we stand before the artist's landscape,
not thinking of the artist, or his dexterity, but simply absorbed in the
pleasure which he procures us—we do not find ourselves reverting,
in imagination, to other trees or other rivers than those he has
depicted. We certainly do not believe them to be real trees, but
neither are they mere signs, or a language to recall such objects;
but what there is of tree there we enjoy. There is the coolness and
the quiet of the shaded avenue, and we feel them; there is the
sunlight on that bank, and we feel its cheerfulness; we feel the
serenity of his river. He has brought the spirit of the trees around us;
the imagination rests in the picture. In other departments of art the
effect is the same. If we stand before a head of Rembrandt or
Vandyke, we do not think that it lives; but neither do we think of
some other head, of which that is the type. But there is majesty,
there is thought, there is calm repose, there is some phase of
humanity expressed before us, and we are occupied with so much of
human life, or human character, as is then and there given us.
Imitate as many qualities of the real object as you please, but
always the highest, never sacrificing a truth of the mind, or the
heart, for one only of the sense. Truth, as Mr Ruskin most justly says
—truth always. When it is said that truth should not be always
expressed, the maxim, if properly understood, resolves into this—
that the higher truth is not to be sacrificed to the lower. In a
landscape, the gradation of light and shade is a more important
truth than the exact brilliancy (supposing it to be attainable,) of any
individual object. The painter must calculate what means he has at
his disposal for representing this gradation of light, and he must
pitch his tone accordingly. Say he pitches it far below reality, he is
still in search of truth—of contrast and degree.
Sometimes it may happen that, by rendering one detail faithfully,
an artist may give a false impression, simply because he cannot
render other details or facts by which it is accompanied in nature.
Here, too, he would only sacrifice truth in the cause of truth. The
admirers of Constable will perhaps dispute the aptness of our
illustration. Nevertheless his works appear to us to afford a curious
example of a scrupulous accuracy or detail producing a false
impression. Constable, looking at foliage under the sunlight, and
noting that the leaf, especially after a shower, will reflect so much
light that the tree will seem more white than green, determined to
paint all the white he saw. Constable could paint white leaves. So far
so well. But then these leaves in nature are almost always in motion:
they are white at one moment and green the next. We never have
the impression of a white leaf; for it is seen playing with the light—
its mirror, for one instant, and glancing from it the next. Constable
could not paint motion. He could not imitate this shower of light in
the living tree. He must leave his white paint where he has once put
it. Other artists before him had seen the same light, but, knowing
that they could not bring the breeze into their canvass, they wisely
concluded that less white paint than Constable uses would produce a
more truthful impression.
But we must no longer be detained from the more immediate task
before us. We must now follow Mr Ruskin to his second volume of
Modern Painters, where he explains his theory of the beautiful; and
although this will not be to readers in general the most attractive
portion of his writings, and we ourselves have to practise some sort
of self-denial in fixing our attention upon it, yet manifestly it is here
that we must look for the basis or fundamental principles of all his
criticisms in art. The order in which his works have been published
was apparently deranged by a generous zeal, which could brook no
delay, to defend Mr Turner from the censures of the undiscerning
public. If the natural or systematic order had been preserved, the
materials of this second volume would have formed the first
preliminary treatise, determining those broad principles of taste, or
that philosophical theory of the beautiful, on which the whole of the
subsequent works were to be modelled. Perhaps this broken and
reversed order of publication has not been unfortunate for the
success of the author—perhaps it was dimly foreseen to be not
altogether impolitic; for the popular ear was gained by the bold and
enthusiastic defence of a great painter; and the ear of the public,
once caught, may be detained by matter which, in the first instance,
would have appealed to it in vain. Whether the effect of chance or
design, we may certainly congratulate Mr Ruskin on the fortunate
succession, and the fortunate rapidity with which his publications
have struck on the public ear. The popular feeling, won by the zeal
and intrepidity of the first volume of Modern Painters, was no doubt
a little tried by the graver discussions of the second. It was soon,
however, to be again caught, and pleased by a bold and agreeable
miscellany under the magical name of The Seven Lamps; and
these Seven Lamps could hardly fail to throw some portion of their
pleasant and bewildering light over a certain rudimentary treatise
upon building, which was to appear under the title of The Stones of
Venice.
We cannot, however, congratulate Mr Ruskin on the manner in
which he has acquitted himself in this arena of philosophical inquiry,
nor on the sort of theory of the Beautiful which he has contrived to
construct. The least metaphysical of our readers is aware that there
is a controversy of long standing upon this subject, between two
different schools of philosophy. With the one the beautiful is
described as a great idea of the reason, or an intellectual intuition,
or a simple intuitive perception; different expressions are made use
of, but all imply that it is a great primary feeling, or sentiment, or
idea of the human mind, and as incapable of further analysis as the
idea of space, or the simplest of our sensations. The rival school of
theorists maintain, on the contrary, that no sentiment yields more
readily to analysis; and that the beautiful, except in those rare cases
where the whole charm lies in one sensation, as in that of colour, is
a complex sentiment. They describe it as a pleasure resulting from
the presence of the visible object, but of which the visible object is
only in part the immediate cause. Of a great portion of the pleasure
it is merely the vehicle; and they say that blended reminiscences,
gathered from every sense, and every human affection, from the
softness of touch of an infant's finger to the highest contemplations
of a devotional spirit, have contributed, in their turn, to this
delightful sentiment.
Mr Ruskin was not bound to belong to either of these schools of
philosophy; he was at liberty to construct an eclectic system of his
own;—and he has done so. We shall take the precaution, in so
delicate a matter, of quoting Mr Ruskin's own words for the
exposition of his own theory. Meanwhile, as some clue to the reader,
we may venture to say that he agrees with the first of these schools
in adopting a primary intuitive sentiment of the beautiful; but then
this primary intuition is only of a sensational or animal nature—a
subordinate species of the beautiful, which is chiefly valuable as the
necessary condition of the higher and truly beautiful; and this last he
agrees with the opposite school in regarding as a derived sentiment
—derived by contemplating the objects of external nature as types
of the Divine attributes. This is a brief summary of the theory; for a
fuller exposition we shall have recourse to his own words.
The term Æsthetic, which has been applied to this branch of
philosophy, Mr Ruskin discards; he offers as a substitute Theoria, or
The Theoretic Faculty, the meaning of which he thus explains:—
I proceed, therefore, first to examine the nature of what I
have called the theoretic faculty, and to justify my substitution
of the term 'Theoretic' for 'Æsthetic,' which is the one commonly
employed with reference to it.
Now the term 'æsthesis' properly signifies mere sensual
perception of the outward qualities and necessary effects of
bodies; in which sense only, if we would arrive at any accurate
conclusions on this difficult subject, it should always be used.
But I wholly deny that the impressions of beauty are in any way
sensual;—they are neither sensual nor intellectual, but moral;
and for the faculty receiving them, whose difference from mere
perception I shall immediately endeavour to explain, no terms
can be more accurate or convenient than that employed by the
Greeks, 'Theoretic,' which I pray permission, therefore, always
to use, and to call the operation of the faculty itself, Theoria.—
(P. 11.)
We are introduced to a new faculty of the human mind; let us see
what new or especial sphere of operation is assigned to it. After
some remarks on the superiority of the mere sensual pleasures of
the eye and the ear, but particularly of the eye, to those derived
from other organs of sense, he continues:—
Herein, then, we find very sufficient ground for the higher
estimation of these delights: first, in their being eternal and
inexhaustible; and, secondly, in their being evidently no meaner
instrument of life, but an object of life. Now, in whatever is an
object of life, in whatever may be infinitely and for itself desired,
we may be sure there is something of divine: for God will not
make anything an object of life to his creatures which does not
point to, or partake of himself,—[a bold assertion.] And so,
though we were to regard the pleasures of sight merely as the
highest of sensual pleasures, and though they were of rare
occurrence—and, when occurring, isolated and imperfect—there
would still be supernatural character about them, owing to their
self-sufficiency. But when, instead of being scattered,
interrupted, or chance-distributed, they are gathered together
and so arranged to enhance each other, as by chance they could
not be, there is caused by them, not only a feeling of strong
affection towards the object in which they exist, but a
perception of purpose and adaptation of it to our desires; a
perception, therefore, of the immediate operation of the
Intelligence which so formed us and so feeds us.
Out of what perception arise Joy, Admiration, and Gratitude?
Now, the mere animal consciousness of the pleasantness I
call Æsthesis; but the exulting, reverent, and grateful perception
of it I call Theoria. For this, and this only, is the full
comprehension and contemplation of the beautiful as a gift of
God; a gift not necessary to our being, but adding to and
elevating it, and twofold—first, of the desire; and, secondly, of
the thing desired.
We find, then, that in the production of the full sentiment of the
beautiful two faculties are employed, or two distinct operations
denoted. First, there is the animal pleasantness which we call
Æsthesis,—which sometimes appears confounded with the mere
pleasures of sense, but which the whole current of his speculations
obliges us to conclude is some separate intuition of a sensational
character; and, secondly, there is the exulting, reverent, and
grateful perception of it, which we call Theoria, which alone is the
truly beautiful, and which it is the function of the Theoretic Faculty
to reveal to us. But this new Theoretic Faculty—what can it be but
the old faculty of Human Reason, exercised upon the great subject
of Divine beneficence?
Mr Ruskin, as we shall see, discovers that external objects are
beautiful because they are types of Divine attributes; but he admits,
and is solicitous to impress upon our minds, that the meaning of
these types is learnt. When, in a subsequent part of his work, he
feels himself pressed by the objection that many celebrated artists,
who have shown a vivid appreciation and a great passion for the
beautiful, have manifested no peculiar piety, have been rather
deficient in spiritual-mindedness, he gives them over to that
instinctive sense he has called Æsthesis, and says—It will be
remembered that I have, throughout the examination of typical
beauty, asserted our instinctive sense of it; the moral meaning of it
being only discoverable by reflection, (p. 127.) Now, there is no
other conceivable manner in which the meaning of the type can be
learnt than by the usual exercise of the human reason, detecting
traces of the Divine power, and wisdom, and benevolence, in the
external world, and then associating with the various objects of the
external world the ideas we have thus acquired of the Divine wisdom
and goodness. The rapid and habitual regard of certain facts or
appearances in the visible world, as types of the attributes of God,
can be nothing else but one great instance (or class of instances) of
that law of association of ideas on which the second school of
philosophy we have alluded to so largely insist. And thus, whether
Mr Ruskin chooses to acquiesce in it or not, his Theoria resolves
itself into a portion, or fragment, of that theory of association of
ideas, to which he declares, and perhaps believes, himself to be
violently opposed.
In a very curious manner, therefore, has Mr Ruskin selected his
materials from the two rival schools of metaphysics. His Æsthesis is
an intuitive perception, but of a mere sensual or animal nature—
sometimes almost confounded with the mere pleasure of sense, at
other times advanced into considerable importance, as where he has
to explain the fact that men of very little piety have a very acute
perception of beauty. His Theoria is, and can be, nothing more than
the results of human reason in its highest and noblest exercise,
rapidly brought before the mind by a habitual association of ideas.
For the lowest element of the beautiful he runs to the school of
intuitions;—they will not thank him for the compliment;—for the
higher to that analytic school, and that theory of association of
ideas, to which throughout he is ostensibly opposed.
This Theoria divides itself into two parts. We shall quote Mr
Ruskin's own words and take care to quote from them passages
where he seems most solicitous to be accurate and explanatory:—
The first thing, then, we have to do, he says, is accurately
to discriminate and define those appearances from which we
are about to reason as belonging to beauty, properly so called,
and to clear the ground of all the confused ideas and erroneous
theories with which the misapprehension or metaphorical use of
the term has encumbered it.
By the term Beauty, then, properly are signified two things:
first, that external quality of bodies, already so often spoken of,
and which, whether it occur in a stone, flower, beast, or in man,
is absolutely identical—which, as I have already asserted, may
be shown to be in some sort typical of the Divine attributes, and
which, therefore, I shall, for distinction's sake, call Typical
Beauty; and, secondarily, the appearance of felicitous fulfilment
of functions in living things, more especially of the joyful and
right exertion of perfect life in man—and this kind of beauty I
shall call Vital Beauty.—(P. 26.)
The Vital Beauty, as well as the Typical, partakes essentially, as far
as we can understand our author, of a religious character. On turning
to that part of the volume where it is treated of at length, we find a
universal sympathy and spirit of kindliness very properly insisted on,
as one great element of the sentiment of beauty; but we are not
permitted to dwell upon this element, or rest upon it a moment,
without some reference to our relation to God. Even the animals
themselves seem to be turned into types for us of our moral feelings
or duties. We are expressly told that we cannot have this sympathy
with life and enjoyment in other creatures, unless it takes the form
of, or comes accompanied with, a sentiment of piety. In all cases
where the beautiful is anything higher than a certain animal
pleasantness, we are to understand that it has a religious character.
In all cases, he says, summing up the functions of the Theoretic
Faculty, it is something Divine; either the approving voice of God,
the glorious symbol of Him, the evidence of His kind presence, or
the obedience to His will by Him induced and supported,—(p. 126.)
Now it is a delicate task, when a man errs by the exaggeration of a
great truth or a noble sentiment, to combat his error; and yet as
much mischief may ultimately arise from an error of this description
as from any other. The thoughts and feelings which Mr Ruskin has
described, form the noblest part of our sentiment of the beautiful, as
they form the noblest phase of the human reason. But they are not
the whole of it. The visible object, to adopt his phraseology, does
become a type to the contemplative and pious mind of the attribute
of God, and is thus exalted to our apprehension. But it is not
beautiful solely or originally on this account. To assert this, is simply
to falsify our human nature.
Before, however, we enter into these types, or this typical beauty,
it will be well to notice how Mr Ruskin deals with previous and
opposing theories. It will be well also to remind our readers of the
outline of that theory of association of ideas which is here presented
to us in so very confused a manner. We shall then be better able to
understand the very curious position our author has taken up in this
domain of speculative philosophy.
Mr Ruskin gives us the following summary of the errors which he
thinks it necessary in the first place to clear from his path:—
Those erring or inconsistent positions which I would at once
dismiss are, the first, that the beautiful is the true; the second,
that the beautiful is the useful; the third, that it is dependent on
custom; and the fourth, that it is dependent on the association
of ideas.
The first of these theories, that the beautiful is the true, we leave
entirely to the tender mercies of Mr Ruskin; we cannot gather from
his refutation to what class of theorists he is alluding. The remaining
three are, as we understand the matter, substantially one and the
same theory. We believe that no one, in these days, would define
beauty as solely resulting either from the apprehension of Utility,
(that is, the adjustment of parts to a whole, or the application of the
object to an ulterior purpose,) or to Familiarity and the affection
which custom engenders; but they would regard both Utility and
Familiarity as amongst the sources of those agreeable ideas or
impressions, which, by the great law of association, became
intimately connected with the visible object. We must listen,
however, to Mr Ruskin's refutation of them:—
That the beautiful is the useful is an assertion evidently
based on that limited and false sense of the latter term which I
have already deprecated. As it is the most degrading and
dangerous supposition which can be advanced on the subject,
so, fortunately, it is the most palpably absurd. It is to confound
admiration with hunger, love with lust, and life with sensation; it
is to assert that the human creature has no ideas and no
feelings, except those ultimately referable to its brutal appetites.
It has not a single fact, nor appearance of fact, to support it,
and needs no combating—at least until its advocates have
obtained the consent of the majority of mankind that the most
beautiful productions of nature are seeds and roots; and of art,
spades and millstones.
Somewhat more rational grounds appear for the assertion
that the sense of the beautiful arises from familiarity with the
object, though even this could not long be maintained by a
thinking person. For all that can be alleged in defence of such a
supposition is, that familiarity deprives some objects which at
first appeared ugly of much of their repulsiveness; whence it is
as rational to conclude that familiarity is the cause of beauty, as
it would be to argue that, because it is possible to acquire a
taste for olives, therefore custom is the cause of lusciousness in
grapes....
I pass to the last and most weighty theory, that the
agreeableness in objects which we call beauty is the result of
the association with them of agreeable or interesting ideas.
Frequent has been the support and wide the acceptance of
this supposition, and yet I suppose that no two consecutive
sentences were ever written in defence of it, without involving
either a contradiction or a confusion of terms. Thus Alison,
'There are scenes undoubtedly more beautiful than Runnymede,
yet, to those who recollect the great event that passed there,
there is no scene perhaps which so strongly seizes on the
imagination,'—where we are wonder-struck at the bold
obtuseness which would prove the power of imagination by its
overcoming that very other power (of inherent beauty) whose
existence the arguer desires; for the only logical conclusion
which can possibly be drawn from the above sentence is, that
imagination is not the source of beauty—for, although no scene
seizes so strongly on the imagination, yet there are scenes
'more beautiful than Runnymede.' And though instances of self-
contradiction as laconic and complete as this are rare, yet, if the
arguments on the subject be fairly sifted from the mass of
confused language with which they are always encumbered,
they will be found invariably to fall into one of these two forms:
either association gives pleasure, and beauty gives pleasure,
therefore association is beauty; or the power of association is
stronger than the power of beauty, therefore the power of
association is the power of beauty.
Now this last sentence is sheer nonsense, and only proves that the
author had never given himself the trouble to understand the theory
he so flippantly discards. No one ever said that association gives
pleasure; but very many, and Mr Ruskin amongst the rest, have
said that associated thought adds its pleasure to an object pleasing
in itself, and thus increases the complex sentiment of beauty. That it
is a complex sentiment in all its higher forms, Mr Ruskin himself will
tell us. As to the manner in which he deals with Alison, it is in the
worst possible spirit of controversy. Alison was an elegant, but not a
very precise writer; it was the easiest thing in the world to select an
unfortunate illustration, and to convict that of absurdity. Yet he
might with equal ease have selected many other illustrations from
Alison, which would have done justice to the theory he expounds. A
hundred such will immediately occur to the reader. If, instead of a
historical recollection of this kind, which could hardly make the
stream itself of Runnymede look more beautiful, Alison had confined
himself to those impressions which the generality of mankind receive
from river scenery, he would have had no difficulty in showing (as
we believe he has elsewhere done) how, in this case, ideas gathered
from different sources flow into one harmonious and apparently
simple feeling. That sentiment of beauty which arises as we look
upon a river will be acknowledged by most persons to be composed
of many associated thoughts, combining with the object before
them. Its form and colour, its bright surface and its green banks, are
all that the eye immediately gives us; but with these are combined
the remembered coolness of the fluent stream, and of the breeze
above it, and of the pleasant shade of its banks; and beside all this—
as there are few persons who have not escaped with delight from
town or village, to wander by the quiet banks of some neighbouring
stream, so there are few persons who do not associate with river
scenery ideas of peace and serenity. Now many of these thoughts or
facts are such as the eye does not take cognisance of, yet they
present themselves as instantaneously as the visible form, and so
blended as to seem, for the moment, to belong to it.
Why not have selected some such illustration as this, instead of
the unfortunate Runnymede, from a work where so many abound as
apt as they are elegantly expressed? As to Mr Ruskin's utilitarian
philosopher, it is a fabulous creature—no such being exists. Nor need
we detain ourselves with the quite departmental subject of
Familiarity. But let us endeavour—without desiring to pledge
ourselves or our readers to its final adoption—to relieve the theory
of association of ideas from the obscurity our author has thrown
around it. Our readers will not find that this is altogether a wasted
labour.
With Mr Ruskin we are of opinion that, in a discussion of this kind,
the term Beauty ought to be limited to the impression derived,
mediately or immediately, from the visible object. It would be
useless affectation to attempt to restrict the use of the word, in
general, to this application. We can have no objection to the term
Beautiful being applied to a piece of music, or to an eloquent
composition, prose or verse, or even to our moral feelings and heroic
actions; the word has received this general application, and there is,
at basis, a great deal in common between all these and the
sentiment of beauty attendant on the visible object. For music, or
sweet sounds, and poetry, and our moral feelings, have much to do
(through the law of association) with our sentiment of the Beautiful.
It is quite enough if, speaking of the subject of our analysis, we limit
it to those impressions, however originated, which attend upon the
visible object.
One preliminary word on this association of ideas. It is from its
very nature, and the nature of human life, of all degrees of intimacy
—from the casual suggestion, or the case where the two ideas are at
all times felt to be distinct, to those close combinations where the
two ideas have apparently coalesced into one, or require an
attentive analysis to separate them. You see a mass of iron; you may
be said to see its weight, the impression of its weight is so intimately
combined with its form. The light of the sun, and the heat of the sun
are learnt from different senses, yet we never see the one without
thinking of the other, and the reflection of the sunbeam seen upon a
bank immediately suggests the idea of warmth. But it is not
necessary that the combination should be always so perfect as in
this instance, in order to produce the effect we speak of under the
name of Association of Ideas. It is hardly possible for us to abstract
the glow of the sunbeam from its light; but the fertility which follows
upon the presence of the sun, though a suggestion which habitually
occurs to reflective minds, is an association of a far less intimate
nature. It is sufficiently intimate, however, to blend with that feeling
of admiration we have when we speak of the beauty of the sun.
There is the golden harvest in its summer beams. Again, the
contemplative spirit in all ages has formed an association between
the sun and the Deity, whether as the fittest symbol of God, or as
being His greatest gift to man. Here we have an association still
more refined, and of a somewhat less frequent character, but one
which will be found to enter, in a very subtle manner, into that
impression we receive from the great luminary.
And thus it is that, in different minds, the same materials of
thought may be combined in a closer or laxer relationship. This
should be borne in mind by the candid inquirer. That in many
instances ideas from different sources do coalesce, in the manner
we have been describing, he cannot for an instant doubt. He seems
to see the coolness of that river; he seems to see the warmth on
that sunny bank. In many instances, however, he must make
allowance for the different habitudes of life. The same illustration will
not always have the same force to all men. Those who have
cultivated their minds by different pursuits, or lived amongst scenery
of a different character, cannot have formed exactly the same moral
association with external nature.
These preliminaries being adjusted, what, we ask, is that first
original charm of the visible object which serves as the foundation
for this wonderful superstructure of the Beautiful, to which almost
every department of feeling and of thought will be found to bring its
contribution? What is it so pleasurable that the eye at once receives
from the external world, that round it should have gathered all these
tributary pleasures? Light—colour—form; but, in reference to our
discussion, pre-eminently the exquisite pleasure derived from the
sense of light, pure or coloured. Colour, from infancy to old age, is
one original, universal, perpetual source of delight, the first and
constant element of the Beautiful.
We are far from thinking that the eye does not at once take
cognisance of form as well as colour. Some ingenious analysts have
supposed that the sensation of colour is, in its origin, a mere mental
affection, having no reference to space or external objects, and that
it obtains this reference through the contemporaneous acquisition of
the sense of touch. But there can be no more reason for supposing
that the sense of touch informs us immediately of an external world
than that the sense of colour does. If we do not allow to all the
senses an intuitive reference to the external world, we shall get it
from none of them. Dr Brown, who paid particular attention to this
subject, and who was desirous to limit the first intimation of the
sense of sight to an abstract sensation of unlocalised colour, failed
entirely in his attempt to obtain from any other source the idea of
space or outness; Kant would have given him certain subjective
forms of the sensitive faculty, space and time. These he did not like:
he saw that, if he denied to the eye an immediate perception of the
external world, he must also deny it to the touch; he therefore
prayed in aid certain muscular sensations from which the idea of
resistance would be obtained. But it seems to us evident that not till
after we have acquired a knowledge of the external world can we
connect volition with muscular movement, and that, until that
connection is made, the muscular sensations stand in the same
predicament as other sensations, and could give him no aid in
solving his problem. We cannot go further into this matter at
present.[6]
The mere flash of light which follows the touch upon the
optic nerve represents itself as something without; nor was colour,
we imagine, ever felt, but under some form more or less distinct;
although in the human being the eye seems to depend on the touch
far more than in other animals, for its further instruction.
But although the eye is cognisant of form as well as colour, it is in
the sensation of colour that we must seek the primitive pleasure
derived from this organ. And probably the first reason why form
pleases is this, that the boundaries of form are also the lines of
contrast of colour. It is a general law of all sensation that, if it be
continued, our susceptibility to it declines. It was necessary that the
eye should be always open. Its susceptibility is sustained by the
perpetual contrast of colours. Whether the contrast is sudden, or
whether one hue shades gradually into another, we see here an
original and primary source of pleasure. A constant variety, in some
way produced, is essential to the maintenance of the pleasure
derived from colour.
It is not incumbent on us to inquire how far the beauty of form
may be traceable to the sensation of touch;—a very small portion of
it we suspect. In the human countenance, and in sculpture, the
beauty of form is almost resolvable into expression; though possibly
the soft and rounded outline may in some measure be associated
with the sense of smoothness to the touch. All that we are
concerned to show is, that there is here in colour, diffused as it is
over the whole world, and perpetually varied, a beauty at once
showered upon the visible object. We hear it said, if you resolve all
into association, where will you begin? You have but a circle of
feelings. If moral sentiment, for instance, be not itself the beautiful,
why should it become so by association. There must be something
else that is the beautiful, by association with which it passes for
such. We answer, that we do not resolve all into association; that we
have in this one gift of colour, shed so bountifully over the whole
world, an original beauty, a delight which makes the external object
pleasant and beloved; for how can we fail, in some sort, to love
what produces so much pleasure?
We are at a loss to understand how any one can speak with
disparagement of colour as a source of the beautiful. The sculptor
may, perhaps, by his peculiar education, grow comparatively
indifferent to it: we know not how this may be; but let any man, of
the most refined taste imaginable, think what he owes to this
source, when he walks out at evening, and sees the sun set
amongst the hills. The same concave sky, the same scene, so far as
its form is concerned, was there a few hours before, and saddened
him with its gloom; one leaden hue prevailed over all; and now in a
clear sky the sun is setting, and the hills are purple, and the clouds
are radiant with every colour that can be extracted from the
sunbeam. He can hardly believe that it is the same scene, or he the
same man. Here the grown-up man and the child stand always on
the same level. As to the infant, note how its eye feeds upon a
brilliant colour, or the living flame. If it had wings, it would assuredly
do as the moth does. And take the most untutored rustic, let him be
old, and dull, and stupid, yet, as long as the eye has vitality in it, will
he look up with long untiring gaze at this blue vault of the sky,
traversed by its glittering clouds, and pierced by the tall green trees
around him.
Is it any marvel now that round the visible object should associate
tributary feelings of pleasure? How many pleasing and tender
sentiments gather round the rose! Yet the rose is beautiful in itself.
It was beautiful to the child by its colour, its texture, its softly-
shaded leaf, and the contrast between the flower and the foliage.
Love, and poetry, and the tender regrets of advanced life, have
contributed a second dower of beauty. The rose is more to the youth
and to the old man than it was to the child; but still to the last they
both feel the pleasure of the child.
The more commonplace the illustration, the more suited it is to
our purpose. If any one will reflect on the many ideas that cluster
round this beautiful flower, he will not fail to see how numerous and
subtle may be the association formed with the visible object. Even
an idea painful in itself may, by way of contrast, serve to heighten
the pleasure of others with which it is associated. Here the thought
of decay and fragility, like a discord amongst harmonies, increases
our sentiment of tenderness. We express, we believe, the prevailing
taste when we say that there is nothing, in the shape of art, so
disagreeable and repulsive as artificial flowers. The waxen flower
may be an admirable imitation, but it is a detestable thing. This
partly results from the nature of the imitation; a vulgar deception is
often practised upon us: what is not a flower is intended to pass for
one. But it is owing still more, we think, to the contradiction that is
immediately afterwards felt between this preserved and imperishable
waxen flower, and the transitory and perishable rose. It is the nature
of the rose to bud, and blossom, and decay; it gives its beauty to
the breeze and to the shower; it is mortal; it is ours; it bears our
hopes, our loves, our regrets. This waxen substitute, that cannot
change or decay, is a contradiction and a disgust.
Amongst objects of man's contrivance, the sail seen upon the calm
waters of a lake or a river is universally felt to be beautiful. The form
is graceful, and the movement gentle, and its colour contrasts well
either with the shore or the water. But perhaps the chief element of
our pleasure is all association with human life, with peaceful
enjoyment—
This quiet sail is as a noiseless wing,
To waft me from distraction.
Or take one of the noblest objects in nature—the mountain. There
is no object except the sea and the sky that reflects to the sight
colours so beautiful, and in such masses. But colour, and form, and
magnitude, constitute but a part of the beauty or the sublimity of
the mountain. Not only do the clouds encircle or rest upon it, but
men have laid on it their grandest thoughts: we have associated
with it our moral fortitude, and all we understand of greatness or
elevation of mind; our phraseology seems half reflected from the
mountain. Still more, we have made it holy ground. Has not God
himself descended on the mountain? Are not the hills, once and for
ever, the unwalled temples of our earth? And still there is another
circumstance attendant upon mountain scenery, which adds a
solemnity of its own, and is a condition of the enjoyment of other
sources of the sublime—solitude. It seems to us that the feeling of
solitude almost always associates itself with mountain scenery. Mrs
Somerville, in the description which she gives or quotes, in her
Physical Geography, of the Himalayas, says—
The loftiest peaks being bare of snow gives great variety of
colour and beauty to the scenery, which in these passes is at all
times magnificent. During the day, the stupendous size of the
mountains, their interminable extent, the variety and the
sharpness of their forms, and, above all, the tender clearness of
their distant outline melting into the pale blue sky, contrasted
with the deep azure above, is described as a scene of wild and
wonderful beauty. At midnight, when myriads of stars sparkle in
the black sky, and the pure blue of the mountains looks deeper
still below the pale white gleam of the earth and snow-light, the
effect is of unparalleled sublimity, and no language can describe
the splendour of the sunbeams at daybreak, streaming between
the high peaks, and throwing their gigantic shadows on the
mountains below. There, far above the habitation of man, no
living thing exists, no sound is heard; the very echo of the
traveller's footsteps startles him in the awful solitude and silence
that reigns in those august dwellings of everlasting snow.
No one can fail to recognise the effect of the last circumstance
mentioned. Let those mountains be the scene of a gathering of any
human multitude, and they would be more desecrated than if their
peaks had been levelled to the ground. We have also quoted this
description to show how large a share colour takes in beautifying
such a scene. Colour, either in large fields of it, or in sharp contrasts,
or in gradual shading—the play of light, in short, upon this world—is
the first element of beauty.
Here would be the place, were we writing a formal treatise upon
this subject, after showing that there is in the sense of sight itself a
sufficient elementary beauty, whereto other pleasurable
reminiscences may attach themselves, to point out some of these
tributaries. Each sense—the touch, the ear, the smell, the taste—
blend their several remembered pleasures with the object of vision.
Even taste, we say, although Mr Ruskin will scorn the gross alliance.
And we would allude to the fact to show the extreme subtilty of
these mental processes. The fruit which you think of eating has lost
its beauty from that moment—it assumes to you a quite different
relation; but the reminiscence that there is sweetness in the peach
or the grape, whilst it remains quite subordinate to the pleasure
derived from the sense of sight, mingles with and increases that
pleasure. Whilst the cluster of ripe grapes is looked at only for its
beauty, the idea that they are pleasant to the taste as well steals in
unobserved, and adds to the complex sentiment. If this idea grow
distinct and prominent, the beauty of the grape is gone—you eat it.
Here, too, would be the place to take notice of such sources of
pleasure as are derived from adaptation of parts, or the adaptation
of the whole to ulterior purposes; but here especially should we
insist on human affections, human loves, human sympathies. Here,
in the heart of man, his hopes, his regrets, his affections, do we find
the great source of the beautiful—tributaries which take their name
from the stream they join, but which often form the main current.
On that sympathy with which nature has so wonderfully endowed
us, which makes the pain and pleasure of all other living things our
own pain and pleasure, which binds us not only to our fellow-men,
but to every moving creature on the face of the earth, we should
have much to say. How much, for instance, does its life add to the
beauty of the swan!—how much more its calm and placid life! Here,
and on what would follow on the still more exalted mood of pious
contemplation—when all nature seems as a hymn or song of praise
to the Creator—we should be happy to borrow aid from Mr Ruskin;
his essay supplying admirable materials for certain chapters in a
treatise on the beautiful which should embrace the whole subject.
No such treatise, however, is it our object to compose. We have
said enough to show the true nature of that theory of association, as
a branch of which alone is it possible to take any intelligible view of
Mr Ruskin's Theoria, or Theoretic Faculty. His flagrant error is, that
he will represent a part for the whole, and will distort and confuse
everything for the sake of this representation. Viewed in their proper
limitation, his remarks are often such as every wise and good man
will approve of. Here and there too, there are shrewd intimations
which the psychological student may profit by. He has pointed out
several instances where the associations insisted upon by writers of
the school of Alison have nothing whatever to do with the sentiment
of beauty; and neither harmonise with, nor exalt it. Not all that may,
in any way, interest us in an object, adds to its beauty. Thus, as Mr
Ruskin we think very justly says, where we are told that the leaves
of a plant are occupied in decomposing carbonic acid, and preparing
oxygen for us, we begin to look upon it with some such indifference
as upon a gasometer. It has become a machine; some of our sense
of its happiness is gone; its emanation of inherent life is no longer
pure. The knowledge of the anatomical structure of the limb is very
interesting, but it adds nothing to the beauty of its outline. Scientific
associations, however, of this kind, will have a different æsthetic
effect, according to the degree or the enthusiasm with which the
science has been studied.
It is not our business to advocate this theory of association of
ideas, but briefly to expound it. But we may remark that those who
adopt (as Mr Ruskin has done in one branch of his subject—his
Æsthesis) the rival theory of an intuitive perception of the beautiful,
must find a difficulty where to insert this intuitive perception. The
beauty of any one object is generally composed of several qualities
and accessories—to which of these are we to connect this intuition?
And if to the whole assemblage of them, then, as each of these
qualities has been shown by its own virtue to administer to the
general effect, we shall be explaining again by this new perception
what has been already explained. Select any notorious instance of
the beautiful—say the swan. How many qualities and accessories
immediately occur to us as intimately blended in our minds with the
form and white plumage of the bird! What were its arched neck and
mantling wings if it were not living? And how the calm and
inoffensive, and somewhat majestic life it leads, carries away our
sympathies! Added to which, the snow-white form of the swan is
imaged in clear waters, and is relieved by green foliage; and if the
bird makes the river more beautiful, the river, in return, reflects its
serenity and peacefulness upon the bird. Now all this we seem to
see as we look upon the swan. To which of these facts separately
will you attach this new intuition? And if you wait till all are
assembled, the bird is already beautiful.
We are all in the habit of reasoning on the beautiful, of defending
our own tastes, and this just in proportion as the beauty in question
is of a high order. And why do we do this? Because, just in
proportion as the beauty is of an elevated character, does it depend
on some moral association. Every argument of this kind will be found
to consist of an analysis of the sentiment. Nor is there anything
derogatory, as some have supposed, in this analysis of the
sentiment; for we learn from it, at every step, that in the same
degree as men become more refined, more humane, more kind,
equitable, and pious, will the visible world become more richly clad
with beauty. We see here an admirable arrangement, whereby the
external world grows in beauty, as men grow in goodness.
We must now follow Mr Ruskin a step farther into the
development of his Theoria. All beauty, he tell us, is such, in its high
and only true character, because it is a type of one or more of God's
attributes. This, as we have shown, is to represent one class of
associated thought as absorbing and displacing all the rest. We
protest against this egregious exaggeration of a great and sacred
source of our emotions. With Mr Ruskin's own piety we can have no
quarrel; but we enter a firm and calm protest against a falsification
of our human nature, in obedience to one sentiment, however
sublime. No good can come of it—no good, we mean, to religion
itself. It is substantially the same error, though assuming a very
different garb, which the Puritans committed. They disgusted men
with religion, by introducing it into every law and custom, and detail
of human life. Mr Ruskin would commit the same error in the
department of taste, over which he would rule so despotically: he is
not content that the highest beauty shall be religious; he will permit
nothing to be beautiful, except as it partakes of a religious character.
But there is a vast region lying between the animal pleasantness of
his Æsthesis and the pious contemplation of his Theoria. There is
much between the human animal and the saint; there are the
domestic affections and the love they spring from, and hopes, and
regrets, and aspirations, and the hour of peace and the hour of
repose—in short, there is human life. From all human life, as we
have seen, come contributions to the sentiment of the beautiful,
quite as distinctly traced as the peculiar class on which Mr Ruskin
insists.
If any one descanting upon music should affirm, that, in the first
place, there was a certain animal pleasantness in harmony or
melody, or both, but that the real essence of music, that by which it
truly becomes music, was the perception in harmony or melody of
types of the Divine attributes, he would reason exactly in the same
manner on music as Mr Ruskin does on beauty. Nevertheless,
although sacred music is the highest, it is very plain that there is
other music than the sacred, and that all songs are not hymns.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Compiling Algorithms for Heterogeneous Systems Steven Bell
PDF
The Datacenter As A Computer Designing Warehouse Scale Machines Luiz André Ba...
PDF
Instant download Compiling Algorithms for Heterogeneous Systems Steven Bell p...
PDF
Compiling Algorithms for Heterogeneous Systems Steven Bell
PDF
Architecture and operating system support for virtual memory 1st Edition Abhi...
PDF
Compiling Algorithms for Heterogeneous Systems Steven Bell
PDF
Reconfigurable And Adaptive Computing Theory And Applications Nedjah
PDF
ADVANCED COMPUTER ARCHITECTURE PARALLELISM SCALABILITY PROGRAMMABILITY Baas ...
Compiling Algorithms for Heterogeneous Systems Steven Bell
The Datacenter As A Computer Designing Warehouse Scale Machines Luiz André Ba...
Instant download Compiling Algorithms for Heterogeneous Systems Steven Bell p...
Compiling Algorithms for Heterogeneous Systems Steven Bell
Architecture and operating system support for virtual memory 1st Edition Abhi...
Compiling Algorithms for Heterogeneous Systems Steven Bell
Reconfigurable And Adaptive Computing Theory And Applications Nedjah
ADVANCED COMPUTER ARCHITECTURE PARALLELISM SCALABILITY PROGRAMMABILITY Baas ...

Similar to Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu (20)

PDF
Computer Science And Engineeringtheory And Applications 1st Coll
PDF
Mauricio breteernitiz hpc-exascale-iscte
PDF
(eBook PDF) Computer Organization and Architecture10th Global Edition
DOC
Proposed-curricula-MCSEwithSyllabus_24_...
PDF
Computer Organisation and Architecture An Introduction 2nd Edition B.S. Chalk
PDF
Neuromorphic Computing Systems For Industry 40 22th Edition Dhanasekar S
PDF
Computer Architecture And System Design J Vaideeswaran
PDF
Computer architecture and organization
DOCX
INTRODUCTION TO EMBEDD.docx
PDF
Parallel Processing 1980 To 2020 Robert Kuhn David Padua
DOCX
INTRODUCTION TO EMBEDDED SYSTEMSA CYBER-PHYS.docx
PDF
Architecture Exploration of FPGA Based Accelerators for BioInformatics Applic...
PPTX
computer Architecture
PDF
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
PDF
PPT
Current Trends in HPC
PPTX
TWISummit 2019 - Return of Reconfigurable Computing
PDF
UNC Cause chris maher ibm High Performance Computing HPC
PDF
X Machines for Agent Based Modeling FLAME Perspectives 1st Edition Mariam Kiran
PPTX
Introduction to architecture exploration
Computer Science And Engineeringtheory And Applications 1st Coll
Mauricio breteernitiz hpc-exascale-iscte
(eBook PDF) Computer Organization and Architecture10th Global Edition
Proposed-curricula-MCSEwithSyllabus_24_...
Computer Organisation and Architecture An Introduction 2nd Edition B.S. Chalk
Neuromorphic Computing Systems For Industry 40 22th Edition Dhanasekar S
Computer Architecture And System Design J Vaideeswaran
Computer architecture and organization
INTRODUCTION TO EMBEDD.docx
Parallel Processing 1980 To 2020 Robert Kuhn David Padua
INTRODUCTION TO EMBEDDED SYSTEMSA CYBER-PHYS.docx
Architecture Exploration of FPGA Based Accelerators for BioInformatics Applic...
computer Architecture
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
Current Trends in HPC
TWISummit 2019 - Return of Reconfigurable Computing
UNC Cause chris maher ibm High Performance Computing HPC
X Machines for Agent Based Modeling FLAME Perspectives 1st Edition Mariam Kiran
Introduction to architecture exploration
Ad

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Cell Structure & Organelles in detailed.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Classroom Observation Tools for Teachers
PDF
Computing-Curriculum for Schools in Ghana
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Presentation on HIE in infants and its manifestations
PPTX
master seminar digital applications in india
PPTX
Institutional Correction lecture only . . .
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
VCE English Exam - Section C Student Revision Booklet
Cell Structure & Organelles in detailed.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Final Presentation General Medicine 03-08-2024.pptx
Cell Types and Its function , kingdom of life
Classroom Observation Tools for Teachers
Computing-Curriculum for Schools in Ghana
Chinmaya Tiranga quiz Grand Finale.pdf
Presentation on HIE in infants and its manifestations
master seminar digital applications in india
Institutional Correction lecture only . . .
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
2.FourierTransform-ShortQuestionswithAnswers.pdf
Ad

Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu

  • 1. Robotic Computing On Fpgas Synthesis Lectures On Distributed Computing Theory Shaoshan Liu download https://ebookbell.com/product/robotic-computing-on-fpgas- synthesis-lectures-on-distributed-computing-theory-shaoshan- liu-33377966 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Autonomous Robotic Systems Soft Computing And Hard Computing Methodologies And Applications 1st Edition J Mira https://ebookbell.com/product/autonomous-robotic-systems-soft- computing-and-hard-computing-methodologies-and-applications-1st- edition-j-mira-4189056 Human Communication Technology Internetofroboticthings And Ubiquitous Computing 1st Edition Anandan R https://ebookbell.com/product/human-communication-technology- internetofroboticthings-and-ubiquitous-computing-1st-edition- anandan-r-35849040 Soft Computing In Advanced Robotics 1st Edition Yongtae Kim https://ebookbell.com/product/soft-computing-in-advanced-robotics-1st- edition-yongtae-kim-4662850 Soft Computing For Intelligent Control And Mobile Robotics 1st Edition Ramn Zatarain https://ebookbell.com/product/soft-computing-for-intelligent-control- and-mobile-robotics-1st-edition-ramn-zatarain-4194888
  • 3. Geometric Computing With Clifford Algebras Theoretical Foundations And Applications In Computer Vision And Robotics Softcover Reprint Of Hardcover 1st Ed 2001 Editorgerald Sommer https://ebookbell.com/product/geometric-computing-with-clifford- algebras-theoretical-foundations-and-applications-in-computer-vision- and-robotics-softcover-reprint-of-hardcover-1st-ed-2001-editorgerald- sommer-54790386 Wavelets In Soft Computing World Scientific Series In Robotics And Intelligent Systems 25 Marc Thuillard https://ebookbell.com/product/wavelets-in-soft-computing-world- scientific-series-in-robotics-and-intelligent-systems-25-marc- thuillard-2170742 Aspects Of Soft Computing Intelligent Robotics And Control 1st Edition Endre Pap Auth https://ebookbell.com/product/aspects-of-soft-computing-intelligent- robotics-and-control-1st-edition-endre-pap-auth-4193848 Computational Surgery And Dual Training Computing Robotics And Imaging B L Bass https://ebookbell.com/product/computational-surgery-and-dual-training- computing-robotics-and-imaging-b-l-bass-4593808 Advances In Soft Computing Intelligent Robotics And Control 1st Edition Jnos Fodor https://ebookbell.com/product/advances-in-soft-computing-intelligent- robotics-and-control-1st-edition-jnos-fodor-4662878
  • 5. Series Editor: Natalie Enright Jerger, University of Toronto Robotic Computing on FPGAs Shaoshan Liu, PerceptIn Zishen Wan, Georgia Institute of Technology Bo Yu, PerceptIn Yu Wang, Tsinghua University This book provides a thorough overview of the state-of-the-art field-programmable gate array (FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized techniques.This book consists of ten chapters, delving into the details of how FPGAs have been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In addition to individual robotic tasks, this book provides detailed descriptions of how FPGAs have been used in robotic products, including commercial autonomous vehicles and space exploration robots. store.morganclaypool.com About SYNTHESIS This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis books provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. LIU • ET AL ROBOTIC COMPUTING ON FPGAS MORGAN & CLAYPOOL Synthesis Lectures on Computer Architecture Synthesis Lectures on Computer Architecture Series ISSN: 1935-3235 Natalie Enright Jerger, Series Editor
  • 8. Synthesis Lectures on Computer Architecture Editor Natalie Enright Jerger, University of Toronto Editor Emerita Margaret Martonosi, Princeton University Founding Editor Emeritus Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50- to 100-page books on topics pertaining to the science and art of designing, analyzing, selecting, and interconnecting hardware components to create computers that meet functional, performance, and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS. Robotic Computing on FPGAs Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang 2021 AI for Computer Architecture: Principles, Practice, and Prospects Lizhong Chen, Drew Penney, and Daniel Jiménez 2020 Deep Learning Systems: Algorithms, Compilers, and Processors for Large-Scale Production Andres Rodriguez 2020 Parallel Processing, 1980 to 2020 Robert Kuhn and David Padua 2020 Data Orchestration in Deep Learning Accelerators Tushar Krishna, Hyoukjun Kwon, Angshuman Parashar, Michael Pellauer, and Ananda Samajdar 2020 Efficient Processing of Deep Neural Networks Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer 2020
  • 9. iii Quantum Computer System: Research for Noisy Intermediate-Scale Quantum Computers Yongshan Ding and Frederic T. Chong 2020 A Primer on Memory Consistency and Cache Coherence, Second Edition Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David Wood 2020 Innovations in the Memory System Rajeev Balasubramonian 2019 Cache Replacement Policies Akanksha Jain and Calvin Lin 2019 The Datacenter as a Computer: Designing Warehouse-Scale Machines, Third Edition Luiz André Barroso, Urs Hölzle, and Parthasarathy Ranganathan 2018 Principles of Secure Processor Architecture Design Jakub Szefer 2018 General-Purpose Graphics Processor Architectures Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers 2018 Compiling Algorithms for Heterogenous Systems Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz 2018 Architectural and Operating System Support for Virtual Memory Abhishek Bhattacharjee and Daniel Lustig 2017 Deep Learning for Computer Architects Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks 2017 On-Chip Networks, Second Edition Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh 2017
  • 10. iv Space-Time Computing with Temporal Neural Networks James E. Smith 2017 Hardware and Software Support for Virtualization Edouard Bugnion, Jason Nieh, and Dan Tsafrir 2017 Datacenter Design and Management: A Computer Architect’s Perspective Benjamin C. Lee 2016 A Primer on Compression in the Memory Hierarchy Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood 2015 Research Infrastructures for Hardware Accelerators Yakun Sophia Shao and David Brooks 2015 Analyzing Analytics Rajesh Bordawekar, Bob Blainey, and Ruchir Puri 2015 Customizable Computing Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao 2015 Die-stacking Architecture Yuan Xie and Jishen Zhao 2015 Single-Instruction Multiple-Data Execution Christopher J. Hughes 2015 Power-Efficient Computer Architectures: Recent Advances Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras 2014 FPGA-Accelerated Simulation of Computer Systems Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe 2014 A Primer on Hardware Prefetching Babak Falsafi and Thomas F. Wenisch 2014
  • 11. v On-Chip Photonic Interconnects: A Computer Architect’s Perspective Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella 2013 Optimization and Mathematical Modeling in Computer Architecture Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and David Wood 2013 Security Basics for Computer Architects Ruby B. Lee 2013 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle 2013 Shared-Memory Synchronization Michael L. Scott 2013 Resilient Architecture Design for Voltage Variation Vijay Janapa Reddi and Meeta Sharma Gupta 2013 Multithreading Architecture Mario Nemirovsky and Dean M. Tullsen 2013 Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu 2012 Automatic Parallelization: An Overview of Fundamental Compiler Techniques Samuel P. Midkiff 2012 Phase Change Memory: From Devices to Systems Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran 2011 Multi-Core Cache Hierarchies Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar 2011
  • 12. vi A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood 2011 Dynamic Binary Modification: Tools, Techniques, and Applications Kim Hazelwood 2011 Quantum Computing for Computer Architects, Second Edition Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong 2011 High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities Dennis Abts and John Kim 2011 Processor Microarchitecture: An Implementation Perspective Antonio González, Fernando Latorre, and Grigorios Magklis 2010 Transactional Memory, Second Edition Tim Harris, James Larus, and Ravi Rajwar 2010 Computer Architecture Performance Evaluation Methods Lieven Eeckhout 2010 Introduction to Reconfigurable Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009 On-Chip Networks Natalie Enright Jerger and Li-Shiuan Peh 2009 The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It Bruce Jacob 2009 Fault Tolerant Computer Architecture Daniel J. Sorin 2009 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Luiz André Barroso and Urs Hölzle 2009
  • 13. vii Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, and James Laudon 2007 Transactional Memory James R. Larus and Ravi Rajwar 2006 Quantum Computing for Computer Architects Tzvetan S. Metodi and Frederic T. Chong 2006
  • 14. Copyright © 2021 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Robotic Computing on FPGAs Shaoshan Liu, Zishen Wan, Bo Yu, and Yu Wang www.morganclaypool.com ISBN: 9781636391656 paperback ISBN: 9781636391663 ebook ISBN: 9781636391670 hardcover DOI 10.2200/S01101ED1V01Y202105CAC056 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Lecture #56 Series Editor: Natalie Enright Jerger, University of Toronto Editor Emerita: Margaret Martonosi, Princeton University Founding Editor Emeritus: Mark D. Hill, University of Wisconsin, Madison Series ISSN Print 1935-3235 Electronic 1935-3243
  • 15. Robotic Computing on FPGAs Shaoshan Liu PerceptIn Zishen Wan Georgia Institute of Technology Bo Yu PerceptIn Yu Wang Tsinghua University SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #56 C M & cLaypool Morgan publishers &
  • 16. ABSTRACT This book provides a thorough overview of the state-of-the-art field-programmable gate array (FPGA)-based robotic computing accelerator designs and summarizes their adopted optimized techniques. This book consists of ten chapters, delving into the details of how FPGAs have been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In addition to individual robotic tasks, this book provides detailed descriptions of how FPGAs have been used in robotic products, including commercial autonomous vehicles and space exploration robots. KEYWORDS robotics, FPGAs, autonomous machines, perception, localization, planning, con- trol, space exploration, deep learning
  • 17. xi Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Planning and Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 FPGAs in Robotic Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 The Deep Processing Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 An Introduction to FPGA Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Types of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.2 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 Commercial Applications of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Partial Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 What is Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.2 How to Use Partial Reconfiguration? . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.3 Achieving High Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.4 Real-World Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3 Robot Operating System (ROS) on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 Robot Operating System (ROS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 ROS-Compliant FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.3 Optimizing Communication Latency for the ROS-Compliant FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3 Perception on FPGAs – Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Why Choose FPGAs for Deep Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Preliminary: Deep Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
  • 18. xii 3.3 Design Methodology and Criteria. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 Hardware-Oriented Model Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Data Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Weight Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.5 Hardware Design: Efficient Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Computation Unit Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5.2 Loop Unrolling Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.6 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Perception on FPGAs – Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.1 Perception in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2 Stereo Vision in Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Local Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.4 Global Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5 Semi-Global Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.1 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.5.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.6 Efficient Large-Scale Stereo Matching on FPGAs . . . . . . . . . . . . . . . . . . . . . 63 4.6.1 ELAS Algorithm Framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.6.2 FPGA Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.7 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.7.1 Dataset and Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.7.2 Power and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Localization on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.1.2 Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.2 Algorithm Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
  • 19. xiii 5.3 Frontend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.2 Exploiting Task-Level Parallelisms . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.4 Backend FPGA Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.5.2 Resource Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.5.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6 Planning on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1 Motion Planning Context Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.1.1 Probabilistic Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.1.2 Rapidly Exploring Random Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6.2 Collision Detection on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.2.1 Motion Planning Compute Time Profiling . . . . . . . . . . . . . . . . . . . . . 94 6.2.2 General Purpose Processor-Based Solutions . . . . . . . . . . . . . . . . . . . . 95 6.2.3 Specialized Hardware Accelerator-Based Solutions . . . . . . . . . . . . . . 97 6.2.4 Evaluation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.3 Graph Search on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 7 Multi-Robot Collaboration on FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.1 Multi-Robot Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.2 INCAME Framework for Multi-Task on FPGAs . . . . . . . . . . . . . . . . . . . . . 113 7.2.1 Hardware Resource Conflicts in ROS . . . . . . . . . . . . . . . . . . . . . . . . 113 7.2.2 Interruptible Accelerator with ROS (INCAME) . . . . . . . . . . . . . . . 115 7.3 Virtual Instruction-Based Accelerator Interrupt . . . . . . . . . . . . . . . . . . . . . . . 117 7.3.1 Instruction Driven Accelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 7.3.2 How to Interrupt: Virtual Instruction . . . . . . . . . . . . . . . . . . . . . . . . 119 7.3.3 Where to Interrupt: After SAVE/CALC_F . . . . . . . . . . . . . . . . . . . 121 7.3.4 Latency Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 7.3.5 Virtual Instruction ISA (VI-ISA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 7.3.6 Instruction Arrangement Unit (IAU) . . . . . . . . . . . . . . . . . . . . . . . . 125 7.3.7 Example of Virtual Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 7.4 Evaluation and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 7.4.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
  • 20. xiv 7.4.2 Virtual Instruction-Based Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.4.3 ROS-Based MR-Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 8 Autonomous Vehicles Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.1 The PerceptIn Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.2 Design Constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.2.1 Overview of the Vehicle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.2.2 Performance Requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.2.3 Energy and Cost Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 8.3 Software Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.4 On Vehicle Processing System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.4.1 Hardware Design Space Exploration . . . . . . . . . . . . . . . . . . . . . . . . . 140 8.4.2 Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 8.4.3 Sensor Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 8.4.4 Performance Characterizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 9 Space Robots Powered by FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.1 Radiation Tolerance for Space Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.2 Space Robotic Algorithm Acceleration on FPGAs . . . . . . . . . . . . . . . . . . . . 151 9.2.1 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 9.2.2 Stereo Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.2.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 9.3 Utilization of FPGAs in Space Robotic Missions . . . . . . . . . . . . . . . . . . . . . 154 9.3.1 Mars Exploration Rover Missions . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 9.3.2 Mars Science Laboratory Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 9.3.3 Mars 2020 Mission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 10.1 What we Have Covered in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 10.2 Looking Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
  • 21. xv Preface In this book, we provide a thorough overview of the state-of-the-art FPGA-based robotic com- puting accelerator designs and summarize their adopted optimized techniques. The authors combined have over 40 years of research experiences of utilizing FPGAs in robotic applications, both in academic research and commercial deployments. For instance, the authors have demon- strated that, by co-designing both the software and hardware, FPGAs can achieve more than 10× better performance and energy efficiency compared to the CPU and GPU implementations. The authors have also pioneered the utilization of the partial reconfiguration methodology in FPGA implementations to further improve the design flexibility and reduce the overhead. In addition, the authors have successfully developed and shipped commercial robotic products powered by FPGAs and the authors demonstrate that FPGAs have excellent potential and are promising candidates for robotic computing acceleration due to its high reliability, adaptability, and power efficiency. The authors believe that FPGAs are the best compute substrate for robotic applications for several reasons. First, robotic algorithms are still evolving rapidly, and thus any ASIC-based accelerators will be months or even years behind the state-of-the-art algorithms. On the other hand, FPGAs can be dynamically updated as needed. Second, robotic workloads are highly di- verse, thus it is difficult for any ASIC-based robotic computing accelerator to reach economies of scale in the near future. On the other hand, FPGAs are a cost effective and energy-effective alternative before one type of accelerator reaches economies of scale. Third, compared to sys- tems on a chip (SoCs) that have reached economies of scale, e.g., mobile SoCs, FPGAs deliver a significant performance advantage. Fourth, partial reconfiguration allows multiple robotic work- loads to time-share an FPGA, thus allowing one chip to serve multiple applications, leading to overall cost and energy reduction. Specifically, FPGAs require little power and are often built into small systems with less memory. They have the ability of massively parallel computations and to make use of the prop- erties of perception (e.g., stereo matching), localization (e.g., simultaneous localization and mapping (SLAM)), and planning (e.g., graph search) kernels to remove additional logic so as to simplify the end-to-end system implementation. Taking into account hardware charac- teristics, several algorithms are proposed which can be run in a hardware-friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet real-time require- ments while achieving high energy efficiency compared to central processing units (CPUs) and graphics processing units (GPUs). In addition, unlike the application-specific integrated circuit (ASIC) counterparts, FPGA technologies provide the flexibility of on-site programming and re-programming without going through re-fabrication with a modified design. Partial Recon-
  • 22. xvi PREFACE figuration (PR) takes this flexibility one step further, allowing the modification of an operating FPGA design by loading a partial configuration file. Using PR, part of the FPGA can be recon- figured at runtime without compromising the integrity of the applications running on those parts of the device that are not being reconfigured. As a result, PR can allow different robotic applica- tions to time-share part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable computing platform for dynamic and complex robotic workloads. Due to the advantages over other compute substrates, FPGAs have been successfully utilized in commercial autonomous vehicles as well as in space robotic applications, for FPGAs offer unprecedented flexibility and significantly reduced the design cycle and development cost. This book consists of ten chapters, providing a thorough overview of how FPGAs have been utilized in robotic perception, localization, planning, and multi-robot collaboration tasks. In addition to individual robotic tasks, we provide detailed descriptions of how FPGAs have been used in robotic products, including commercial autonomous vehicles and space exploration robots. Shaoshan Liu June 2021
  • 23. 1 C H A P T E R 1 Introduction and Overview The last decade has seen significant progress in the development of robotics, spanning from al- gorithms, mechanics to hardware platforms. Various robotic systems, like manipulators, legged robots, unmanned aerial vehicles, and self-driving cars have been designed for search and res- cue [1, 2], exploration [3, 4], package delivery [5], entertainment [6, 7], and more applications and scenarios. These robots are on the rise of demonstrating their full potential. Take drones, a type of aerial robot, as an example. The number of drones has grown by 2.83x between 2015 and 2019 based on the U.S. Federal Aviation Administration (FAA) report [8]. The registered number reached 1.32 million in 2019, and the FFA expects this number will grow to 1.59 billion by 2024. However, robotic systems are very complex [9–12]. They tightly integrate many tech- nologies and algorithms, including sensing, perception, mapping, localization, decision making, control, etc. This complexity poses many challenges for the design of robotic edge computing systems [13, 14]. On the one hand, robotic systems need to process an enormous amount of data in real-time. The incoming data often comes from multiple sensors, which is highly het- erogeneous and requires accurate spatial and temporal synchronization and pre-processing [15]. However, the robotic system usually has limited on-board resources, such as memory storage, bandwidth, and compute capabilities, making it hard to meet the real-time requirements. On the other hand, the current state-of-the-art robotic system usually has strict power constraints on the edge that cannot support the amount of computation required for performing tasks, such as 3D sensing, localization, navigation, and path planning. Therefore, the computation and stor- age complexity, as well as real-time and power constraints of the robotic system, hinder its wide application in latency-critical or power-limited scenarios [16]. Therefore, it is essential to choose a proper compute platform for robotic systems. CPUs and GPUs are two widely used commercial compute platforms. CPUs are designed to handle a wide range of tasks quickly and are often used to develop novel algorithms. A typical CPU can achieve 10–100 GFLOPS with below 1 GOP/J power efficiency [17]. In contrast, GPUs are designed with thousands of processor cores running simultaneously, which enable massive parallelism. A typical GPU can perform up to 10 TOPS performance and become a good can- didate for high-performance scenarios. Recently, benefiting in part from the better accessibility provided by CUDA/OpenCL, GPUs have been predominantly used in many robotic applica- tions. However, conventional CPUs and GPUs usually consume 10–100 W of power, which are orders of magnitude higher than what is available on the resource-limited robotic system.
  • 24. 2 1. INTRODUCTION AND OVERVIEW Besides CPUs and GPUs, FPGAs are attracting attention and becoming compute sub- strate candidates to achieve energy-efficient robotic task processing. FPGAs require low power and are often built into small systems with less memory. They have the ability to process mas- sively parallel computations and to make use of the properties of perception (e.g., stereo match- ing), localization (e.g., SLAM), and planning (e.g., graph search) kernels to remove additional logic and simplify the implementation. Taking into account hardware characteristics, researchers and engineers have proposed several algorithms that can be run in a hardware-friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet real-time require- ments while achieving high energy efficiency compared to CPUs and GPUs. Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re-programming without going through re-fabrication with a modified design. Partial Recon- figuration (PR) takes this flexibility one step further, allowing the modification of an operating FPGA design by loading a partial configuration file. By using PR, part of the FPGA can be reconfigured at runtime without compromising the integrity of the applications running on the parts of the device that are not being reconfigured. As a result, PR can allow different robotic applications to time-share part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable computing platform for dynamic and complex robotic workloads. Note that robotics is not one technology but rather an integration of many technologies. As shown in Fig. 1.1, the stack of the robotic system consists of three major components: ap- plication workloads, including sensing, perception, localization, motion planning, and control; a software edge subsystem, including operating system and runtime layer; and computing hard- ware, including both micro-controllers and companion computers [16, 18, 19]. We focus on the robotic application workloads in this chapter. The application subsystem contains multiple algorithms that are used by the robot to extract meaningful information from raw sensor data to understand the environment and dynamically make decisions about its actions. 1.1 SENSING The sensing stage is responsible for extracting meaningful information from the sensor raw data. To enable intelligent actions and improve reliability, the robot platform usually supports a wide range of sensors. The number and type of sensors are heavily dependent on the specifications of the workload and the capability of the onboard compute platform. The sensors can include the following: Cameras. Cameras are usually used for object recognition and object tracking, such as lane detection in autonomous vehicles and obstacle detection in drones, etc. RGB-D camera can also be utilized to determine object distances and positions. Take autonomous vehicles as an example, the current system usually mounts eight or more 1080p cameras around the vehicle to detect, recognize and track objects in different directions, which can greatly improve safety.
  • 25. 1.1. SENSING 3 Usually, these cameras run at 60 Hz, which will process about multiple gigabytes of raw data per second when combined. GNSS/IMU. The global navigation satellite system (GNSS) and inertial measurement unit (IMU) system help the robot localize itself by reporting both inertial updates and an es- timate of the global location at a high rate. Different robots have different requirements for localization sensing. For instance, 10 Hz may be enough for a low-speed mobile robot, but high-speed autonomous vehicles usually demand 30 Hz or higher for localization, and high- speed drones may need 100 Hz or more for localization, thus we are facing a wide spectrum of sensing speeds. Fortunately, different sensors have their advantages and drawbacks. GNSS can enable fairly accurate localization, while it runs at only 10 Hz, thus unable to provide real-time updates. By contrast, both accelerometer and gyroscope in IMU can run at 100–200 Hz, which can satisfy the real-time requirement. However, IMU suffers bias wandering over time or per- turbation by some thermo-mechanical noise, which may lead to an accuracy degradation in the position estimates. By combining GNSS and IMU, we can get accurate and real-time updates for robots. LiDAR. Light detection and ranging (LiDAR) is used for evaluating distance by illu- minating the obstacles with laser light and measuring the reflection time. These pulses, along with other recorded data, can generate precise and three-dimensional information about the surrounding characteristics. LiDAR plays an important role in localization, obstacle detection, and avoidance. As indicated in [20], the choice of sensors dictates the algorithm and hardware design. Take autonomous driving as an instance, almost all the autonomous vehicle companies use LiDAR at the core of their technologies. Examples include Uber, Waymo, and Baidu. Per- ceptIn and Tesla are among the very few that do not use LiDAR and, instead, rely on cameras and vision-based systems. In particular, PerceptIn’s data demonstrated that for the low-speed autonomous driving scenario, LiDAR processing is slower than camera-based vision processing, but increases the power consumption and cost. Radar and Sonar. The Radio Detection and Ranging (Radar) and Sound Navigation and Ranging (Sonar) system is used to determine the distance and speed to a certain object, which usually serves as the last line of defense to avoid obstacles. Take autonomous vehicles as an ex- ample, a danger of collision may occur when near obstacles are detected, then the vehicle will apply brakes or turn to avoid obstacles. Compared to LiDAR, the Radar and Sonar system is cheaper and smaller, and their raw data is usually fed to the control processor directly with- out going through the main compute pipeline, which can be used to implement some urgent functions as swerving or applying the brakes. One key problem we have observed with commercial CPUs, GPUs, or mobile SoCs is the lack of built-in multi-sensor processing supports, hence most of the multi-sensor processing has to be done in software, which could lead to problems such as time synchronization. On the other hand, FPGAs provide a rich sensor interface and enable most time-critical sensor
  • 26. 4 1. INTRODUCTION AND OVERVIEW GPS/IMU LiDAR Camera Sensing Perception Decision Path Planning Action Prediction Obstacle Avoidance Object Detection Object Tracking Mapping Localization Radar/Sonar Feedback Control Operating System Hardware Platform Figure 1.1: The stack of the robotic system. processing tasks to be done in hardware [21]. In Chapter 2, we introduce FPGA technologies, especially how FPGAs provide rich I/O blocks, which can be configured for heterogeneous sensor processing. 1.2 PERCEPTION The sensor data is then fed into the perception layer to sense the static and dynamic objects as well as build a reliable and detailed representation of the robot’s environment by using computer vision techniques (including deep learning). The perception layer is responsible for object detection, segmentation, and tracking. There are obstacles, lane dividers, and other objects to detect. Traditionally, a detection pipeline starts with image pre-processing, followed by a region of interest detector, and finally a classifier that outputs detected objects. In 2005, Dalal and Triggs [22] proposed an algorithm based on the histogram of orientation (HOG) and support vector machine (SVM) to model both the ap- pearance and shape of the object under various condition. The goal of segmentation is to give the robot a structured understanding of its environment. Semantic segmentation is usually for- mulated as a graph labeling problem with vertices of the graph being pixels or super-pixels. Inference algorithms on graphical models such as conditional random field (CRF) [23, 24] are used. The goal of tracking is to estimate the trajectory of moving obstacles. Tracking can be formulated as a sequential Bayesian filtering problem by recursively running the prediction step and correction step. Tracking can also be formulated by tracking-by-detection handling with
  • 27. 1.3. LOCALIZATION 5 Markovian decision process (MDP) [25], where an object detector is applied to consecutive frames and detected objects are linked across frames. In recent years, deep neural networks (DNNs), also known as deep learning, have greatly affected the field of computer vision and made significant progress in solving robot percep- tion problems. Most state-of-the-art algorithms now apply one type of neural network based on convolution operation. Fast R-CNN [26], Faster R-CNN [27], SSD [28], YOLO [29], and YOLO9000 [30] were used to get much better speed and accuracy in object detection. Most CNN-based semantic segmentation work is based on Fully Convolutional Networks (FCNs) [31], and there are some recent work in spatial pyramid pooling network [32] and pyra- mid scene parsing network (PSPNet) [33] to combine global image-level information with the locally extracted feature. By using auxiliary natural images, a stacked autoencoder model can be trained offline to learn generic image features and then applied for online object tracking [34]. In Chapter 3, we review the state-of-the-art neural network accelerator designs and demonstrate that with software-hardware co-design, FPGAs can achieve more than 10 times better speed and energy efficiency than the state-of-the-art GPUs. This verifies that FPGAs are a promising candidate for neural network acceleration. In Chapter 4, we review various stereo vi- sion algorithms in the robotic perception and their FPGA accelerator designs. We demonstrate that with careful algorithm-hardware co-design, FPGAs can achieve two orders of magnitude of higher energy efficiency and performance than the state-of-the-art GPUs and CPUs. 1.3 LOCALIZATION The localization layer is responsible for aggregating data from various sensors to locate the robot in the environment model. GNSS/IMU system is used for localization. The GNSS consist of several satellite systems, such as GPS, Galileo, and BeiDou, which can provide accurate localization results but with a slow update rate. In comparison, IMU can provide a fast update with less accurate rotation and acceleration results. A mathematical filter, such as Kalman Filter, can be used to combine the advantages of the two and minimize the localization error and latency. However, this sole system has some problems, such as the signal may bounce off obstacles, introduce more noise, and fail to work in closed environments. LiDAR and High-Definition (HD) maps are used for localization. LiDAR can generate point clouds and provide a shape description of the environment, while it is hard to differentiate individual points. HD map has a higher resolution compared to digital maps and makes the route familiar to the robot, where the key is to fuse different sensor information to minimize the errors in each grid cell. Once the HD map is built, a particle filter method can be applied to localize the robot in real-time correlated with LiDAR measurement. However, the LiDAR performance may be severely affected by weather conditions (e.g., rain, snow) and bring localization error. Cameras are used for localization as well. The pipeline of vision-based localization is sim- plified as follows: (1) by triangulating stereo image pairs, a disparity map is obtained and used
  • 28. 6 1. INTRODUCTION AND OVERVIEW to derive depth information for each point; (2) by matching salient features between successive stereo image frames in order to establish correlations between feature points in different frames, the motion between the past two frames is estimated; and (3) by comparing the salient features against those in the known map, the current position of the robot is derived [35]. Apart from these techniques, sensor fusion strategy is also often utilized to combine mul- tiple sensors for localization, which can improve the reliability and robustness of robot [36, 37]. In Chapter 5, we introduce a general-purpose localization framework that integrates key primitives in existing algorithms along with its implementation in FPGA. The FPGA-based localization framework retains high accuracy of individual algorithms, simplifies the software stack, and provides a desirable acceleration target. 1.4 PLANNING AND CONTROL The planning and control layer is responsible for generating trajectory plans and passing the control commands based on the original and destination of the robot. Broadly, prediction and routing modules are also included here, where their outputs are fed into downstream planning and control layers as input. The prediction module is responsible for predicting the future be- havior of surrounding objects identified by the perception layer. The routing module can be a lane-level routing based on lane segmentation of the HD maps for autonomous vehicles. Planning and control layers usually include behavioral decision, motion planning, and feedback control. The mission of the behavioral decision module is to make effective and safe decisions by leveraging all various input data sources. Bayesian models are becoming more and more popular and have been applied in recent works [38, 39]. Among the Bayesian mod- els, the Markov Decision Process (MDP) and Partially Observable Markov Decision Process (POMDP) are the widely applied methods in modeling robot behavior. The task of motion plan- ning is to generate a trajectory and send it to the feedback control for execution. The planned trajectory is usually specified and represented as a sequence of planned trajectory points, and each of these points contains attributes like location, time, speed, etc. Low-dimensional mo- tion planning problems can be solved with grid-based algorithms (such as Dijkstra [40] or A* [41]) or geometric algorithms. High-dimensional motion planning problems can be dealt with sampling-based algorithms, such as Rapidly exploring Random Tree (RRT) [42] and Prob- abilistic Roadmap (PRM) [43], which can avoid the problem of local minima. Reward-based algorithms, such as the Markov decision process (MDP), can also generate the optimal path by maximizing cumulative future rewards. The goal of feedback control is to track the difference between the actual pose and the pose on the predefined trajectory by continuous feedback. The most typical and widely used algorithm in robot feedback control is PID. While optimization-based approaches enjoy mainstream appeal in solving motion plan- ning and control problems, learning-based approaches [44–48] are becoming increasingly pop- ular with recent developments in artificial intelligence. Learning-based methods, such as rein- forcement learning, can naturally make full use of historical data and iteratively interact with the
  • 29. 1.5. FPGAS IN ROBOTIC APPLICATIONS 7 environment through actions to deal with complex scenarios. Some model the behavioral level decisions via reinforcement learning [46, 48], while other approaches directly work on motion planning trajectory output or even direct feedback control signals [45]. Q-learning [49], Actor- Critic learning [50], and policy gradient [43] are some popular algorithms in reinforcement learning. In Chapter 6, we introduce the motion planning modules of the robotics system, and compare several FPGA and ASIC accelerator designs in motion planning to analyze intrinsic design trade-offs. We demonstrate that with careful algorithm-hardware co-design, FPGAs can achieve three orders of magnitude than CPUs and two orders of magnitude than GPUs with much lower power consumption. This demonstrates that FPGAs can be a promising candidate for accelerating motion planning kernels. 1.5 FPGAS IN ROBOTIC APPLICATIONS Besides accelerating the basic modules in the robotic computing stack, FPGAs have been uti- lized in different robotic applications. In Chapter 7, we explore how FPGAs can be utilized in multi-robot exploration tasks. Specifically, we present an FPGA-based interruptible CNN accelerator and a deployment framework for multi-robot exploration. In Chapter 8, we provide a retrospective summary of PerceptIn’s efforts on developing on-vehicle computing systems for autonomous vehicles, especially how FPGAs are utilized to accelerate critical tasks in a full autonomous driving stack. For instance, localization is acceler- ated on an FPGA while depth estimation and object detection are accelerated by a GPU. This case study has demonstrated that FPGAs are capable of playing a crucial role in autonomous driving, and exploiting accelerator-level parallelism while taking into account constraints arising in different contexts could significantly improve on-vehicle processing. In Chapter 9, we explore how FPGAs have been utilized in space robotic applications in the past two decades. The properties of FPGAs make them good onboard processors for space missions, ones that have high reliability, adaptability, processing power, and energy efficiency. FPGAs may help us close the two-decade performance gap between commercial processors and space-grade ASICs when it comes to powering space exploration robots. 1.6 THE DEEP PROCESSING PIPELINE Different from other computing workloads, autonomous machines have a very deep processing pipeline with strong dependencies between different stages and a strict time-bound associated with each stage [51]. For instance, Fig. 1.2 presents an overview of the processing pipeline of an autonomous driving system. Starting from the left side, the system consumes raw sensing data from mmWave radars, LiDARs, cameras, and GNSS/IMUs, and each sensor produces raw data at a different frequency. The cameras capture images at 30 FPS and feed the raw data to the 2D Perception module, the LiDARs capture point clouds at 10 FPS and feed the raw data to the
  • 30. 8 1. INTRODUCTION AND OVERVIEW 2D Perception 10 Hz 30 Hz 30 Hz 10 Hz 10 Hz 10 Hz 10 Hz 100 Hz 10 Hz 10 Hz 10 Hz 100 Hz 10 Hz 3D Perception Perception Fusion mmWave Radar Tracking Localization Prediction Planning Control Camera LiDAR GNSS/IMU Vehicle Chassis Figure 1.2: The processing pipeline of autonomous vehicles. 3D Perception module as well as the Localization module, the GNSS/IMUs generate positional updates at 100 Hz and feed the raw data to the Localization module, the mmWave radars detect obstacles at 10 FPS and feed the raw data to the Perception Fusion module. Next, the results of 2D and 3D Perception Modules are fed into the Perception Fusion module at 30 Hz and 10 Hz, respectively, to create a comprehensive perception list of all detected objects. The perception list is then fed into the Tracking module at 10 Hz to create a tracking list of all detected objects. The tracking list is then fed into the Prediction module at 10 Hz to create a prediction list of all objects. After that, both the prediction results and the localization results are fed into the Planning module at 10 Hz to generate a navigation plan. The navigation plan is then fed into the Control module at 10 Hz to generate control commands, which are finally sent to the autonomous vehicle for execution at 100 Hz. Hence, for each 10 ms, the autonomous vehicle needs to generate a control command to maneuver the vehicle. If any upstream module, such as the Perception module, misses the deadline to generate an output, the Control module still has to generate a command before the deadline. This could lead to disastrous results as the autonomous vehicle is essentially driving blindly without the perception output. The key challenge is to design a system to minimize the end-to-end latency of the deep processing pipeline within energy and cost constraints, and with minimum latency variation. In this book, we demonstrate that FPGAs can be utilized in different modules in this long processing pipeline to minimize latency, reduce latency variation, and achieve energy efficiency.
  • 31. 1.7. SUMMARY 9 1.7 SUMMARY The authors believe that FPGAs are the indispensable compute substrate for robotic applications for several reasons. • First, robotic algorithms are still evolving rapidly. Thus, any ASIC-based accelerators will be months or even years behind the state-of-the-art algorithms; on the other hand, FPGAs can be dynamically updated as needed. • Second, robotic workloads are highly diverse. Thus, it is difficult for any ASIC-based robotic computing accelerator to reach economies of scale in the near future; on the other hand, FPGAs are a cost-effective and energy-effective alternative before one type of accelerator reaches economies of scale. • Third, compared to SoCs that have reached economies of scale, e.g., mobile SoCs and FPGAs deliver a significant performance advantage. • Fourth, partial reconfiguration allows multiple robotic workloads to time-share an FPGA, thus allowing one chip to serve multiple applications, leading to overall cost and energy reduction. Specifically, FPGAs require little power and are often built into small systems with less memory. They have the ability to parallel computations massively and make use of the proper- ties of perception (e.g., stereo matching), localization (e.g., SLAM), and planning (e.g., graph search) kernels to remove additional logic and simplify the implementation. Taking into ac- count hardware characteristics, several algorithms are proposed which can be run in a hardware- friendly way and achieve similar software performance. Therefore, FPGAs are possible to meet real-time requirements while achieving high energy efficiency compared to CPUs and GPUs. Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re- programming without going through re-fabrication with a modified design. PR takes this flex- ibility one step further, allowing the modification of an operating FPGA design by loading a partial configuration file. Using PR, part of the FPGA can be reconfigured at runtime without compromising the integrity of the applications running on those parts of the device that are not being reconfigured. As a result, PR can allow different robotic applications to time-share part of an FPGA, leading to energy and performance efficiency, and making FPGA a suitable computing platform for dynamic and complex robotic workloads. Due to the advantages over other compute substrates, FPGAs have been successfully uti- lized in commercial autonomous vehicles. Particularly, over the past four years, PerceptIn has built and commercialized autonomous vehicles for micromobility, and PerceptIn’s products have been deployed in China, the U.S., Japan, and Switzerland. In this book, we provide a real-world case study on how PerceptIn developed its computing system by relying heavily on FPGAs, which perform not only heterogeneous sensor synchronizations but also the acceleration of soft- ware components on the critical path. In addition, FPGAs are used heavily in space robotic
  • 32. 10 1. INTRODUCTION AND OVERVIEW applications, for FPGAs offered unprecedented flexibility and significantly reduced the design cycle and development cost.
  • 33. 11 C H A P T E R 2 FPGA Technologies Before we delve into utilizing FPGAs for accelerating robotic workloads, in this chapter we first provide the background of FPGA technologies so that readers without prior knowledge can grasp the basic understanding of what an FPGA is and how an FPGA works. We also introduce partial reconfiguration, a technique that exploits the flexibility of FPGAs and one that is extremely useful for various robotic workloads to time-share an FPGA so as to minimize energy consumption and resource utilization. In addition, we explore existing techniques that enable the robot operating system (ROS), an essential infrastructure for robotic computing, to run directly on FPGAs. 2.1 AN INTRODUCTION TO FPGA TECHNOLOGIES In the 1980s, FPGAs emerged as a result of increasing integration in electronics. Before the use of FPGAs, glue-logic designs were based on individual boards with fixed components intercon- nected via a shared standard bus, which has various drawbacks, such as hindrance of high volume data processing and higher susceptibility to radiation-induced errors, in addition to inflexibility. In detail, FPGAs are semiconductor devices that are based around a matrix of config- urable logic blocks (CLBs) connected via programmable interconnects. FPGAs can be repro- grammed to desired application or functionality requirements after manufacturing. This feature distinguishes FPGAs from Application-Specific Integrated Circuits (ASICs), which are custom manufactured for specific design tasks. Note that ASICs and FPGAs have different value propositions, and they must be care- fully evaluated before choosing any one over the other. While FPGAs used to be selected for lower-speed/complexity/volume designs in the past, today’s FPGAs easily push the 500 MHz performance barrier. With unprecedented logic density increases and a host of other features, such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price points, FPGAs are a compelling proposition for almost any type of design. Modern FPGAs are with massive reconfigurable logic and memory, which let engineers build dedicated hardware with superb power and performance efficiency. Especially, FPGAs are attracting attention from the robotic community and becoming an energy-efficient platform for robotic computing. Unlike ASIC counterparts, FPGA technology provides the flexibility of on- site programming and re-programming without going through re-fabrication with a modified design, due to its underlying reconfigurable fabrics.
  • 34. 12 2. FPGA TECHNOLOGIES 2.1.1 TYPES OF FPGAS FPGAs can be categorized by the type of their programmable interconnection switches: antifuse, SRAM, and Flash. Each of the three technologies comes with trade-offs. • Antifuse FPGAs are non-volatile and have a minimal delay due to routing, resulting in a faster speed and lower power consumption. The drawback is evident as they have a relatively more complicated fabrication process and are only one-time programmable. • SRAM-based FPGAs are field reprogrammable and use the standard fabrication pro- cess that foundries put in significant effort in optimizing, resulting in a faster rate of performance increase. However, based on SRAM, these FPGAs are volatile and may not hold configuration if a power glitch occurs. Also, they have more substantial rout- ing delays, require more power, and have a higher susceptibility to bit errors. Note that SRAM-based FPGAs are the most popular compute substrates in space applications. • Flash-based FPGAs are non-volatile and reprogrammable, and also have low power consumption and route delay. The major drawback is that runtime reconfiguration is not recommended for flash-based FPGAs due to the potentially destructive results if radiation effects occur during the reconfiguration process [52]. Also, the stability of stored charge on the floating gate is of concern: it is a function including factors such as operating temperature, the electric fields that might disturb the charge. As a result, flash-based FPGAs are not as frequently used in space missions [53]. 2.1.2 FPGA ARCHITECTURE In this subsection, we introduce the basic components in FPGA architecture in the hope of providing basic background knowledge to readers with limited prior knowledge on FPGA tech- nologies. For a detailed and thorough explanation, interested authors can refer to [54]. As shown in Fig. 2.1, a basic FPGA design usually contains the following components. • Configurable Logic Blocks (CLBs) are the basic repeating logic resources on an FPGA. When linked together by the programmable routing blocks, CLBs can exe- cute complex logic functions, implement memory functions, and synchronize code on the FPGA. CLBs contain smaller components, including flip-flops (FFs), look-up ta- bles (LUTs), and multiplexers (MUX). An FF is the smallest storage resource on the FPGA. Each FF in a CLB is a binary register used to save logic states between clock cycles on an FPGA circuit. An LUT stores a predefined list of outputs for every com- bination of inputs. LUTs provide a fast way to retrieve the output of a logic operation because possible results are stored and then referenced rather than calculated. A MUX is a circuit that selects between two or more inputs and then returns the selected input. Any logic can be implemented using the combination of FFs, LUTs, and MUX.
  • 35. 2.1. AN INTRODUCTION TO FPGA TECHNOLOGIES 13 Set by configuration bit-strem Output SRAM FF 4-LUT 1 0 Logic block Inputs 4-input look-up table I/O block DSP DSP Input 3-state DDR MUX Programmable routing block Configurable logic block Interconnect wires Reg OCK1 Reg ICK1 Reg ICK2 PAD Reg OCK2 Output DDR MUX Reg OCK1 Reg OCK2 Figure 2.1: Overview of FPGA architecture. • Programmable Routing Blocks (PRBs) provide programmability for connectivity among a pool of CLBs. The interconnection network contains configurable switch ma- trices and connection blocks that can be programmed to form the demanded connec- tion. PRBs can be divided into Connection Blocks (CBs) and a matrix of Switch Boxes (SBs), namely, Switch Matrix (SM). CBs are responsible to provide a connection be- tween CLBs input/output pins to the adjacent routing channels. SBs are placed at the intersection points of vertical and horizontal routing channels. Routing a net from a CLB source to a CLB target necessitates passing through multiple interconnect wires and SBs, in which an entering signal from a certain side can connect to any of the other three directions based on the SM topology. • I/O Blocks (IOBs) are used to bridge signals onto the chip and send them back off again. An IOB consists of an input buffer and an output buffer with three-state and open-collector output controls. Typically, there are pull-up resistors on the outputs and sometimes pull-down resistors that can be used to terminate signals and buses without requiring discrete resistors external to the chip. The polarity of the output can usually be programmed for active high or active low output. There are typical flip-flops on outputs so that clocked signals can be output directly to the pins without encountering significant delay, more easily meeting the setup time requirement for external devices.
  • 36. 14 2. FPGA TECHNOLOGIES Since there are many IOBs available on an FPGA and these IOBs are programmable, we can easily design a compute system to connect to different types of sensors, which are extremely useful in robotic workloads. • Digital Signal Processors (DSPs) have been optimized to implement various com- mon digital signal processing functions with maximum performance and minimum logic resource utilization. In addition to multipliers, each DSP block has functions that are frequently required in typical DSP algorithms. These functions usually in- clude pre-adders, adders, subtractors, accumulators, coefficient register storage, and a summation unit. With these rich features, the DSP blocks in the Stratix series FP- GAs are ideal for applications with high-performance and computationally intensive signal processing functions, such as finite impulse response (FIR) filtering, fast Fourier transforms (FFTs), digital up/down conversion, high-definition (HD) video process- ing, HD CODECs, etc. Besides the aforementioned traditional workloads, DSPs are also extremely useful for robotic workloads, especially computer vision workloads, pro- viding high-performance and low-power solutions for robotic vision front ends [55]. 2.1.3 COMMERCIAL APPLICATIONS OF FPGAS Due to their programmable nature, FPGAs are an ideal fit for many different markets such as the following. • Aerospace & Defense – Radiation-tolerant FPGAs along with the intellectual property for image processing, waveform generation, and partial reconfiguration for Software-Defined Radios, especially for space and defense applications. • ASIC Prototyping – ASIC prototyping with FPGAs enables fast and accurate SoC system modeling and verification of embedded software. • Automotive – FPGAs enable automotive silicon and IP solutions for gateway and driver assistance systems, as well as comfort, convenience, and in-vehicle infotainment. • Consumer Electronics – FPGAs provide cost-effective solutions enabling next- generation, full-featured consumer applications, such as converged handsets, digital flat panel displays, information appliances, home networking, and residential set top boxes. • Data Center – FPGAs have been utilized heavily for high-bandwidth, low-latency servers, networking, and storage applications to bring higher value into cloud deploy- ments. • High-Performance Computing and Data Storage – FPGAs have been utilized widely for Network Attached Storage (NAS), Storage Area Network (SAN), servers, and storage appliances.
  • 37. 2.2. PARTIAL RECONFIGURATION 15 • Industrial – FPGAs have been utilized in targeted design platforms for Industrial, Sci- entific, and Medical (ISM) enable higher degrees of flexibility, faster time-to-market, and lower overall non-recurring engineering costs (NRE) for a wide range of applica- tions such as industrial imaging and surveillance, industrial automation, and medical imaging equipment. • Medical – For diagnostic, monitoring, and therapy applications, FPGAs have been used to meet a range of processing, display, and I/O interface requirements. • Security – FPGAs offer solutions that meet the evolving needs of security applications, from access control to surveillance and safety systems. • Video & Image Processing – FPGAs have been utilized in targeted design platforms to enable higher degrees of flexibility, faster time-to-market, and lower overall non- recurring engineering costs (NRE) for a wide range of video and imaging applications. • WiredCommunications – FPGAs have been utilized to develop end-to-end solutions for the Reprogrammable Networking Linecard Packet Processing, Framer/MAC, se- rial backplanes, and more. • Wireless Communications – FPGAs have been utilized to develop RF, base band, connectivity, transport, and networking solutions for wireless equipment, addressing standards such as WCDMA, HSDPA, WiMAX, and others. In the rest of this book, we explore robotic computing, an emerging and potentially a killer application for FPGAs. With FPGAs, we can develop low-power, high-performance, cost- effective, and flexible compute systems for various robotic workloads. Due to the advantages provided by FPGAs, we expect that robotic applications will be a major demand driver for FPGAs in the near future. 2.2 PARTIAL RECONFIGURATION Unlike the ASIC counterparts, FPGAs provide the flexibility of on-site programming and re- programming without going through re-fabrication with a modified design. PR takes this flex- ibility one step further, allowing the modification of an operating FPGA design by loading a PR file. Using PR, part of the FPGA can be reconfigured at runtime without compromising the integrity of the applications running on those parts of the device that are not being reconfigured. As a result, PR can allow different robotic applications to time-share part of an FPGA, leading to energy and performance efficiency, and making FPGAs suitable computing platforms for dynamic and complex robotic workloads.
  • 38. 16 2. FPGA TECHNOLOGIES 2.2.1 WHAT IS PARTIAL RECONFIGURATION? The obvious benefit of using reconfigurable devices, such as FPGAs, is that the functionality that a device has now can be changed and updated at some time in the future. As additional func- tionality is available or design improvements are made available, the FPGA can be completely reprogrammed with new logic. PR takes this capability one step further by allowing designers to change the logic within a part of an FPGA without disrupting the entire system. This allows designers to divide their system into modules, each comprised of one block of logic and, without disrupting the whole system and stopping the flow of data, the users can update the functionality within one block. Runtime partial reconfiguration (RPR) is a special feature offered by many FPGAs that allows designers to reconfigure certain portions of the FPGA during runtime without influ- encing other parts of the design. This feature allows the hardware to be adaptive to a changing environment. First, it allows optimized hardware implementation to accelerate computation. Second, it allows efficient use of chip area such that different hardware modules can be swapped in/out of the chip at runtime. Last, it may allow leakage and clock distribution power saving by unloading hardware modules that are not active. RPR is extremely useful for robotic applications, as a mobile robot might encounter very different environments as it navigates, and it might require different perception, localization, or planning algorithms for these different environments. For instance, while a mobile robot is in an indoor environment, it is likely to use an indoor map for localization, but when it travels outdoor, it might choose to use GPS and visual-inertial odometry for localization. Keeping multiple hardware accelerators for different tasks is not only costly but also energy inefficient. RPR provides a perfect solution for this problem. As shown in Fig. 2.2, an FPGA is divided into three partitions for the three basic functions, one for perception, one for localization, and one for planning. Then for each function, there are three algorithms ready, one for each environment. Each of these algorithms is converted to a bit file and ready for RPR when needed. For instance, when a robot navigates to a new environment and decides that a new perception algorithm is needed, it can load the target bit file and sends it to the internal configuration access port (ICAP) to reconfigure the perception partition. One major challenge of RPR for robotic computing is the configuration speed, as most robotic tasks have strong real-time constraints, and to maintain the performance of the robot, the reconfiguration process has to finish within a very tight time bound. In addition, the recon- figuration process incurs performance and power overheads. By maximizing the configuration speed, these overheads can be minimized as well. 2.2.2 HOW TO USE PARTIAL RECONFIGURATION? PR allows the modification of an operating FPGA design by loading a PR file, or a bit file through ICAP [56]. Using PR, after a full bit file configures the FPGA, partial bit files can also be downloaded to modify reconfigurable regions in the FPGA without compromising the
  • 39. 2.2. PARTIAL RECONFIGURATION 17 Perception module 1 Localization module 1 Planning module 1 Perception module 2 Localization module 2 Planning module 2 Perception module 3 Localization module 3 Planning module 3 Partition 1 Partition 2 FPGA I C A P Partition 3 Figure 2.2: An example of partial reconfiguration for robotic applications. integrity of the applications running on those parts of the device that are not being reconfigured. RPR allows a limited, predefined portion of an FPGA to be reconfigured while the rest of the device continues to operate, and this feature is especially valuable where devices operate in a mission-critical environment that cannot be disrupted while some subsystems are being redefined. In an SRAM-based FPGA, all user-programmable features are controlled by memory cells that are volatile and must be configured on power-up. These memory cells are known as the configuration memory, and they define the look-up table (LUT) equations, signal routing, input/output block (IOB) voltage standards, and all other aspects of the design. In order to program the configuration memory, instructions for the configuration control logic and data for the configuration memory are provided in the form of a bitstream, which is delivered to the device through the JTAG, SelectMAP, serial, or ICAP configuration interface. An FPGA can be partially reconfigured using a partial bitstream. A designer can use such a partial bitstream to change the structure of one part of an FPGA design as the rest of the device continues to operate. RPR is useful for systems with multiple functions that can time-share the same FPGA device resources. In such systems, one section of the FPGA continues to operate, while other sections of the FPGA are disabled and reconfigured to provide new functionality. This is anal- ogous to the situation where a microprocessor manages context switching between software processes. In the case of PR of an FPGA, however, it is the hardware instead of the software that is being switched. RPR provides an advantage over multiple full bitstreams in applications that require con- tinuous operation, which would not be possible during full reconfiguration. One example is a mobile robot that switches the perception module while keeping the localization module and planning module intact when moving from a dark environment to a bright environment. With RPR, the system can maintain the localization and planning modules while the perception mod- ule within the FPGA is changed on the fly.
  • 40. 18 2. FPGA TECHNOLOGIES In-circuit verification In-circuit verification Static analysis Functional verification Behavioral simulation Static analysis Partial reconfiguration Partial bit file generation Upload to FPGA Full bit file generation Layout Source file Synthesis Figure 2.3: FPGA regular and partial reconfiguration design flow. Xilinx has provided the PR feature in their high-end FPGAs, the Virtex series, in limited access BETA since the late 1990s. More recently it is a production feature supported by their tools and across their devices since the release of ISE 12. The support for this feature continues to improve in the more recent release of ISE 13. Altera has promised this feature for their new high-end devices, but this has not yet materialized. PR of FPGAs is a compelling design concept for general purpose reconfigurable systems for its flexibility and extensibility. Using the Xilinx tool chain, designers can go through the regular synthesis flow to generate a single bitstream for programming the FPGA. This considers the device as a single atomic entity. As opposed to the general synthesis flow, the PR flow physically divides the FPGA device into regions. One region is called the “static region,” which is the portion of the device that is programmed at startup and never changes. Another region is the “PR region,” which is the portion of the device that will be reconfigured dynamically, potentially multiple times and with different designs. It is possible to have multiple PR regions, but we will consider only the simplest case here. The PR flow generates at least two bitstreams, one for the static and one for the PR region. Most likely, there will be multiple PR bitstreams, one for each design that can be dynamically loaded. As shown in Fig. 2.3, the first step in implementing a system using the PR design flow is the same as the regular design, which is to synthesize the netlists from the HDL sources that will be used in the implementation and layout process. Note that the process requires separate netlists for the static (top-level) designs and the PR partitions. A netlist must be generated for each implementation of the PR partition used in the design. If the system design has multiple
  • 41. 2.2. PARTIAL RECONFIGURATION 19 PR partitions, then it will require a netlist for each implementation of each PR partition, even if the logic is the same in multiple locations. Then once a netlist is done, we need to work on the layout for each design to make sure that the netlist fits into the dedicated partition, and we need to make sure that there are enough resources available for the design in each partition. Once the implementation is done, we can then generate the bit file for each partition. At runtime, we can dynamically swap different designs to a partition for the robot to adapt to the changing environment. For more details on how to use PR on FPGAs, please refer to [57]. 2.2.3 ACHIEVING HIGH PERFORMANCE A major performance bottleneck for PR is the configuration overhead, which determines the usefulness of PR. If PR is done fast enough, we can use this feature to enable mobile robots to swap hardware components at runtime. If PR cannot be done fast enough, we can only use this feature to perform offline hardware updates. To address this problem, in [58], the authors propose a combination of two techniques to minimize the overhead. First, the authors design and implement fully streaming DMA en- gines to saturate the configuration throughput. Second, the authors exploit a simple form of data redundancy to compress the configuration bitstreams, and implement an intelligent in- ternal configuration access port (ICAP) controller to perform decompression at runtime. This design achieves an effective configuration data transfer throughput of up to 1.2 GB/s, which well surpasses the theoretical upper bound of the data transfer throughput, 400 MB/s. Specifi- cally, the proposed fully streaming DMA engines reduce the configuration time from the range of seconds to the range of milliseconds, a more than 1000-fold improvement. In addition, the proposed compression scheme achieves up to a 75% reduction in bitstream size and results in a decompression circuit with negligible hardware overhead. Figure 2.4 shows the architecture of the fast PR engine, which consists of: • a direct memory access (DMA) engine to establish a direct transfer link between the external SRAM, where the configuration files are stored, and the ICAP; • a streaming engine implemented with a FIFO queue to buffer data between the con- sumer and the producer to eliminate the handshake between the producer and the consumer for each data transfer; and • turn on the burst mode for ICAP thus it can fetch four words instead of one word at a time. We will explain this design in greater details in the following sections. Problems with the Out-of-Box PR Engine Design Without the fast PR engine, in the out-of-box design, the ICAP Controller contains only the ICAP and the ICAP FSM, and the SRAM Controller only contains the SRAM Bridge and
  • 42. 20 2. FPGA TECHNOLOGIES SRAM bridge SRAM controller ICAP controller SRAM interface ICAP ICAP FSM Secondary DMA Primary DMA FIFO System bus Figure 2.4: Fast partial reconfiguration engine. the SRAM Interface. Hence, there is no direct memory access between SRAM and ICAP, and all configuration data transfers are done in software. In this way, the pipeline issues one read instruction to fetch a configuration word from SRAM, and then issues a write instruction to send the word to ICAP; instructions are also fetched from SRAM, and this process repeats until the transfer process completes. This scheme is highly inefficient because the transfer of one word requires tens of cycles, and the ICAP transfer throughput of this design is only 318 KB/s, whereas on the product specification, the ideal ICAP throughput is 400 MB/s. Hence the out- of-box design throughput is 1000 times worse than the ideal design. Configuration Time is a Pure Function of the Bitstream Size? Theoretically, the ICAP throughput can reach 400 MB/s, but this is achievable only if the config- uration time is a pure function of bitstream file size. In order to find out whether this theoretical throughput is achievable, the authors of [58] performed experiments to configure different re- gions of the FPGA chip, to repeatedly writing NOPs to ICAP, and to stress the configuration circuit by repeatedly configuring one region. During all these tests, we found out that ICAP al- ways ran at full speed such that it was able to consume four bytes of configuration data per cycle,
  • 43. 2.2. PARTIAL RECONFIGURATION 21 regardless of the semantics of the configuration data. This confirms that configuration time is a pure function of the size of the bitstream file. Adding the Primary-Secondary DMA Engines To improve PR throughput, we first can simply implement a pair of primary-secondary DMA engines. The primary DMA engine resides in the ICAP controller and interfaces with the ICAP FSM, the ICAP, as well as the secondary DMA engine. The secondary DMA engine resides in the SRAM Controller, and it interfaces with the SRAM Bridge and the primary DMA engine. When a DMA operation starts, the primary DMA engine receives the starting address as well as the size of the DMA operation. Then it starts sending control signals (read-enable, address, etc.) to the secondary DMA engine, which then forwards the signals to the SRAM Bridge. After the data is fetched, the secondary DMA engine sends the data back to the primary DMA engine. Then, the primary DMA engine decrements the size counter, increments the address, and repeats the process to fetch the next word. Compared to the out-of-box design, simply adding the DMA engines avoids the involvement of the pipeline in the data transfer process and it significantly increases the PR throughput to 50 MB/s, a 160-fold improvement. Adding a FIFO between the DMA Engines To further improve the PR throughput, we can modify the primary-secondary DMA engines by adding a FIFO between the two DMA engines. In this version of the design, when DMA operation starts, instead of sending control signals to the secondary DMA engine, the primary DMA engine forwards the starting address and the size of the DMA operation to the secondary DMA engine, then it waits for the data to become available in the FIFO. Once data becomes available in the FIFO, the primary DMA engine reads the data and decrements its size counter. When the counter hits zero, the DMA operation completes. On the other side, upon receiving the starting address and size of the DMA operation, the secondary DMA engine starts sending control signals to the SRAM Bridge to fetch data one word at a time. Then once the secondary DMA engine receives data from the SRAM Bridge, it writes the word into the FIFO, decre- ments its size counter, and increments its address register to fetch the next word. In this design, only data is transferred between the primary and secondary DMA engines, and all control op- erations to SRAM are handled in the secondary DMA. This greatly simplifies the handshaking between the ICAP Controller and the SRAM Controller, and it leads to a 100 MB/s ICAP throughput, an additional two-fold improvement. Adding Burst Mode to Provide Fully Streaming The SRAM on most FPGA boards usually provides burst read mode such that we can read four words at a time instead of one. Burst mode reads are available on most DDR memories as well. There is an ADVLD signal to the SRAM device. During a read, if this signal is set, then a new address is loaded into the device. Otherwise, the device will output a burst of up to
  • 44. 22 2. FPGA TECHNOLOGIES four words, one word per cycle. Therefore, if we can set the ADVLD signal every four cycles, each time we increment the address by four words, and given that the synchronization between control signals and data fetches is correct, then we are able to stream data from SRAM to the ICAP. We implement two independent state machines in the secondary DMA engine. One state machine sends control signals as well as the addresses to the SRAM in a continuous manner, such that in every four cycles, the address is incremented by four words (16 bytes) and sent to the SRAM device. The other state machine simply waits for the data to become ready at the beginning, and then in each cycle, it receives one word from the SRAM and streams the word to the FIFO until the DMA operation completes. Similarly, the primary DMA engine waits for data to become available in the FIFO, and then in each cycle, it reads one word from the FIFO and streams the word to the ICAP until the DMA operation completes. This fully streaming DMA design leads to an ICAP throughput that exceeds 395 MB/s, which is very close to the ideal 400 MB/s throughputs. Energy Efficiency In [59], the authors indicate that the polarity of the FPGA hardware structures may significantly impact leakage power consumption. Based on this observation, the authors of [60] tried to find out whether FPGAs utilize this property such that when the blank bitstream is loaded to wipe out an accelerator, the circuit is set to a state to minimize the leakage power consumption. In order to achieve this, the authors implemented eight PR regions on an FPGA chip, with each region occupying a configuration frame. These eight PR regions did not consume any dynamic power, as the authors purposely gated off the clock to these regions. Then the authors used the blank bitstream files to wipe out each of these regions and observed the chip power consumption behavior. The results indicated that for every four configuration frames that we applied the blank bitstream on, the chip power consumption dropped by a constant amount. This study confirms that PR indeed leads to static power reduction and suggests that FPGAs may have utilized the polarity property to minimize leakage power. In addition, the authors of [60] studied whether PR can be used as an effective energy re- duction technique in reconfigurable computing systems. To approach this problem, the authors first identified the analytical models that capture the necessary conditions for energy reduc- tion under different system configurations. The models show that increasing the configuration throughput is a general and effective way to minimize the PR energy overhead. Therefore, the authors designed and implemented a fully streaming DMA engine that nearly saturates the configuration throughput. The findings provide answers to the three questions: first, although we pay extra power to use an accelerator, depending on the accelerator’s ability to accelerate the program execution, it will result in actual energy reduction. The experimental results in [60] demonstrate that due to its low power overhead and excellent ability of acceleration, having an acceleration extension can lead to both program speedup and system energy reduction. Second, it is worthwhile to use PR
  • 45. 2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 23 to reduce chip energy consumption if the energy reduction can make up for the energy overhead incurred during the reconfiguration process; and the key to minimize the energy overhead during the reconfiguration process is to maximize the configuration speed. The experimental results in [60] confirm that enabling PR is a highly effective energy reduction technique. Finally, clock gating is an effective technique in reducing energy consumption due to its negligible overhead; however, it reduces only dynamic power whereas PR reduces both dynamic and static power. Therefore, PR can lead to a larger energy reduction than clock gating, provided the extra energy saving on static power elimination can make up for the energy overhead incurred during the reconfiguration process. Although the conventional wisdom is that PR is only useful if the accelerator would not be used for a very long period of time, the experimental results in [60] indicate that with the high configuration throughput delivered by the fast PR engine, PR can outperform clock gating in energy reduction even if the accelerator inactive time is in the millisecond range. In summary, based on the results from [58] and [60], we can conclude that PR is an effective technique for improving both performance and energy efficiency, and it is the key feature that makes FPGAs a highly attractive choice for dynamic robotic computing workloads. 2.2.4 REAL-WORLD CASE STUDY Following the design presented in [60], PerceptIn has demonstrated in their commercial prod- uct that RPR is useful for robotic computing, especially computing for autonomous vehicles, because many on-vehicle tasks usually have multiple versions where each is used in a particular scenario [20]. For instance, in PerceptIn’s design, the localization algorithm relies on salient features; features in keyframes are extracted by a feature extraction algorithm (based on ORB features [61]), whereas features in non-key frames are tracked from previous frames (using op- tical flow [62]); the latter executes in 10 ms, 50% faster than the former. Spatially sharing the FPGA is not only area-inefficient but also power-inefficient as the unused portion of the FPGA consumes non-trivial static power. In order to temporally share the FPGA and “hot-swap” dif- ferent algorithms, PerceptIn developed a Partial Reconfiguration Engine (PRE) that dynami- cally reconfigures part of the FPGA at runtime. The PRE achieves a 400 MB/s reconfiguration throughput (i.e., bitstream programming rate). Both the feature extraction and tracking bit- streams are less than 4 MB. Thus, the reconfiguration delay is less than 1 ms. 2.3 ROBOT OPERATING SYSTEM (ROS) ON FPGAS As demonstrated in the previous chapter, autonomous vehicles and robots demand complex in- formation processing such as SLAM (Simultaneous Localization and Mapping), deep learning, and many other tasks. FPGAs are promising in accelerating these applications with high energy efficiency. However, utilizing FPGAs for robotic workloads is challenging due to the high de- velopment costs and lack of talents who can understand both FPGAs and robotics. One way to address this challenge is to directly support ROS on FPGAs as ROS already provides the basic
  • 46. 24 2. FPGA TECHNOLOGIES infrastructure for supporting efficient robotic computing. Hence, in this section we explore the state-of-the-art supports for ROS to run on FPGAs. 2.3.1 ROBOT OPERATING SYSTEM (ROS) Before delving into supports for running ROS on FPGAs, we first understand the importance of ROS in robotic applications. ROS is an open-source, meta-operating system for autonomous machines and robots. It provides the essential operating system services, including hardware abstraction, low-level device control, implementation of commonly used functionality, message- passing between processes, and package management. ROS also provides tools and libraries for obtaining, building, writing, and running code across multiple computers. The primary goal of ROS is to support code reuse in robotics research and development. In essence, ROS is a distributed framework of processes that enables executables to be individually designed and loosely coupled at runtime. These processes can be grouped into Packages and Stacks, which can be easily shared and distributed. ROS also supports a federated system of code Repositories that enable collaboration to be distributed as well. This design, from the file system level to the community level, enables independent decisions about development and implementation, but all can be brought together with ROS infrastructure tools [63]. The core objectives of the ROS framework include the following. • Thin: ROS is designed to be as thin as possible so that code written for ROS can be used with other robot software frameworks. • ROS-agnostic libraries: the preferred development model is to write ROS-agnostic libraries with clean functional interfaces. • Language independence: the ROS framework is easy to implement in any modern programming language. The ROS development team has already implemented it in Python, C++, and Lisp, and we have experimental libraries in Java and Lua. • Easy testing: ROS has a built-in unit/integration test framework called rostest that makes it easy to bring up and tear down test fixtures. • Scaling: ROS is appropriate for large runtime systems and large development processes. The Computation Graph is the peer-to-peer network of ROS processes that are processing data together. The basic Computation Graph concepts of ROS are nodes, Master, Parameter Server, messages, services, topics, and bags, all of which provide data to the Graph in different ways. • Nodes: nodes are processes that perform computation. ROS is designed to be modular at a fine-grained scale; a robot control system usually comprises many nodes. Take autonomous vehicles as an example, one node controls a laser range-finder, one node
  • 47. 2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 25 controls the wheel motors, one node performs localization, one node performs path planning, one node provides a graphical view of the system, and so on. A ROS node is written with the use of a ROS client library, such as roscpp or rospy. • Master: the ROS Master provides name registration and lookup to the rest of the Computation Graph. Without the Master, nodes would not be able to find each other, exchange messages, or invoke services. • Parameter Server: the parameter server allows data to be stored by key in a central location. It is currently part of the Master. • Messages: nodes communicate with each other by passing messages. A message is simply a data structure, comprising typed fields. Standard primitive types (integer, floating-point, boolean, etc.) are supported, as are arrays of primitive types. Messages can include arbitrarily nested structures and arrays (much like C structs). • Topics: messages are routed via a transport system with publish-subscribe semantics. A node sends out a message by publishing it to a given topic. The topic is a name that is used to identify the content of the message. A node that is interested in a certain kind of data will subscribe to the appropriate topic. There may be multiple concurrent publishers and subscribers for a single topic, and a single node may publish and sub- scribe to multiple topics. In general, publishers and subscribers are not aware of each others’ existence. The idea is to decouple the production of information from its con- sumption. Logically, one can think of a topic as a strongly typed message bus. Each bus has a name, and anyone can connect to the bus to send or receive messages as long as they are the right type. • Services: the publish-subscribe model is a very flexible communication paradigm, but its many-to-many, one-way transport is not appropriate for request-reply interactions, which are often required in a distributed system. Request-reply is done via services, which are defined by a pair of message structures: one for the request and one for the reply. A providing node offers a service under a name and a client uses the service by sending the request message and awaiting the reply. ROS client libraries generally present this interaction to the programmer as if it were a remote procedure call. • Bags: bags are a format for saving and playing back ROS message data. Bags are an important mechanism for storing data, such as sensor data, that can be difficult to collect but is necessary for developing and testing algorithms. The ROS Master acts as a name service in the ROS Computation Graph. It stores topics and services registration information for ROS nodes. Nodes communicate with the Master to report their registration information. As these nodes communicate with the Master, they can re- ceive information about other registered nodes and make connections as appropriate. The Master
  • 48. 26 2. FPGA TECHNOLOGIES ARM ROS-compliant FPGA component on ARM-FPGA SoC Publisher Publisher Input Subscriber ROS node Output Subscriber FPGA interface FPGA interface FPGA Applications Figure 2.5: ROS-compliant FPGAs. will also make callbacks to these nodes when this registration information changes, which allows nodes to dynamically create connections as new nodes are run. Nodes connect to other nodes directly; the Master only provides lookup information, much like a domain name service (DNS) server. Nodes that subscribe to a topic will request connections from nodes that publish that topic and will establish that connection over an agreed- upon connection protocol. This architecture allows for decoupled operations, where the names are the primary means by which larger and more complex systems can be built. Names have a very important role in ROS: nodes, topics, services, and parameters all have names. Every ROS client library supports command-line remapping of names, which means a compiled program can be reconfigured at runtime to operate in a different Computation Graph topology. 2.3.2 ROS-COMPLIANT FPGAS In order to integrate FPGAs into a ROS-based system, a ROS-compliant FPGA component has been proposed [64, 65]. Integration of an FPGA into a robotic system requires equivalent functionality to replace a software ROS component with a ROS-compliant FPGA component. Therefore, each ROS message type and data format used in the ROS-compliant FPGA com- ponent must be the same as that of the software ROS component. The ROS-compliant FPGA component aims to improve its processing performance while satisfying the requirements. Figure 2.5 shows the architecture of the ROS-compliant FPGA component model. Each ROS-compliant FPGA component must implement the following four functions: Encapsula- tion of FPGA circuits, Interface between ROS software and FPGA circuits, Subscribe interface from a topic, and Publish interface to a topic. The ARM core is responsible for communicating with and offloading workloads to the FPGA, whereas the FPGA part performs actual workload acceleration. Note that there are two roles of software in the component. First, an interface pro-
  • 49. 2.3. ROBOT OPERATING SYSTEM (ROS) ON FPGAS 27 cess for input that subscribes to a topic to receive input data. The software component, which runs on the ARM core, is responsible for formatting the data suitable for the FPGA processing and sends the formatted data to the FPGA. Second, an interface process for output receives processing results from the FPGA. The software component, which runs on the ARM core, is responsible for reformatting the results suitable for the ROS system and publishes them to a topic. Such a structure can realize a robot system in which software and hardware cooperate. Note that the difference of ROS-compliant FPGA component from a ROS node written in pure software is that processing contains hardware processing of an FPGA. Integration of ROS-compliant FPGA component into a ROS system only requires connections to ROS nodes through Publish/Subscribe messaging in ordinary ROS development style. The ROS-compliant FPGA component provides easy integration of an FPGA by wrapping it with software. To evaluate this design, the authors of [65] have implemented a hardwired image labeling application on a ROS-compliant FPGA component on Xilinx Zynq-7020, and verifying that this design performs 26 times faster than that of software with the ARM processor, and even 2.3 times faster than that of an Intel PC. Moreover, the end-to-end latency of the component is 1.7 times faster than that of processing with pure software. Therefore, the authors verify that the ROS-compliant FPGA component achieves remarkable performance improvement, maintain- ing high development productivity by cooperative processing of hardware and software. How- ever, this also comes with a problem, as the authors found out that the communication of ROS nodes is a major bottleneck of the execution time in the ROS-compliant FPGA component. 2.3.3 OPTIMIZING COMMUNICATION LATENCY FOR THE ROS-COMPLIANT FPGAS As indicated in the previous subsection, large communication latency between ROS components is a severe problem and has been the bottleneck of offloading computing to FPGAs. The authors in [66] aim to reduce the latency by implementing Publish/Subscribe messaging of ROS as hardware. Based on the results of network packets analysis in the ROS system, the authors propose a method of implementing a hardware ROS-compliant FPGA Component, which is done by separating the registration part (XMLRPC) and data communication part (TCPROS) of the Publish/Subscribe messaging. To study ROS performance, the authors have compared the communication latency of (1) PC-PC and (2) PC-ARM SoC. Two computer nodes are connected with each other through a Gigabit Ethernet. The communication latency in (2) PC-ARM SoC environment is about four times larger than (1) PC-PC. Therefore, the performance in embedded processor environments, such as ARM processors, should be improved. Hence, the challenge for ROS-compliant FPGA components is to reduce the large overhead in communication latency. If communication latency is reduced, the ROS-compliant FPGA component can be used as an accelerator for processing in robotic applications/systems.
  • 50. 28 2. FPGA TECHNOLOGIES In order to implement Publish/Subscribe messaging of ROS as hardware, the authors analyzed network packets that flowed in Publish/Subscribe messaging in the ROS system of ordinary software. The authors utilized WireShark for network packet analysis [67] with the basic ROS setup of one master, one publisher, and one subscriber node. • STEP (1): the Publisher and Subscriber nodes register their nodes and topic informa- tion to the Master node. The registration is done by calling methods like registerPub- lisher, hasParam, and so on, using XMLRPC [68]. • STEP (2): the Master node notifies topic information to the Subscriber nodes by call- ing publisherUpdate (XMLRPC). • STEP (3): the Subscriber node sends a connection request to the Publisher node by using requestTopic (XMLRPC). • STEP (4): the Publisher node returns IP address and port number, TCP connection information for data communication, as a response to the requestTopic (XMLRPC). • STEP (5): the Subscriber node establishes a TCP connection by using the information and sends connection header to the TCP connection. Connection header contains im- portant metadata about a connection being established, including typing and routing information, using TCPROS [69]. • STEP (6): if it is a successful connection, the Publisher node sends connection header (TCPROS). • STEP (7): data transmission repeats. This data is written with little endian and header information (4 bytes) is added to the data (TCPROS). After this analysis, the authors found out that network packets that flowed in Pub- lish/Subscribe messaging in the ROS system can be categorized into two parts, that is, the registration part and the data transmission part. The registration part uses XMLRPC (STEPS (1)–(4)), while the data transmission part uses TCPROS (STEPS (5)–(7)), which is almost raw data of TCP communication with very small overhead. In addition, once data transmission (STEP (7)) starts, only data transmission repeats without STEPS (1)–(6). Based on the network packet analysis, the authors modified the server ports, such that those used in XMLRPC and TCPROS are assigned differently. In addition, a client TCP/IP connection of XMLRPC for the Master node is necessary for the Publisher node. For the Subscriber node, two client TCP/IP connections of XMLRPC and one client connection of TCPROS are necessary. Therefore, two or three TCP ports are necessary to implement Pub- lish/Subscribe messaging. It is a problem to implement ROS nodes using the hardware TCP/IP stack.
  • 51. 2.4. SUMMARY 29 To optimize the communication performance on ROS-compliant FPGAs, the authors proposed hardware publication and subscription services. Conventionally, publication or sub- scription of topics was done by software in ROS. By implementing these nodes as hardwired circuits, direct communication between the ROS nodes and the FPGA becomes not only pos- sible but also highly efficient. In order to implement the hardware ROS nodes, the authors designed the Subscriber hardware and the Publisher hardware separately: the Subscriber hard- ware is responsible to subscribe to a topic of another ROS node and to receive ROS messages from the topic; whereas the Publisher hardware is responsible to publish ROS messages to a topic of another ROS node. With this hardware-based design, the evaluation results indicate that the latency of the Hardware ROS-compliant FPGA component can be cut to half, from 1.0 ms to 0.5 ms, thus effectively improving the communication between the FPGA accelerator and other software-based ROS nodes. 2.4 SUMMARY In this chapter, we have provided a general introduction to FPGA technologies, especially run- time partial reconfiguration, which allows multiple robotic workloads to time-share an FPGA at runtime. We also have introduced existing research on enabling ROS on FPGAs, which pro- vides infrastructure supports for various robotic workloads to run directly on FPGAs. However, the ecosystem of robotic computing on FPGAs is still in its infancy. For instance, due to the lack of high-level synthesis tools for robotic accelerator design, accelerating robotic workloads, or part of a robotic workload, on FPGAs still require extensive manual efforts. To make the matter worse, most robotic engineers do not have sufficient FPGA background to develop an FPGA-based accelerator, whereas few FPGA engineers possess sufficient robotic background to fully understand a robotic system. Hence, to fully exploit the benefits of FPGAs, advanced design automation tools are imperative to bridge this knowledge gap.
  • 53. 31 C H A P T E R 3 Perception on FPGAs – Deep Learning Cameras are widely used in intelligent robot systems because of their lightweight and rich in- formation for perception. Cameras can be used to complete a variety of basic tasks of intelligent robots, such as visual odometry (VO), place recognition, object detection, and recognition. With the development of convolutional neural networks (CNNs), we can reconstruct the depth and pose with the absolute scale directly from a monocular camera, making monocular VO more robust and efficient. And monocular VO methods, like Depth-VO-Feat [70], make robot sys- tems much easier to deploy than stereo ones. Furthermore, although there are previous works to design accelerators for robot applications, such as ESLAM [71], the accelerators can only be used for specific applications with poor scalability. In recent years, CNN has made great improvements on the place recognition for robotic perception. The accuracy of the place recognition code from another CNN-based method, GeM [72], is about 20% better than the handcrafted method, rootSIFT [73]. CNN is a general framework, which can be applied to a variety of robotic applications. With the help of CNN, the robots can also detect and distinguish objects from input images. In summary, CNNs greatly enhance robots’ ability in localization, place recognition, and many other perception tasks. CNNs have become the core component in various kinds of robots. However, since neural networks (NNs) are computationally intensive, deep learning models are often the performance bottleneck in robots. In this chapter, we delve into utilizing FPGAs to accelerate neural networks in various robotic workloads. Specifically, neural networks are widely adopted in regions like image, speech, and video recognition. What’s more, deep learning has made significant progress in solving robotic per- ception problems. But the high computation and storage complexity of neural network inference poses great difficulty in its application. CPUs are hard to offer enough computational capacity. GPUs are the first choice for the neural network process because of their high computational capacity and easy-to-use development frameworks but suffer from energy inefficiency. On the other hand, with specifically designed hardware, FPGAs are a potential candi- date to surpass GPUs in performance and energy efficiency. Various FPGA-based accelerators have been proposed with software and hardware optimization techniques to achieve high per- formance and energy efficiency. In this chapter, we give an overview of previous work on neural network inference accelerators based on FPGAs and summarize the main techniques used. An
  • 54. 32 3. PERCEPTION ON FPGAS – DEEP LEARNING Table 3.1: Performance and resource utilization of state-of-the-art neural network accelerator designs AlexNet[74] VGG19[78] ResNet152[81] MobileNet[79] Year 2012 2014 2016 2017 2017 # Param 60M 144M 57M 4.2M 2.36M # Operation 1.4G 39G 22.6G 1.1G 0.27G Top-1 Acc. 61.0% 74.5% 79.3% 70.6% 67.6% investigation from software to hardware, from circuit level to system level, is carried out for a complete analysis of FPGA-based deep learning accelerators and serves as a guide to future work. 3.1 WHY CHOOSE FPGAS FOR DEEP LEARNING? Recent research works on neural networks demonstrate great improvements over traditional al- gorithms in machine learning. Various network models, like CNNs, recurrent neural networks (RNNs), have been proposed for image, video, and speech processes. CNNs [74] improve the top-5 image classification accuracy on ImageNet [75] dataset from 73.8–84.7% in 2012 and fur- ther improve object detection [76] with its outstanding ability in feature extraction. RNNs [77] achieve the state-of-the-art word error rate on speech recognition. In general, NNs feature a high fitting ability to a wide range of pattern recognition problems. This ability makes NNs promising candidates for many artificial intelligence applications. But the computation and storage complexity of NN models are high. In Table 3.1, we list the number of operations, number of parameters (add or multiplication), and top-1 accuracy on ImageNet dataset [75] of state-of-the-art CNN models. Take CNNs as an example. The largest CNN model for a 224 224 image classification requires up to 39 billion floating-point opera- tions (FLOP) and more than 500 MB model parameters [78]. As the computational complexity is proportional to the input image size, processing images with higher resolutions may need more than 100 billion operations. Latest works like MobileNet [79] and ShuffleNet [80] are trying to reduce the network size with advanced network structures, but with obvious accuracy loss. The balance between the size of NN models and accuracy is still an open question today. In some cases, the large model size hinders the application of NNs, especially in power-limited or latency-critical scenarios. Therefore, choosing a proper computation platform for neural-network-based applica- tions is essential. A typical CPU can perform 10–100 GFLOP per second, and the power effi- ciency is usually below 1 GOP/J. So CPUs are hard to meet the high-performance requirements in cloud applications nor the low power requirements in mobile applications. In contrast, GPUs
  • 55. Discovering Diverse Content Through Random Scribd Documents
  • 56. that what we see is a painting. At the same time, we are not satisfied with an expression which several writers, we remark, have lately used, and which Mr Ruskin very explicitly adopts. The imitations of the landscape-painter are not a language which he uses; they are not mere signs, analogous to those which the poet or the orator employs. There is no analogy between them. Let us analyse our impressions as we stand before the artist's landscape, not thinking of the artist, or his dexterity, but simply absorbed in the pleasure which he procures us—we do not find ourselves reverting, in imagination, to other trees or other rivers than those he has depicted. We certainly do not believe them to be real trees, but neither are they mere signs, or a language to recall such objects; but what there is of tree there we enjoy. There is the coolness and the quiet of the shaded avenue, and we feel them; there is the sunlight on that bank, and we feel its cheerfulness; we feel the serenity of his river. He has brought the spirit of the trees around us; the imagination rests in the picture. In other departments of art the effect is the same. If we stand before a head of Rembrandt or Vandyke, we do not think that it lives; but neither do we think of some other head, of which that is the type. But there is majesty, there is thought, there is calm repose, there is some phase of humanity expressed before us, and we are occupied with so much of human life, or human character, as is then and there given us. Imitate as many qualities of the real object as you please, but always the highest, never sacrificing a truth of the mind, or the heart, for one only of the sense. Truth, as Mr Ruskin most justly says —truth always. When it is said that truth should not be always expressed, the maxim, if properly understood, resolves into this— that the higher truth is not to be sacrificed to the lower. In a landscape, the gradation of light and shade is a more important truth than the exact brilliancy (supposing it to be attainable,) of any individual object. The painter must calculate what means he has at his disposal for representing this gradation of light, and he must pitch his tone accordingly. Say he pitches it far below reality, he is still in search of truth—of contrast and degree.
  • 57. Sometimes it may happen that, by rendering one detail faithfully, an artist may give a false impression, simply because he cannot render other details or facts by which it is accompanied in nature. Here, too, he would only sacrifice truth in the cause of truth. The admirers of Constable will perhaps dispute the aptness of our illustration. Nevertheless his works appear to us to afford a curious example of a scrupulous accuracy or detail producing a false impression. Constable, looking at foliage under the sunlight, and noting that the leaf, especially after a shower, will reflect so much light that the tree will seem more white than green, determined to paint all the white he saw. Constable could paint white leaves. So far so well. But then these leaves in nature are almost always in motion: they are white at one moment and green the next. We never have the impression of a white leaf; for it is seen playing with the light— its mirror, for one instant, and glancing from it the next. Constable could not paint motion. He could not imitate this shower of light in the living tree. He must leave his white paint where he has once put it. Other artists before him had seen the same light, but, knowing that they could not bring the breeze into their canvass, they wisely concluded that less white paint than Constable uses would produce a more truthful impression. But we must no longer be detained from the more immediate task before us. We must now follow Mr Ruskin to his second volume of Modern Painters, where he explains his theory of the beautiful; and although this will not be to readers in general the most attractive portion of his writings, and we ourselves have to practise some sort of self-denial in fixing our attention upon it, yet manifestly it is here that we must look for the basis or fundamental principles of all his criticisms in art. The order in which his works have been published was apparently deranged by a generous zeal, which could brook no delay, to defend Mr Turner from the censures of the undiscerning public. If the natural or systematic order had been preserved, the materials of this second volume would have formed the first preliminary treatise, determining those broad principles of taste, or that philosophical theory of the beautiful, on which the whole of the
  • 58. subsequent works were to be modelled. Perhaps this broken and reversed order of publication has not been unfortunate for the success of the author—perhaps it was dimly foreseen to be not altogether impolitic; for the popular ear was gained by the bold and enthusiastic defence of a great painter; and the ear of the public, once caught, may be detained by matter which, in the first instance, would have appealed to it in vain. Whether the effect of chance or design, we may certainly congratulate Mr Ruskin on the fortunate succession, and the fortunate rapidity with which his publications have struck on the public ear. The popular feeling, won by the zeal and intrepidity of the first volume of Modern Painters, was no doubt a little tried by the graver discussions of the second. It was soon, however, to be again caught, and pleased by a bold and agreeable miscellany under the magical name of The Seven Lamps; and these Seven Lamps could hardly fail to throw some portion of their pleasant and bewildering light over a certain rudimentary treatise upon building, which was to appear under the title of The Stones of Venice. We cannot, however, congratulate Mr Ruskin on the manner in which he has acquitted himself in this arena of philosophical inquiry, nor on the sort of theory of the Beautiful which he has contrived to construct. The least metaphysical of our readers is aware that there is a controversy of long standing upon this subject, between two different schools of philosophy. With the one the beautiful is described as a great idea of the reason, or an intellectual intuition, or a simple intuitive perception; different expressions are made use of, but all imply that it is a great primary feeling, or sentiment, or idea of the human mind, and as incapable of further analysis as the idea of space, or the simplest of our sensations. The rival school of theorists maintain, on the contrary, that no sentiment yields more readily to analysis; and that the beautiful, except in those rare cases where the whole charm lies in one sensation, as in that of colour, is a complex sentiment. They describe it as a pleasure resulting from the presence of the visible object, but of which the visible object is only in part the immediate cause. Of a great portion of the pleasure
  • 59. it is merely the vehicle; and they say that blended reminiscences, gathered from every sense, and every human affection, from the softness of touch of an infant's finger to the highest contemplations of a devotional spirit, have contributed, in their turn, to this delightful sentiment. Mr Ruskin was not bound to belong to either of these schools of philosophy; he was at liberty to construct an eclectic system of his own;—and he has done so. We shall take the precaution, in so delicate a matter, of quoting Mr Ruskin's own words for the exposition of his own theory. Meanwhile, as some clue to the reader, we may venture to say that he agrees with the first of these schools in adopting a primary intuitive sentiment of the beautiful; but then this primary intuition is only of a sensational or animal nature—a subordinate species of the beautiful, which is chiefly valuable as the necessary condition of the higher and truly beautiful; and this last he agrees with the opposite school in regarding as a derived sentiment —derived by contemplating the objects of external nature as types of the Divine attributes. This is a brief summary of the theory; for a fuller exposition we shall have recourse to his own words. The term Æsthetic, which has been applied to this branch of philosophy, Mr Ruskin discards; he offers as a substitute Theoria, or The Theoretic Faculty, the meaning of which he thus explains:— I proceed, therefore, first to examine the nature of what I have called the theoretic faculty, and to justify my substitution of the term 'Theoretic' for 'Æsthetic,' which is the one commonly employed with reference to it. Now the term 'æsthesis' properly signifies mere sensual perception of the outward qualities and necessary effects of bodies; in which sense only, if we would arrive at any accurate conclusions on this difficult subject, it should always be used. But I wholly deny that the impressions of beauty are in any way sensual;—they are neither sensual nor intellectual, but moral; and for the faculty receiving them, whose difference from mere
  • 60. perception I shall immediately endeavour to explain, no terms can be more accurate or convenient than that employed by the Greeks, 'Theoretic,' which I pray permission, therefore, always to use, and to call the operation of the faculty itself, Theoria.— (P. 11.) We are introduced to a new faculty of the human mind; let us see what new or especial sphere of operation is assigned to it. After some remarks on the superiority of the mere sensual pleasures of the eye and the ear, but particularly of the eye, to those derived from other organs of sense, he continues:— Herein, then, we find very sufficient ground for the higher estimation of these delights: first, in their being eternal and inexhaustible; and, secondly, in their being evidently no meaner instrument of life, but an object of life. Now, in whatever is an object of life, in whatever may be infinitely and for itself desired, we may be sure there is something of divine: for God will not make anything an object of life to his creatures which does not point to, or partake of himself,—[a bold assertion.] And so, though we were to regard the pleasures of sight merely as the highest of sensual pleasures, and though they were of rare occurrence—and, when occurring, isolated and imperfect—there would still be supernatural character about them, owing to their self-sufficiency. But when, instead of being scattered, interrupted, or chance-distributed, they are gathered together and so arranged to enhance each other, as by chance they could not be, there is caused by them, not only a feeling of strong affection towards the object in which they exist, but a perception of purpose and adaptation of it to our desires; a perception, therefore, of the immediate operation of the Intelligence which so formed us and so feeds us. Out of what perception arise Joy, Admiration, and Gratitude? Now, the mere animal consciousness of the pleasantness I call Æsthesis; but the exulting, reverent, and grateful perception
  • 61. of it I call Theoria. For this, and this only, is the full comprehension and contemplation of the beautiful as a gift of God; a gift not necessary to our being, but adding to and elevating it, and twofold—first, of the desire; and, secondly, of the thing desired. We find, then, that in the production of the full sentiment of the beautiful two faculties are employed, or two distinct operations denoted. First, there is the animal pleasantness which we call Æsthesis,—which sometimes appears confounded with the mere pleasures of sense, but which the whole current of his speculations obliges us to conclude is some separate intuition of a sensational character; and, secondly, there is the exulting, reverent, and grateful perception of it, which we call Theoria, which alone is the truly beautiful, and which it is the function of the Theoretic Faculty to reveal to us. But this new Theoretic Faculty—what can it be but the old faculty of Human Reason, exercised upon the great subject of Divine beneficence? Mr Ruskin, as we shall see, discovers that external objects are beautiful because they are types of Divine attributes; but he admits, and is solicitous to impress upon our minds, that the meaning of these types is learnt. When, in a subsequent part of his work, he feels himself pressed by the objection that many celebrated artists, who have shown a vivid appreciation and a great passion for the beautiful, have manifested no peculiar piety, have been rather deficient in spiritual-mindedness, he gives them over to that instinctive sense he has called Æsthesis, and says—It will be remembered that I have, throughout the examination of typical beauty, asserted our instinctive sense of it; the moral meaning of it being only discoverable by reflection, (p. 127.) Now, there is no other conceivable manner in which the meaning of the type can be learnt than by the usual exercise of the human reason, detecting traces of the Divine power, and wisdom, and benevolence, in the external world, and then associating with the various objects of the external world the ideas we have thus acquired of the Divine wisdom
  • 62. and goodness. The rapid and habitual regard of certain facts or appearances in the visible world, as types of the attributes of God, can be nothing else but one great instance (or class of instances) of that law of association of ideas on which the second school of philosophy we have alluded to so largely insist. And thus, whether Mr Ruskin chooses to acquiesce in it or not, his Theoria resolves itself into a portion, or fragment, of that theory of association of ideas, to which he declares, and perhaps believes, himself to be violently opposed. In a very curious manner, therefore, has Mr Ruskin selected his materials from the two rival schools of metaphysics. His Æsthesis is an intuitive perception, but of a mere sensual or animal nature— sometimes almost confounded with the mere pleasure of sense, at other times advanced into considerable importance, as where he has to explain the fact that men of very little piety have a very acute perception of beauty. His Theoria is, and can be, nothing more than the results of human reason in its highest and noblest exercise, rapidly brought before the mind by a habitual association of ideas. For the lowest element of the beautiful he runs to the school of intuitions;—they will not thank him for the compliment;—for the higher to that analytic school, and that theory of association of ideas, to which throughout he is ostensibly opposed. This Theoria divides itself into two parts. We shall quote Mr Ruskin's own words and take care to quote from them passages where he seems most solicitous to be accurate and explanatory:— The first thing, then, we have to do, he says, is accurately to discriminate and define those appearances from which we are about to reason as belonging to beauty, properly so called, and to clear the ground of all the confused ideas and erroneous theories with which the misapprehension or metaphorical use of the term has encumbered it. By the term Beauty, then, properly are signified two things: first, that external quality of bodies, already so often spoken of,
  • 63. and which, whether it occur in a stone, flower, beast, or in man, is absolutely identical—which, as I have already asserted, may be shown to be in some sort typical of the Divine attributes, and which, therefore, I shall, for distinction's sake, call Typical Beauty; and, secondarily, the appearance of felicitous fulfilment of functions in living things, more especially of the joyful and right exertion of perfect life in man—and this kind of beauty I shall call Vital Beauty.—(P. 26.) The Vital Beauty, as well as the Typical, partakes essentially, as far as we can understand our author, of a religious character. On turning to that part of the volume where it is treated of at length, we find a universal sympathy and spirit of kindliness very properly insisted on, as one great element of the sentiment of beauty; but we are not permitted to dwell upon this element, or rest upon it a moment, without some reference to our relation to God. Even the animals themselves seem to be turned into types for us of our moral feelings or duties. We are expressly told that we cannot have this sympathy with life and enjoyment in other creatures, unless it takes the form of, or comes accompanied with, a sentiment of piety. In all cases where the beautiful is anything higher than a certain animal pleasantness, we are to understand that it has a religious character. In all cases, he says, summing up the functions of the Theoretic Faculty, it is something Divine; either the approving voice of God, the glorious symbol of Him, the evidence of His kind presence, or the obedience to His will by Him induced and supported,—(p. 126.) Now it is a delicate task, when a man errs by the exaggeration of a great truth or a noble sentiment, to combat his error; and yet as much mischief may ultimately arise from an error of this description as from any other. The thoughts and feelings which Mr Ruskin has described, form the noblest part of our sentiment of the beautiful, as they form the noblest phase of the human reason. But they are not the whole of it. The visible object, to adopt his phraseology, does become a type to the contemplative and pious mind of the attribute of God, and is thus exalted to our apprehension. But it is not
  • 64. beautiful solely or originally on this account. To assert this, is simply to falsify our human nature. Before, however, we enter into these types, or this typical beauty, it will be well to notice how Mr Ruskin deals with previous and opposing theories. It will be well also to remind our readers of the outline of that theory of association of ideas which is here presented to us in so very confused a manner. We shall then be better able to understand the very curious position our author has taken up in this domain of speculative philosophy. Mr Ruskin gives us the following summary of the errors which he thinks it necessary in the first place to clear from his path:— Those erring or inconsistent positions which I would at once dismiss are, the first, that the beautiful is the true; the second, that the beautiful is the useful; the third, that it is dependent on custom; and the fourth, that it is dependent on the association of ideas. The first of these theories, that the beautiful is the true, we leave entirely to the tender mercies of Mr Ruskin; we cannot gather from his refutation to what class of theorists he is alluding. The remaining three are, as we understand the matter, substantially one and the same theory. We believe that no one, in these days, would define beauty as solely resulting either from the apprehension of Utility, (that is, the adjustment of parts to a whole, or the application of the object to an ulterior purpose,) or to Familiarity and the affection which custom engenders; but they would regard both Utility and Familiarity as amongst the sources of those agreeable ideas or impressions, which, by the great law of association, became intimately connected with the visible object. We must listen, however, to Mr Ruskin's refutation of them:— That the beautiful is the useful is an assertion evidently based on that limited and false sense of the latter term which I have already deprecated. As it is the most degrading and
  • 65. dangerous supposition which can be advanced on the subject, so, fortunately, it is the most palpably absurd. It is to confound admiration with hunger, love with lust, and life with sensation; it is to assert that the human creature has no ideas and no feelings, except those ultimately referable to its brutal appetites. It has not a single fact, nor appearance of fact, to support it, and needs no combating—at least until its advocates have obtained the consent of the majority of mankind that the most beautiful productions of nature are seeds and roots; and of art, spades and millstones. Somewhat more rational grounds appear for the assertion that the sense of the beautiful arises from familiarity with the object, though even this could not long be maintained by a thinking person. For all that can be alleged in defence of such a supposition is, that familiarity deprives some objects which at first appeared ugly of much of their repulsiveness; whence it is as rational to conclude that familiarity is the cause of beauty, as it would be to argue that, because it is possible to acquire a taste for olives, therefore custom is the cause of lusciousness in grapes.... I pass to the last and most weighty theory, that the agreeableness in objects which we call beauty is the result of the association with them of agreeable or interesting ideas. Frequent has been the support and wide the acceptance of this supposition, and yet I suppose that no two consecutive sentences were ever written in defence of it, without involving either a contradiction or a confusion of terms. Thus Alison, 'There are scenes undoubtedly more beautiful than Runnymede, yet, to those who recollect the great event that passed there, there is no scene perhaps which so strongly seizes on the imagination,'—where we are wonder-struck at the bold obtuseness which would prove the power of imagination by its overcoming that very other power (of inherent beauty) whose existence the arguer desires; for the only logical conclusion
  • 66. which can possibly be drawn from the above sentence is, that imagination is not the source of beauty—for, although no scene seizes so strongly on the imagination, yet there are scenes 'more beautiful than Runnymede.' And though instances of self- contradiction as laconic and complete as this are rare, yet, if the arguments on the subject be fairly sifted from the mass of confused language with which they are always encumbered, they will be found invariably to fall into one of these two forms: either association gives pleasure, and beauty gives pleasure, therefore association is beauty; or the power of association is stronger than the power of beauty, therefore the power of association is the power of beauty. Now this last sentence is sheer nonsense, and only proves that the author had never given himself the trouble to understand the theory he so flippantly discards. No one ever said that association gives pleasure; but very many, and Mr Ruskin amongst the rest, have said that associated thought adds its pleasure to an object pleasing in itself, and thus increases the complex sentiment of beauty. That it is a complex sentiment in all its higher forms, Mr Ruskin himself will tell us. As to the manner in which he deals with Alison, it is in the worst possible spirit of controversy. Alison was an elegant, but not a very precise writer; it was the easiest thing in the world to select an unfortunate illustration, and to convict that of absurdity. Yet he might with equal ease have selected many other illustrations from Alison, which would have done justice to the theory he expounds. A hundred such will immediately occur to the reader. If, instead of a historical recollection of this kind, which could hardly make the stream itself of Runnymede look more beautiful, Alison had confined himself to those impressions which the generality of mankind receive from river scenery, he would have had no difficulty in showing (as we believe he has elsewhere done) how, in this case, ideas gathered from different sources flow into one harmonious and apparently simple feeling. That sentiment of beauty which arises as we look upon a river will be acknowledged by most persons to be composed of many associated thoughts, combining with the object before
  • 67. them. Its form and colour, its bright surface and its green banks, are all that the eye immediately gives us; but with these are combined the remembered coolness of the fluent stream, and of the breeze above it, and of the pleasant shade of its banks; and beside all this— as there are few persons who have not escaped with delight from town or village, to wander by the quiet banks of some neighbouring stream, so there are few persons who do not associate with river scenery ideas of peace and serenity. Now many of these thoughts or facts are such as the eye does not take cognisance of, yet they present themselves as instantaneously as the visible form, and so blended as to seem, for the moment, to belong to it. Why not have selected some such illustration as this, instead of the unfortunate Runnymede, from a work where so many abound as apt as they are elegantly expressed? As to Mr Ruskin's utilitarian philosopher, it is a fabulous creature—no such being exists. Nor need we detain ourselves with the quite departmental subject of Familiarity. But let us endeavour—without desiring to pledge ourselves or our readers to its final adoption—to relieve the theory of association of ideas from the obscurity our author has thrown around it. Our readers will not find that this is altogether a wasted labour. With Mr Ruskin we are of opinion that, in a discussion of this kind, the term Beauty ought to be limited to the impression derived, mediately or immediately, from the visible object. It would be useless affectation to attempt to restrict the use of the word, in general, to this application. We can have no objection to the term Beautiful being applied to a piece of music, or to an eloquent composition, prose or verse, or even to our moral feelings and heroic actions; the word has received this general application, and there is, at basis, a great deal in common between all these and the sentiment of beauty attendant on the visible object. For music, or sweet sounds, and poetry, and our moral feelings, have much to do (through the law of association) with our sentiment of the Beautiful. It is quite enough if, speaking of the subject of our analysis, we limit
  • 68. it to those impressions, however originated, which attend upon the visible object. One preliminary word on this association of ideas. It is from its very nature, and the nature of human life, of all degrees of intimacy —from the casual suggestion, or the case where the two ideas are at all times felt to be distinct, to those close combinations where the two ideas have apparently coalesced into one, or require an attentive analysis to separate them. You see a mass of iron; you may be said to see its weight, the impression of its weight is so intimately combined with its form. The light of the sun, and the heat of the sun are learnt from different senses, yet we never see the one without thinking of the other, and the reflection of the sunbeam seen upon a bank immediately suggests the idea of warmth. But it is not necessary that the combination should be always so perfect as in this instance, in order to produce the effect we speak of under the name of Association of Ideas. It is hardly possible for us to abstract the glow of the sunbeam from its light; but the fertility which follows upon the presence of the sun, though a suggestion which habitually occurs to reflective minds, is an association of a far less intimate nature. It is sufficiently intimate, however, to blend with that feeling of admiration we have when we speak of the beauty of the sun. There is the golden harvest in its summer beams. Again, the contemplative spirit in all ages has formed an association between the sun and the Deity, whether as the fittest symbol of God, or as being His greatest gift to man. Here we have an association still more refined, and of a somewhat less frequent character, but one which will be found to enter, in a very subtle manner, into that impression we receive from the great luminary. And thus it is that, in different minds, the same materials of thought may be combined in a closer or laxer relationship. This should be borne in mind by the candid inquirer. That in many instances ideas from different sources do coalesce, in the manner we have been describing, he cannot for an instant doubt. He seems to see the coolness of that river; he seems to see the warmth on that sunny bank. In many instances, however, he must make
  • 69. allowance for the different habitudes of life. The same illustration will not always have the same force to all men. Those who have cultivated their minds by different pursuits, or lived amongst scenery of a different character, cannot have formed exactly the same moral association with external nature. These preliminaries being adjusted, what, we ask, is that first original charm of the visible object which serves as the foundation for this wonderful superstructure of the Beautiful, to which almost every department of feeling and of thought will be found to bring its contribution? What is it so pleasurable that the eye at once receives from the external world, that round it should have gathered all these tributary pleasures? Light—colour—form; but, in reference to our discussion, pre-eminently the exquisite pleasure derived from the sense of light, pure or coloured. Colour, from infancy to old age, is one original, universal, perpetual source of delight, the first and constant element of the Beautiful. We are far from thinking that the eye does not at once take cognisance of form as well as colour. Some ingenious analysts have supposed that the sensation of colour is, in its origin, a mere mental affection, having no reference to space or external objects, and that it obtains this reference through the contemporaneous acquisition of the sense of touch. But there can be no more reason for supposing that the sense of touch informs us immediately of an external world than that the sense of colour does. If we do not allow to all the senses an intuitive reference to the external world, we shall get it from none of them. Dr Brown, who paid particular attention to this subject, and who was desirous to limit the first intimation of the sense of sight to an abstract sensation of unlocalised colour, failed entirely in his attempt to obtain from any other source the idea of space or outness; Kant would have given him certain subjective forms of the sensitive faculty, space and time. These he did not like: he saw that, if he denied to the eye an immediate perception of the external world, he must also deny it to the touch; he therefore prayed in aid certain muscular sensations from which the idea of resistance would be obtained. But it seems to us evident that not till
  • 70. after we have acquired a knowledge of the external world can we connect volition with muscular movement, and that, until that connection is made, the muscular sensations stand in the same predicament as other sensations, and could give him no aid in solving his problem. We cannot go further into this matter at present.[6] The mere flash of light which follows the touch upon the optic nerve represents itself as something without; nor was colour, we imagine, ever felt, but under some form more or less distinct; although in the human being the eye seems to depend on the touch far more than in other animals, for its further instruction. But although the eye is cognisant of form as well as colour, it is in the sensation of colour that we must seek the primitive pleasure derived from this organ. And probably the first reason why form pleases is this, that the boundaries of form are also the lines of contrast of colour. It is a general law of all sensation that, if it be continued, our susceptibility to it declines. It was necessary that the eye should be always open. Its susceptibility is sustained by the perpetual contrast of colours. Whether the contrast is sudden, or whether one hue shades gradually into another, we see here an original and primary source of pleasure. A constant variety, in some way produced, is essential to the maintenance of the pleasure derived from colour. It is not incumbent on us to inquire how far the beauty of form may be traceable to the sensation of touch;—a very small portion of it we suspect. In the human countenance, and in sculpture, the beauty of form is almost resolvable into expression; though possibly the soft and rounded outline may in some measure be associated with the sense of smoothness to the touch. All that we are concerned to show is, that there is here in colour, diffused as it is over the whole world, and perpetually varied, a beauty at once showered upon the visible object. We hear it said, if you resolve all into association, where will you begin? You have but a circle of feelings. If moral sentiment, for instance, be not itself the beautiful, why should it become so by association. There must be something else that is the beautiful, by association with which it passes for
  • 71. such. We answer, that we do not resolve all into association; that we have in this one gift of colour, shed so bountifully over the whole world, an original beauty, a delight which makes the external object pleasant and beloved; for how can we fail, in some sort, to love what produces so much pleasure? We are at a loss to understand how any one can speak with disparagement of colour as a source of the beautiful. The sculptor may, perhaps, by his peculiar education, grow comparatively indifferent to it: we know not how this may be; but let any man, of the most refined taste imaginable, think what he owes to this source, when he walks out at evening, and sees the sun set amongst the hills. The same concave sky, the same scene, so far as its form is concerned, was there a few hours before, and saddened him with its gloom; one leaden hue prevailed over all; and now in a clear sky the sun is setting, and the hills are purple, and the clouds are radiant with every colour that can be extracted from the sunbeam. He can hardly believe that it is the same scene, or he the same man. Here the grown-up man and the child stand always on the same level. As to the infant, note how its eye feeds upon a brilliant colour, or the living flame. If it had wings, it would assuredly do as the moth does. And take the most untutored rustic, let him be old, and dull, and stupid, yet, as long as the eye has vitality in it, will he look up with long untiring gaze at this blue vault of the sky, traversed by its glittering clouds, and pierced by the tall green trees around him. Is it any marvel now that round the visible object should associate tributary feelings of pleasure? How many pleasing and tender sentiments gather round the rose! Yet the rose is beautiful in itself. It was beautiful to the child by its colour, its texture, its softly- shaded leaf, and the contrast between the flower and the foliage. Love, and poetry, and the tender regrets of advanced life, have contributed a second dower of beauty. The rose is more to the youth and to the old man than it was to the child; but still to the last they both feel the pleasure of the child.
  • 72. The more commonplace the illustration, the more suited it is to our purpose. If any one will reflect on the many ideas that cluster round this beautiful flower, he will not fail to see how numerous and subtle may be the association formed with the visible object. Even an idea painful in itself may, by way of contrast, serve to heighten the pleasure of others with which it is associated. Here the thought of decay and fragility, like a discord amongst harmonies, increases our sentiment of tenderness. We express, we believe, the prevailing taste when we say that there is nothing, in the shape of art, so disagreeable and repulsive as artificial flowers. The waxen flower may be an admirable imitation, but it is a detestable thing. This partly results from the nature of the imitation; a vulgar deception is often practised upon us: what is not a flower is intended to pass for one. But it is owing still more, we think, to the contradiction that is immediately afterwards felt between this preserved and imperishable waxen flower, and the transitory and perishable rose. It is the nature of the rose to bud, and blossom, and decay; it gives its beauty to the breeze and to the shower; it is mortal; it is ours; it bears our hopes, our loves, our regrets. This waxen substitute, that cannot change or decay, is a contradiction and a disgust. Amongst objects of man's contrivance, the sail seen upon the calm waters of a lake or a river is universally felt to be beautiful. The form is graceful, and the movement gentle, and its colour contrasts well either with the shore or the water. But perhaps the chief element of our pleasure is all association with human life, with peaceful enjoyment—
  • 73. This quiet sail is as a noiseless wing, To waft me from distraction. Or take one of the noblest objects in nature—the mountain. There is no object except the sea and the sky that reflects to the sight colours so beautiful, and in such masses. But colour, and form, and magnitude, constitute but a part of the beauty or the sublimity of the mountain. Not only do the clouds encircle or rest upon it, but men have laid on it their grandest thoughts: we have associated with it our moral fortitude, and all we understand of greatness or elevation of mind; our phraseology seems half reflected from the mountain. Still more, we have made it holy ground. Has not God himself descended on the mountain? Are not the hills, once and for ever, the unwalled temples of our earth? And still there is another circumstance attendant upon mountain scenery, which adds a solemnity of its own, and is a condition of the enjoyment of other sources of the sublime—solitude. It seems to us that the feeling of solitude almost always associates itself with mountain scenery. Mrs Somerville, in the description which she gives or quotes, in her Physical Geography, of the Himalayas, says— The loftiest peaks being bare of snow gives great variety of colour and beauty to the scenery, which in these passes is at all times magnificent. During the day, the stupendous size of the mountains, their interminable extent, the variety and the sharpness of their forms, and, above all, the tender clearness of their distant outline melting into the pale blue sky, contrasted with the deep azure above, is described as a scene of wild and wonderful beauty. At midnight, when myriads of stars sparkle in the black sky, and the pure blue of the mountains looks deeper still below the pale white gleam of the earth and snow-light, the effect is of unparalleled sublimity, and no language can describe the splendour of the sunbeams at daybreak, streaming between the high peaks, and throwing their gigantic shadows on the mountains below. There, far above the habitation of man, no
  • 74. living thing exists, no sound is heard; the very echo of the traveller's footsteps startles him in the awful solitude and silence that reigns in those august dwellings of everlasting snow. No one can fail to recognise the effect of the last circumstance mentioned. Let those mountains be the scene of a gathering of any human multitude, and they would be more desecrated than if their peaks had been levelled to the ground. We have also quoted this description to show how large a share colour takes in beautifying such a scene. Colour, either in large fields of it, or in sharp contrasts, or in gradual shading—the play of light, in short, upon this world—is the first element of beauty. Here would be the place, were we writing a formal treatise upon this subject, after showing that there is in the sense of sight itself a sufficient elementary beauty, whereto other pleasurable reminiscences may attach themselves, to point out some of these tributaries. Each sense—the touch, the ear, the smell, the taste— blend their several remembered pleasures with the object of vision. Even taste, we say, although Mr Ruskin will scorn the gross alliance. And we would allude to the fact to show the extreme subtilty of these mental processes. The fruit which you think of eating has lost its beauty from that moment—it assumes to you a quite different relation; but the reminiscence that there is sweetness in the peach or the grape, whilst it remains quite subordinate to the pleasure derived from the sense of sight, mingles with and increases that pleasure. Whilst the cluster of ripe grapes is looked at only for its beauty, the idea that they are pleasant to the taste as well steals in unobserved, and adds to the complex sentiment. If this idea grow distinct and prominent, the beauty of the grape is gone—you eat it. Here, too, would be the place to take notice of such sources of pleasure as are derived from adaptation of parts, or the adaptation of the whole to ulterior purposes; but here especially should we insist on human affections, human loves, human sympathies. Here, in the heart of man, his hopes, his regrets, his affections, do we find the great source of the beautiful—tributaries which take their name
  • 75. from the stream they join, but which often form the main current. On that sympathy with which nature has so wonderfully endowed us, which makes the pain and pleasure of all other living things our own pain and pleasure, which binds us not only to our fellow-men, but to every moving creature on the face of the earth, we should have much to say. How much, for instance, does its life add to the beauty of the swan!—how much more its calm and placid life! Here, and on what would follow on the still more exalted mood of pious contemplation—when all nature seems as a hymn or song of praise to the Creator—we should be happy to borrow aid from Mr Ruskin; his essay supplying admirable materials for certain chapters in a treatise on the beautiful which should embrace the whole subject. No such treatise, however, is it our object to compose. We have said enough to show the true nature of that theory of association, as a branch of which alone is it possible to take any intelligible view of Mr Ruskin's Theoria, or Theoretic Faculty. His flagrant error is, that he will represent a part for the whole, and will distort and confuse everything for the sake of this representation. Viewed in their proper limitation, his remarks are often such as every wise and good man will approve of. Here and there too, there are shrewd intimations which the psychological student may profit by. He has pointed out several instances where the associations insisted upon by writers of the school of Alison have nothing whatever to do with the sentiment of beauty; and neither harmonise with, nor exalt it. Not all that may, in any way, interest us in an object, adds to its beauty. Thus, as Mr Ruskin we think very justly says, where we are told that the leaves of a plant are occupied in decomposing carbonic acid, and preparing oxygen for us, we begin to look upon it with some such indifference as upon a gasometer. It has become a machine; some of our sense of its happiness is gone; its emanation of inherent life is no longer pure. The knowledge of the anatomical structure of the limb is very interesting, but it adds nothing to the beauty of its outline. Scientific associations, however, of this kind, will have a different æsthetic effect, according to the degree or the enthusiasm with which the science has been studied.
  • 76. It is not our business to advocate this theory of association of ideas, but briefly to expound it. But we may remark that those who adopt (as Mr Ruskin has done in one branch of his subject—his Æsthesis) the rival theory of an intuitive perception of the beautiful, must find a difficulty where to insert this intuitive perception. The beauty of any one object is generally composed of several qualities and accessories—to which of these are we to connect this intuition? And if to the whole assemblage of them, then, as each of these qualities has been shown by its own virtue to administer to the general effect, we shall be explaining again by this new perception what has been already explained. Select any notorious instance of the beautiful—say the swan. How many qualities and accessories immediately occur to us as intimately blended in our minds with the form and white plumage of the bird! What were its arched neck and mantling wings if it were not living? And how the calm and inoffensive, and somewhat majestic life it leads, carries away our sympathies! Added to which, the snow-white form of the swan is imaged in clear waters, and is relieved by green foliage; and if the bird makes the river more beautiful, the river, in return, reflects its serenity and peacefulness upon the bird. Now all this we seem to see as we look upon the swan. To which of these facts separately will you attach this new intuition? And if you wait till all are assembled, the bird is already beautiful. We are all in the habit of reasoning on the beautiful, of defending our own tastes, and this just in proportion as the beauty in question is of a high order. And why do we do this? Because, just in proportion as the beauty is of an elevated character, does it depend on some moral association. Every argument of this kind will be found to consist of an analysis of the sentiment. Nor is there anything derogatory, as some have supposed, in this analysis of the sentiment; for we learn from it, at every step, that in the same degree as men become more refined, more humane, more kind, equitable, and pious, will the visible world become more richly clad with beauty. We see here an admirable arrangement, whereby the external world grows in beauty, as men grow in goodness.
  • 77. We must now follow Mr Ruskin a step farther into the development of his Theoria. All beauty, he tell us, is such, in its high and only true character, because it is a type of one or more of God's attributes. This, as we have shown, is to represent one class of associated thought as absorbing and displacing all the rest. We protest against this egregious exaggeration of a great and sacred source of our emotions. With Mr Ruskin's own piety we can have no quarrel; but we enter a firm and calm protest against a falsification of our human nature, in obedience to one sentiment, however sublime. No good can come of it—no good, we mean, to religion itself. It is substantially the same error, though assuming a very different garb, which the Puritans committed. They disgusted men with religion, by introducing it into every law and custom, and detail of human life. Mr Ruskin would commit the same error in the department of taste, over which he would rule so despotically: he is not content that the highest beauty shall be religious; he will permit nothing to be beautiful, except as it partakes of a religious character. But there is a vast region lying between the animal pleasantness of his Æsthesis and the pious contemplation of his Theoria. There is much between the human animal and the saint; there are the domestic affections and the love they spring from, and hopes, and regrets, and aspirations, and the hour of peace and the hour of repose—in short, there is human life. From all human life, as we have seen, come contributions to the sentiment of the beautiful, quite as distinctly traced as the peculiar class on which Mr Ruskin insists. If any one descanting upon music should affirm, that, in the first place, there was a certain animal pleasantness in harmony or melody, or both, but that the real essence of music, that by which it truly becomes music, was the perception in harmony or melody of types of the Divine attributes, he would reason exactly in the same manner on music as Mr Ruskin does on beauty. Nevertheless, although sacred music is the highest, it is very plain that there is other music than the sacred, and that all songs are not hymns.
  • 78. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com