SlideShare a Scribd company logo
LogicNetwork Memory
③ System Software (BITFLEX, etc)
② Emulation/Simulation (RAMinate, mesmeric, etc)
Applications (Deep Learning, Graph Processing, etc)
① Error Modeling of Devices
Error Permissive Computing: a New Approach
for Post Moore’s Computer System Design
Abstract We are exploring a new concept of error permissive computing that improves the
capability and capacity while drastically reducing power consumption. More specifically, we
controllably allow hardware errors and develop system software to assure acceptable computational
results. For example, an error correction technique can result in increased latency and reduced
capacity. By taking a holistic approach across the layers from hardware to software, lightweight and
appropriate error correction is performed at the software layer while eliminating general purpose
error correction in hardware layer.
Ryousei Takano, Takahiro Hirofuchi, Mohamed Wahib,
Truong Thao Nguyen, Hiroki Kanezashi, Akram Ben Ahmed
National Institute of Advanced Industrial Science and Technology
The 2nd R-CCS International Symposium, Kobe, February 2020
References
[1] R. Barton, et al. “BITFLEX: A Dynamic Runtime Library for Bit-Level Precision Manipulation and Approximate Computing,” HPC Asia 2020.
[2] T. Hirofuchi, et al. “FPGAによる次世代メモリのエミュレーション機構の試作”, IPSJ SIGHPC171, 2019.
[3] T. Nguyen, et al. “Topology-aware Sparse Allreduce for Large-scale Deep Learning”, IEEE IPCCC 2019.
BITFLEX Full Stack
(OpenMP Extension)
ADAPT Case Study: Pi Accumulator
• We require an attractive means
of boosting performance and
maintaining accuracy in non-
deterministic applications.
• Solution: BITFLEX framework
incorporated in MCXX compiler.
• We propose an extension of
OpenMP as follows:
#pragma omp nondeter <parameters>
Analysis and modeling of bit-flip errors
in voltage-driven MRAM
• The write error ratio of each memory cell is different
due to the variation of magnetic anisotropy (σ).
FPGA-based new memory device emulator [2]
• Emulate the behavior of new memory devices (latency,
bandwidth, bit error ratio) with high accurate.
• Enable detailed performance evaluation of new system software
mechanisms.
BITFLEX: A framework to enable
error permissive computing [1]
Sparse communication
ü 100x-1000x compressed
ü Reduce communication time ~40% more
Topology-aware Allreduce
ü Reduce comm. time up to 45%
ü Reduce power consumption of comm. up to 23%
0
0.01
0.02
0.03
0.04
0.05
0.06
4 8 16 32 64
Comm.time/iter.(s)
Number of processes
Baseline (ring)
Topology-aware (ring-ring)
Topology-aware + Sparse
Simulated result with ABCI-system, 32MB-message,
0.78% sparcification
Accelerating communication for
large-scaler deep learning [3]
Reliable Unreliable Memory
Operating System
Object Analysis and Tracking
Low ß---- Bit-flip tolerance ---à High
Programming Runtime
…
Error
Mitigation
lowerisbetter

More Related Content

DOCX
Novel reconfigurable hardware architecture for polynomial matrix multiplications
PPT
Graph Matching
PPTX
Parallel processing coa
PDF
Hardware Architecture for Calculating LBP-Based Image Region Descriptors
PDF
Moldable pipelines for CNNs on heterogeneous edge devices
PPT
Adaptive Computing Seminar - Suyog Potdar
PDF
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
DOCX
IEEE 2014 NS2 NETWORKING PROJECTS Discount counting for fast flow statistics...
Novel reconfigurable hardware architecture for polynomial matrix multiplications
Graph Matching
Parallel processing coa
Hardware Architecture for Calculating LBP-Based Image Region Descriptors
Moldable pipelines for CNNs on heterogeneous edge devices
Adaptive Computing Seminar - Suyog Potdar
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
IEEE 2014 NS2 NETWORKING PROJECTS Discount counting for fast flow statistics...

What's hot (20)

PPTX
Network Simulators Comparison Research Help
PPTX
Software and Hardware Tools for Microprocessors
PPTX
MATLAB Projects for Master Thesis Students
PDF
Task programming in cloud computing
PPTX
HPC with Clouds and Cloud Technologies
PPTX
Ocr using tensor flow
PDF
HAWQ-V3: Dyadic Neural Network Quantization
PPT
Parallel Computing 2007: Bring your own parallel application
PPT
Parallel Computing 2007: Overview
PPT
Senior Year Seminar
PPTX
Neural networks in robotics
PPT
Glsv00dare
PDF
AnupVMathur
DOCX
An optimized modified booth recoder for efficient design of the add multiply ...
PPTX
Clone cloud
PPTX
Hardware Implementation of Tactile Data Processing Methods for the Reconstruc...
PDF
Software effort estimation through clustering techniques of RBFN network
PPT
Presentation
PPTX
A Guide to Data Versioning with MapR Snapshots
PDF
Aisi2017 keynote speaker
Network Simulators Comparison Research Help
Software and Hardware Tools for Microprocessors
MATLAB Projects for Master Thesis Students
Task programming in cloud computing
HPC with Clouds and Cloud Technologies
Ocr using tensor flow
HAWQ-V3: Dyadic Neural Network Quantization
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Overview
Senior Year Seminar
Neural networks in robotics
Glsv00dare
AnupVMathur
An optimized modified booth recoder for efficient design of the add multiply ...
Clone cloud
Hardware Implementation of Tactile Data Processing Methods for the Reconstruc...
Software effort estimation through clustering techniques of RBFN network
Presentation
A Guide to Data Versioning with MapR Snapshots
Aisi2017 keynote speaker
Ad

Similar to Error Permissive Computing (20)

PDF
Co question 2008
PDF
D031201021027
PDF
hetero_pim
PDF
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
PPTX
Design of a low power processor for Embedded system applications
PDF
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
PDF
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
PPT
EEE226a.ppt
PDF
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
PDF
Fpga based efficient multiplier for image processing applications using recur...
PDF
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
PDF
Real time machine learning proposers day v3
PDF
International Journal of Engineering Research and Development
PDF
Aqeel
PDF
EFFECTIVE EMBEDDED SYSTEMS SOFTWARE DESIGN METHODOLOGIES
PDF
Dx35705709
PDF
50120140505008
PPTX
Developing Real-Time Systems on Application Processors
PDF
Hardback solution to accelerate multimedia computation through mgp in cmp
Co question 2008
D031201021027
hetero_pim
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Design of a low power processor for Embedded system applications
An Investigation towards Effectiveness in Image Enhancement Process in MPSoC
Parallel Vector Tile-Optimized Library (PVTOL) Architecture-v3.pdf
EEE226a.ppt
Design of a Novel Multiplier and Accumulator using Modified Booth Algorithm w...
Fpga based efficient multiplier for image processing applications using recur...
A NETWORK-BASED DAC OPTIMIZATION PROTOTYPE SOFTWARE 2 (1).pdf
Real time machine learning proposers day v3
International Journal of Engineering Research and Development
Aqeel
EFFECTIVE EMBEDDED SYSTEMS SOFTWARE DESIGN METHODOLOGIES
Dx35705709
50120140505008
Developing Real-Time Systems on Application Processors
Hardback solution to accelerate multimedia computation through mgp in cmp
Ad

More from Ryousei Takano (20)

PDF
Opportunities of ML-based data analytics in ABCI
PDF
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
PDF
ABCI Data Center
PDF
クラウド環境におけるキャッシュメモリQoS制御の評価
PDF
USENIX NSDI 2016 (Session: Resource Sharing)
PDF
User-space Network Processing
PDF
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
PDF
A Look Inside Google’s Data Center Networks
PDF
クラウド時代の半導体メモリー技術
PDF
AIST Super Green Cloud: lessons learned from the operation and the performanc...
PDF
IEEE CloudCom 2014参加報告
PDF
Expectations for optical network from the viewpoint of system software research
PDF
Exploring the Performance Impact of Virtualization on an HPC Cloud
PDF
不揮発メモリとOS研究にまつわる何か
PDF
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
PDF
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
PDF
From Rack scale computers to Warehouse scale computers
PDF
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
PDF
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
PDF
IEEE/ACM SC2013報告
Opportunities of ML-based data analytics in ABCI
ABCI: An Open Innovation Platform for Advancing AI Research and Deployment
ABCI Data Center
クラウド環境におけるキャッシュメモリQoS制御の評価
USENIX NSDI 2016 (Session: Resource Sharing)
User-space Network Processing
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
A Look Inside Google’s Data Center Networks
クラウド時代の半導体メモリー技術
AIST Super Green Cloud: lessons learned from the operation and the performanc...
IEEE CloudCom 2014参加報告
Expectations for optical network from the viewpoint of system software research
Exploring the Performance Impact of Virtualization on an HPC Cloud
不揮発メモリとOS研究にまつわる何か
High-resolution Timer-based Packet Pacing Mechanism on the Linux Operating Sy...
クラウドの垣根を超えた高性能計算に向けて~AIST Super Green Cloudでの試み~
From Rack scale computers to Warehouse scale computers
高性能かつスケールアウト可能なHPCクラウド AIST Super Green Cloud
Iris: Inter-cloud Resource Integration System for Elastic Cloud Data Center
IEEE/ACM SC2013報告

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
sap open course for s4hana steps from ECC to s4
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Unlocking AI with Model Context Protocol (MCP)
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Chapter 3 Spatial Domain Image Processing.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MIND Revenue Release Quarter 2 2025 Press Release
sap open course for s4hana steps from ECC to s4
The AUB Centre for AI in Media Proposal.docx
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Error Permissive Computing

  • 1. LogicNetwork Memory ③ System Software (BITFLEX, etc) ② Emulation/Simulation (RAMinate, mesmeric, etc) Applications (Deep Learning, Graph Processing, etc) ① Error Modeling of Devices Error Permissive Computing: a New Approach for Post Moore’s Computer System Design Abstract We are exploring a new concept of error permissive computing that improves the capability and capacity while drastically reducing power consumption. More specifically, we controllably allow hardware errors and develop system software to assure acceptable computational results. For example, an error correction technique can result in increased latency and reduced capacity. By taking a holistic approach across the layers from hardware to software, lightweight and appropriate error correction is performed at the software layer while eliminating general purpose error correction in hardware layer. Ryousei Takano, Takahiro Hirofuchi, Mohamed Wahib, Truong Thao Nguyen, Hiroki Kanezashi, Akram Ben Ahmed National Institute of Advanced Industrial Science and Technology The 2nd R-CCS International Symposium, Kobe, February 2020 References [1] R. Barton, et al. “BITFLEX: A Dynamic Runtime Library for Bit-Level Precision Manipulation and Approximate Computing,” HPC Asia 2020. [2] T. Hirofuchi, et al. “FPGAによる次世代メモリのエミュレーション機構の試作”, IPSJ SIGHPC171, 2019. [3] T. Nguyen, et al. “Topology-aware Sparse Allreduce for Large-scale Deep Learning”, IEEE IPCCC 2019. BITFLEX Full Stack (OpenMP Extension) ADAPT Case Study: Pi Accumulator • We require an attractive means of boosting performance and maintaining accuracy in non- deterministic applications. • Solution: BITFLEX framework incorporated in MCXX compiler. • We propose an extension of OpenMP as follows: #pragma omp nondeter <parameters> Analysis and modeling of bit-flip errors in voltage-driven MRAM • The write error ratio of each memory cell is different due to the variation of magnetic anisotropy (σ). FPGA-based new memory device emulator [2] • Emulate the behavior of new memory devices (latency, bandwidth, bit error ratio) with high accurate. • Enable detailed performance evaluation of new system software mechanisms. BITFLEX: A framework to enable error permissive computing [1] Sparse communication ü 100x-1000x compressed ü Reduce communication time ~40% more Topology-aware Allreduce ü Reduce comm. time up to 45% ü Reduce power consumption of comm. up to 23% 0 0.01 0.02 0.03 0.04 0.05 0.06 4 8 16 32 64 Comm.time/iter.(s) Number of processes Baseline (ring) Topology-aware (ring-ring) Topology-aware + Sparse Simulated result with ABCI-system, 32MB-message, 0.78% sparcification Accelerating communication for large-scaler deep learning [3] Reliable Unreliable Memory Operating System Object Analysis and Tracking Low ß---- Bit-flip tolerance ---à High Programming Runtime … Error Mitigation lowerisbetter