Graph Analysis and
Novel Architectures
Jason Riedy (all opinions my own, no plans)
Lucata Corporation / Emu Technology
Sparse Days, 24 November 2020
Monument aux Combattants de la Haute-Garonne
Graph Analysis v. Hardware Architecture
“We” want:
● Fine-grained memory access,
● fine-grained synchronization,
● sane floating-point (to be defined someday), and
● everything else that drives HW people nuts.
WHY NOT?
Graph Analysis v. Hardware Architecture
“It’s too hard.” Need wide memories, big cache lines, etc.
Nope.
Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
Graph Analysis v. Hardware Architecture
“It’s too hard.” Need wide memories, big cache lines, etc.
Nope.
Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
Graph Analysis v. Hardware Architecture
“It’s too hard.” Need wide memories, big cache lines, etc.
Nope.
Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
How? Being specific.
The Lucata / Emu architecture focuses on fine-grained memory access.
This really exists. And is PGAS. Because... ● No cache.
● The OS is handled by the “boring” part.
● Physically distributed memory.
● Many threads to tolerate…
● LOCAL LATENCIES.
○ Read remotely? MIGRATE.
○ Small context, one flit.
○ Plenty of references.
● Oh, and by the way…
○ Narrow channel DRAM: No wasting
cache lines (so not using ⅛ BW).
○ Memory-side processing.
○ Including floating-point accumulation.
How? Being specific.
The Lucata / Emu architecture focuses on fine-grained memory access.
This really exists. And is PGAS. Because... ● No cache.
● The OS is handled by the “boring” part.
● Physically distributed memory.
● Many threads to tolerate…
● LOCAL LATENCIES.
○ Read remotely? MIGRATE.
○ Small context, one flit.
○ Plenty of references.
● Oh, and by the way…
○ Narrow channel DRAM: No wasting
cache lines (so not using ⅛ BW).
○ Memory-side processing.
○ Including floating-point accumulation.
Not the only idea out there.
● Metastrider
● Maybe embed sparse
gathers in memory
(CAMS)...
● 5.3x energy savings
● 11% performance boost
Sriseshan Srikanth, Anirudh Jain, Joseph M. Lennon, Thomas M. Conte, Erik Debenedictis, and Jeanine Cook. 2019. MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. ACM Trans. Archit. Code Optim. 16, 4, Article 35 (Janua
2020), 26 pages. DOI:https://doi.org/10.1145/3355396
Totally nuts ideas………...
What if……
● You could have a hardware dataflow architecture?
●
Borrowed from Cerebras Systems, Inc.
Totally nuts ideas………...
What if……
● You could have a hardware dataflow architecture?
● You could have “infinite” storage with logic?
●
A Rogues Gallery photo!
Totally nuts ideas………...
What if……
● You could have a hardware dataflow architecture?
● You could have “infinite” storage with logic?
● You could have programmable analog devices?
○ Neuromorphic? Waiting on the recount.
A Rogues Gallery photo!
The crazy thing is that all these exist.
So how are we taking advantage?
I apologize to the non-US folks. I only know our labs with testbeds:
● DoE: ORNL, LBNL, ANL, SNL (Sandia, not Saturday Night), …
● NSF: Georgia Tech’s Rogues Gallery, others…
● A64fx came from Japan / England.
● My preference baseline: RISC-V
○ (because you can bolt anything alongside)
No, really, go out and play!
Those ideas from the 80s and
before? YUP!
BTW, there are open foundries now…
No reason why algorithms folks should be quiet.
My photos are thanks to the Franco-Berkeley Fund.

More Related Content

PDF
Novel Architectures for Applications in Data Science and Beyond
PDF
Flexible and Scalable Domain-Specific Architectures
PDF
Barrelfish OS
PDF
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
PPT
Damon2011 preview
PDF
cachegrand: A Take on High Performance Caching
PDF
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
PPTX
The von Neumann Memory Barrier and Computer Architectures for the 21st Century
Novel Architectures for Applications in Data Science and Beyond
Flexible and Scalable Domain-Specific Architectures
Barrelfish OS
Realizing Exabyte-scale PM Centric Architectures and Memory Fabrics
Damon2011 preview
cachegrand: A Take on High Performance Caching
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
The von Neumann Memory Barrier and Computer Architectures for the 21st Century

Similar to Graph analysis and novel architectures (20)

PDF
Software Design for Persistent Memory Systems
PDF
What every-programmer-should-know-about-memory
PPT
Data flow super computing valentina balas
PDF
(eBook PDF) Parallel Computer Organization and Design
PPTX
DATE 2020: Design, Automation and Test in Europe Conference
PPTX
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
PDF
Ximea - the pc camera, 90 gflps smart camera
PDF
PDF
Mauricio breteernitiz hpc-exascale-iscte
PPT
Presentation
PDF
The big data_computing_architecture-graph500
PDF
The big data_computing_architecture-graph500
PDF
CRNCH 2018 Summit: Rogues Gallery Update
PDF
E3MV - Embedded Vision - Sundance
PPT
Future of computer architecture david A Patterson
PDF
What Every Programmer Should Know About Memory
PDF
Memory consistency models
PPT
Current Trends in HPC
PPTX
CPU Caches
PDF
GraphBLAS and Emus
Software Design for Persistent Memory Systems
What every-programmer-should-know-about-memory
Data flow super computing valentina balas
(eBook PDF) Parallel Computer Organization and Design
DATE 2020: Design, Automation and Test in Europe Conference
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
Ximea - the pc camera, 90 gflps smart camera
Mauricio breteernitiz hpc-exascale-iscte
Presentation
The big data_computing_architecture-graph500
The big data_computing_architecture-graph500
CRNCH 2018 Summit: Rogues Gallery Update
E3MV - Embedded Vision - Sundance
Future of computer architecture david A Patterson
What Every Programmer Should Know About Memory
Memory consistency models
Current Trends in HPC
CPU Caches
GraphBLAS and Emus
Ad

More from Jason Riedy (20)

PDF
Lucata at the HPEC GraphBLAS BoF
PDF
LAGraph 2021-10-13
PDF
Lucata at the HPEC GraphBLAS BoF
PDF
Reproducible Linear Algebra from Application to Architecture
PDF
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PDF
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
PDF
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
PDF
Characterization of Emu Chick with Microbenchmarks
PDF
Augmented Arithmetic Operations Proposed for IEEE-754 2018
PDF
Graph Analysis: New Algorithm Models, New Architectures
PDF
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
PDF
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
PDF
High-Performance Analysis of Streaming Graphs
PDF
High-Performance Analysis of Streaming Graphs
PDF
Updating PageRank for Streaming Graphs
PDF
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
PDF
Graph Analysis Beyond Linear Algebra
PDF
Network Challenge: Error and Sensitivity Analysis
PDF
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
PDF
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Lucata at the HPEC GraphBLAS BoF
LAGraph 2021-10-13
Lucata at the HPEC GraphBLAS BoF
Reproducible Linear Algebra from Application to Architecture
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
Characterization of Emu Chick with Microbenchmarks
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Graph Analysis: New Algorithm Models, New Architectures
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
Updating PageRank for Streaming Graphs
Scalable and Efficient Algorithms for Analysis of Massive, Streaming Graphs
Graph Analysis Beyond Linear Algebra
Network Challenge: Error and Sensitivity Analysis
Graph Analysis Trends and Opportunities -- CMG Performance and Capacity 2014
STING: Spatio-Temporal Interaction Networks and Graphs for Intel Platforms
Ad

Recently uploaded (20)

PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Global Data and Analytics Market Outlook Report
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
DOCX
Factor Analysis Word Document Presentation
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPT
statistic analysis for study - data collection
PDF
Transcultural that can help you someday.
PPTX
Business_Capability_Map_Collection__pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
chrmotography.pptx food anaylysis techni
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Global Data and Analytics Market Outlook Report
STERILIZATION AND DISINFECTION-1.ppthhhbx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Factor Analysis Word Document Presentation
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
statistic analysis for study - data collection
Transcultural that can help you someday.
Business_Capability_Map_Collection__pptx
DU, AIS, Big Data and Data Analytics.ppt
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Data Engineering Interview Questions & Answers Data Modeling (3NF, Star, Vaul...
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Navigating the Thai Supplements Landscape.pdf
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
chrmotography.pptx food anaylysis techni
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf

Graph analysis and novel architectures

  • 1. Graph Analysis and Novel Architectures Jason Riedy (all opinions my own, no plans) Lucata Corporation / Emu Technology Sparse Days, 24 November 2020
  • 2. Monument aux Combattants de la Haute-Garonne
  • 3. Graph Analysis v. Hardware Architecture “We” want: ● Fine-grained memory access, ● fine-grained synchronization, ● sane floating-point (to be defined someday), and ● everything else that drives HW people nuts. WHY NOT?
  • 4. Graph Analysis v. Hardware Architecture “It’s too hard.” Need wide memories, big cache lines, etc. Nope. Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
  • 5. Graph Analysis v. Hardware Architecture “It’s too hard.” Need wide memories, big cache lines, etc. Nope. Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
  • 6. Graph Analysis v. Hardware Architecture “It’s too hard.” Need wide memories, big cache lines, etc. Nope. Jeffrey Young, Eric Hein, Srinivas Eswar, Patrick Lavin, Jiajia Li, Jason Riedy, Richard Vuduc, and Thomas M. Conte. A Microbenchmark Characterization of the Emu Chick. Parallel Computing, September 2019. DOI 10.1016/j.parco.2019.04.012.
  • 7. How? Being specific. The Lucata / Emu architecture focuses on fine-grained memory access. This really exists. And is PGAS. Because... ● No cache. ● The OS is handled by the “boring” part. ● Physically distributed memory. ● Many threads to tolerate… ● LOCAL LATENCIES. ○ Read remotely? MIGRATE. ○ Small context, one flit. ○ Plenty of references. ● Oh, and by the way… ○ Narrow channel DRAM: No wasting cache lines (so not using ⅛ BW). ○ Memory-side processing. ○ Including floating-point accumulation.
  • 8. How? Being specific. The Lucata / Emu architecture focuses on fine-grained memory access. This really exists. And is PGAS. Because... ● No cache. ● The OS is handled by the “boring” part. ● Physically distributed memory. ● Many threads to tolerate… ● LOCAL LATENCIES. ○ Read remotely? MIGRATE. ○ Small context, one flit. ○ Plenty of references. ● Oh, and by the way… ○ Narrow channel DRAM: No wasting cache lines (so not using ⅛ BW). ○ Memory-side processing. ○ Including floating-point accumulation.
  • 9. Not the only idea out there. ● Metastrider ● Maybe embed sparse gathers in memory (CAMS)... ● 5.3x energy savings ● 11% performance boost Sriseshan Srikanth, Anirudh Jain, Joseph M. Lennon, Thomas M. Conte, Erik Debenedictis, and Jeanine Cook. 2019. MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. ACM Trans. Archit. Code Optim. 16, 4, Article 35 (Janua 2020), 26 pages. DOI:https://doi.org/10.1145/3355396
  • 10. Totally nuts ideas………... What if…… ● You could have a hardware dataflow architecture? ● Borrowed from Cerebras Systems, Inc.
  • 11. Totally nuts ideas………... What if…… ● You could have a hardware dataflow architecture? ● You could have “infinite” storage with logic? ● A Rogues Gallery photo!
  • 12. Totally nuts ideas………... What if…… ● You could have a hardware dataflow architecture? ● You could have “infinite” storage with logic? ● You could have programmable analog devices? ○ Neuromorphic? Waiting on the recount. A Rogues Gallery photo!
  • 13. The crazy thing is that all these exist. So how are we taking advantage? I apologize to the non-US folks. I only know our labs with testbeds: ● DoE: ORNL, LBNL, ANL, SNL (Sandia, not Saturday Night), … ● NSF: Georgia Tech’s Rogues Gallery, others… ● A64fx came from Japan / England. ● My preference baseline: RISC-V ○ (because you can bolt anything alongside) No, really, go out and play! Those ideas from the 80s and before? YUP! BTW, there are open foundries now… No reason why algorithms folks should be quiet. My photos are thanks to the Franco-Berkeley Fund.