SlideShare a Scribd company logo
Outline
• Non-Uniform Cache Architecture (NUCA)
• Cache Coherence
• Implementation of directories in multicore
architecture
1
Non-Uniform Cache Architecture [1]
• Uniform Cache Architecture
▫ Multi-level cache hierarchies
 Organized into a few discrete levels
 Each level reduces access to the lower level
 Inclusion overhead
 Internal wire delays
 Restricted number of ports
▫ Large on-chip cache
 Single and discrete hit latency
 Undesirable due to increasing wire delays
2
Non-Uniform Cache Architecture [1]
• Non-uniform cache architecture (NUCA)
▫ Exploit non-uniformity
 Data in large cache closer to processor is accessed
faster than data residing physically farther
Level 2 caches architectures, 16MB with 50nm technology (taken from [1])
3
Non-Uniform Cache Architecture [1]
• Static NUCA
▫ Each bank can be accessed at different speeds
 Proportional to the distance from the controller
 Lower latency when closer to controller
▫ Mapping of data into banks based on block index
▫ Banks are independently addressable
▫ Access to banks may proceed in parallel
Banks have private channels
▫ Large number of wires
▫ Access time and routing delay increase with time
 Best organization at smaller technologies uses larger
banks
4
Non-Uniform Cache Architecture [1]
Static NUCA design (taken from [1])
5
Non-Uniform Cache Architecture [1]
• Switched Static NUCA
▫ 2D Mesh, point-to-point links
▫ Removes most of the large number of wires
▫ Allows a large number of faster, smaller banks
• Dynamic NUCA
▫ Allows data to be mapped to many banks
▫ Allows data to migrate among the banks
▫ Frequently used data can be promoted to faster
banks
6
Non-Uniform Cache Architecture [1]
Switched NUCA design (taken from [1])
7
Non-Uniform Cache Architecture [2]
• Policies
▫ Bank placement policy
 Where is data placed in the NUCA cache memory
▫ Bank access policy
 Determines bank-searching algorithm
▫ Bank migration policy
 Determines if a data element is allowed to change its
placement from one bank to another
 Regulates migration of data
▫ Bank replacement policy
 How NUCA behaves when there is a data eviction from
one of the banks
8
Taken from [2]
Non-Uniform Cache Architecture [2]
9
Cache Coherence
• Cache-coherence problem
• Support for large number of processors
▫ Need for high bandwidth
▫ Bus architecture insufficient
• Point-to-Point networks
▫ No broadcast mechanism
▫ Snooping protocol unusable
• Directory
▫ Solution for point-to-point networks
▫ Stores location of cache copies of blocks of data
▫ Centralized or distributed
10
Implementation of directories in
multicore architectures [3]
• DRAM (off-chip) directory
▫ Stores directory information in DRAM
 Ex: full-map protocol
▫ Does not exploit distance locality
▫ Treats each tile as a potential sharer of data
▫ Directory can be cached in on-chip SRAM
 Do not need to access off-chip memory each time
11
Implementation of directories in
multicore architectures [3]
Taken from [3]
12
Implementation of directories in
multicore architecture [4]
• DRAM (off-chip) directory with directory caches
▫ Private cache
▫ Directory is cached in each tile
 Do not need to access off-chip memory each time
 Non-coherent caches
 Home node for any given cache line
 Different range of memory address for each tile
▫ Directory controller in each tile
 Controls coherency between private caches
13
Implementation of directories in
multicore architecture [4]
Taken from [4]
14
Implementation of directories in
multicore architectures [3]
• Duplicate tag directory
▫ Directory centrally located in SRAM
▫ Connected to individual cores
▫ Exact duplicate tag store
 Directory state for a block is determined by examining
copy of tags of every possible cache that can hold the
block
 Keep copied tags up-to-date
▫ No more need to read states from DRAM memory
▫ Challenging as the number of cores increases
 64 cores, 16-way associative cache = 1024 aggregate
associativity of all tiles
15
Implementation of directories in
multicore architectures [3]
Taken from [3]
16
Implementation of directories in
multicore architecture [5]
Directory memory, 4-way associative caches (taken from [5])
17
Implementation of directories in
multicore architectures [3]
• Static cache bank directory
▫ Distributed directory among the tiles
 Mapping block address to a tile (called the home tile)
 Home tiles selected by simple interleaving
 Location can be sub-optimal (see next slide)
 Tile’s cache extended to contain directory
information
 Integrates directory states with cache tags
 Avoids SRAM or DRAM separate directory
18
Implementation of directories in
multicore architectures [3,6]
Taken from [3]
19
Taken from [6]
Implementation of directories in
multicore architecture [7]
• SGI Origin2000 multiprocessor system
▫ Directory memory connected to on-chip memory
 Shared L2 cache
 Directory memory distributed over multiple tiles
 Cache coherence controller
 Home tile sends appropriate messages to cores
20
Implementation of directories in
multicore architecture [7]
SGI Origin2000 multiprocessor system (taken from [7])
21
Implementation of directories in
multicore architecture [8]
• Tilera Tile64 architecture
▫ 2d mesh network (8X8)
▫ Provides coherent shared-memory environment
▫ Uses neighborhood caching
 Provides on-chip distributed shared cache
▫ Coherency is maintained at the home tile
 Data is not cached at non-home tiles
▫ Communication over a Tile Dynamic Network
22
Implementation of directories in
multicore architecture [9]
23
Tilera Tile64 (taken from)
References
• [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip
Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12
• [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using
the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8
• [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11
• [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”,
SPAA’07, June 2007, pp. 1-9
• [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd
International Symposium on, 2006, pp.264-276
• [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006.
MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468
• [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE
Transactions on , vol.59, no.5, May 2010, p.638-650
• [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal,
"On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31
• [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21
2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html >
24

More Related Content

PPTX
Cache coherence ppt
PDF
Cache coherence
PPT
Cache coherence
PPTX
Cache coherence
PPT
Snooping 2
PPT
Snooping protocols 3
PDF
Coherence and consistency models in multiprocessor architecture
Cache coherence ppt
Cache coherence
Cache coherence
Cache coherence
Snooping 2
Snooping protocols 3
Coherence and consistency models in multiprocessor architecture

What's hot (20)

PDF
Lecture 6.1
PDF
Comparative study on Cache Coherence Protocols
PPT
Memory models
PDF
Multithreaded processors ppt
PPT
Multithreading models
PDF
Shared-Memory Multiprocessors
PPT
Introduction 1
PPTX
Hardware Multi-Threading
PPT
Hardware multithreading
PDF
Thread
PPT
Os Threads
PPTX
SYNCHRONIZATION IN MULTIPROCESSING
PPTX
Multithreading
PPTX
Thread management
PPTX
Multithreading models.ppt
PPT
Lecture2
PPTX
Lecture4
PPTX
Lecture1
PPTX
Multi threading model
Lecture 6.1
Comparative study on Cache Coherence Protocols
Memory models
Multithreaded processors ppt
Multithreading models
Shared-Memory Multiprocessors
Introduction 1
Hardware Multi-Threading
Hardware multithreading
Thread
Os Threads
SYNCHRONIZATION IN MULTIPROCESSING
Multithreading
Thread management
Multithreading models.ppt
Lecture2
Lecture4
Lecture1
Multi threading model
Ad

Viewers also liked (17)

PDF
COM614_21_Patitta
PPTX
Kilpailukykyinen hevostalous tarvitsee kotimaista kasvatusta
PDF
การดูแลสุขภาพโดยการออกกำลังกาย
PPTX
Inexpensive ways to boost the value of key biscayne waterfront condos (9)
PPTX
Hardware managed cache
DOCX
【內湖 個人倉庫】旅人的置物箱
PPTX
Presentación curso fundamentos16
PDF
HIS 340
PDF
a.gi.mus. Venezia
PPTX
Presentación ud1
PPTX
Presentación ud2 tics
PPTX
Inexpensive ways to boost the value of key biscayne waterfront condos (7)
PPTX
Presentación ud3 tics
PPT
0529理財演講1思考致富
PDF
Atividades lúdicas
PPT
Introductiont To Aray,Tree,Stack, Queue
PDF
DEVERE GROUP
COM614_21_Patitta
Kilpailukykyinen hevostalous tarvitsee kotimaista kasvatusta
การดูแลสุขภาพโดยการออกกำลังกาย
Inexpensive ways to boost the value of key biscayne waterfront condos (9)
Hardware managed cache
【內湖 個人倉庫】旅人的置物箱
Presentación curso fundamentos16
HIS 340
a.gi.mus. Venezia
Presentación ud1
Presentación ud2 tics
Inexpensive ways to boost the value of key biscayne waterfront condos (7)
Presentación ud3 tics
0529理財演講1思考致富
Atividades lúdicas
Introductiont To Aray,Tree,Stack, Queue
DEVERE GROUP
Ad

Similar to Directory based cache coherence (20)

PPT
04 cache memory
PDF
Architecture and implementation issues of multi core processors and caching –...
PPTX
CAO-Unit-III.pptx
PPTX
CPU Caches
PPT
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
PPT
04 Cache Memory
PPT
cache memory
DOCX
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
PDF
Computer architecture for HNDIT
PPT
Memory organization including cache and RAM.ppt
PDF
Cache-Memory for university courses at PG
PDF
Lecture 25
PPT
04_Cache Memory-computer-architecture.ppt
PPT
04_Cache Memory.ppt
PPT
Cache Memory.ppt
PPT
04_Cache Memory.ppt
PPT
04 cache memory.ppt 1
PPT
Cache Memory from Computer Architecture.ppt
PPT
Cache Memory for Computer Architecture.ppt
PPT
Memory Organization and Cache mapping.ppt
04 cache memory
Architecture and implementation issues of multi core processors and caching –...
CAO-Unit-III.pptx
CPU Caches
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
04 Cache Memory
cache memory
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Computer architecture for HNDIT
Memory organization including cache and RAM.ppt
Cache-Memory for university courses at PG
Lecture 25
04_Cache Memory-computer-architecture.ppt
04_Cache Memory.ppt
Cache Memory.ppt
04_Cache Memory.ppt
04 cache memory.ppt 1
Cache Memory from Computer Architecture.ppt
Cache Memory for Computer Architecture.ppt
Memory Organization and Cache mapping.ppt

More from Hoang Nguyen (20)

PPTX
Rest api to integrate with your site
PPTX
How to build a rest api
PPTX
Api crash
PPTX
Smm and caching
PPTX
Optimizing shared caches in chip multiprocessors
PPTX
How analysis services caching works
PPTX
Cache recap
PPTX
Python your new best friend
PPTX
Python language data types
PPTX
Python basics
PPTX
Programming for engineers in python
PPTX
Learning python
PPTX
Extending burp with python
PPTX
Cobol, lisp, and python
PPT
Object oriented programming using c++
PPTX
Object oriented analysis
PPTX
Object model
PPTX
Data structures and algorithms
PPT
Data abstraction the walls
PPT
Data abstraction and object orientation
Rest api to integrate with your site
How to build a rest api
Api crash
Smm and caching
Optimizing shared caches in chip multiprocessors
How analysis services caching works
Cache recap
Python your new best friend
Python language data types
Python basics
Programming for engineers in python
Learning python
Extending burp with python
Cobol, lisp, and python
Object oriented programming using c++
Object oriented analysis
Object model
Data structures and algorithms
Data abstraction the walls
Data abstraction and object orientation

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
The Rise and Fall of 3GPP – Time for a Sabbatical?
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Understanding_Digital_Forensics_Presentation.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Bridging biosciences and deep learning for revolutionary discoveries: a compr...

Directory based cache coherence

  • 1. Outline • Non-Uniform Cache Architecture (NUCA) • Cache Coherence • Implementation of directories in multicore architecture 1
  • 2. Non-Uniform Cache Architecture [1] • Uniform Cache Architecture ▫ Multi-level cache hierarchies  Organized into a few discrete levels  Each level reduces access to the lower level  Inclusion overhead  Internal wire delays  Restricted number of ports ▫ Large on-chip cache  Single and discrete hit latency  Undesirable due to increasing wire delays 2
  • 3. Non-Uniform Cache Architecture [1] • Non-uniform cache architecture (NUCA) ▫ Exploit non-uniformity  Data in large cache closer to processor is accessed faster than data residing physically farther Level 2 caches architectures, 16MB with 50nm technology (taken from [1]) 3
  • 4. Non-Uniform Cache Architecture [1] • Static NUCA ▫ Each bank can be accessed at different speeds  Proportional to the distance from the controller  Lower latency when closer to controller ▫ Mapping of data into banks based on block index ▫ Banks are independently addressable ▫ Access to banks may proceed in parallel Banks have private channels ▫ Large number of wires ▫ Access time and routing delay increase with time  Best organization at smaller technologies uses larger banks 4
  • 5. Non-Uniform Cache Architecture [1] Static NUCA design (taken from [1]) 5
  • 6. Non-Uniform Cache Architecture [1] • Switched Static NUCA ▫ 2D Mesh, point-to-point links ▫ Removes most of the large number of wires ▫ Allows a large number of faster, smaller banks • Dynamic NUCA ▫ Allows data to be mapped to many banks ▫ Allows data to migrate among the banks ▫ Frequently used data can be promoted to faster banks 6
  • 7. Non-Uniform Cache Architecture [1] Switched NUCA design (taken from [1]) 7
  • 8. Non-Uniform Cache Architecture [2] • Policies ▫ Bank placement policy  Where is data placed in the NUCA cache memory ▫ Bank access policy  Determines bank-searching algorithm ▫ Bank migration policy  Determines if a data element is allowed to change its placement from one bank to another  Regulates migration of data ▫ Bank replacement policy  How NUCA behaves when there is a data eviction from one of the banks 8
  • 9. Taken from [2] Non-Uniform Cache Architecture [2] 9
  • 10. Cache Coherence • Cache-coherence problem • Support for large number of processors ▫ Need for high bandwidth ▫ Bus architecture insufficient • Point-to-Point networks ▫ No broadcast mechanism ▫ Snooping protocol unusable • Directory ▫ Solution for point-to-point networks ▫ Stores location of cache copies of blocks of data ▫ Centralized or distributed 10
  • 11. Implementation of directories in multicore architectures [3] • DRAM (off-chip) directory ▫ Stores directory information in DRAM  Ex: full-map protocol ▫ Does not exploit distance locality ▫ Treats each tile as a potential sharer of data ▫ Directory can be cached in on-chip SRAM  Do not need to access off-chip memory each time 11
  • 12. Implementation of directories in multicore architectures [3] Taken from [3] 12
  • 13. Implementation of directories in multicore architecture [4] • DRAM (off-chip) directory with directory caches ▫ Private cache ▫ Directory is cached in each tile  Do not need to access off-chip memory each time  Non-coherent caches  Home node for any given cache line  Different range of memory address for each tile ▫ Directory controller in each tile  Controls coherency between private caches 13
  • 14. Implementation of directories in multicore architecture [4] Taken from [4] 14
  • 15. Implementation of directories in multicore architectures [3] • Duplicate tag directory ▫ Directory centrally located in SRAM ▫ Connected to individual cores ▫ Exact duplicate tag store  Directory state for a block is determined by examining copy of tags of every possible cache that can hold the block  Keep copied tags up-to-date ▫ No more need to read states from DRAM memory ▫ Challenging as the number of cores increases  64 cores, 16-way associative cache = 1024 aggregate associativity of all tiles 15
  • 16. Implementation of directories in multicore architectures [3] Taken from [3] 16
  • 17. Implementation of directories in multicore architecture [5] Directory memory, 4-way associative caches (taken from [5]) 17
  • 18. Implementation of directories in multicore architectures [3] • Static cache bank directory ▫ Distributed directory among the tiles  Mapping block address to a tile (called the home tile)  Home tiles selected by simple interleaving  Location can be sub-optimal (see next slide)  Tile’s cache extended to contain directory information  Integrates directory states with cache tags  Avoids SRAM or DRAM separate directory 18
  • 19. Implementation of directories in multicore architectures [3,6] Taken from [3] 19 Taken from [6]
  • 20. Implementation of directories in multicore architecture [7] • SGI Origin2000 multiprocessor system ▫ Directory memory connected to on-chip memory  Shared L2 cache  Directory memory distributed over multiple tiles  Cache coherence controller  Home tile sends appropriate messages to cores 20
  • 21. Implementation of directories in multicore architecture [7] SGI Origin2000 multiprocessor system (taken from [7]) 21
  • 22. Implementation of directories in multicore architecture [8] • Tilera Tile64 architecture ▫ 2d mesh network (8X8) ▫ Provides coherent shared-memory environment ▫ Uses neighborhood caching  Provides on-chip distributed shared cache ▫ Coherency is maintained at the home tile  Data is not cached at non-home tiles ▫ Communication over a Tile Dynamic Network 22
  • 23. Implementation of directories in multicore architecture [9] 23 Tilera Tile64 (taken from)
  • 24. References • [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12 • [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8 • [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11 • [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”, SPAA’07, June 2007, pp. 1-9 • [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd International Symposium on, 2006, pp.264-276 • [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468 • [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE Transactions on , vol.59, no.5, May 2010, p.638-650 • [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal, "On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31 • [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21 2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html > 24

Editor's Notes

  • #3: [1] ftp://ftp.cs.utexas.edu/pub/dburger/papers/ASPLOS02.pdf
  • #9: [2] http://www.cercs.gatech.edu/mmcs09/papers/lira.pdf
  • #12: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #13: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #14: http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #15: [4] http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #16: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #17: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #18: [5] http://pages.cs.wisc.edu/~mscalar/papers/2006/isca2006-coop-caching.pdf
  • #19: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #20: 1- http://www.cs.pitt.edu/cast/papers/cho-micro06.pdf 2- http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #21: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #22: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #23: [8] http://www.ieeexplore.ieee.org.proxy.bib.uottawa.ca/stamp/stamp.jsp?tp=&arnumber=4378780
  • #24: [9] http://www.linuxfordevices.com/files/misc/tilera_tile64_arch_diag2.gif