SlideShare a Scribd company logo
Outline
• Non-Uniform Cache Architecture (NUCA)
• Cache Coherence
• Implementation of directories in multicore
architecture
1
Non-Uniform Cache Architecture [1]
• Uniform Cache Architecture
▫ Multi-level cache hierarchies
 Organized into a few discrete levels
 Each level reduces access to the lower level
 Inclusion overhead
 Internal wire delays
 Restricted number of ports
▫ Large on-chip cache
 Single and discrete hit latency
 Undesirable due to increasing wire delays
2
Non-Uniform Cache Architecture [1]
• Non-uniform cache architecture (NUCA)
▫ Exploit non-uniformity
 Data in large cache closer to processor is accessed
faster than data residing physically farther
Level 2 caches architectures, 16MB with 50nm technology (taken from [1])
3
Non-Uniform Cache Architecture [1]
• Static NUCA
▫ Each bank can be accessed at different speeds
 Proportional to the distance from the controller
 Lower latency when closer to controller
▫ Mapping of data into banks based on block index
▫ Banks are independently addressable
▫ Access to banks may proceed in parallel
Banks have private channels
▫ Large number of wires
▫ Access time and routing delay increase with time
 Best organization at smaller technologies uses larger
banks
4
Non-Uniform Cache Architecture [1]
Static NUCA design (taken from [1])
5
Non-Uniform Cache Architecture [1]
• Switched Static NUCA
▫ 2D Mesh, point-to-point links
▫ Removes most of the large number of wires
▫ Allows a large number of faster, smaller banks
• Dynamic NUCA
▫ Allows data to be mapped to many banks
▫ Allows data to migrate among the banks
▫ Frequently used data can be promoted to faster
banks
6
Non-Uniform Cache Architecture [1]
Switched NUCA design (taken from [1])
7
Non-Uniform Cache Architecture [2]
• Policies
▫ Bank placement policy
 Where is data placed in the NUCA cache memory
▫ Bank access policy
 Determines bank-searching algorithm
▫ Bank migration policy
 Determines if a data element is allowed to change its
placement from one bank to another
 Regulates migration of data
▫ Bank replacement policy
 How NUCA behaves when there is a data eviction from
one of the banks
8
Taken from [2]
Non-Uniform Cache Architecture [2]
9
Cache Coherence
• Cache-coherence problem
• Support for large number of processors
▫ Need for high bandwidth
▫ Bus architecture insufficient
• Point-to-Point networks
▫ No broadcast mechanism
▫ Snooping protocol unusable
• Directory
▫ Solution for point-to-point networks
▫ Stores location of cache copies of blocks of data
▫ Centralized or distributed
10
Implementation of directories in
multicore architectures [3]
• DRAM (off-chip) directory
▫ Stores directory information in DRAM
 Ex: full-map protocol
▫ Does not exploit distance locality
▫ Treats each tile as a potential sharer of data
▫ Directory can be cached in on-chip SRAM
 Do not need to access off-chip memory each time
11
Implementation of directories in
multicore architectures [3]
Taken from [3]
12
Implementation of directories in
multicore architecture [4]
• DRAM (off-chip) directory with directory caches
▫ Private cache
▫ Directory is cached in each tile
 Do not need to access off-chip memory each time
 Non-coherent caches
 Home node for any given cache line
 Different range of memory address for each tile
▫ Directory controller in each tile
 Controls coherency between private caches
13
Implementation of directories in
multicore architecture [4]
Taken from [4]
14
Implementation of directories in
multicore architectures [3]
• Duplicate tag directory
▫ Directory centrally located in SRAM
▫ Connected to individual cores
▫ Exact duplicate tag store
 Directory state for a block is determined by examining
copy of tags of every possible cache that can hold the
block
 Keep copied tags up-to-date
▫ No more need to read states from DRAM memory
▫ Challenging as the number of cores increases
 64 cores, 16-way associative cache = 1024 aggregate
associativity of all tiles
15
Implementation of directories in
multicore architectures [3]
Taken from [3]
16
Implementation of directories in
multicore architecture [5]
Directory memory, 4-way associative caches (taken from [5])
17
Implementation of directories in
multicore architectures [3]
• Static cache bank directory
▫ Distributed directory among the tiles
 Mapping block address to a tile (called the home tile)
 Home tiles selected by simple interleaving
 Location can be sub-optimal (see next slide)
 Tile’s cache extended to contain directory
information
 Integrates directory states with cache tags
 Avoids SRAM or DRAM separate directory
18
Implementation of directories in
multicore architectures [3,6]
Taken from [3]
19
Taken from [6]
Implementation of directories in
multicore architecture [7]
• SGI Origin2000 multiprocessor system
▫ Directory memory connected to on-chip memory
 Shared L2 cache
 Directory memory distributed over multiple tiles
 Cache coherence controller
 Home tile sends appropriate messages to cores
20
Implementation of directories in
multicore architecture [7]
SGI Origin2000 multiprocessor system (taken from [7])
21
Implementation of directories in
multicore architecture [8]
• Tilera Tile64 architecture
▫ 2d mesh network (8X8)
▫ Provides coherent shared-memory environment
▫ Uses neighborhood caching
 Provides on-chip distributed shared cache
▫ Coherency is maintained at the home tile
 Data is not cached at non-home tiles
▫ Communication over a Tile Dynamic Network
22
Implementation of directories in
multicore architecture [9]
23
Tilera Tile64 (taken from)
References
• [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip
Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12
• [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using
the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8
• [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11
• [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”,
SPAA’07, June 2007, pp. 1-9
• [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd
International Symposium on, 2006, pp.264-276
• [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006.
MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468
• [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE
Transactions on , vol.59, no.5, May 2010, p.638-650
• [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal,
"On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31
• [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21
2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html >
24

More Related Content

PPT
final_rac
PDF
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
PDF
PDF
Gluster.next feb-2016
PDF
Glusterfs and openstack
PPTX
CNN Dataflow Implementation on FPGAs
PPT
Nuxeo Core 2
PDF
GlusterFS Talk for CentOS Dojo Bangalore
final_rac
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
Gluster.next feb-2016
Glusterfs and openstack
CNN Dataflow Implementation on FPGAs
Nuxeo Core 2
GlusterFS Talk for CentOS Dojo Bangalore

What's hot (16)

DOCX
Block Level Storage Vs File Level Storage
ODP
Gluster fs hadoop_fifth-elephant
PDF
HDFS for Geographically Distributed File System
PDF
Recent advancements in cache technology
PDF
Dumitru Enache - Bacula
PPTX
CNN Dataflow Implementation on FPGAs
ODP
Sdc challenges-2012
PPTX
CNN Dataflow Implementation on FPGAs
PDF
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
ODP
Comparison between OCFS2 and GFS2
PDF
Database management-system
PDF
The Future of GlusterFS and Gluster.org
PPTX
file sharing semantics by Umar Danjuma Maiwada
ODP
Lisa 2015-gluster fs-introduction
ODP
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
PPTX
Block Level Storage Vs File Level Storage
Gluster fs hadoop_fifth-elephant
HDFS for Geographically Distributed File System
Recent advancements in cache technology
Dumitru Enache - Bacula
CNN Dataflow Implementation on FPGAs
Sdc challenges-2012
CNN Dataflow Implementation on FPGAs
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
Comparison between OCFS2 and GFS2
Database management-system
The Future of GlusterFS and Gluster.org
file sharing semantics by Umar Danjuma Maiwada
Lisa 2015-gluster fs-introduction
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
Ad

Viewers also liked (16)

PDF
Andre Childs Journal_of_Raman_Spectroscopy
DOCX
Curriculum Vitae ahmed afifi 50446440 new 2016
PPT
Text classification methods
PPTX
Concurrency with java
PPTX
Datamining with nb
PPTX
Behaviour driven development
PDF
Cheryl Holzknecht Resume 1
PDF
SOA2010 SOA with REST
PPT
Memory caching
PPT
Data preprocessing
PPTX
Data visualization
PPT
Hash crypto
PPT
Віртуальна виставка нових надходжень
PPTX
Object oriented programming
PPTX
Directory based cache coherence
PPTX
How analysis services caching works
Andre Childs Journal_of_Raman_Spectroscopy
Curriculum Vitae ahmed afifi 50446440 new 2016
Text classification methods
Concurrency with java
Datamining with nb
Behaviour driven development
Cheryl Holzknecht Resume 1
SOA2010 SOA with REST
Memory caching
Data preprocessing
Data visualization
Hash crypto
Віртуальна виставка нових надходжень
Object oriented programming
Directory based cache coherence
How analysis services caching works
Ad

Similar to Directory based cache coherence (20)

PPTX
Jaringan virtual komputasi awan bagian ke 2
PDF
Cosmos DB at VLDB 2019
PPTX
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
PPTX
Cassandra tech talk
PDF
NoSQL – Data Center Centric Application Enablement
PPTX
CASSANDRA - Next to RDBMS
PPTX
Project Presentation Final
PDF
Data Lake and the rise of the microservices
PPTX
Blue and Green Narrative Writing Story Starters Education Presentation _20241...
PPTX
D108636GC10_les01.pptx
PPTX
Introduction to Data Science NoSQL.pptx
PPTX
Cassandra an overview
PPTX
409793049-Storage-Virtualization-pptx.pptx
PPTX
Apache cassandra
PDF
PromCon EU 2022 - Centralized vs Decentralized Prometheus Scraping Architectu...
PPTX
Factored operating systems
PPTX
Vaibhav (2)
PPTX
Data Center
PPTX
Nosql query processing system for wireless sensor networks
PDF
International Journal of Engineering Research and Development
Jaringan virtual komputasi awan bagian ke 2
Cosmos DB at VLDB 2019
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
Cassandra tech talk
NoSQL – Data Center Centric Application Enablement
CASSANDRA - Next to RDBMS
Project Presentation Final
Data Lake and the rise of the microservices
Blue and Green Narrative Writing Story Starters Education Presentation _20241...
D108636GC10_les01.pptx
Introduction to Data Science NoSQL.pptx
Cassandra an overview
409793049-Storage-Virtualization-pptx.pptx
Apache cassandra
PromCon EU 2022 - Centralized vs Decentralized Prometheus Scraping Architectu...
Factored operating systems
Vaibhav (2)
Data Center
Nosql query processing system for wireless sensor networks
International Journal of Engineering Research and Development

More from James Wong (20)

PPT
Data race
PPT
Multi threaded rtos
PPT
Recursion
PPTX
Business analytics and data mining
PPTX
Data mining and knowledge discovery
PPTX
Cache recap
PPTX
Big picture of data mining
PPTX
How analysis services caching works
PPTX
Optimizing shared caches in chip multiprocessors
PPT
Abstract data types
PPTX
Abstraction file
PPTX
Hardware managed cache
PPTX
Object model
PPT
Abstract class
PPTX
Object oriented analysis
PPTX
Data structures and algorithms
PPTX
Cobol, lisp, and python
PPTX
Inheritance
PPTX
Api crash
PPTX
Learning python
Data race
Multi threaded rtos
Recursion
Business analytics and data mining
Data mining and knowledge discovery
Cache recap
Big picture of data mining
How analysis services caching works
Optimizing shared caches in chip multiprocessors
Abstract data types
Abstraction file
Hardware managed cache
Object model
Abstract class
Object oriented analysis
Data structures and algorithms
Cobol, lisp, and python
Inheritance
Api crash
Learning python

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Tartificialntelligence_presentation.pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Hybrid model detection and classification of lung cancer
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A Presentation on Artificial Intelligence
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
NewMind AI Weekly Chronicles - August'25-Week II
Digital-Transformation-Roadmap-for-Companies.pptx
Tartificialntelligence_presentation.pptx
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
A comparative analysis of optical character recognition models for extracting...
Enhancing emotion recognition model for a student engagement use case through...
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Chapter 5: Probability Theory and Statistics
WOOl fibre morphology and structure.pdf for textiles
Hybrid model detection and classification of lung cancer
Univ-Connecticut-ChatGPT-Presentaion.pdf

Directory based cache coherence

  • 1. Outline • Non-Uniform Cache Architecture (NUCA) • Cache Coherence • Implementation of directories in multicore architecture 1
  • 2. Non-Uniform Cache Architecture [1] • Uniform Cache Architecture ▫ Multi-level cache hierarchies  Organized into a few discrete levels  Each level reduces access to the lower level  Inclusion overhead  Internal wire delays  Restricted number of ports ▫ Large on-chip cache  Single and discrete hit latency  Undesirable due to increasing wire delays 2
  • 3. Non-Uniform Cache Architecture [1] • Non-uniform cache architecture (NUCA) ▫ Exploit non-uniformity  Data in large cache closer to processor is accessed faster than data residing physically farther Level 2 caches architectures, 16MB with 50nm technology (taken from [1]) 3
  • 4. Non-Uniform Cache Architecture [1] • Static NUCA ▫ Each bank can be accessed at different speeds  Proportional to the distance from the controller  Lower latency when closer to controller ▫ Mapping of data into banks based on block index ▫ Banks are independently addressable ▫ Access to banks may proceed in parallel Banks have private channels ▫ Large number of wires ▫ Access time and routing delay increase with time  Best organization at smaller technologies uses larger banks 4
  • 5. Non-Uniform Cache Architecture [1] Static NUCA design (taken from [1]) 5
  • 6. Non-Uniform Cache Architecture [1] • Switched Static NUCA ▫ 2D Mesh, point-to-point links ▫ Removes most of the large number of wires ▫ Allows a large number of faster, smaller banks • Dynamic NUCA ▫ Allows data to be mapped to many banks ▫ Allows data to migrate among the banks ▫ Frequently used data can be promoted to faster banks 6
  • 7. Non-Uniform Cache Architecture [1] Switched NUCA design (taken from [1]) 7
  • 8. Non-Uniform Cache Architecture [2] • Policies ▫ Bank placement policy  Where is data placed in the NUCA cache memory ▫ Bank access policy  Determines bank-searching algorithm ▫ Bank migration policy  Determines if a data element is allowed to change its placement from one bank to another  Regulates migration of data ▫ Bank replacement policy  How NUCA behaves when there is a data eviction from one of the banks 8
  • 9. Taken from [2] Non-Uniform Cache Architecture [2] 9
  • 10. Cache Coherence • Cache-coherence problem • Support for large number of processors ▫ Need for high bandwidth ▫ Bus architecture insufficient • Point-to-Point networks ▫ No broadcast mechanism ▫ Snooping protocol unusable • Directory ▫ Solution for point-to-point networks ▫ Stores location of cache copies of blocks of data ▫ Centralized or distributed 10
  • 11. Implementation of directories in multicore architectures [3] • DRAM (off-chip) directory ▫ Stores directory information in DRAM  Ex: full-map protocol ▫ Does not exploit distance locality ▫ Treats each tile as a potential sharer of data ▫ Directory can be cached in on-chip SRAM  Do not need to access off-chip memory each time 11
  • 12. Implementation of directories in multicore architectures [3] Taken from [3] 12
  • 13. Implementation of directories in multicore architecture [4] • DRAM (off-chip) directory with directory caches ▫ Private cache ▫ Directory is cached in each tile  Do not need to access off-chip memory each time  Non-coherent caches  Home node for any given cache line  Different range of memory address for each tile ▫ Directory controller in each tile  Controls coherency between private caches 13
  • 14. Implementation of directories in multicore architecture [4] Taken from [4] 14
  • 15. Implementation of directories in multicore architectures [3] • Duplicate tag directory ▫ Directory centrally located in SRAM ▫ Connected to individual cores ▫ Exact duplicate tag store  Directory state for a block is determined by examining copy of tags of every possible cache that can hold the block  Keep copied tags up-to-date ▫ No more need to read states from DRAM memory ▫ Challenging as the number of cores increases  64 cores, 16-way associative cache = 1024 aggregate associativity of all tiles 15
  • 16. Implementation of directories in multicore architectures [3] Taken from [3] 16
  • 17. Implementation of directories in multicore architecture [5] Directory memory, 4-way associative caches (taken from [5]) 17
  • 18. Implementation of directories in multicore architectures [3] • Static cache bank directory ▫ Distributed directory among the tiles  Mapping block address to a tile (called the home tile)  Home tiles selected by simple interleaving  Location can be sub-optimal (see next slide)  Tile’s cache extended to contain directory information  Integrates directory states with cache tags  Avoids SRAM or DRAM separate directory 18
  • 19. Implementation of directories in multicore architectures [3,6] Taken from [3] 19 Taken from [6]
  • 20. Implementation of directories in multicore architecture [7] • SGI Origin2000 multiprocessor system ▫ Directory memory connected to on-chip memory  Shared L2 cache  Directory memory distributed over multiple tiles  Cache coherence controller  Home tile sends appropriate messages to cores 20
  • 21. Implementation of directories in multicore architecture [7] SGI Origin2000 multiprocessor system (taken from [7]) 21
  • 22. Implementation of directories in multicore architecture [8] • Tilera Tile64 architecture ▫ 2d mesh network (8X8) ▫ Provides coherent shared-memory environment ▫ Uses neighborhood caching  Provides on-chip distributed shared cache ▫ Coherency is maintained at the home tile  Data is not cached at non-home tiles ▫ Communication over a Tile Dynamic Network 22
  • 23. Implementation of directories in multicore architecture [9] 23 Tilera Tile64 (taken from)
  • 24. References • [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12 • [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8 • [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11 • [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”, SPAA’07, June 2007, pp. 1-9 • [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd International Symposium on, 2006, pp.264-276 • [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468 • [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE Transactions on , vol.59, no.5, May 2010, p.638-650 • [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal, "On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31 • [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21 2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html > 24

Editor's Notes

  • #3: [1] ftp://ftp.cs.utexas.edu/pub/dburger/papers/ASPLOS02.pdf
  • #9: [2] http://www.cercs.gatech.edu/mmcs09/papers/lira.pdf
  • #12: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #13: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #14: http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #15: [4] http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #16: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #17: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #18: [5] http://pages.cs.wisc.edu/~mscalar/papers/2006/isca2006-coop-caching.pdf
  • #19: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #20: 1- http://www.cs.pitt.edu/cast/papers/cho-micro06.pdf 2- http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #21: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #22: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #23: [8] http://www.ieeexplore.ieee.org.proxy.bib.uottawa.ca/stamp/stamp.jsp?tp=&arnumber=4378780
  • #24: [9] http://www.linuxfordevices.com/files/misc/tilera_tile64_arch_diag2.gif