SlideShare a Scribd company logo
Outline
• Non-Uniform Cache Architecture (NUCA)
• Cache Coherence
• Implementation of directories in multicore
architecture
1
Non-Uniform Cache Architecture [1]
• Uniform Cache Architecture
▫ Multi-level cache hierarchies
 Organized into a few discrete levels
 Each level reduces access to the lower level
 Inclusion overhead
 Internal wire delays
 Restricted number of ports
▫ Large on-chip cache
 Single and discrete hit latency
 Undesirable due to increasing wire delays
2
Non-Uniform Cache Architecture [1]
• Non-uniform cache architecture (NUCA)
▫ Exploit non-uniformity
 Data in large cache closer to processor is accessed
faster than data residing physically farther
Level 2 caches architectures, 16MB with 50nm technology (taken from [1])
3
Non-Uniform Cache Architecture [1]
• Static NUCA
▫ Each bank can be accessed at different speeds
 Proportional to the distance from the controller
 Lower latency when closer to controller
▫ Mapping of data into banks based on block index
▫ Banks are independently addressable
▫ Access to banks may proceed in parallel
Banks have private channels
▫ Large number of wires
▫ Access time and routing delay increase with time
 Best organization at smaller technologies uses larger
banks
4
Non-Uniform Cache Architecture [1]
Static NUCA design (taken from [1])
5
Non-Uniform Cache Architecture [1]
• Switched Static NUCA
▫ 2D Mesh, point-to-point links
▫ Removes most of the large number of wires
▫ Allows a large number of faster, smaller banks
• Dynamic NUCA
▫ Allows data to be mapped to many banks
▫ Allows data to migrate among the banks
▫ Frequently used data can be promoted to faster
banks
6
Non-Uniform Cache Architecture [1]
Switched NUCA design (taken from [1])
7
Non-Uniform Cache Architecture [2]
• Policies
▫ Bank placement policy
 Where is data placed in the NUCA cache memory
▫ Bank access policy
 Determines bank-searching algorithm
▫ Bank migration policy
 Determines if a data element is allowed to change its
placement from one bank to another
 Regulates migration of data
▫ Bank replacement policy
 How NUCA behaves when there is a data eviction from
one of the banks
8
Taken from [2]
Non-Uniform Cache Architecture [2]
9
Cache Coherence
• Cache-coherence problem
• Support for large number of processors
▫ Need for high bandwidth
▫ Bus architecture insufficient
• Point-to-Point networks
▫ No broadcast mechanism
▫ Snooping protocol unusable
• Directory
▫ Solution for point-to-point networks
▫ Stores location of cache copies of blocks of data
▫ Centralized or distributed
10
Implementation of directories in
multicore architectures [3]
• DRAM (off-chip) directory
▫ Stores directory information in DRAM
 Ex: full-map protocol
▫ Does not exploit distance locality
▫ Treats each tile as a potential sharer of data
▫ Directory can be cached in on-chip SRAM
 Do not need to access off-chip memory each time
11
Implementation of directories in
multicore architectures [3]
Taken from [3]
12
Implementation of directories in
multicore architecture [4]
• DRAM (off-chip) directory with directory caches
▫ Private cache
▫ Directory is cached in each tile
 Do not need to access off-chip memory each time
 Non-coherent caches
 Home node for any given cache line
 Different range of memory address for each tile
▫ Directory controller in each tile
 Controls coherency between private caches
13
Implementation of directories in
multicore architecture [4]
Taken from [4]
14
Implementation of directories in
multicore architectures [3]
• Duplicate tag directory
▫ Directory centrally located in SRAM
▫ Connected to individual cores
▫ Exact duplicate tag store
 Directory state for a block is determined by examining
copy of tags of every possible cache that can hold the
block
 Keep copied tags up-to-date
▫ No more need to read states from DRAM memory
▫ Challenging as the number of cores increases
 64 cores, 16-way associative cache = 1024 aggregate
associativity of all tiles
15
Implementation of directories in
multicore architectures [3]
Taken from [3]
16
Implementation of directories in
multicore architecture [5]
Directory memory, 4-way associative caches (taken from [5])
17
Implementation of directories in
multicore architectures [3]
• Static cache bank directory
▫ Distributed directory among the tiles
 Mapping block address to a tile (called the home tile)
 Home tiles selected by simple interleaving
 Location can be sub-optimal (see next slide)
 Tile’s cache extended to contain directory
information
 Integrates directory states with cache tags
 Avoids SRAM or DRAM separate directory
18
Implementation of directories in
multicore architectures [3,6]
Taken from [3]
19
Taken from [6]
Implementation of directories in
multicore architecture [7]
• SGI Origin2000 multiprocessor system
▫ Directory memory connected to on-chip memory
 Shared L2 cache
 Directory memory distributed over multiple tiles
 Cache coherence controller
 Home tile sends appropriate messages to cores
20
Implementation of directories in
multicore architecture [7]
SGI Origin2000 multiprocessor system (taken from [7])
21
Implementation of directories in
multicore architecture [8]
• Tilera Tile64 architecture
▫ 2d mesh network (8X8)
▫ Provides coherent shared-memory environment
▫ Uses neighborhood caching
 Provides on-chip distributed shared cache
▫ Coherency is maintained at the home tile
 Data is not cached at non-home tiles
▫ Communication over a Tile Dynamic Network
22
Implementation of directories in
multicore architecture [9]
23
Tilera Tile64 (taken from)
References
• [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip
Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12
• [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using
the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8
• [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11
• [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”,
SPAA’07, June 2007, pp. 1-9
• [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd
International Symposium on, 2006, pp.264-276
• [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006.
MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468
• [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE
Transactions on , vol.59, no.5, May 2010, p.638-650
• [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal,
"On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31
• [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21
2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html >
24

More Related Content

PPT
final_rac
PDF
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
PDF
PDF
Gluster.next feb-2016
PDF
Glusterfs and openstack
PPTX
CNN Dataflow Implementation on FPGAs
PPT
Nuxeo Core 2
PDF
GlusterFS Talk for CentOS Dojo Bangalore
final_rac
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
Gluster.next feb-2016
Glusterfs and openstack
CNN Dataflow Implementation on FPGAs
Nuxeo Core 2
GlusterFS Talk for CentOS Dojo Bangalore

What's hot (16)

DOCX
Block Level Storage Vs File Level Storage
ODP
Gluster fs hadoop_fifth-elephant
PDF
HDFS for Geographically Distributed File System
PDF
Recent advancements in cache technology
PDF
Dumitru Enache - Bacula
PPTX
CNN Dataflow Implementation on FPGAs
ODP
Sdc challenges-2012
PPTX
CNN Dataflow Implementation on FPGAs
PDF
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
ODP
Comparison between OCFS2 and GFS2
PDF
Database management-system
PDF
The Future of GlusterFS and Gluster.org
PPTX
file sharing semantics by Umar Danjuma Maiwada
ODP
Lisa 2015-gluster fs-introduction
ODP
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
PPTX
Block Level Storage Vs File Level Storage
Gluster fs hadoop_fifth-elephant
HDFS for Geographically Distributed File System
Recent advancements in cache technology
Dumitru Enache - Bacula
CNN Dataflow Implementation on FPGAs
Sdc challenges-2012
CNN Dataflow Implementation on FPGAs
OSBConf 2015 | Scale out backups with bareos and gluster by niels de vos
Comparison between OCFS2 and GFS2
Database management-system
The Future of GlusterFS and Gluster.org
file sharing semantics by Umar Danjuma Maiwada
Lisa 2015-gluster fs-introduction
GlusterD 2.0 - Managing Distributed File System Using a Centralized Store
Ad

Viewers also liked (18)

PPTX
2015 bioinformatics go_hmm_wim_vancriekinge
PPTX
Business analytics and data mining
PDF
презентация лц кор фин_25.11.16
PPT
Ccbb según lomce
PPTX
Hardware managed cache
PPTX
Smm & caching
PDF
Behavioral Assessment
PPTX
2015 bioinformatics bio_python_part4
PPTX
04 uni 11352 parte 2
PPT
Human Resource Management
PPT
Abstract data types
PDF
Green Printing at UK Government Department [Infographic]
PDF
LA CRISI DE LA RESTAURACIÓ (1898-1931)
DOCX
Tha price of wisdom.pt.3.newer.html.doc
DOCX
Tha price of health.pt.3.newer.html.doc
DOCX
Tha price of a g.pt.3.newer.html.doc
PPT
Art romànic i gòtic
PPTX
Ibèria entre els segles VIII-XI
2015 bioinformatics go_hmm_wim_vancriekinge
Business analytics and data mining
презентация лц кор фин_25.11.16
Ccbb según lomce
Hardware managed cache
Smm & caching
Behavioral Assessment
2015 bioinformatics bio_python_part4
04 uni 11352 parte 2
Human Resource Management
Abstract data types
Green Printing at UK Government Department [Infographic]
LA CRISI DE LA RESTAURACIÓ (1898-1931)
Tha price of wisdom.pt.3.newer.html.doc
Tha price of health.pt.3.newer.html.doc
Tha price of a g.pt.3.newer.html.doc
Art romànic i gòtic
Ibèria entre els segles VIII-XI
Ad

Similar to Directory based cache coherence (20)

PPT
04 cache memory
PDF
Architecture and implementation issues of multi core processors and caching –...
PPTX
CAO-Unit-III.pptx
PPTX
CPU Caches
PPT
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
PPT
04 Cache Memory
PPT
cache memory
DOCX
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
PDF
Computer architecture for HNDIT
PPT
Memory organization including cache and RAM.ppt
PDF
Cache-Memory for university courses at PG
PDF
Lecture 25
PPT
04_Cache Memory-computer-architecture.ppt
PPT
04_Cache Memory.ppt
PPT
Cache Memory.ppt
PPT
04_Cache Memory.ppt
PPT
04 cache memory.ppt 1
PPT
Cache Memory from Computer Architecture.ppt
PPT
Cache Memory for Computer Architecture.ppt
PPT
Memory Organization and Cache mapping.ppt
04 cache memory
Architecture and implementation issues of multi core processors and caching –...
CAO-Unit-III.pptx
CPU Caches
Lec9 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part 1
04 Cache Memory
cache memory
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Computer architecture for HNDIT
Memory organization including cache and RAM.ppt
Cache-Memory for university courses at PG
Lecture 25
04_Cache Memory-computer-architecture.ppt
04_Cache Memory.ppt
Cache Memory.ppt
04_Cache Memory.ppt
04 cache memory.ppt 1
Cache Memory from Computer Architecture.ppt
Cache Memory for Computer Architecture.ppt
Memory Organization and Cache mapping.ppt

More from Luis Goldster (20)

PPTX
Ruby on rails evaluation
PPTX
Design patterns
PPT
Lisp and scheme i
PPT
Ado.net &amp; data persistence frameworks
PPTX
Multithreading models.ppt
PPTX
Big picture of data mining
PPTX
Data mining and knowledge discovery
PPTX
Cache recap
PPTX
Hardware managed cache
PPTX
How analysis services caching works
PPTX
Optimizing shared caches in chip multiprocessors
PPTX
Api crash
PPTX
Object model
PPTX
Abstraction file
PPTX
Object oriented analysis
PPT
Abstract class
PPTX
Concurrency with java
PPTX
Data structures and algorithms
PPTX
Rest api to integrate with your site
PPTX
Inheritance
Ruby on rails evaluation
Design patterns
Lisp and scheme i
Ado.net &amp; data persistence frameworks
Multithreading models.ppt
Big picture of data mining
Data mining and knowledge discovery
Cache recap
Hardware managed cache
How analysis services caching works
Optimizing shared caches in chip multiprocessors
Api crash
Object model
Abstraction file
Object oriented analysis
Abstract class
Concurrency with java
Data structures and algorithms
Rest api to integrate with your site
Inheritance

Recently uploaded (20)

PPTX
Benefits of Physical activity for teenagers.pptx
DOCX
search engine optimization ppt fir known well about this
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Five Habits of High-Impact Board Members
PDF
A novel scalable deep ensemble learning framework for big data classification...
PPT
What is a Computer? Input Devices /output devices
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
The various Industrial Revolutions .pptx
PDF
Unlock new opportunities with location data.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Benefits of Physical activity for teenagers.pptx
search engine optimization ppt fir known well about this
A contest of sentiment analysis: k-nearest neighbor versus neural network
DP Operators-handbook-extract for the Mautical Institute
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
WOOl fibre morphology and structure.pdf for textiles
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
O2C Customer Invoices to Receipt V15A.pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Five Habits of High-Impact Board Members
A novel scalable deep ensemble learning framework for big data classification...
What is a Computer? Input Devices /output devices
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
A review of recent deep learning applications in wood surface defect identifi...
Enhancing emotion recognition model for a student engagement use case through...
The various Industrial Revolutions .pptx
Unlock new opportunities with location data.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf

Directory based cache coherence

  • 1. Outline • Non-Uniform Cache Architecture (NUCA) • Cache Coherence • Implementation of directories in multicore architecture 1
  • 2. Non-Uniform Cache Architecture [1] • Uniform Cache Architecture ▫ Multi-level cache hierarchies  Organized into a few discrete levels  Each level reduces access to the lower level  Inclusion overhead  Internal wire delays  Restricted number of ports ▫ Large on-chip cache  Single and discrete hit latency  Undesirable due to increasing wire delays 2
  • 3. Non-Uniform Cache Architecture [1] • Non-uniform cache architecture (NUCA) ▫ Exploit non-uniformity  Data in large cache closer to processor is accessed faster than data residing physically farther Level 2 caches architectures, 16MB with 50nm technology (taken from [1]) 3
  • 4. Non-Uniform Cache Architecture [1] • Static NUCA ▫ Each bank can be accessed at different speeds  Proportional to the distance from the controller  Lower latency when closer to controller ▫ Mapping of data into banks based on block index ▫ Banks are independently addressable ▫ Access to banks may proceed in parallel Banks have private channels ▫ Large number of wires ▫ Access time and routing delay increase with time  Best organization at smaller technologies uses larger banks 4
  • 5. Non-Uniform Cache Architecture [1] Static NUCA design (taken from [1]) 5
  • 6. Non-Uniform Cache Architecture [1] • Switched Static NUCA ▫ 2D Mesh, point-to-point links ▫ Removes most of the large number of wires ▫ Allows a large number of faster, smaller banks • Dynamic NUCA ▫ Allows data to be mapped to many banks ▫ Allows data to migrate among the banks ▫ Frequently used data can be promoted to faster banks 6
  • 7. Non-Uniform Cache Architecture [1] Switched NUCA design (taken from [1]) 7
  • 8. Non-Uniform Cache Architecture [2] • Policies ▫ Bank placement policy  Where is data placed in the NUCA cache memory ▫ Bank access policy  Determines bank-searching algorithm ▫ Bank migration policy  Determines if a data element is allowed to change its placement from one bank to another  Regulates migration of data ▫ Bank replacement policy  How NUCA behaves when there is a data eviction from one of the banks 8
  • 9. Taken from [2] Non-Uniform Cache Architecture [2] 9
  • 10. Cache Coherence • Cache-coherence problem • Support for large number of processors ▫ Need for high bandwidth ▫ Bus architecture insufficient • Point-to-Point networks ▫ No broadcast mechanism ▫ Snooping protocol unusable • Directory ▫ Solution for point-to-point networks ▫ Stores location of cache copies of blocks of data ▫ Centralized or distributed 10
  • 11. Implementation of directories in multicore architectures [3] • DRAM (off-chip) directory ▫ Stores directory information in DRAM  Ex: full-map protocol ▫ Does not exploit distance locality ▫ Treats each tile as a potential sharer of data ▫ Directory can be cached in on-chip SRAM  Do not need to access off-chip memory each time 11
  • 12. Implementation of directories in multicore architectures [3] Taken from [3] 12
  • 13. Implementation of directories in multicore architecture [4] • DRAM (off-chip) directory with directory caches ▫ Private cache ▫ Directory is cached in each tile  Do not need to access off-chip memory each time  Non-coherent caches  Home node for any given cache line  Different range of memory address for each tile ▫ Directory controller in each tile  Controls coherency between private caches 13
  • 14. Implementation of directories in multicore architecture [4] Taken from [4] 14
  • 15. Implementation of directories in multicore architectures [3] • Duplicate tag directory ▫ Directory centrally located in SRAM ▫ Connected to individual cores ▫ Exact duplicate tag store  Directory state for a block is determined by examining copy of tags of every possible cache that can hold the block  Keep copied tags up-to-date ▫ No more need to read states from DRAM memory ▫ Challenging as the number of cores increases  64 cores, 16-way associative cache = 1024 aggregate associativity of all tiles 15
  • 16. Implementation of directories in multicore architectures [3] Taken from [3] 16
  • 17. Implementation of directories in multicore architecture [5] Directory memory, 4-way associative caches (taken from [5]) 17
  • 18. Implementation of directories in multicore architectures [3] • Static cache bank directory ▫ Distributed directory among the tiles  Mapping block address to a tile (called the home tile)  Home tiles selected by simple interleaving  Location can be sub-optimal (see next slide)  Tile’s cache extended to contain directory information  Integrates directory states with cache tags  Avoids SRAM or DRAM separate directory 18
  • 19. Implementation of directories in multicore architectures [3,6] Taken from [3] 19 Taken from [6]
  • 20. Implementation of directories in multicore architecture [7] • SGI Origin2000 multiprocessor system ▫ Directory memory connected to on-chip memory  Shared L2 cache  Directory memory distributed over multiple tiles  Cache coherence controller  Home tile sends appropriate messages to cores 20
  • 21. Implementation of directories in multicore architecture [7] SGI Origin2000 multiprocessor system (taken from [7]) 21
  • 22. Implementation of directories in multicore architecture [8] • Tilera Tile64 architecture ▫ 2d mesh network (8X8) ▫ Provides coherent shared-memory environment ▫ Uses neighborhood caching  Provides on-chip distributed shared cache ▫ Coherency is maintained at the home tile  Data is not cached at non-home tiles ▫ Communication over a Tile Dynamic Network 22
  • 23. Implementation of directories in multicore architecture [9] 23 Tilera Tile64 (taken from)
  • 24. References • [1] C. Kim, D. Burger, S.W. Keckler, “An Adaptative, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches”, in Proc. 10th Int. Conf. ASPLOS, San Jose, CA, 2002, pp. 1-12 • [2] J. Lira, C. Molina, A. Gonzalez, “Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec Benchmark Suite”, MMCS’09, Mar. 2009, pp. 1-8 • [3] M.R. Marty, M.D. Hill, “Virtual Hierarchies to Support Server Consolidation”, ISCA’07, June 2007, pp. 1-11 • [4] J.A. Brown, R. Kumar, D. Tullsen, “Proximity-Aware Directory-based Coherence for Multi-core Processor Architectures”, SPAA’07, June 2007, pp. 1-9 • [5] J. Chang, G.S. Sophi, “Cooperative Caching for Chip Multiprocessors”, Computer Architecture, ISCA '06. 33rd International Symposium on, 2006, pp.264-276 • [6] S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation“, Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, Dec. 2006, pp.455-468 • [7] H. Lee, S. Cho, B.R. Childers, "PERFECTORY: A Fault-Tolerant Directory Memory Architecture“, Computers, IEEE Transactions on , vol.59, no.5, May 2010, p.638-650 • [8] D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.C. Miao, J.F. Brown, A. Agarwal, "On-Chip Interconnection Architecture of the Tile Processor“, Micro, IEEE , vol.27, no.5, Sept.-Oct. 2007, pp.15-31 • [9] Linux Devices, “4-way chip gains Linux IDE, dev cards, design wins” [online], Linux Devices, Apr. 2008 [cited Oct. 21 2010] , available from World Wide Web: < http://thing1.linuxdevices.com/news/NS4811855366.html > 24

Editor's Notes

  • #3: [1] ftp://ftp.cs.utexas.edu/pub/dburger/papers/ASPLOS02.pdf
  • #9: [2] http://www.cercs.gatech.edu/mmcs09/papers/lira.pdf
  • #12: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #13: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #14: http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #15: [4] http://cseweb.ucsd.edu/users/tullsen/spaa07.pdf
  • #16: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #17: http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #18: [5] http://pages.cs.wisc.edu/~mscalar/papers/2006/isca2006-coop-caching.pdf
  • #19: [3] http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #20: 1- http://www.cs.pitt.edu/cast/papers/cho-micro06.pdf 2- http://www.cs.wisc.edu/multifacet/papers/isca07_virtual_hierarchy.pdf
  • #21: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #22: http://www.cs.pitt.edu/cast/papers/lee-tc10.pdf
  • #23: [8] http://www.ieeexplore.ieee.org.proxy.bib.uottawa.ca/stamp/stamp.jsp?tp=&arnumber=4378780
  • #24: [9] http://www.linuxfordevices.com/files/misc/tilera_tile64_arch_diag2.gif