SlideShare a Scribd company logo
CPU Performance
Enhancements
CS2052 Computer Architecture
Computer Science & Engineering
University of Moratuwa
Dilum Bandara
Dilum.Bandara@uom.lk
Pipelining – It’s Natural!
 Laundry example
 Amal, Bimal, Chamal, & Dinal
each have one load of clothes
to wash, dry, & fold
 Washer takes 30 minutes
 Dryer takes 40 minutes
 Folder takes 20 minutes
A B C D
2
Sequential Laundry
 Sequential laundry takes 6 hours for 4 loads
 If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
3
Pipelined Laundry – Start Work ASAP
 Pipelined laundry takes 3.5 hours for 4 loads
A
B
C
D
6 PM 7 8 9 10 11 Midnight
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
4
Pipelining Lessons
 Pipelining doesn’t reduce
latency of a single task
 Improve throughput of entire
workload
 Pipeline rate limited by
slowest pipeline stage
 Multiple tasks operating
simultaneously
 Potential speedup = No pipe
stages
 Unbalanced lengths of pipe
stages reduces speedup
 Time to fill pipeline & time to
drain/flush it reduces
speedup
A
B
C
D
6 PM 7 8 9
T
a
s
k
O
r
d
e
r
Time
30 40 40 40 40 20
6
Source:
http://mail.humber.ca/~paul.mi
chaud/Pipeline.htm
Instruction Level
Parallelism (ILP)
CPU Pipelines
7
Source: http://en.wikipedia.org/wiki/Classic_RISC_pipeline
5-stage MIPS
pipeline
8
Pipeline With a Branch Penalty
Due to a Taken Branch
9
Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
Superscalar Architectures
 Executes more than 1 instruction during a clock
cycle by simultaneously dispatching multiple
instructions to redundant functional units
10
Source: http://mail.humber.ca/~paul.michaud/Pipeline.htm
Intel Hyper Threading (HT)
 Introduced with Intel Pentium 4
 Allows 2 different resources of CPU to be used at
the same time
 While 1st thread (instruction) is working with integers
(ALU’s integer unit) 2nd thread can work on floating
point numbers (ALU’s floating point unit)
 OS feels that there are 2 logical CPUs
 Achieved through a mix of shared, replicated, &
partitioned chip resources such as:
 Registers
 Arithmetic units
 Cache memory 11
Amdahl’s Law
 What’s maximum expected improvement to an
overall system when only part of it is improved?
 Amdahl said this relationship is not linear
12
Amdahl’s Law (Cont.)
13
Best you could ever hope to do
 enhanced
maximum
Fraction-1
1
Speedup 
Amdahl’s Law – Example
 Floating point instructions improved to run 2X;
but only 10% of actual instructions are FP
14
Speedupoverall =
1
0.95
= 1.053
ExTimenew = ExTimeold × (0.9 + 0.1/2) = 0.95 × ExTimeold
Moore’s Law – Today’s Status
15
Moore’s Law – No of
transistors on a chip
tends to double about
every 2 years
Transistor
count still
rising
Clock speed
flattening
sharply
www.extremetech.com/wp-
content/uploads/2012/02/CPU-Scaling.jpg
Dual Core
 Introduced by IBM Power4
 However, AMD brought it to consumer market
 Combines 2 independent CPUs & their
respective caches onto a single silicon chip
 Provide better performance improvement than
HT
 True parallelism
16
Multi-Core
17
Source: www.anandtech.com/show/5174/why-ivy-bridge-is-
still-quad-core
Multi-Core (Cont.)
18
Source: www.legitreviews.com/intel-core-i7-4770k-haswell-3-5ghz-quad-core-cpu-review_2203
Multi-Core (Cont.)
19
Source: www.hardwarecanucks.com/news/cpu/intel-launch-8-core-xeon-nehalemex/
Multi-Cores + Hyper Threading
20
Source: www.notebookcheck.net/Intel-Core-i7-Notebook-Processor-Clarksfield.21025.0.html
NVIDIA Tesla 2070
Many-Cores
 GPUs
 Graphic Processing Unit
 NVIDIA & ATI
 SIMD – Single Instruction Multiple Data
 Intel Xeon Phi
 General purpose
21
Intel Xeon Phi
Example Specifications
22
GTX 480 Tesla 2070 Tesla K80
Peak double
precision FP
performance
650 Gigaflops 515 Gigaflops 2.91 Teraflops
Peak single
precision FP
performance
1.3 Teraflops 1.03 Teraflops 8.74 Teraflops
CUDA cores 480 448 4992
Frequency of CUDA
Cores
1.40 GHz 1.15 GHz 560/875 MHz
Memory size
(GDDR5)
1536 MB 6 GB 24 GB
Memory bandwidth 177.4 GB/sec 150 GB/sec 480 GB/sec
ECC Memory No Yes Yes
CPU vs. GPU Architecture
23
GPU devotes more transistors for computation
Multithreaded SIMD Processor
24
Source: Computer Architecture by
John L. Hennessy and David A.
Patterson
NVIDIA CUDA Architecture
25
Intel Xeon Phi
26
Source: www.pcgameshardware.de/Xeon-Phi-Hardware-256199/News/Intel-Xeon-Phi-Hardware-
Informationen-1040924/
Intel Xeon Phi (Cont.)
27
Source: www.altera.com/technology/system-design/articles/2012/multicore-many-core.html
Power Consumption
 Dynamic energy
 Transistor switch from 0  1 or 1  0
 ½ × Capacitive load × Voltage2
 Dynamic power
 ½ × Capacitive load × Voltage2 × Frequency switched
 Static power consumption
 Currentstatic × Voltage
 Scales with no of transistors
 Reducing voltage reduces energy
 Reducing clock rate reduces power, not energy
 Power gating than not only taking out clock signal28

More Related Content

PPT
Pipelining in computer architecture
PPTX
RISC and CISC Processors
PPTX
Instruction codes
PPTX
Microprocessor - Intel Pentium Series
PPTX
Superscalar Architecture_AIUB
PDF
8 bit microcontroller
PPT
Chapter 2 instructions language of the computer
PDF
Algorithms Lecture 1: Introduction to Algorithms
Pipelining in computer architecture
RISC and CISC Processors
Instruction codes
Microprocessor - Intel Pentium Series
Superscalar Architecture_AIUB
8 bit microcontroller
Chapter 2 instructions language of the computer
Algorithms Lecture 1: Introduction to Algorithms

What's hot (20)

PDF
Register renaming technique
PPTX
Processor powerpoint
PPTX
The Basic Organization of Computers
PDF
High Performance Computer Architecture
PPTX
Floating point arithmetic operations (1)
PPTX
Notion of an algorithm
PPTX
Advanced Pipelining in ARM Processors.pptx
PDF
Pipelining and ILP (Instruction Level Parallelism)
PPTX
Risc and cisc
PDF
DBMS Unit III Material
PPTX
CISC & RISC Architecture
PPTX
Embedded system design using arduino
PPT
Computer Measures of Performance
PPT
Chapter 3 pc
PPT
Pipelining slides
PPTX
Multithreading computer architecture
PDF
Array Processor
PDF
Unit 4-input-output organization
PPTX
Design a processor
PPSX
LECT 1: ARM PROCESSORS
Register renaming technique
Processor powerpoint
The Basic Organization of Computers
High Performance Computer Architecture
Floating point arithmetic operations (1)
Notion of an algorithm
Advanced Pipelining in ARM Processors.pptx
Pipelining and ILP (Instruction Level Parallelism)
Risc and cisc
DBMS Unit III Material
CISC & RISC Architecture
Embedded system design using arduino
Computer Measures of Performance
Chapter 3 pc
Pipelining slides
Multithreading computer architecture
Array Processor
Unit 4-input-output organization
Design a processor
LECT 1: ARM PROCESSORS
Ad

Viewers also liked (20)

PDF
Deploying infrastructure with Opscode Chef
PDF
Evaluating CPU Performance
PPTX
High Performance Computing for Accelerating Sustainable Transportation Innova...
PPTX
Lrz kurs: gpu and mic programming with r
PPT
S7 bas-15
PDF
Etsy chef-workflow
PDF
“Debugging is on the table” Dr. House pergunta a um Sysadmin
PDF
Chef - Administration for programmers
PDF
Training Opscode Chef
PPT
S7 400 h
PDF
Dive into Chef
PPT
State of Puppet 2013 - Puppet Camp DC
PPT
Ct213 processor design_pipelinehazard
PPTX
Chef Compliance & Workflow w/Delivery
PPTX
Compilers Are Databases
PPTX
Infrastructure Automation with Chef
KEY
Puppet for dummies - ZendCon 2011 Edition
PPTX
Instruction pipelining
PPTX
Timeline of Processors
Deploying infrastructure with Opscode Chef
Evaluating CPU Performance
High Performance Computing for Accelerating Sustainable Transportation Innova...
Lrz kurs: gpu and mic programming with r
S7 bas-15
Etsy chef-workflow
“Debugging is on the table” Dr. House pergunta a um Sysadmin
Chef - Administration for programmers
Training Opscode Chef
S7 400 h
Dive into Chef
State of Puppet 2013 - Puppet Camp DC
Ct213 processor design_pipelinehazard
Chef Compliance & Workflow w/Delivery
Compilers Are Databases
Infrastructure Automation with Chef
Puppet for dummies - ZendCon 2011 Edition
Instruction pipelining
Timeline of Processors
Ad

Similar to CPU Performance Enhancements (20)

PDF
Opportunities of ML-based data analytics in ABCI
PPT
Intel new processors
PPTX
Sql sever engine batch mode and cpu architectures
PPT
Webinaron muticoreprocessors
PPT
pipelining
PDF
Pipeline Organization Overview and Performance.pdf
PPTX
Processors (CPU)
PDF
Advanced Computer Architecture - Lec1.pdf
PDF
Large-Scale Optimization Strategies for Typical HPC Workloads
PDF
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
PPT
Unit 5-lecture 5
PDF
Lecture1_Introduction_computerar (1).pdf
PPSX
Concept of Pipelining
PPT
Pipelining and co processor.
PPT
Intel Core i7
PPT
atom-imp-concept of hardware tools in ECE.ppt
PDF
BURA Supercomputer
PDF
HPC Infrastructure To Solve The CFD Grand Challenge
PPT
Chapter 3
PPT
02 Computer Evolution And Performance
Opportunities of ML-based data analytics in ABCI
Intel new processors
Sql sever engine batch mode and cpu architectures
Webinaron muticoreprocessors
pipelining
Pipeline Organization Overview and Performance.pdf
Processors (CPU)
Advanced Computer Architecture - Lec1.pdf
Large-Scale Optimization Strategies for Typical HPC Workloads
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
Unit 5-lecture 5
Lecture1_Introduction_computerar (1).pdf
Concept of Pipelining
Pipelining and co processor.
Intel Core i7
atom-imp-concept of hardware tools in ECE.ppt
BURA Supercomputer
HPC Infrastructure To Solve The CFD Grand Challenge
Chapter 3
02 Computer Evolution And Performance

More from Dilum Bandara (20)

PPTX
Designing for Multiple Blockchains in Industry Ecosystems
PPTX
Introduction to Machine Learning
PPTX
Time Series Analysis and Forecasting in Practice
PPTX
Introduction to Dimension Reduction with PCA
PPTX
Introduction to Descriptive & Predictive Analytics
PPTX
Introduction to Concurrent Data Structures
PPTX
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
PPTX
Introduction to Map-Reduce Programming with Hadoop
PPTX
Embarrassingly/Delightfully Parallel Problems
PPTX
Introduction to Warehouse-Scale Computers
PPTX
Introduction to Thread Level Parallelism
PPTX
CPU Memory Hierarchy and Caching Techniques
PPTX
Data-Level Parallelism in Microprocessors
PDF
Instruction Level Parallelism – Hardware Techniques
PPTX
Instruction Level Parallelism – Compiler Techniques
PPTX
CPU Pipelining and Hazards - An Introduction
PPTX
Advanced Computer Architecture – An Introduction
PPTX
High Performance Networking with Advanced TCP
PPTX
Introduction to Content Delivery Networks
PPTX
Peer-to-Peer Networking Systems and Streaming
Designing for Multiple Blockchains in Industry Ecosystems
Introduction to Machine Learning
Time Series Analysis and Forecasting in Practice
Introduction to Dimension Reduction with PCA
Introduction to Descriptive & Predictive Analytics
Introduction to Concurrent Data Structures
Hard to Paralelize Problems: Matrix-Vector and Matrix-Matrix
Introduction to Map-Reduce Programming with Hadoop
Embarrassingly/Delightfully Parallel Problems
Introduction to Warehouse-Scale Computers
Introduction to Thread Level Parallelism
CPU Memory Hierarchy and Caching Techniques
Data-Level Parallelism in Microprocessors
Instruction Level Parallelism – Hardware Techniques
Instruction Level Parallelism – Compiler Techniques
CPU Pipelining and Hazards - An Introduction
Advanced Computer Architecture – An Introduction
High Performance Networking with Advanced TCP
Introduction to Content Delivery Networks
Peer-to-Peer Networking Systems and Streaming

Recently uploaded (20)

PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Sustainable Sites - Green Building Construction
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Internet of Things (IOT) - A guide to understanding
CYBER-CRIMES AND SECURITY A guide to understanding
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Automation-in-Manufacturing-Chapter-Introduction.pdf
Embodied AI: Ushering in the Next Era of Intelligent Systems
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
OOP with Java - Java Introduction (Basics)
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Model Code of Practice - Construction Work - 21102022 .pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Sustainable Sites - Green Building Construction
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...

CPU Performance Enhancements