SlideShare a Scribd company logo
Parallel Computing
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
CSCI-UA.0480-003
Lecture 2: Parallel Hardware: Basics
Computer History
Eckert and Mauchly
• 1st working electronic
computer (1946)
• To reprogram it you
need to re-arrange the
cords
• 18,000 Vacuum tubes
• 1,800 instructions/sec
• 3,000 ft3
Computer History
Programming the ENIAC!
Computer History
• Von Neumann
presented his idea of
stored program
concept.
• Maurice Wilkes built it.
1st stored program
computer
650 instructions/sec
1,400 ft3http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/
EDSAC 1 (1949)
Computer History
• After the vacuum tubes, transistors
were invented (1947)  2nd generations
of computers
• UNIVAC (UNIversal Automatic Computer)
• Introduced in the 50s
Computer History
• From transistors to integrated
circuits(IC)  3rd generation of
computers
• One IC can host hundreds (then
thousands, then millions) of transistors
 computers are getting smaller
Intel 4004 Die Photo
• Introduced in 1970
– First
microprocessor
• 2,250 transistors
• 12 mm2
• 108 KHz
Intel 8086 Die Scan
• 29,000 transistors
• 33 mm2
• 5 MHz
• Introduced in 1979
– Basic architecture
of the IA32 PC
Intel 80486 Die Scan
• 1,200,000
transistors
• 81 mm2
• 25 MHz
• Introduced in 1989
– 1st pipelined
implementation of
IA32
– 1st processor with
on-chip cache
Pentium Die Photo
• 3,100,000
transistors
• 296 mm2
• 60 MHz
• Introduced in 1993
– 1st superscalar
implementation of
IA32
Pentium 4
• 55,000,000
transistors
• 146 mm2
• 3 GHz
• Introduced in 2000
http://www.chip-architect.com
Pentium 4
IBM Power 9 (24 cores)
Core 2 Duo (2 cores)
AMD RyZen(8 cores to 32 cores)
Intel Core i9 (22 cores )
How did the hardware evolve like
that?
Let’s look at different waves (generations of architectures)
First Generation (1970s)
Single Cycle Implementation
Copyright © 2010, Elsevier Inc. All rights
Reserved
The Von Neumann Architecture
Second Generation (1980s)
DF CI E
•Pipelinining:
•the hardware divided into stages
•temporal parallelism
•Number of stages increases with each generation
•Maximum CPI (Cycles Per Instruction) = 1
•Due to dependencies
(i.e. an instruction must wait
for the result of another instruction)
Fetch ExecuteIssueDecode Commit
Some Enhancements
Cache Memory Virtual Memory
TLBMulti-level caches
Third Generation (1990s)
DF CI
E
E
E
•ILP (Instruction Level Parallelism)
•Spatial parallelism
•Executing several instructions at the same
time is called superscalar capability.
•performance = instructions per cycle (IPC)
•Speculative Execution (prediction of branch direction) is
introduced to make the best use of superscalar capability 
This can make some instructions execute out-of-order!!
Fourth Generation (2000s)
DF CI
E
E
E
DF CI
E
E
E
Simultaneous Multithreading (SMT)
(aka Hyperthreading Technology)
Some definitions before we proceed
An operating system “process”
• An instance of a computer program that
is being executed.
• Components of a process:
– The executable machine language program
– A block of memory
– Descriptors of resources the OS has
allocated to the process
– Security information
– Information about the state of the process
Copyright © 2010, Elsevier Inc.
All rights Reserved
Multitasking
• Gives the illusion that a single processor
system is running multiple programs
simultaneously.
• Each process takes turns running time
slice
• After its time is up, it waits until it has
a turn again.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Threading
• Threads are contained within processes.
• They allow programmers to divide their
programs into (more or less) independent
tasks.
• The hope is that when one thread blocks
because it is waiting on a resource,
another thread will have work to do and
can run.
Copyright © 2010, Elsevier Inc.
All rights Reserved
As you can see …
We can have several processes, executed
in a multitasking fashion, and each
process can consist of several threads.
The Status-Quo
• We moved from single core to multicore to
manycore:
– for technological reasons, as we saw last lecture.
• Free lunch is over for software folks
– The software will not become faster with every
new generation of processors
• Not enough experience in parallel programming
– Parallel programs of old days were restricted to
some elite applications -> very few programmers
– Now we need parallel programs for many different
applications
How Did These Advances Happen?
Computer
Architecture
Software
Community
Process
Technology
Wishes
• Performance
• Restrictions
• Restrictions
• Capabilities
Design
Parallel Computing - Lec 2
Conclusions
• The hardware evolution, driven by
Moore’s law, was geared toward two
things:
– Exploiting parallelism
– Dealing with memory (latency, capacity)
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)

More Related Content

PDF
PDF
Faults inside System Software
PDF
Hints for L4 Microkernel
PPTX
UNIT 1 OS.pptx Introduction of Operating System
PDF
Parallel Computing - Lec 3
PPT
Ch04 threads
PDF
Lecture 1 - introduction OS learner .pdf
PDF
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
Faults inside System Software
Hints for L4 Microkernel
UNIT 1 OS.pptx Introduction of Operating System
Parallel Computing - Lec 3
Ch04 threads
Lecture 1 - introduction OS learner .pdf
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems

Similar to Parallel Computing - Lec 2 (20)

PPTX
Third generation computers (hardware and software)
PPT
Cpu architecture
PPTX
Computer Generations
PPTX
ICT assignment.pptx
PPTX
Computer generations
PDF
L4 Microkernel :: Design Overview
PPTX
1.1 Introduction to Operating System .pptx
PPTX
Introduction to Operating system and graduate
PPTX
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
PPTX
Unit I _Computer organisation andarchitecture
PPTX
computer applicationin hospitality Industry1 periyar university unit1
PPTX
computer application in hospitality Industry, periyar university unit 1
PPTX
PPTX
PPTX
PDF
Construct an Efficient and Secure Microkernel for IoT
PPT
Chap1 computer basics
PPTX
Operating system
PPTX
Computer Architecture
PPTX
Moore’s law change drives OS change
Third generation computers (hardware and software)
Cpu architecture
Computer Generations
ICT assignment.pptx
Computer generations
L4 Microkernel :: Design Overview
1.1 Introduction to Operating System .pptx
Introduction to Operating system and graduate
Unit I _COMPUTER ORGANISATON AND ARCHITECTURE_PPT.pptx
Unit I _Computer organisation andarchitecture
computer applicationin hospitality Industry1 periyar university unit1
computer application in hospitality Industry, periyar university unit 1
Construct an Efficient and Secure Microkernel for IoT
Chap1 computer basics
Operating system
Computer Architecture
Moore’s law change drives OS change
Ad

More from Shah Zaib (9)

PDF
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
PDF
MPI - 4
PDF
MPI - 3
PDF
MPI - 2
PDF
MPI - 1
PDF
Parallel Computing - Lec 6
PDF
Parallel Computing - Lec 5
PDF
Parallel Computing - Lec 4
PPTX
Mpi collective communication operations
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
MPI - 4
MPI - 3
MPI - 2
MPI - 1
Parallel Computing - Lec 6
Parallel Computing - Lec 5
Parallel Computing - Lec 4
Mpi collective communication operations
Ad

Recently uploaded (20)

PPTX
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
PPTX
INFERTILITY (FEMALE FACTORS).pptxgvcghhfcg
PPTX
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
PPT
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
PPTX
"Fundamentals of Digital Image Processing: A Visual Approach"
PPTX
Lecture-3-Computer-programming for BS InfoTech
DOCX
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
PPTX
Syllabus Computer Six class curriculum s
PPTX
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
PPT
chapter_1_a.ppthduushshwhwbshshshsbbsbsbsbsh
PPTX
DEATH AUDIT MAY 2025.pptxurjrjejektjtjyjjy
PDF
Dynamic Checkweighers and Automatic Weighing Machine Solutions
PPTX
material for studying about lift elevators escalation
PPTX
Embeded System for Artificial intelligence 2.pptx
PPTX
Prograce_Present.....ggation_Simple.pptx
PPTX
PROGRAMMING-QUARTER-2-PYTHON.pptxnsnsndn
PPTX
Operating System Processes_Scheduler OSS
PDF
Cableado de Controladores Logicos Programables
PPTX
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
PPTX
KVL KCL ppt electrical electronics eee tiet
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
INFERTILITY (FEMALE FACTORS).pptxgvcghhfcg
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
FABRICATION OF MOS FET BJT DEVICES IN NANOMETER
"Fundamentals of Digital Image Processing: A Visual Approach"
Lecture-3-Computer-programming for BS InfoTech
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
Syllabus Computer Six class curriculum s
了解新西兰毕业证(Wintec毕业证书)怀卡托理工学院毕业证存档可查的
chapter_1_a.ppthduushshwhwbshshshsbbsbsbsbsh
DEATH AUDIT MAY 2025.pptxurjrjejektjtjyjjy
Dynamic Checkweighers and Automatic Weighing Machine Solutions
material for studying about lift elevators escalation
Embeded System for Artificial intelligence 2.pptx
Prograce_Present.....ggation_Simple.pptx
PROGRAMMING-QUARTER-2-PYTHON.pptxnsnsndn
Operating System Processes_Scheduler OSS
Cableado de Controladores Logicos Programables
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
KVL KCL ppt electrical electronics eee tiet

Parallel Computing - Lec 2

  • 1. Parallel Computing Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-UA.0480-003 Lecture 2: Parallel Hardware: Basics
  • 2. Computer History Eckert and Mauchly • 1st working electronic computer (1946) • To reprogram it you need to re-arrange the cords • 18,000 Vacuum tubes • 1,800 instructions/sec • 3,000 ft3
  • 4. Computer History • Von Neumann presented his idea of stored program concept. • Maurice Wilkes built it. 1st stored program computer 650 instructions/sec 1,400 ft3http://www.cl.cam.ac.uk/UoCCL/misc/EDSAC99/ EDSAC 1 (1949)
  • 5. Computer History • After the vacuum tubes, transistors were invented (1947)  2nd generations of computers • UNIVAC (UNIversal Automatic Computer) • Introduced in the 50s
  • 6. Computer History • From transistors to integrated circuits(IC)  3rd generation of computers • One IC can host hundreds (then thousands, then millions) of transistors  computers are getting smaller
  • 7. Intel 4004 Die Photo • Introduced in 1970 – First microprocessor • 2,250 transistors • 12 mm2 • 108 KHz
  • 8. Intel 8086 Die Scan • 29,000 transistors • 33 mm2 • 5 MHz • Introduced in 1979 – Basic architecture of the IA32 PC
  • 9. Intel 80486 Die Scan • 1,200,000 transistors • 81 mm2 • 25 MHz • Introduced in 1989 – 1st pipelined implementation of IA32 – 1st processor with on-chip cache
  • 10. Pentium Die Photo • 3,100,000 transistors • 296 mm2 • 60 MHz • Introduced in 1993 – 1st superscalar implementation of IA32
  • 11. Pentium 4 • 55,000,000 transistors • 146 mm2 • 3 GHz • Introduced in 2000 http://www.chip-architect.com
  • 12. Pentium 4 IBM Power 9 (24 cores) Core 2 Duo (2 cores) AMD RyZen(8 cores to 32 cores) Intel Core i9 (22 cores )
  • 13. How did the hardware evolve like that? Let’s look at different waves (generations of architectures)
  • 14. First Generation (1970s) Single Cycle Implementation
  • 15. Copyright © 2010, Elsevier Inc. All rights Reserved The Von Neumann Architecture
  • 16. Second Generation (1980s) DF CI E •Pipelinining: •the hardware divided into stages •temporal parallelism •Number of stages increases with each generation •Maximum CPI (Cycles Per Instruction) = 1 •Due to dependencies (i.e. an instruction must wait for the result of another instruction) Fetch ExecuteIssueDecode Commit
  • 17. Some Enhancements Cache Memory Virtual Memory TLBMulti-level caches
  • 18. Third Generation (1990s) DF CI E E E •ILP (Instruction Level Parallelism) •Spatial parallelism •Executing several instructions at the same time is called superscalar capability. •performance = instructions per cycle (IPC) •Speculative Execution (prediction of branch direction) is introduced to make the best use of superscalar capability  This can make some instructions execute out-of-order!!
  • 19. Fourth Generation (2000s) DF CI E E E DF CI E E E Simultaneous Multithreading (SMT) (aka Hyperthreading Technology)
  • 21. An operating system “process” • An instance of a computer program that is being executed. • Components of a process: – The executable machine language program – A block of memory – Descriptors of resources the OS has allocated to the process – Security information – Information about the state of the process Copyright © 2010, Elsevier Inc. All rights Reserved
  • 22. Multitasking • Gives the illusion that a single processor system is running multiple programs simultaneously. • Each process takes turns running time slice • After its time is up, it waits until it has a turn again. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 23. Threading • Threads are contained within processes. • They allow programmers to divide their programs into (more or less) independent tasks. • The hope is that when one thread blocks because it is waiting on a resource, another thread will have work to do and can run. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 24. As you can see … We can have several processes, executed in a multitasking fashion, and each process can consist of several threads.
  • 25. The Status-Quo • We moved from single core to multicore to manycore: – for technological reasons, as we saw last lecture. • Free lunch is over for software folks – The software will not become faster with every new generation of processors • Not enough experience in parallel programming – Parallel programs of old days were restricted to some elite applications -> very few programmers – Now we need parallel programs for many different applications
  • 26. How Did These Advances Happen? Computer Architecture Software Community Process Technology Wishes • Performance • Restrictions • Restrictions • Capabilities Design
  • 28. Conclusions • The hardware evolution, driven by Moore’s law, was geared toward two things: – Exploiting parallelism – Dealing with memory (latency, capacity) Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)