SlideShare a Scribd company logo
Big Data and
High Performance Computing
Dr. Abzetdin ADAMOV
Center for Data Analytics Research (CeDAR)
School of IT & Engineering
ADA University
aadamov@ada.edu.az
Speech Content
• Where Big Data Comes From
• Opportunities derived from Big Data
• Understanding Big Data Problem
• Why Now?
• Hadoop Ecosystem
• Big Data Computing Solution
• Massive Parallelization
• Q&A
WHERE BIG DATA COMES FROM
Where we were?
AAdamov, CeDAR, ADA University
Pope Benedict inauguration in 2005
Where we are?
AAdamov, CeDAR, ADA University
Pope Francis inauguration in 2013
Digital Universe Volume
• 2003 – 5 exabytes from beginning of civilization
• 2005 – 130 exabytes
• 2008 – 480.000 petabytes (PB)
• 2009 – 800.000 PB
• 2010 – 1200 000 PB or 1.2 zettabyte (ZB)
• 2012 – 2.7 ZB
• 2014 ~ 6.2 ZB
• 2015 ~ 10 ZB
• 2017 ~ 16 ZB
• 2019 ~ 30 ZB
• 2020 estimated 44 ZB
Every day now we create as much information as we
did from the dawn of civilization up until 2003
CeDAWI Research Center
Where Data Comes From
Data is produced by:
• People
• Social Media, Public Web, Smartphones, …
• Organizations (Employer)
• OLTP, OLAP, BI, …
• Machines
• IoT, Satellites, Vehicles, Science, …
CeDAWI Research Center
Modern Data Sources
à User Generated Content (Web & Mobile)
• Twitter, Facebook, Snapchat, YouTube
• Clickstream, Ads, User Engagement
• Payments: Paypal, Venmo
à Internet of Anything (IoAT)
• Wind Turbines, Oil Rigs, Cars
• Weather Stations, Smart Grids
• RFID Tags, Beacons, Wearables
Data Variety – Multiple Formats
• Structured: 5-10% of all Data Universe
• SQL - Databases
• Semi-Structured: 5-10%
• CSV, XML, JSON, email structure
• Unstructured: 80-90%
• books, journals, documents, metadata, log files, health records, audio,
video, images, files, email message, Web page, social media, word-
processor document, ...
OPPORTUNITIES FROM BIG DATA
Recommenda-
tion engines
Smart meter
monitoring
Equipment
monitoring
Advertising
analysis
Life sciences
research
Fraud
detection
Healthcare
outcomes
Weather
forecasting for
business
planning
Oil & Gas
exploration
Social network
analysis
Churn
analysis
Traffic flow
optimization
IT infrastructure
& Web App
optimization
Legal
discovery and
document
archiving
Data Analytics is needed everywhere
Intelligence
Gathering
Location-based
tracking &
services
Pricing Analysis
Personalized
Insurance
But do you have the capacity to refine it?
DATA is the NEW OIL!
AAdamov, CeDAR, ADA University
UNDERSTANDING BIG DATA PROBLEM
The “Big Data” Problem
à A single machine cannot process or even store all the data!
Problem
Solution
à Distribute data over large clusters
Difficulty
à How to split work across machines?
à Moving data over network is expensive
à Must consider data & network locality
à How to deal with failures?
à How to deal with slow nodes?
Traditional Approach in Data Management
1TB Hard Drive
3 TB file
1TB of Data
1TB of Data
1TB of Data
STORAGE PROCESSING
DATA Processor
Raw DataProcessed Data
Addressing Data
• Standard Hard Drive data transmission speed 60 – 100 MB/sec
• Solid State Hard Drive (SSD) - 250 – 500 MB/sec
• Hard Drive capacity growing RAPIDLY (4 – 60 TB)
• Online data growth (double every 18 month)
• Processing Speed (relatively same growth – Moore law)
• Hard Drive transmission speed is relatively FLAT
Moving Data IN and OUT of disk is the Bottleneck
WHY NOW?
Addressing Data – Digital Universe
0
5
10
15
20
25
30
35
1995 2000 2003 2005 2008 2009 2010 2011 2012 2014 2016 2017 2018
DataGrowthinZettaBytes
Digital Universe Growth over time
Addressing Data – Hard Disk Capacity
0
10000
20000
30000
40000
50000
60000
70000
1991 1998 2003 2005 2007 2008 2009 2010 2011 2012 2014 2016 2017
CapacityinGigaBytes
Hard Drive Capacity Growth over time
Addressing Data – Storage Cost
1200000
100000
10000 800 10 1 0.1 0.003 0.0020
200000
400000
600000
800000
1000000
1200000
1400000
1980 1985 1990 1995 2000 2005 2010 2015 2020
Price$perGBytes
Data Storage Cost per Gigabyte
AAdamov, CeDAR, ADA University
Computation Power CPU and GPU
0
500
1000
1500
2000
2500
3000
3500
4000
4500
2001 2002 2005 2006 2008 2009 2010 2012 2013 2014 2015 2017 2018
GFLOPS
Computation Power CPU and GPU
GPU
CPU
HADOOP ECOSYSTEM
Hadoop Ecosystem – Big Data Tech Stack
STORAGE
DATA MANAGEMENT
PROCESSING
INTELLIGENCE / VISUALIZATION
Hadoop Core = Storage + Compute
storage storage
storage storage
CPU RAM
Yet Another Resource
Negotiator (YARN)
Hadoop Distributed File
System (HDFS)
BIG DATA COMPUTING SOLUTION
Timeline of Computing Architecture
Traditional
Architecture
– 2000
Distributed
Architecture
2010 +
Operating System
HARDWARE
App App App
HARDWARE
App App App
HYPERVISOR
OS OS OS
HARDWARE HARDWARE HARDWARE
OS OS OS
HADOOP HDFS + YARN
App App App App App App
Virtualized
Architecture
2000 +
Distributed vs Traditional Computing
RDBMS
Function
SAN / NAS
DATA DATA DATA
DATA DATA DATA
DATA DATA DATA
DATA DATA DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Function
DATA
Traditional Computing Distributed Computing
Distributed Architecture of HDFS
Rack 1
DN1
DN2
DN3
DN4
Switch
Rack 2
DN11
DN12
DN13
DN14
Switch
Rack 3
DN21
DN22
DN23
DN24
Switch
Rack 4
DN31
DN32
DN33
DN34
Switch
CC
AA
DD
BB
A
Where to write file ADA.txt (blocks A, B, C, D) in HDFS?
CLIENT
NAMENODE
A B C D
A – DN32, 11, 14
B – DN01, 22, 23
C – DN12, 02, 04
D – DN34, 12, 14
BCD
AAdamov, CeDAR, ADA University
MapReduce Architecture
INPUTDATA
OUTPUTDATA
Map()
Map()
Map()
Map()
Reduce()
Reduce()
Split
[k1, v1]
Sort
by k
Merge
[k1, [v1, v2, …, vN]]
MapReduce Job – Logical View
MAP SHUFFLE REDUCE
MASSIVE PARALLELIZATION
Big Data and High Performance Computing
How different CPU and GPU?
Several Cores
Hundreds of Thousands of Cores
Application
CPU
GPU
Parallel and
Compute-
intensive
FunctionsGeneral Logic
and Serial
Functions
Low Latency vs. High Throughput
CPU
ALU ALU
L2 (cache)
ALU ALU
CONTROL
DRAM
GPU
DRAM
L2 (cache)
Hundreds of ALUs
Hundreds of ALUs
• Optimized for low-latency
access to cached datasets
• Control logic for out-of-
control and speculative
executions
• Optimized for Data-parallel
throughput computation
• Architecture tolerant of
memory latency
• More transistors dedicated to
computation
Interaction between CPU and GPU
CPU
ALU ALU
L2 (cache)
ALU ALU
CONTROL
DRAM
GPU
DRAM
L2 (cache)
Hundreds of ALUs
Hundreds of ALUs
1. Copy Input Data from CPU Memory
to GPU Memory;
2. Load dedicated functions to GPU;
3. Copy computing results from GPU
Memory to CPU Memory.
1
2
3
What is CUDA?
• Parallel Programming Model
• Includes Memory Model
• Can utilize 100’s of CUDA Cores and 1000’s of Parallel Threads
• Developer can focus on Parallel Programming abstracting from many
low-level operations
• Supports heterogeneous systems using CPU+GPU
• Implemented in C++
• There are extensions for C, C++, C#, Fortran, Java, Python, Ruby, etc.
CUDA Kernel
• Kernel is a set of computing instructions (small program) designed for
GPU;
• GPU runs Kernels in 1000’s of parallel threads;
• CUDA Threads:
• Lightweight
• Fast Switching
• Massively Parallel
Logical Architecture vs. Physical Architecture
Threads CUDA Core
….
Thread Block Streaming Multiprocessor (SM)
…. ….
Grid
….
….
….
….
GPU Unit
Executed by
Executed by
Executed by
BIG DOES NOT MEAN SLOW
AAdamov, CeDAR, ADA University
SMALL DOES NOT MEAN WEAK
AAdamov, CeDAR, ADA University
Information is the oil of the 21st century,
and Analytics is the Combustion Engine
Q & A ?
Dr. Abzetdin Adamov,
Email me at: aadamov@ada.edu.az
Follow me at: @
Link to me at: www.linkedin.com/in/adamov
Visit my blog at: aadamov.wordpress.com

More Related Content

PPTX
Classification techniques in data mining
PPTX
Deep Learning in Computer Vision
PPTX
Artificial Intelligence Terminologies
PDF
Common Problems in Hyperparameter Optimization
PDF
An Introduction to Neural Architecture Search
PPTX
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
PPTX
AlexNet
PPT
Using binary classifiers
Classification techniques in data mining
Deep Learning in Computer Vision
Artificial Intelligence Terminologies
Common Problems in Hyperparameter Optimization
An Introduction to Neural Architecture Search
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
AlexNet
Using binary classifiers

What's hot (20)

PDF
Convolutional Neural Network Models - Deep Learning
PPTX
Two player games
PPTX
Artificial intelligence(02)
PDF
Latent Dirichlet Allocation
PDF
Lecture13 - Association Rules
PDF
Prostate Cancer Diagnosis using Deep Learning with 3D Multiparametric MRI
PDF
Intelligence at scale through AI model efficiency
PDF
Seminar(Pattern Recognition)
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
PDF
Training Week: Create a Knowledge Graph: A Simple ML Approach
PDF
Cheat Sheet for Machine Learning in Python: Scikit-learn
PDF
Interpretability beyond feature attribution quantitative testing with concept...
PPTX
GAN with Mathematics
PPT
Classification and prediction
PPTX
Stochastic Gradient Decent (SGD).pptx
PPTX
Towards Dropout Training for Convolutional Neural Networks
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
PPTX
META-LEARNING.pptx
PPTX
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
PDF
Support Vector Machines
Convolutional Neural Network Models - Deep Learning
Two player games
Artificial intelligence(02)
Latent Dirichlet Allocation
Lecture13 - Association Rules
Prostate Cancer Diagnosis using Deep Learning with 3D Multiparametric MRI
Intelligence at scale through AI model efficiency
Seminar(Pattern Recognition)
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Training Week: Create a Knowledge Graph: A Simple ML Approach
Cheat Sheet for Machine Learning in Python: Scikit-learn
Interpretability beyond feature attribution quantitative testing with concept...
GAN with Mathematics
Classification and prediction
Stochastic Gradient Decent (SGD).pptx
Towards Dropout Training for Convolutional Neural Networks
Artificial Intelligence, Machine Learning and Deep Learning
META-LEARNING.pptx
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...
Support Vector Machines
Ad

Similar to Big Data and High Performance Computing (20)

PDF
Understanding your Data - Data Analytics Lifecycle and Machine Learning
PDF
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
PPTX
Big data4businessusers
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PDF
Scaling up with Cisco Big Data: Data + Science = Data Science
PDF
Big Data and OSS at IBM
PDF
Trivadis Azure Data Lake
PDF
Architecting Agile Data Applications for Scale
PPTX
Introduction to Azure DocumentDB
PDF
Hadoop and SAP BI
PDF
IBM Data Centric Systems & OpenPOWER
PDF
High-performance database technology for rock-solid IoT solutions
PDF
Presentation_Final
PPTX
Introduction to Microsoft’s Hadoop solution (HDInsight)
PDF
Data Culture Series - Keynote - 3rd Dec
PDF
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
PPTX
Derfor skal du bruge en DataLake
PPTX
Big Data Analytics Strategy and Roadmap
PDF
Hadoop Master Class : A concise overview
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Big data4businessusers
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
Scaling up with Cisco Big Data: Data + Science = Data Science
Big Data and OSS at IBM
Trivadis Azure Data Lake
Architecting Agile Data Applications for Scale
Introduction to Azure DocumentDB
Hadoop and SAP BI
IBM Data Centric Systems & OpenPOWER
High-performance database technology for rock-solid IoT solutions
Presentation_Final
Introduction to Microsoft’s Hadoop solution (HDInsight)
Data Culture Series - Keynote - 3rd Dec
DATA LAKE AND THE RISE OF THE MICROSERVICES - ALEX BORDEI
Derfor skal du bruge en DataLake
Big Data Analytics Strategy and Roadmap
Hadoop Master Class : A concise overview
Ad

More from Abzetdin Adamov (16)

PDF
Big Data & Privacy
PPT
Big Data Ecosystem for Data-Driven Decision Making
PDF
Latest Trends in Technology: BigData Analytics, Virtualization, Cloud Computi...
PDF
Steps and Tips to Protect Yourself and your Private Information while Online....
PDF
Technical, Legal and Political Issues of Combating Terrorism on the Internet.
PDF
Introduction to object oriented programming
PDF
Introduction to AJAX
PPT
Introduction to HTML
PDF
Qafqaz university-inegrated-management-information-system
PDF
Grid Computing
PDF
Üniversite Bilgi Sistemi - Birimlerin İşbirliği Platformu
PDF
INFORMATION TECHNOLOGIES AS THE BASE OF THE BUSINESS PROCESS MANAGEMENT IMPLE...
PDF
e-Government Strategy. Government Transformation in Developing Countries of t...
PDF
The Truth about Cloud Computing as new Paradigm in IT
PDF
The Role of Business Process Management in Success of the e-Government Projec...
PDF
University Management Information System
Big Data & Privacy
Big Data Ecosystem for Data-Driven Decision Making
Latest Trends in Technology: BigData Analytics, Virtualization, Cloud Computi...
Steps and Tips to Protect Yourself and your Private Information while Online....
Technical, Legal and Political Issues of Combating Terrorism on the Internet.
Introduction to object oriented programming
Introduction to AJAX
Introduction to HTML
Qafqaz university-inegrated-management-information-system
Grid Computing
Üniversite Bilgi Sistemi - Birimlerin İşbirliği Platformu
INFORMATION TECHNOLOGIES AS THE BASE OF THE BUSINESS PROCESS MANAGEMENT IMPLE...
e-Government Strategy. Government Transformation in Developing Countries of t...
The Truth about Cloud Computing as new Paradigm in IT
The Role of Business Process Management in Success of the e-Government Projec...
University Management Information System

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Computer network topology notes for revision
PDF
annual-report-2024-2025 original latest.
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Mega Projects Data Mega Projects Data
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to machine learning and Linear Models
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Reliability_Chapter_ presentation 1221.5784
Computer network topology notes for revision
annual-report-2024-2025 original latest.
IB Computer Science - Internal Assessment.pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Business Ppt On Nestle.pptx huunnnhhgfvu
Mega Projects Data Mega Projects Data
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Foundation of Data Science unit number two notes
Introduction to machine learning and Linear Models
Miokarditis (Inflamasi pada Otot Jantung)
Recruitment and Placement PPT.pdfbjfibjdfbjfobj

Big Data and High Performance Computing

  • 1. Big Data and High Performance Computing Dr. Abzetdin ADAMOV Center for Data Analytics Research (CeDAR) School of IT & Engineering ADA University aadamov@ada.edu.az
  • 2. Speech Content • Where Big Data Comes From • Opportunities derived from Big Data • Understanding Big Data Problem • Why Now? • Hadoop Ecosystem • Big Data Computing Solution • Massive Parallelization • Q&A
  • 3. WHERE BIG DATA COMES FROM
  • 4. Where we were? AAdamov, CeDAR, ADA University Pope Benedict inauguration in 2005
  • 5. Where we are? AAdamov, CeDAR, ADA University Pope Francis inauguration in 2013
  • 6. Digital Universe Volume • 2003 – 5 exabytes from beginning of civilization • 2005 – 130 exabytes • 2008 – 480.000 petabytes (PB) • 2009 – 800.000 PB • 2010 – 1200 000 PB or 1.2 zettabyte (ZB) • 2012 – 2.7 ZB • 2014 ~ 6.2 ZB • 2015 ~ 10 ZB • 2017 ~ 16 ZB • 2019 ~ 30 ZB • 2020 estimated 44 ZB Every day now we create as much information as we did from the dawn of civilization up until 2003 CeDAWI Research Center
  • 7. Where Data Comes From Data is produced by: • People • Social Media, Public Web, Smartphones, … • Organizations (Employer) • OLTP, OLAP, BI, … • Machines • IoT, Satellites, Vehicles, Science, … CeDAWI Research Center
  • 8. Modern Data Sources à User Generated Content (Web & Mobile) • Twitter, Facebook, Snapchat, YouTube • Clickstream, Ads, User Engagement • Payments: Paypal, Venmo à Internet of Anything (IoAT) • Wind Turbines, Oil Rigs, Cars • Weather Stations, Smart Grids • RFID Tags, Beacons, Wearables
  • 9. Data Variety – Multiple Formats • Structured: 5-10% of all Data Universe • SQL - Databases • Semi-Structured: 5-10% • CSV, XML, JSON, email structure • Unstructured: 80-90% • books, journals, documents, metadata, log files, health records, audio, video, images, files, email message, Web page, social media, word- processor document, ...
  • 11. Recommenda- tion engines Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting for business planning Oil & Gas exploration Social network analysis Churn analysis Traffic flow optimization IT infrastructure & Web App optimization Legal discovery and document archiving Data Analytics is needed everywhere Intelligence Gathering Location-based tracking & services Pricing Analysis Personalized Insurance
  • 12. But do you have the capacity to refine it? DATA is the NEW OIL! AAdamov, CeDAR, ADA University
  • 14. The “Big Data” Problem à A single machine cannot process or even store all the data! Problem Solution à Distribute data over large clusters Difficulty à How to split work across machines? à Moving data over network is expensive à Must consider data & network locality à How to deal with failures? à How to deal with slow nodes?
  • 15. Traditional Approach in Data Management 1TB Hard Drive 3 TB file 1TB of Data 1TB of Data 1TB of Data STORAGE PROCESSING DATA Processor Raw DataProcessed Data
  • 16. Addressing Data • Standard Hard Drive data transmission speed 60 – 100 MB/sec • Solid State Hard Drive (SSD) - 250 – 500 MB/sec • Hard Drive capacity growing RAPIDLY (4 – 60 TB) • Online data growth (double every 18 month) • Processing Speed (relatively same growth – Moore law) • Hard Drive transmission speed is relatively FLAT Moving Data IN and OUT of disk is the Bottleneck
  • 18. Addressing Data – Digital Universe 0 5 10 15 20 25 30 35 1995 2000 2003 2005 2008 2009 2010 2011 2012 2014 2016 2017 2018 DataGrowthinZettaBytes Digital Universe Growth over time
  • 19. Addressing Data – Hard Disk Capacity 0 10000 20000 30000 40000 50000 60000 70000 1991 1998 2003 2005 2007 2008 2009 2010 2011 2012 2014 2016 2017 CapacityinGigaBytes Hard Drive Capacity Growth over time
  • 20. Addressing Data – Storage Cost 1200000 100000 10000 800 10 1 0.1 0.003 0.0020 200000 400000 600000 800000 1000000 1200000 1400000 1980 1985 1990 1995 2000 2005 2010 2015 2020 Price$perGBytes Data Storage Cost per Gigabyte AAdamov, CeDAR, ADA University
  • 21. Computation Power CPU and GPU 0 500 1000 1500 2000 2500 3000 3500 4000 4500 2001 2002 2005 2006 2008 2009 2010 2012 2013 2014 2015 2017 2018 GFLOPS Computation Power CPU and GPU GPU CPU
  • 23. Hadoop Ecosystem – Big Data Tech Stack STORAGE DATA MANAGEMENT PROCESSING INTELLIGENCE / VISUALIZATION
  • 24. Hadoop Core = Storage + Compute storage storage storage storage CPU RAM Yet Another Resource Negotiator (YARN) Hadoop Distributed File System (HDFS)
  • 25. BIG DATA COMPUTING SOLUTION
  • 26. Timeline of Computing Architecture Traditional Architecture – 2000 Distributed Architecture 2010 + Operating System HARDWARE App App App HARDWARE App App App HYPERVISOR OS OS OS HARDWARE HARDWARE HARDWARE OS OS OS HADOOP HDFS + YARN App App App App App App Virtualized Architecture 2000 +
  • 27. Distributed vs Traditional Computing RDBMS Function SAN / NAS DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Function DATA Traditional Computing Distributed Computing
  • 28. Distributed Architecture of HDFS Rack 1 DN1 DN2 DN3 DN4 Switch Rack 2 DN11 DN12 DN13 DN14 Switch Rack 3 DN21 DN22 DN23 DN24 Switch Rack 4 DN31 DN32 DN33 DN34 Switch CC AA DD BB A Where to write file ADA.txt (blocks A, B, C, D) in HDFS? CLIENT NAMENODE A B C D A – DN32, 11, 14 B – DN01, 22, 23 C – DN12, 02, 04 D – DN34, 12, 14 BCD AAdamov, CeDAR, ADA University
  • 30. MapReduce Job – Logical View MAP SHUFFLE REDUCE
  • 33. How different CPU and GPU? Several Cores Hundreds of Thousands of Cores Application CPU GPU Parallel and Compute- intensive FunctionsGeneral Logic and Serial Functions
  • 34. Low Latency vs. High Throughput CPU ALU ALU L2 (cache) ALU ALU CONTROL DRAM GPU DRAM L2 (cache) Hundreds of ALUs Hundreds of ALUs • Optimized for low-latency access to cached datasets • Control logic for out-of- control and speculative executions • Optimized for Data-parallel throughput computation • Architecture tolerant of memory latency • More transistors dedicated to computation
  • 35. Interaction between CPU and GPU CPU ALU ALU L2 (cache) ALU ALU CONTROL DRAM GPU DRAM L2 (cache) Hundreds of ALUs Hundreds of ALUs 1. Copy Input Data from CPU Memory to GPU Memory; 2. Load dedicated functions to GPU; 3. Copy computing results from GPU Memory to CPU Memory. 1 2 3
  • 36. What is CUDA? • Parallel Programming Model • Includes Memory Model • Can utilize 100’s of CUDA Cores and 1000’s of Parallel Threads • Developer can focus on Parallel Programming abstracting from many low-level operations • Supports heterogeneous systems using CPU+GPU • Implemented in C++ • There are extensions for C, C++, C#, Fortran, Java, Python, Ruby, etc.
  • 37. CUDA Kernel • Kernel is a set of computing instructions (small program) designed for GPU; • GPU runs Kernels in 1000’s of parallel threads; • CUDA Threads: • Lightweight • Fast Switching • Massively Parallel
  • 38. Logical Architecture vs. Physical Architecture Threads CUDA Core …. Thread Block Streaming Multiprocessor (SM) …. …. Grid …. …. …. …. GPU Unit Executed by Executed by Executed by
  • 39. BIG DOES NOT MEAN SLOW AAdamov, CeDAR, ADA University
  • 40. SMALL DOES NOT MEAN WEAK AAdamov, CeDAR, ADA University
  • 41. Information is the oil of the 21st century, and Analytics is the Combustion Engine
  • 42. Q & A ? Dr. Abzetdin Adamov, Email me at: aadamov@ada.edu.az Follow me at: @ Link to me at: www.linkedin.com/in/adamov Visit my blog at: aadamov.wordpress.com