SlideShare a Scribd company logo
A Gentle
Introduction to
Big Data
Presenter: Mehmet Ali Akyol
April 03, 2018
How much data is big data?
Big data happens when the data you
have to process is bigger than what
you can process in the given time
What is Big Data?
Field dedicated to the analysis, processing, and storage of large collections of
data that frequently originate from disparate sources
Required when traditional data analysis, processing and storage technologies
and techniques are insufficient
Addresses distinct requirements, such as the combining of multiple unrelated
datasets, processing of large amounts of unstructured data and harvesting of
hidden information in a time-sensitive manner
Characteristics
of Big Data
● Volume
● Variety
● Velocity
● Veracity
● Value
Volume
About the scale of data
Terabytes, petabytes, exabytes, zettabytes…
Airbus generates 640TB of data in a flight
Self driving cars will generate 2PB of data every year
Variety
Various types of data formats and types
Structured, unstructured, and semi-structured
Text files, media files such as sound and video
Velocity
The speed of data is produced
The delay before the data must be consumed
Streaming data; Social media posts (Tweets, Facebook posts), IOT (Internet of
Things)
Veracity
Quality of data
Meaningful results
Understandability
Importance of data source
Value
Usefulness of data
Amount of knowledge that can be extracted from data
Making informed decisions
Types of Data
Structured Data Unstructured
Data
Semi-structured
data
Metadata
Structured Data
Unstructured Data
Semi-structured Data
Metadata
Data about data
Details of the dataset
such as source, date,
and type
Data Processing in Big Data
Batch Processing
- Stored data is processed at
certain time intervals
- Hadoop
Stream Processing
- Processing continuous streams of
data in real time
- Apache Storm
Big Data
Processing
Architectures
Processing architectures that takes
advantage of both batch and stream
processing
- Lambda Architecture
- Apache Spark
- Kappa Architecture
- Apache Flink
Big Data Tools
Hadoop: A framework
that allows for the
distributed storage &
processing of large
datasets across clusters
of computers
Hive: A data warehouse
infrastructure that
provides data
summarization and
querying
Flink: Stream
processing framework
for distributed,
high-performing,
always-available, and
accurate data streaming
applications
Spark: A fast and
general engine for
large-scale data
processing for both
batch and streaming
data
Big Data Tools
Storm: A distributed
realtime computation
system for unbounded
data processing
Beam: A framework for
batch and streaming
data processing jobs
that run on any
execution engine such
as Flink and Spark
Zeppelin: Web-based
notebook that enables
data-driven, interactive
data analytics, and data
visualization
Kafka: A distributed
message queue for
building real-time data
pipelines and streaming
apps
Takeaways
Big Data is a research field that deals with processing of large amount of data
that traditional data processing techniques cannot handle in a timely and
efficient manner
5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value
Data Types: Structured, Unstructured, Semi-structured, and Metadata
Big Data Processing Methods: Batch and Stream Processing
There are tens of open-source big data tools out there
Thanks for listening!

More Related Content

PPTX
Introduction to Big Data
PPTX
big data and hadoop
PPTX
Hadoop Training Tutorial for Freshers
PPTX
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
PPTX
Great Expectations Presentation
PDF
Big data presentation
PPTX
Big Data Analytics & Architecture
PDF
Big data ecosystem
Introduction to Big Data
big data and hadoop
Hadoop Training Tutorial for Freshers
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Great Expectations Presentation
Big data presentation
Big Data Analytics & Architecture
Big data ecosystem

What's hot (20)

PDF
PDF
Introduction to Big Data Technologies & Applications
PPT
Big Tools for Big Data
PPTX
View on big data technologies
PDF
Big data converted
PDF
Big Data Ecosystem
PDF
Analysis of big data in pandemic case
PPTX
Useful data presentation from DataShaka
PPTX
Big data – a brief overview
PPT
Overview of Bigdata Analytics
PPSX
Big Data
PPTX
Nyc web perf-final-july-23
PPTX
Big Data Ecosystem
PDF
Big Data Tech Stack
PPTX
big data overview ppt
PDF
Bigdata and Hadoop Bootcamp
PPTX
INTRODUCTION OF BIG DATA
PPTX
Big data unit 2
PDF
Hadoop Big data Solution Provider
PDF
Дмитрий Попович "How to build a data warehouse?"
Introduction to Big Data Technologies & Applications
Big Tools for Big Data
View on big data technologies
Big data converted
Big Data Ecosystem
Analysis of big data in pandemic case
Useful data presentation from DataShaka
Big data – a brief overview
Overview of Bigdata Analytics
Big Data
Nyc web perf-final-july-23
Big Data Ecosystem
Big Data Tech Stack
big data overview ppt
Bigdata and Hadoop Bootcamp
INTRODUCTION OF BIG DATA
Big data unit 2
Hadoop Big data Solution Provider
Дмитрий Попович "How to build a data warehouse?"
Ad

Similar to A Gentle Introduction to Big Data (20)

PDF
Introduction Big Data
PPTX
big data eco system fundamentals of data science
PPTX
Data lake-itweekend-sharif university-vahid amiry
PPT
Hadoop HDFS.ppt
PPTX
Big Data Session 1.pptx
PPTX
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
DOC
Big Data Technologies - Hadoop, Spark, and Beyond.doc
PDF
BD_Architecture and Charateristics.pptx.pdf
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
PPTX
BDA ( haoop ).pptx
PPTX
Big data architecture
PPT
Data analytics & its Trends
PDF
Big data
PDF
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
PPT
Big data.ppt
PPTX
Lecture1
PPTX
Big Data Analytics PPT - S1 working .pptx
PPTX
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
PDF
INF2190_W1_2016_public
PPTX
Internet of Things & Big Data
Introduction Big Data
big data eco system fundamentals of data science
Data lake-itweekend-sharif university-vahid amiry
Hadoop HDFS.ppt
Big Data Session 1.pptx
Yaron Haviv, Iguaz.io - OpenStack and BigData - OpenStack Israel 2015
Big Data Technologies - Hadoop, Spark, and Beyond.doc
BD_Architecture and Charateristics.pptx.pdf
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
BDA ( haoop ).pptx
Big data architecture
Data analytics & its Trends
Big data
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Big data.ppt
Lecture1
Big Data Analytics PPT - S1 working .pptx
Unushs susus susujss. Ssuusussjjsjsit 4.pptx
INF2190_W1_2016_public
Internet of Things & Big Data
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
Business_Capability_Map_Collection__pptx
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Introduction to Data Science and Data Analysis
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Microsoft Core Cloud Services powerpoint
PDF
Navigating the Thai Supplements Landscape.pdf
PPTX
Managing Community Partner Relationships
PDF
Global Data and Analytics Market Outlook Report
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPT
Predictive modeling basics in data cleaning process
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Business_Capability_Map_Collection__pptx
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Introduction to Data Science and Data Analysis
[EN] Industrial Machine Downtime Prediction
Microsoft Core Cloud Services powerpoint
Navigating the Thai Supplements Landscape.pdf
Managing Community Partner Relationships
Global Data and Analytics Market Outlook Report
Pilar Kemerdekaan dan Identi Bangsa.pptx
ISS -ESG Data flows What is ESG and HowHow
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
A Complete Guide to Streamlining Business Processes
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Qualitative Qantitative and Mixed Methods.pptx
Predictive modeling basics in data cleaning process

A Gentle Introduction to Big Data

  • 1. A Gentle Introduction to Big Data Presenter: Mehmet Ali Akyol April 03, 2018
  • 2. How much data is big data?
  • 3. Big data happens when the data you have to process is bigger than what you can process in the given time
  • 4. What is Big Data? Field dedicated to the analysis, processing, and storage of large collections of data that frequently originate from disparate sources Required when traditional data analysis, processing and storage technologies and techniques are insufficient Addresses distinct requirements, such as the combining of multiple unrelated datasets, processing of large amounts of unstructured data and harvesting of hidden information in a time-sensitive manner
  • 5. Characteristics of Big Data ● Volume ● Variety ● Velocity ● Veracity ● Value
  • 6. Volume About the scale of data Terabytes, petabytes, exabytes, zettabytes… Airbus generates 640TB of data in a flight Self driving cars will generate 2PB of data every year
  • 7. Variety Various types of data formats and types Structured, unstructured, and semi-structured Text files, media files such as sound and video
  • 8. Velocity The speed of data is produced The delay before the data must be consumed Streaming data; Social media posts (Tweets, Facebook posts), IOT (Internet of Things)
  • 9. Veracity Quality of data Meaningful results Understandability Importance of data source
  • 10. Value Usefulness of data Amount of knowledge that can be extracted from data Making informed decisions
  • 11. Types of Data Structured Data Unstructured Data Semi-structured data Metadata
  • 15. Metadata Data about data Details of the dataset such as source, date, and type
  • 16. Data Processing in Big Data Batch Processing - Stored data is processed at certain time intervals - Hadoop Stream Processing - Processing continuous streams of data in real time - Apache Storm
  • 17. Big Data Processing Architectures Processing architectures that takes advantage of both batch and stream processing - Lambda Architecture - Apache Spark - Kappa Architecture - Apache Flink
  • 18. Big Data Tools Hadoop: A framework that allows for the distributed storage & processing of large datasets across clusters of computers Hive: A data warehouse infrastructure that provides data summarization and querying Flink: Stream processing framework for distributed, high-performing, always-available, and accurate data streaming applications Spark: A fast and general engine for large-scale data processing for both batch and streaming data
  • 19. Big Data Tools Storm: A distributed realtime computation system for unbounded data processing Beam: A framework for batch and streaming data processing jobs that run on any execution engine such as Flink and Spark Zeppelin: Web-based notebook that enables data-driven, interactive data analytics, and data visualization Kafka: A distributed message queue for building real-time data pipelines and streaming apps
  • 20. Takeaways Big Data is a research field that deals with processing of large amount of data that traditional data processing techniques cannot handle in a timely and efficient manner 5 Vs of Big Data: Volume, Velocity, Variety, Veracity, and Value Data Types: Structured, Unstructured, Semi-structured, and Metadata Big Data Processing Methods: Batch and Stream Processing There are tens of open-source big data tools out there