SlideShare a Scribd company logo
Big Data
Valeri Kopaleishvili
Outline
◦what is Big Data ?
◦where this Beg Data come from?
◦4v`s Analysis
◦When dealing with big Data?
◦EXAMPLE : Google
What is big data?
“Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone. This data comes from
everywhere: sensors used to gather climate information,
posts to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS signals to
name a few.
This data is “big data.”
Where Is This “Big Data” Coming From?
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?TBsof
dataevery
day
2+ billion
people on
the Web
by end
2011
30 billion RFID tags
today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions of
GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
Volume
of Tweets
create daily.
12+ terabytes
Variety
of different
types of data.
100’s
Value
With Big Data, We’ve Moved to 4 Vs Analytics
trade events
per second.
5+ million
Velocity
Volume (Scale)
Data Volume
◦ 44x increase from 2009 2020
◦ From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
6
Refers to the vast amounts of data generated every second. We are not talking
Terabytes but Petabytes . If we take all the data generated in the world between the
beginning of time and 2008, the same amount of data will soon be generated every
minute. This makes most data sets too large to store and analyze using traditional
database technology. New big data tools use distributed systems so that we can store
and analyse data across databases that are dotted around anywhere in the world.
Variety (Complexity)
7
To extract knowledge all these types of
data need to be linked together
Refers to the different types of data we can now use. In the past we only
focused on structured data that neatly fitted into tables or relational
databases, such as financial data.
In fact, 80% of the world’s data is unstructured (text, images, video,
voice, etc.) With big data technology we can now analyze and bring
together data of different types such as messages, social media
conversations, photos, sensor data, video or voice recordings.
Velocity (Speed)
Velocity :Refers to the speed at which new data is generated and the speed at
which data moves around. Just think of social media messages going viral in
seconds. Technology allows us now to analyze the data while it is being
generated (sometimes referred to as in-memory analytics), without ever
putting it into databases.
Examples
◦ E-Promotions: Based on your current location, your purchase history, what
you like  send promotions right now for store next to you
◦ Healthcare monitoring: sensors monitoring your activities and body  any
abnormal measurements require immediate reaction
8
Real-time/Fast Data
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
9
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
Value Then there is another V to take into account when looking at Big
Data: Value! Having access to big data is no good unless we can turn it
into value. Companies are starting to generate amazing value from their
big data.
We currently only see the beginnings of a transformation into a big data
economy. Any business that doesn’t seriously consider the implications
of Big Data runs the risk of being left behind.
Value
Big Data Exploration: Value & Diagram
11
File
Systems
Relational
Data
Content
Management
Email
CRM
Supply
Chain
ERP
RSS Feeds
Cloud
Custom
Sources
DataExplorer
Application/
Users
Find, Visualize & Understand
all big data to improve
business knowledge
• Greater efficiencies in business
processes
• New insights from combining and
analyzing data types in new
ways
• Develop new business models
with resulting increased market
presence and revenue
Applications for Big Data Analytics
Homeland Security
FinanceSmarter Healthcare
Telecom
Manufacturing
Traffic Control
Trading Analytics
Log Analysis
Search Quality
When dealing with Big Data is
hard
When the operations on data are complex:
◦ Eg. Simple counting is not a complex problem.
◦ Modeling and reasoning with data of different kinds can
get extremely complex
Good news with big-data:
◦ Often, because of the vast amount of data, modeling
techniques can get simpler (e.g., smart counting can
replace complex model-based analytics)…
◦ …as long as we deal with the scale.
Big data
Big data
Big data
Big data
Big data
Big data
Hadoopis an open-source software framework for storing and processing big data in a
distributed fashion on large clusters of commodity hardware.
Suitable for extremely large databases (billions of rows, millions of columns), distributed
across thousands of nodes.
Hadoop Distributed File System (HDFS) is a Java-based file system that provides
scalable and reliable data storage that is designed to large clusters of commodity
servers.
Big data
Big data
MapReduce is a programming model and an associated implementation for processing and generating
large data sets with a parallel, distributed algorithm on a cluster.
Big data
We first wrote the data into HDFS, then created a table and loaded data from HDFS
files to HIVE table.
Big data
Thanks!

More Related Content

PPTX
Big data Presentation
PPTX
Big data
PDF
Big data introduction
PPT
Big Data
PDF
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
PPT
Big Data
PDF
8 Steps to Creating a Data Strategy
PPTX
Big Data ppt
Big data Presentation
Big data
Big data introduction
Big Data
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Big Data
8 Steps to Creating a Data Strategy
Big Data ppt

What's hot (20)

PPTX
Big data ppt
PPTX
BUSINESS INTELLIGENCE
PPTX
Big Data
PPTX
Presentation on Big Data
PPTX
Big Data
PPTX
PDF
Improving Data Literacy Around Data Architecture
PPTX
Overview of Big data(ppt)
PPTX
PPTX
Presentation on Big Data
PPTX
Chapter 1 big data
PPTX
Big data
PPTX
Is the traditional data warehouse dead?
PPTX
What is big data?
PPTX
Big_data_ppt
PDF
Data Architecture Strategies: Data Architecture for Digital Transformation
PDF
Why an AI-Powered Data Catalog Tool is Critical to Business Success
PDF
Data Governance
PPT
Data Warehouse Basic Guide
Big data ppt
BUSINESS INTELLIGENCE
Big Data
Presentation on Big Data
Big Data
Improving Data Literacy Around Data Architecture
Overview of Big data(ppt)
Presentation on Big Data
Chapter 1 big data
Big data
Is the traditional data warehouse dead?
What is big data?
Big_data_ppt
Data Architecture Strategies: Data Architecture for Digital Transformation
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Data Governance
Data Warehouse Basic Guide
Ad

Viewers also liked (12)

PPTX
PPTX
Big data (Data Size doesn't Matter, How and What is Data that's matter)
PPTX
BDI- The Beginning (Big data training in Coimbatore)
PDF
Privacy in the Age of Big Data
PPT
Big data : Coudbells.com
PPTX
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
PDF
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
PDF
行動廣告與大數據資料分析策略與執行
PPT
Big data ppt
PPTX
Big data ppt
PPTX
What is Big Data?
Big data (Data Size doesn't Matter, How and What is Data that's matter)
BDI- The Beginning (Big data training in Coimbatore)
Privacy in the Age of Big Data
Big data : Coudbells.com
Webinar | Using Hadoop Analytics to Gain a Big Data Advantage
SAP HANA and Apache Hadoop for Big Data Management (SF Scalable Systems Meetup)
行動廣告與大數據資料分析策略與執行
Big data ppt
Big data ppt
What is Big Data?
Ad

Similar to Big data (20)

DOCX
Introduction to big data – convergences.
PDF
MBA-TU-Thailand:BigData for business startup.
PPTX
SKILLWISE-BIGDATA ANALYSIS
PDF
SuanIct-Bigdata desktop-final
PPTX
Big data
PPTX
Big data
PDF
Final_Bigdata_pret
PDF
Big data Seminar/Presentation
PPT
IT FUTURE- Big data
PPTX
What is Big Data?
PPTX
What is Big Data?
PPTX
PPTX
DOCX
Big data lecture notes
PPTX
BIG DATA,WHAT IS BIG DATA?THREE CHARACTERISTICS OF BIG DATA
PPTX
WHAT IS BIG DATA,THREE CHARACTERISTICS OF BIG DATA
PDF
Bigdata (1) converted
PPT
130214 copy
PPTX
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx
PPTX
Introduction to big data – convergences.
MBA-TU-Thailand:BigData for business startup.
SKILLWISE-BIGDATA ANALYSIS
SuanIct-Bigdata desktop-final
Big data
Big data
Final_Bigdata_pret
Big data Seminar/Presentation
IT FUTURE- Big data
What is Big Data?
What is Big Data?
Big data lecture notes
BIG DATA,WHAT IS BIG DATA?THREE CHARACTERISTICS OF BIG DATA
WHAT IS BIG DATA,THREE CHARACTERISTICS OF BIG DATA
Bigdata (1) converted
130214 copy
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx

More from valeri kopaleishvili (6)

PPTX
Georgia(格鲁吉亚)
DOCX
Run wordcount job (hadoop)
PDF
Software specification for
PDF
Erp (sap report)
PPTX
Design interpreter pattern
Georgia(格鲁吉亚)
Run wordcount job (hadoop)
Software specification for
Erp (sap report)
Design interpreter pattern

Recently uploaded (20)

PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Foundation of Data Science unit number two notes
PPTX
Global journeys: estimating international migration
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
1_Introduction to advance data techniques.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
Data_Analytics_and_PowerBI_Presentation.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Foundation of Data Science unit number two notes
Global journeys: estimating international migration
Reliability_Chapter_ presentation 1221.5784
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21
1_Introduction to advance data techniques.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Launch Your Data Science Career in Kochi – 2025
IB Computer Science - Internal Assessment.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
STUDY DESIGN details- Lt Col Maksud (21).pptx

Big data

  • 2. Outline ◦what is Big Data ? ◦where this Beg Data come from? ◦4v`s Analysis ◦When dealing with big Data? ◦EXAMPLE : Google
  • 3. What is big data? “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data.”
  • 4. Where Is This “Big Data” Coming From? 12+ TBs of tweet data every day 25+ TBs of log data every day ?TBsof dataevery day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  • 5. Volume of Tweets create daily. 12+ terabytes Variety of different types of data. 100’s Value With Big Data, We’ve Moved to 4 Vs Analytics trade events per second. 5+ million Velocity
  • 6. Volume (Scale) Data Volume ◦ 44x increase from 2009 2020 ◦ From 0.8 zettabytes to 35zb Data volume is increasing exponentially 6 Refers to the vast amounts of data generated every second. We are not talking Terabytes but Petabytes . If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyze using traditional database technology. New big data tools use distributed systems so that we can store and analyse data across databases that are dotted around anywhere in the world.
  • 7. Variety (Complexity) 7 To extract knowledge all these types of data need to be linked together Refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyze and bring together data of different types such as messages, social media conversations, photos, sensor data, video or voice recordings.
  • 8. Velocity (Speed) Velocity :Refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyze the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases. Examples ◦ E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you ◦ Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 8
  • 9. Real-time/Fast Data The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 9 Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  • 10. Value Then there is another V to take into account when looking at Big Data: Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data. We currently only see the beginnings of a transformation into a big data economy. Any business that doesn’t seriously consider the implications of Big Data runs the risk of being left behind. Value
  • 11. Big Data Exploration: Value & Diagram 11 File Systems Relational Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources DataExplorer Application/ Users Find, Visualize & Understand all big data to improve business knowledge • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue
  • 12. Applications for Big Data Analytics Homeland Security FinanceSmarter Healthcare Telecom Manufacturing Traffic Control Trading Analytics Log Analysis Search Quality
  • 13. When dealing with Big Data is hard When the operations on data are complex: ◦ Eg. Simple counting is not a complex problem. ◦ Modeling and reasoning with data of different kinds can get extremely complex Good news with big-data: ◦ Often, because of the vast amount of data, modeling techniques can get simpler (e.g., smart counting can replace complex model-based analytics)… ◦ …as long as we deal with the scale.
  • 20. Hadoopis an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Suitable for extremely large databases (billions of rows, millions of columns), distributed across thousands of nodes.
  • 21. Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to large clusters of commodity servers.
  • 24. MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  • 26. We first wrote the data into HDFS, then created a table and loaded data from HDFS files to HIVE table.