SlideShare a Scribd company logo
Big Data
Big data
Big Data
• What is Big Data?
• Analog starage vs digital.
• The FOUR V’s of Big Data.
• Who’s Generating Big Data
• The importance of Big Data.
• Optimalization
• HDFC
Definition
Big datais the term for a collection
of data sets so large and complex
that it becomes difficult to
process using on-hand database
management tools or traditional
data processing applications. The
challenges include capture,
curation, storage, search,
sharing, transfer, analysis, and
visualization.
Big data
The FOUR V’s of Big Data
From traffic patterns and music downloads to web
history and medical records, data is recorded,
stored, and analyzed to enable that technology
and services that the world relies on every day.
But what exactly is big data be used?
According to IBM scientists big data can be break
into four dimensions: Volume, Velocity, Variety
and Veracity.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Volume. Many factors contribute to the increase in
data volume. Transaction-based data stored
through the years. Unstructured data streaming
in from social media. Increasing amounts of
sensor and machine-to-machine data being
collected. In the past, excessive data volume was
a storage issue. But with decreasing storage
costs, other issues emerge, including how to
determine relevance within large data volumes
and how to use analytics to create value from
relevant data.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Variety. Data today comes in all types of formats.
Structured, numeric data in traditional databases.
Information created from line-of-business
applications. Unstructured text documents,
email, video, audio, stock ticker data and
financial transactions. Managing, merging and
governing different varieties of data is something
many organizations still grapple with.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Velocity. Data is streaming in at unprecedented
speed and must be dealt with in a timely manner.
RFID tags, sensors and smart metering are driving
the need to deal with torrents of data in near-
real time. Reacting quickly enough to deal with
data velocity is a challenge for most
organizations.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Veracity - Big Data Veracity refers to the biases,
noise and abnormality in data. Is the data that is
being stored, and mined meaningful to the
problem being analyzed. Inderpal feel veracity in
data analysis is the biggest challenge when
compares to things like volume and velocity. In
scoping out your big data strategy you need to
have your team and partners work to help keep
your data clean and processes to keep ‘dirty data’
from accumulating in your systems.
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
15
The importance of Big Data
The real issue is not that you are acquiring large
amounts of data. It's what you do with the data that
counts. The hopeful vision is that organizations will
be able to take data from any source, harness
relevant data and analyze it to find answers that
enable:
• Cost reductions
• Time reductions
• New product development and optimized offerings
• Smarter business decision making
Big data
The importance of Big Data
For instance, by combining big data and high-powered analytics, it is possible
to:
• Determine root causes of failures, issues and defects in near-real time,
potentially saving billions of dollars annually.
• Optimize routes for many thousands of package delivery vehicles while
they are on the road.
• Analyze millions of SKUs to determine prices that maximize profit and
clear inventory.
• Generate retail coupons at the point of sale based on the customer's
current and past purchases.
• Send tailored recommendations to mobile devices while customers are in
the right area to take advantage of offers.
• Recalculate entire risk portfolios in minutes.
• Quickly identify customers who matter the most.
• Use clickstream analysis and data mining to detect fraudulent behavior
HDFS / Hadoop
Data in a HDFS cluster is broken down into
smaller pieces (called blocks) and
distributed throughout the cluster. In this
way, the map and reduce functions can
be executed on smaller subsets of your
larger data sets, and this provides the
scalability that is needed for big data
processing. The goal of Hadoop is to use
commonly available servers in a very
large cluster, where each server has a set
of inexpensive internal disk drives.
PROS OF HDFS
• Scalable – New nodes can be added as needed,
and added without needing to change data
formats, how data is loaded, how jobs are
written, or the applications on top.
• Cost effective – Hadoop brings massively parallel
computing to commodity servers. The result is a
sizeable decrease in the cost per terabyte of
storage, which in turn makes it affordable to
model all your data.
• Flexible – Hadoop is schema-less, and can absorb
any type of data, structured or not, from any
Big data
Big data
Big data
Sources
• McKinsey Global Institute
• Cisco
• Gartner
• EMC, SAS
• IBM
• MEPTEC
Thank you for your
attention.
Authors: Tomasz Wis
Krzysztof Rudnicki

More Related Content

PDF
Big data introduction
PPTX
Big data, Big decision
PPTX
Big Data
PPT
Big Data
PPTX
Big Data ppt
PPTX
Big data Ppt
PPTX
Big data
PDF
Big Data, Big Deal: For Future Big Data Scientists
Big data introduction
Big data, Big decision
Big Data
Big Data
Big Data ppt
Big data Ppt
Big data
Big Data, Big Deal: For Future Big Data Scientists

What's hot (20)

PDF
Big data tools
PPTX
Big data ppt
PPTX
Data mining with big data
PPTX
Data mining with big data implementation
PPTX
Big data
PPTX
Data mining with big data
PPTX
Introduction of big data and analytics
PPTX
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
DOCX
JPJ1417 Data Mining With Big Data
PDF
Data minig with Big data analysis
PPTX
Big data
PPTX
PPTX
Big data ppt
PPTX
Big_data_ppt
PPTX
Chapter 1 big data
PPTX
Presentation on Big Data
PPTX
big data overview ppt
PPTX
Mining Big Data in Real Time
Big data tools
Big data ppt
Data mining with big data
Data mining with big data implementation
Big data
Data mining with big data
Introduction of big data and analytics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
JPJ1417 Data Mining With Big Data
Data minig with Big data analysis
Big data
Big data ppt
Big_data_ppt
Chapter 1 big data
Presentation on Big Data
big data overview ppt
Mining Big Data in Real Time
Ad

Similar to Big data (20)

PPTX
Big data analytics for the bussiness purpose
PDF
Bigdata (1) converted
PDF
All About Big Data
PPTX
PPTX
Understanding big data
PPSX
Big data with Hadoop - Introduction
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPTX
An Overview of BigData
PPTX
SKILLWISE-BIGDATA ANALYSIS
PPTX
bigdataintro.pptx
PPTX
Big data
PPTX
Data Mining With Big Data
PPTX
Big data and data mining
PDF
02 a holistic approach to big data
PPTX
Big data4businessusers
PPTX
Guide to big data analytics
DOCX
Big data lecture notes
PDF
Ab cs of big data
PDF
Big Data Evolution
Big data analytics for the bussiness purpose
Bigdata (1) converted
All About Big Data
Understanding big data
Big data with Hadoop - Introduction
Lecture 5 - Big Data and Hadoop Intro.ppt
An Overview of BigData
SKILLWISE-BIGDATA ANALYSIS
bigdataintro.pptx
Big data
Data Mining With Big Data
Big data and data mining
02 a holistic approach to big data
Big data4businessusers
Guide to big data analytics
Big data lecture notes
Ab cs of big data
Big Data Evolution
Ad

More from Hoang Nguyen (20)

PPTX
Rest api to integrate with your site
PPTX
How to build a rest api
PPTX
Api crash
PPTX
Smm and caching
PPTX
Optimizing shared caches in chip multiprocessors
PPTX
How analysis services caching works
PPTX
Hardware managed cache
PPTX
Directory based cache coherence
PPTX
Cache recap
PPTX
Python your new best friend
PPTX
Python language data types
PPTX
Python basics
PPTX
Programming for engineers in python
PPTX
Learning python
PPTX
Extending burp with python
PPTX
Cobol, lisp, and python
PPT
Object oriented programming using c++
PPTX
Object oriented analysis
PPTX
Object model
PPTX
Data structures and algorithms
Rest api to integrate with your site
How to build a rest api
Api crash
Smm and caching
Optimizing shared caches in chip multiprocessors
How analysis services caching works
Hardware managed cache
Directory based cache coherence
Cache recap
Python your new best friend
Python language data types
Python basics
Programming for engineers in python
Learning python
Extending burp with python
Cobol, lisp, and python
Object oriented programming using c++
Object oriented analysis
Object model
Data structures and algorithms

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
The various Industrial Revolutions .pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
STKI Israel Market Study 2025 version august
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPT
What is a Computer? Input Devices /output devices
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Modernising the Digital Integration Hub
PDF
Assigned Numbers - 2025 - Bluetooth® Document
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Hindi spoken digit analysis for native and non-native speakers
Module 1.ppt Iot fundamentals and Architecture
cloud_computing_Infrastucture_as_cloud_p
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Developing a website for English-speaking practice to English as a foreign la...
Programs and apps: productivity, graphics, security and other tools
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
The various Industrial Revolutions .pptx
Chapter 5: Probability Theory and Statistics
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
STKI Israel Market Study 2025 version august
Univ-Connecticut-ChatGPT-Presentaion.pdf
Enhancing emotion recognition model for a student engagement use case through...
1 - Historical Antecedents, Social Consideration.pdf
WOOl fibre morphology and structure.pdf for textiles
What is a Computer? Input Devices /output devices
Zenith AI: Advanced Artificial Intelligence
Modernising the Digital Integration Hub
Assigned Numbers - 2025 - Bluetooth® Document

Big data

  • 3. Big Data • What is Big Data? • Analog starage vs digital. • The FOUR V’s of Big Data. • Who’s Generating Big Data • The importance of Big Data. • Optimalization • HDFC
  • 4. Definition Big datais the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
  • 6. The FOUR V’s of Big Data From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used? According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity.
  • 7. The FOUR V’s of Big Data
  • 8. The FOUR V’s of Big Data Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
  • 9. The FOUR V’s of Big Data
  • 10. The FOUR V’s of Big Data Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
  • 11. The FOUR V’s of Big Data
  • 12. The FOUR V’s of Big Data Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near- real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
  • 13. The FOUR V’s of Big Data
  • 14. The FOUR V’s of Big Data Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems.
  • 15. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 15
  • 16. The importance of Big Data The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable: • Cost reductions • Time reductions • New product development and optimized offerings • Smarter business decision making
  • 18. The importance of Big Data For instance, by combining big data and high-powered analytics, it is possible to: • Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually. • Optimize routes for many thousands of package delivery vehicles while they are on the road. • Analyze millions of SKUs to determine prices that maximize profit and clear inventory. • Generate retail coupons at the point of sale based on the customer's current and past purchases. • Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers. • Recalculate entire risk portfolios in minutes. • Quickly identify customers who matter the most. • Use clickstream analysis and data mining to detect fraudulent behavior
  • 19. HDFS / Hadoop Data in a HDFS cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing. The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives.
  • 20. PROS OF HDFS • Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. • Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. • Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any
  • 24. Sources • McKinsey Global Institute • Cisco • Gartner • EMC, SAS • IBM • MEPTEC
  • 25. Thank you for your attention. Authors: Tomasz Wis Krzysztof Rudnicki