SlideShare a Scribd company logo
Big Data: Its Characteristics And
Architecture Capabilities

By
Ashraf Uddin
South Asian University
(http://ashrafsau.blogspot.in/)
What is Big Data?
Big data refers to large datasets that are
challenging
to
store,
search,
share,
visualize, and analyze.
“Big Data” is data whose scale, diversity,
and complexity require new architecture,
techniques, algorithms, and analytics to
manage it and extract value and hidden
knowledge from it…
The Model of Generating/Consuming
Data has Changed
Old Model: Few companies are generating data, all others are
consuming data

New Model: all of us are generating data, and all of us are
consuming data
Do we really need Big Data?
For consumer :

Better understanding of own behavior

Integration of activities

Influence – involvement and recognition

For companies :

Real behavior-- what do people do, and what do they
value?

Faster interaction

Better targeted offers

Customer understanding
Characteristics of Big Data

1. Volume (Scale)
2. Velocity (Speed)
3. Varity (Complexity)
Volume
Velocity
• Data is being generated fast and need to be
processed fast
• Online Data Analytics
• Late Decision leads missing opportunity
Varity
• Various formats, types, and
structures
• Text, numerical, images,
audio, video, sequences, time
series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many
types of data
• To extract knowledge all
these types of data need to
linked together
Generation of Big Data

Scientific instruments
(collecting all sorts of data)

Social media and networks
(all of us are generating data)

Sensor technology and
networks
(measuring all kinds of data)
Why Big Data is Different?
For example, an airline jet collects 10 terabytes of
sensor data for every 30 minutes of flying time.
Compare that with conventional high performance
computing where New York Stock Exchange collects
1 terabyte of structured trading data per day.
Conventional corporate structured data sized in
terabytes and petabytes.
Big Data is sized in peta-, exa-, and soon perhaps,
zetta-bytes!
Why Big Data is Different?
The unique characteristics of Big Data is the
manner in which value is discovered.
In conventional BI, the simple summing of a
known value reveals a result
In Big Data, the value is discovered through a
refining modeling process:
make a hypothesis
create statistical, visual, or semantic models
validate, then make a new hypothesis.
Use cases for Big Data Analytics
A Big Data Use Case:
Personalized Insurance Premium

an insurance company wants to offer to those who are
unlikely to make a claim, thereby optimizing their profits.
One way to approach this problem is to collect more
detailed data about an individual's driving habits and then
assess their risk.
to collect data on driving habits utilizing sensors in their
customers' cars to capture driving data, such as routes
driven, miles driven, time of day, and braking abruptness.
A Big Data Use Case:
Personalized Insurance Premium

This data is used to assess driver risk; they compare
individual
driving
patterns
with
other
statistical
information, such as average miles driven in same state,
and peak hours of drivers on the road.
Driver risk plus actuarial information is then correlated
with policy and profile information to offer a competitive
and more profitable rate for the company
The result
A personalized insurance plan.
These unique capabilities, delivered from big data analytics, are
revolutionizing the insurance industry.
A Big Data Use Case:
Personalized Insurance Premium

To accomplish this task:
a great amount of continuous data must be collected,
stored, and correlated.
Hadoop is an excellent choice for acquisition and
reduction of the automobile sensor data.
Master data and certain reference data including
customer profile information are likely to be stored in the
existing DBMS systems
a NoSQL database can be used to capture and store
reference data that are more dynamic, diverse in formats,
and change frequently.
Data Realm Characteristics
Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
Storage and Management Capability
Hadoop
(HDFS)

Distributed

File

System

 highly scalable storage and automatic
data replication across three nodes for fault
tolerance

Cloudera Manager
 gives a cluster-wide, real-time view of
nodes and services running; provides a
single, central place to enact configuration
changes across the cluster
Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
Database Capability
Oracle NoSQL
 Dynamic and flexible schema design
 High performance key value pair database.

Apache HBase
 Strictly consistent reads and writes
 Allows random, real time read/write access

Apache Cassandra
 Fault tolerance capability is designed for every node
 Data model offers column indexes with the
performance of log-structured updates, materialized
views, and built-in caching

Apache Hive
 Tools to enable easy data extract/transform/load (ETL)

 Query execution via MapReduce
Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
Processing Capability
MapReduce

Break problem up into smaller
sub-problems
 Able to distribute data workloads across
thousands of nodes

Apache Hadoop
 Leading MapReduce implementation
 Highly scalable parallel batch processing
 Writes multiple copies across cluster for
fault tolerance
Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
Data Integration Capability
Exports MapReduce results
Hadoop, and other targets

to

RDBMS,

Connects Hadoop to relational databases for
SQL processing
Optimized processing
import/export

with

parallel

data
Big Data Architecture Capabilities
Storage and Management Capability
Database Capability
Processing Capability
Data Integration Capability
Statistical Analysis Capability
Statistical Analysis Capability
Programming
analysis

language

for

statistical

Oracle R Enterprise allows reuse
pre-existing R scripts with no modification

of
Big Data Architecture

Traditional Information Architecture Capability

Big Data Information Architecture Capability
Conclusion
Today’s economic environment demands
that business be driven by useful, accurate,
and timely information.
the world of Big Data is a solution to the
problem.
there are always business and IT tradeoffs to
get to data and information in a most
cost-effective way.
References
1. Big Data Analytics Guide: Better technology, more
insight for the next generation of business
applications, SAP
2. Oracle Information
Guide to Big Data

Architecture:

An

Architect’s

3. http://
www.csc.com/insights/flxwd/78931-big_data_univers
e_beginning_to_explode
4. http://
www.techrepublic.com/blog/big-data-analytics/10-em
erging-technologies-for-big-data/280
5. http://www.idc.com/
6. From Database to Big Data. Sam Madden (MIT)

More Related Content

PPTX
Presentation on Big Data
PDF
Big Data Evolution
PDF
Big Data Ecosystem
PPTX
Big data frameworks
PPTX
Big Data - Applications and Technologies Overview
PPTX
Big Data PPT by Rohit Dubey
PPTX
BIG DATA and USE CASES
PDF
Lecture4 big data technology foundations
Presentation on Big Data
Big Data Evolution
Big Data Ecosystem
Big data frameworks
Big Data - Applications and Technologies Overview
Big Data PPT by Rohit Dubey
BIG DATA and USE CASES
Lecture4 big data technology foundations

What's hot (20)

PPTX
Big Data Open Source Technologies
PPT
Data preprocessing
PPTX
PDF
Lecture6 introduction to data streams
PPTX
Big Data - The 5 Vs Everyone Must Know
PPTX
Overview of Big data(ppt)
PPTX
Chapter 1 big data
PPTX
Hadoop File system (HDFS)
PPSX
Frequent itemset mining methods
PPTX
Map Reduce
PPTX
Kdd process
PPT
01 Data Mining: Concepts and Techniques, 2nd ed.
PPT
Data preprocessing in Data Mining
PPTX
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
PDF
Big data Analytics
PPTX
Data quality and data profiling
PPTX
Data Streaming in Big Data Analysis
PPT
Hadoop Map Reduce
PPTX
Text MIning
Big Data Open Source Technologies
Data preprocessing
Lecture6 introduction to data streams
Big Data - The 5 Vs Everyone Must Know
Overview of Big data(ppt)
Chapter 1 big data
Hadoop File system (HDFS)
Frequent itemset mining methods
Map Reduce
Kdd process
01 Data Mining: Concepts and Techniques, 2nd ed.
Data preprocessing in Data Mining
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
Big data Analytics
Data quality and data profiling
Data Streaming in Big Data Analysis
Hadoop Map Reduce
Text MIning
Ad

Similar to Big Data: Its Characteristics And Architecture Capabilities (20)

PDF
Big data - what, why, where, when and how
PPTX
unit1 big data analysis description and defenition .pptx
PPTX
Big Data Practice_Planning_steps_RK
PPTX
What is big data
PPTX
Big-Data-Seminar-6-Aug-2014-Koenig
PPTX
Data mining with big data implementation
PPTX
A Big Data Concept
PDF
Oea big-data-guide-1522052
PDF
Oea big-data-guide-1522052
PPT
Big data
PPTX
Fundamentals of Big Data
PPTX
An Overview of BigData
PPTX
Introduction to Big Data
PPTX
Big Data and Hadoop
PDF
Big Data Analytics
PPT
big data
PDF
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
PPTX
20211011112936_PPT01-Introduction to Big Data.pptx
PDF
Big data and analytics
Big data - what, why, where, when and how
unit1 big data analysis description and defenition .pptx
Big Data Practice_Planning_steps_RK
What is big data
Big-Data-Seminar-6-Aug-2014-Koenig
Data mining with big data implementation
A Big Data Concept
Oea big-data-guide-1522052
Oea big-data-guide-1522052
Big data
Fundamentals of Big Data
An Overview of BigData
Introduction to Big Data
Big Data and Hadoop
Big Data Analytics
big data
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
How Big Data ,Cloud Computing ,Data Science can help business
20211011112936_PPT01-Introduction to Big Data.pptx
Big data and analytics
Ad

More from Ashraf Uddin (7)

PDF
A short tutorial on r
PDF
MapReduce: Simplified Data Processing on Large Clusters
PPTX
Text Mining Infrastructure in R
PPTX
Software piracy
PPTX
Naive bayes
PPT
Freenet
PPTX
Dynamic source routing
A short tutorial on r
MapReduce: Simplified Data Processing on Large Clusters
Text Mining Infrastructure in R
Software piracy
Naive bayes
Freenet
Dynamic source routing

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Classroom Observation Tools for Teachers
PPTX
Pharma ospi slides which help in ospi learning
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Business Ethics Teaching Materials for college
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Insiders guide to clinical Medicine.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
Anesthesia in Laparoscopic Surgery in India
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Types and Its function , kingdom of life
human mycosis Human fungal infections are called human mycosis..pptx
Final Presentation General Medicine 03-08-2024.pptx
TR - Agricultural Crops Production NC III.pdf
Basic Mud Logging Guide for educational purpose
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Business Ethics Teaching Materials for college

Big Data: Its Characteristics And Architecture Capabilities

  • 1. Big Data: Its Characteristics And Architecture Capabilities By Ashraf Uddin South Asian University (http://ashrafsau.blogspot.in/)
  • 2. What is Big Data? Big data refers to large datasets that are challenging to store, search, share, visualize, and analyze. “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it…
  • 3. The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data
  • 4. Do we really need Big Data? For consumer :  Better understanding of own behavior  Integration of activities  Influence – involvement and recognition For companies :  Real behavior-- what do people do, and what do they value?  Faster interaction  Better targeted offers  Customer understanding
  • 5. Characteristics of Big Data 1. Volume (Scale) 2. Velocity (Speed) 3. Varity (Complexity)
  • 7. Velocity • Data is being generated fast and need to be processed fast • Online Data Analytics • Late Decision leads missing opportunity
  • 8. Varity • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data • To extract knowledge all these types of data need to linked together
  • 9. Generation of Big Data Scientific instruments (collecting all sorts of data) Social media and networks (all of us are generating data) Sensor technology and networks (measuring all kinds of data)
  • 10. Why Big Data is Different? For example, an airline jet collects 10 terabytes of sensor data for every 30 minutes of flying time. Compare that with conventional high performance computing where New York Stock Exchange collects 1 terabyte of structured trading data per day. Conventional corporate structured data sized in terabytes and petabytes. Big Data is sized in peta-, exa-, and soon perhaps, zetta-bytes!
  • 11. Why Big Data is Different? The unique characteristics of Big Data is the manner in which value is discovered. In conventional BI, the simple summing of a known value reveals a result In Big Data, the value is discovered through a refining modeling process: make a hypothesis create statistical, visual, or semantic models validate, then make a new hypothesis.
  • 12. Use cases for Big Data Analytics
  • 13. A Big Data Use Case: Personalized Insurance Premium an insurance company wants to offer to those who are unlikely to make a claim, thereby optimizing their profits. One way to approach this problem is to collect more detailed data about an individual's driving habits and then assess their risk. to collect data on driving habits utilizing sensors in their customers' cars to capture driving data, such as routes driven, miles driven, time of day, and braking abruptness.
  • 14. A Big Data Use Case: Personalized Insurance Premium This data is used to assess driver risk; they compare individual driving patterns with other statistical information, such as average miles driven in same state, and peak hours of drivers on the road. Driver risk plus actuarial information is then correlated with policy and profile information to offer a competitive and more profitable rate for the company The result A personalized insurance plan. These unique capabilities, delivered from big data analytics, are revolutionizing the insurance industry.
  • 15. A Big Data Use Case: Personalized Insurance Premium To accomplish this task: a great amount of continuous data must be collected, stored, and correlated. Hadoop is an excellent choice for acquisition and reduction of the automobile sensor data. Master data and certain reference data including customer profile information are likely to be stored in the existing DBMS systems a NoSQL database can be used to capture and store reference data that are more dynamic, diverse in formats, and change frequently.
  • 17. Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • 18. Storage and Management Capability Hadoop (HDFS) Distributed File System  highly scalable storage and automatic data replication across three nodes for fault tolerance Cloudera Manager  gives a cluster-wide, real-time view of nodes and services running; provides a single, central place to enact configuration changes across the cluster
  • 19. Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • 20. Database Capability Oracle NoSQL  Dynamic and flexible schema design  High performance key value pair database. Apache HBase  Strictly consistent reads and writes  Allows random, real time read/write access Apache Cassandra  Fault tolerance capability is designed for every node  Data model offers column indexes with the performance of log-structured updates, materialized views, and built-in caching Apache Hive  Tools to enable easy data extract/transform/load (ETL)  Query execution via MapReduce
  • 21. Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • 22. Processing Capability MapReduce  Break problem up into smaller sub-problems  Able to distribute data workloads across thousands of nodes Apache Hadoop  Leading MapReduce implementation  Highly scalable parallel batch processing  Writes multiple copies across cluster for fault tolerance
  • 23. Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • 24. Data Integration Capability Exports MapReduce results Hadoop, and other targets to RDBMS, Connects Hadoop to relational databases for SQL processing Optimized processing import/export with parallel data
  • 25. Big Data Architecture Capabilities Storage and Management Capability Database Capability Processing Capability Data Integration Capability Statistical Analysis Capability
  • 26. Statistical Analysis Capability Programming analysis language for statistical Oracle R Enterprise allows reuse pre-existing R scripts with no modification of
  • 27. Big Data Architecture Traditional Information Architecture Capability Big Data Information Architecture Capability
  • 28. Conclusion Today’s economic environment demands that business be driven by useful, accurate, and timely information. the world of Big Data is a solution to the problem. there are always business and IT tradeoffs to get to data and information in a most cost-effective way.
  • 29. References 1. Big Data Analytics Guide: Better technology, more insight for the next generation of business applications, SAP 2. Oracle Information Guide to Big Data Architecture: An Architect’s 3. http:// www.csc.com/insights/flxwd/78931-big_data_univers e_beginning_to_explode 4. http:// www.techrepublic.com/blog/big-data-analytics/10-em erging-technologies-for-big-data/280 5. http://www.idc.com/ 6. From Database to Big Data. Sam Madden (MIT)