SlideShare a Scribd company logo
Big Data
Trịnh Phong Nhã
Võ Hoàng Trôvi
Võ Đình Chinh
GVGD: TS. Nguyễn Đức Thái
Memory storage…
Computer Memory: 640K Ought to
be Enough for Anyone
How much data?
7 billion people
Google processes 100 PB/day; 3 million servers
Facebook has 300 PB + 500 TB/day; 35% of
world’s photos
YouTube 1000 PB video storage; 4 billion
views/day
Twitter processes 124 billion tweets/year
SMS messages – 6.1T per year
US Cell Calls – 2.2T minutes per year
US Credit cards - 1.4B Cards; 20B
transactions/year
3
Contents
4. Big Data Security
3. SQL vs NoSQL
2. Big Data Technology Today
1. Big Data Overview
5. Big data trends
6. Demo with MongoDB & Ref docs
1. Big Data Overview (tt)
“Big data is not a single technology
but a combination of old and new
tech-nologies that helps companies
gain actionable insight”.
(“Big Data For DummiesPublished by John Wiley & Sons,
Inc. ” book reference)
1. Big Data Overview (tt)
Characteristics of Big Data
Sources of Big Data
ERP
RFID
Website
Network Switches
Social Media
Examining Big Data Types
Structured Data
Structured Data(…)
Computer- or machine-generated:
Machine-generated data generally
refers to data that is created by a
machine without human intervention.
(Sensor data, Web log data, Point-of-
sale data, Financial data…)
Human-generated: This is data that
humans, in interaction with
computers, supply (Input data, Click-
stream data, Gaming-related data…)
Examining Big Data Types
Unstructured Data
Unstructured Data(…)
Unstructured data is everywhere
Machine-generated unstructured
data: Satellite images, Scientific
data, Photographs and video, Radar
or sonar data…
 Human-generated unstructured
data:Text internal to your company,
Social media data, Mobile data…
Managing different data types
Managing different data types
Integrating data types into a big data
environment need:
Connectors: enable you to pull data
in from various big data sources
Metadata is the definitions,
mappings, and other characteristics
used to describe how to find, access,
and use a company’s data (and
software) components
Analysis
• Querying
• Statistic
• Modeling
• Data Mining
• Text analytics
Analysis &
Processing
Processing
• Data storage
• Data transfer
• Data monitoring
What will we do with Big Data?
Quiz….?
How to store and
handle Big Data?
2. Big Data Technology Today
Storage…NoSQL Database
2.Big Data Technology Today(tt)
Processing
2.Big Data Technology Today(tt)
 The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters of
computers using simple programming models.
2.Big Data Technology Today(tt)
Instead of treating
memory as a cache,
why not treat it as a
primary data store?
 Facebook keeps 80% of its
data in Memory (Stanford
research)
 RAM is 100-1000x faster
than Disk (Random seek)
• Disk - 5 -10ms
• RAM – x0.001msec
20
Events
FACEBOOK
FACEBOOK
FACEBOOK
Memory Grid
Data Grid
Data Grid
Data Grid
2.Big Data Technology Today(tt)
Transfer data:
2.Big Data Technology Today(tt)
Open-source software framework from
Apache Hadoop
 Google MapReduce
 GFS (Google File System)
 HDFS
 Map/Reduce
3. SQL vs NoSQL
Data
storage
File
SQL
DBMS
NoSQL
3. SQL vs NoSQL (…)
A relational database is a set of tables
containing data fitted into predefined
categories.
Each table contains one or more data
categories in columns.
Each row contains a unique instance of
data for the categories defined by the
columns.
3. SQL vs NoSQL (…)
Key-value stores. As the name implies, a
key-value store is a system that stores
values indexed for retrieval by keys.
Some of the market
leaders:
Riak
Amazon Dynamo
Voldermort
3. SQL vs NoSQL (…)
Column-oriented databases. column-
oriented databases contain one extendable
column of closely related data
Some of the market
leaders:
HBase
Cassandra
3. SQL vs NoSQL (…)
Document-based stores. These databases
store and organize data as collections of
documents, rather than as structured tables
with uniform sized fields for each record
Some of the
market
leaders:
MongoDB
CouchDB
SimpleDB
3. SQL vs NoSQL (…)
SQL 2008 Data
storage capacity
3. SQL vs NoSQL (…)
GridFS stores files in two
collections:
 chunks stores the binary chunks. For
details, see The chunks Collection.
 files stores the file’s metadata. For
details, see The files Collection.
3. SQL vs NoSQL (…)
BSON Types
The chunks Collection
The files Collection
3. SQL vs NoSQL (…)
4. Big Data Security
• Secure computations in distributed
programming frameworks
• Security best practices for non-relational
data stores
• Secure data storage and transactions logs
• Cryptographically enforced access control
and secure communication
• Granular access control
• Real-time security/compliance monitoring
4. Big Data Security (…)
Technical Recommendations for
sercurity
• Use Kerberos for node authentication
• Use file layer encryption
• Data anonymization
• Use key management
• Deployment validation
• Use secure communication
• Tokenization
• Cloud database controls
5. Big data trends
• Big data – of the people, by the
people, for the people
• Big data and social computing
• Cloud computing
• In memmory computing
• Mobile Applications and HTML5
• Internet and big data
6. Demo with MongoDB & Ref docs
Ref docs:
 Judith Hurwitz, Alan Nugent, Dr. Fern Halper,
and Marcia Kaufman: Big Data For Dummies.
John Wiley & Sons, Inc. 2013.
 “Technology Trends for 2013” prepared by
Kaushal Amin, Chief Technology Officer, KMS
Technology – Atlanta, GA, USA
 Website: http://hadoop.apache.org/
Demo with MongoDB
Big data presentation

More Related Content

PDF
MongoDB: The Operational Big Data by NORBERTO LEITE at Big Data Spain 2014
PPTX
MongoDB
PPTX
Introduction to Bigdata and NoSQL
PDF
Developing hybrid applications with informix
PDF
Big data - what, why, where, when and how
PPTX
Service generated big data and big data-as-a-service
PDF
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
PPTX
Big Data vs Data Warehousing
MongoDB: The Operational Big Data by NORBERTO LEITE at Big Data Spain 2014
MongoDB
Introduction to Bigdata and NoSQL
Developing hybrid applications with informix
Big data - what, why, where, when and how
Service generated big data and big data-as-a-service
Denodo Data Virtualization Platform: Overview (session 1 from Architect to Ar...
Big Data vs Data Warehousing

What's hot (20)

PPTX
PPTX
Structuring Big Data
PPTX
Big data-ppt
PPTX
Technical Demonstration - Denodo Platform 7.0
PPTX
Introduction to Big Data
PDF
Future of Data Strategy
PPT
Overview of Bigdata Analytics
PDF
Sina Sohangir Presentation on IWMC 2015
PPTX
Big Data Analytics MIS presentation
PDF
Data virtualization an introduction
PPTX
Take your Data Management Practice to the Next Level with Denodo 7
PPTX
Big Data Overview 2013-2014
PPT
Big data Analytics
PDF
Data Virtualization: From Zero to Hero
PPTX
Big data unit 2
PDF
Creating a Modern Data Architecture for Digital Transformation
PPTX
Enterprise Architecture in the Era of Big Data and Quantum Computing
PPTX
big data overview ppt
PPTX
Big data ppt
PPTX
Chapter 1 big data
Structuring Big Data
Big data-ppt
Technical Demonstration - Denodo Platform 7.0
Introduction to Big Data
Future of Data Strategy
Overview of Bigdata Analytics
Sina Sohangir Presentation on IWMC 2015
Big Data Analytics MIS presentation
Data virtualization an introduction
Take your Data Management Practice to the Next Level with Denodo 7
Big Data Overview 2013-2014
Big data Analytics
Data Virtualization: From Zero to Hero
Big data unit 2
Creating a Modern Data Architecture for Digital Transformation
Enterprise Architecture in the Era of Big Data and Quantum Computing
big data overview ppt
Big data ppt
Chapter 1 big data
Ad

Similar to Big data presentation (20)

PPT
Data Warehouse and Data Mining
PPT
Data Mining and Data Warehousing
PDF
INF2190_W1_2016_public
PPTX
Overview of Big Data by Sunny
PPT
Data mining Introduction
PDF
Concepts, use cases and principles to build big data systems (1)
PDF
Introduction Big Data
PPTX
Big Data Session 1.pptx
PPTX
lec1_Unit 1_rev.pptx_big data aanalytics
PPTX
Introduction to Big Data
PPT
Big data & hadoop framework
PDF
A Review Paper on Big Data and Hadoop for Data Science
PDF
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
PDF
A Gentle Introduction to Big Data
PPT
Hadoop HDFS.ppt
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PPT
PUC Masterclass Big Data
PDF
Data Mining: Future Trends and Applications
PPT
NoSQL Basics - a quick tour
PPTX
Big Data Analytics: Finding diamonds in the rough with Azure
Data Warehouse and Data Mining
Data Mining and Data Warehousing
INF2190_W1_2016_public
Overview of Big Data by Sunny
Data mining Introduction
Concepts, use cases and principles to build big data systems (1)
Introduction Big Data
Big Data Session 1.pptx
lec1_Unit 1_rev.pptx_big data aanalytics
Introduction to Big Data
Big data & hadoop framework
A Review Paper on Big Data and Hadoop for Data Science
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
A Gentle Introduction to Big Data
Hadoop HDFS.ppt
Introduction to Cloud computing and Big Data-Hadoop
PUC Masterclass Big Data
Data Mining: Future Trends and Applications
NoSQL Basics - a quick tour
Big Data Analytics: Finding diamonds in the rough with Azure
Ad

More from Chinh Vo Wili (6)

PPTX
BYOD -Bring your own device
DOCX
Tieu luan triet hoc - Phan tich tu tuong nhan sinh quan trong mot so đieu ra...
DOCX
De cuong on thi mon triet hoc
DOCX
File bao cao Wifi Robot
PPT
Thesis final - Wifi Robot
PPT
Atmel avr
BYOD -Bring your own device
Tieu luan triet hoc - Phan tich tu tuong nhan sinh quan trong mot so đieu ra...
De cuong on thi mon triet hoc
File bao cao Wifi Robot
Thesis final - Wifi Robot
Atmel avr

Recently uploaded (20)

PDF
AI in Product Development-omnex systems
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Introduction to Artificial Intelligence
PDF
System and Network Administration Chapter 2
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
System and Network Administraation Chapter 3
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
L1 - Introduction to python Backend.pptx
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
AI in Product Development-omnex systems
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Design an Analysis of Algorithms I-SECS-1021-03
Introduction to Artificial Intelligence
System and Network Administration Chapter 2
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Essential Infomation Tech presentation.pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
CHAPTER 2 - PM Management and IT Context
Reimagine Home Health with the Power of Agentic AI​
2025 Textile ERP Trends: SAP, Odoo & Oracle
Operating system designcfffgfgggggggvggggggggg
System and Network Administraation Chapter 3
wealthsignaloriginal-com-DS-text-... (1).pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
L1 - Introduction to python Backend.pptx
How to Migrate SBCGlobal Email to Yahoo Easily

Big data presentation

  • 1. Big Data Trịnh Phong Nhã Võ Hoàng Trôvi Võ Đình Chinh GVGD: TS. Nguyễn Đức Thái
  • 2. Memory storage… Computer Memory: 640K Ought to be Enough for Anyone
  • 3. How much data? 7 billion people Google processes 100 PB/day; 3 million servers Facebook has 300 PB + 500 TB/day; 35% of world’s photos YouTube 1000 PB video storage; 4 billion views/day Twitter processes 124 billion tweets/year SMS messages – 6.1T per year US Cell Calls – 2.2T minutes per year US Credit cards - 1.4B Cards; 20B transactions/year 3
  • 4. Contents 4. Big Data Security 3. SQL vs NoSQL 2. Big Data Technology Today 1. Big Data Overview 5. Big data trends 6. Demo with MongoDB & Ref docs
  • 5. 1. Big Data Overview (tt) “Big data is not a single technology but a combination of old and new tech-nologies that helps companies gain actionable insight”. (“Big Data For DummiesPublished by John Wiley & Sons, Inc. ” book reference)
  • 6. 1. Big Data Overview (tt)
  • 8. Sources of Big Data ERP RFID Website Network Switches Social Media
  • 9. Examining Big Data Types Structured Data
  • 10. Structured Data(…) Computer- or machine-generated: Machine-generated data generally refers to data that is created by a machine without human intervention. (Sensor data, Web log data, Point-of- sale data, Financial data…) Human-generated: This is data that humans, in interaction with computers, supply (Input data, Click- stream data, Gaming-related data…)
  • 11. Examining Big Data Types Unstructured Data
  • 12. Unstructured Data(…) Unstructured data is everywhere Machine-generated unstructured data: Satellite images, Scientific data, Photographs and video, Radar or sonar data…  Human-generated unstructured data:Text internal to your company, Social media data, Mobile data…
  • 14. Managing different data types Integrating data types into a big data environment need: Connectors: enable you to pull data in from various big data sources Metadata is the definitions, mappings, and other characteristics used to describe how to find, access, and use a company’s data (and software) components
  • 15. Analysis • Querying • Statistic • Modeling • Data Mining • Text analytics Analysis & Processing Processing • Data storage • Data transfer • Data monitoring What will we do with Big Data?
  • 16. Quiz….? How to store and handle Big Data?
  • 17. 2. Big Data Technology Today Storage…NoSQL Database
  • 18. 2.Big Data Technology Today(tt) Processing
  • 19. 2.Big Data Technology Today(tt)  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
  • 20. 2.Big Data Technology Today(tt) Instead of treating memory as a cache, why not treat it as a primary data store?  Facebook keeps 80% of its data in Memory (Stanford research)  RAM is 100-1000x faster than Disk (Random seek) • Disk - 5 -10ms • RAM – x0.001msec 20 Events FACEBOOK FACEBOOK FACEBOOK Memory Grid Data Grid Data Grid Data Grid
  • 21. 2.Big Data Technology Today(tt) Transfer data:
  • 22. 2.Big Data Technology Today(tt) Open-source software framework from Apache Hadoop  Google MapReduce  GFS (Google File System)  HDFS  Map/Reduce
  • 23. 3. SQL vs NoSQL Data storage File SQL DBMS NoSQL
  • 24. 3. SQL vs NoSQL (…) A relational database is a set of tables containing data fitted into predefined categories. Each table contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns.
  • 25. 3. SQL vs NoSQL (…) Key-value stores. As the name implies, a key-value store is a system that stores values indexed for retrieval by keys. Some of the market leaders: Riak Amazon Dynamo Voldermort
  • 26. 3. SQL vs NoSQL (…) Column-oriented databases. column- oriented databases contain one extendable column of closely related data Some of the market leaders: HBase Cassandra
  • 27. 3. SQL vs NoSQL (…) Document-based stores. These databases store and organize data as collections of documents, rather than as structured tables with uniform sized fields for each record Some of the market leaders: MongoDB CouchDB SimpleDB
  • 28. 3. SQL vs NoSQL (…) SQL 2008 Data storage capacity
  • 29. 3. SQL vs NoSQL (…) GridFS stores files in two collections:  chunks stores the binary chunks. For details, see The chunks Collection.  files stores the file’s metadata. For details, see The files Collection.
  • 30. 3. SQL vs NoSQL (…) BSON Types The chunks Collection The files Collection
  • 31. 3. SQL vs NoSQL (…)
  • 32. 4. Big Data Security • Secure computations in distributed programming frameworks • Security best practices for non-relational data stores • Secure data storage and transactions logs • Cryptographically enforced access control and secure communication • Granular access control • Real-time security/compliance monitoring
  • 33. 4. Big Data Security (…) Technical Recommendations for sercurity • Use Kerberos for node authentication • Use file layer encryption • Data anonymization • Use key management • Deployment validation • Use secure communication • Tokenization • Cloud database controls
  • 34. 5. Big data trends • Big data – of the people, by the people, for the people • Big data and social computing • Cloud computing • In memmory computing • Mobile Applications and HTML5 • Internet and big data
  • 35. 6. Demo with MongoDB & Ref docs Ref docs:  Judith Hurwitz, Alan Nugent, Dr. Fern Halper, and Marcia Kaufman: Big Data For Dummies. John Wiley & Sons, Inc. 2013.  “Technology Trends for 2013” prepared by Kaushal Amin, Chief Technology Officer, KMS Technology – Atlanta, GA, USA  Website: http://hadoop.apache.org/ Demo with MongoDB