SlideShare a Scribd company logo
Big Data – Srinath & Arjun
Big Data – Srinath & Arjun
• The BIG-DATA
• Hadoop
• Hadoop Components
• Hadoop Eco Systems
2
Agenda
Big Data – Srinath & Arjun
The BIG-DATA
Big Data – Srinath & Arjun 4
The Context
• Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009)
• Google collects 270PB data in a month (2007), 20000PB a day (2008)
• 2010 census data is expected to be a huge gold mine of information
• Data mining huge amounts of data collected in a wide range of domains
from astronomy to healthcare has become essential for planning and
performance.
Big Data – Srinath & Arjun
• We are in a knowledge economy.
– Data is an important asset to any organization
– Discovery of knowledge; Enabling discovery; annotation of data
• We are looking at newer
– programming models, and
– Supporting algorithms and data structures.
The Context
Big Data – Srinath & Arjun
• Big Data is New
• Big Data is only about Massive Data Volume
• Big data means Hadoop
• Big data need a Data Warehouse
• Big data means Unstructured Data
• Big data is for Social Media and Data mining Analyses
6
The Myth about Big Data
Big Data – Srinath & Arjun
It is all about better analytic on a broader spectrum of data, and
therefore represents an opportunity to create even more differentiation
among industries.
7
Big Data is…
Big Data – Srinath & Arjun
Where Data is coming….?
12+ TBs
of tweet data
every day
25+ TBs
of
log data
every day
?TBsof
dataevery
day
2+
billion
people
on the
Web by
end 2011
30 billion RFID
tags today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions
of GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
Big Data – Srinath & Arjun
Facebook
• 4.5 billion Facebook likes every day
• 350 million photos uploaded on a daily basis
• 250 billion photos stored by Facebook
• 10 billion messages sent everyday
• 1 trillion posts in Facebook’s graph search database
• 500 TB of data processed daily
• 100 PB of data stored in Facebok’s Hadoop disk cluster (1PB=1000TB=1000000
GB)
Example of Big Data Generation
Big Data – Srinath & Arjun
Flights
• 1 Boeing plane engine generates 20TB of data for every hour of flying
• How much data do all the flights in this world generate every year if
there are 100000 two engine flights daily?
Example of Big Data Generation
Big Data – Srinath & Arjun
• Black Box Data
• Social Media Data
• Stock Exchange Data
• Power Grid Data
• Transport Data
• Search Engine Data
What comes under Big data?
Big Data – Srinath & Arjun
• Capturing Data
• Storage
• Searching
• Sharing
• Transfer
• Analysis
• Presentation
Big Data Challenges
Big Data – Srinath & Arjun
Characteristics of Big Data
Volume
of Tweets
create daily.
12+terabytes
Variety
of different
types of data.
100’s
Veracity
decision makers trust
their information.
Only 1 in 3
trade events
per second.
5+million
Velocity
Big Data – Srinath & Arjun
• Structured data : Relational Data
• Semi Structured data : XML data
• Unstructured Data : Word, PDF, Text, Media Logs
Types of Data
Big Data – Srinath & Arjun
The Data Explosion
• 2.5 quintillion bytes of data created each year
• 90 % of data in the world was created in the last two years
Big Data – Srinath & Arjun
Hadoop
Big Data – Srinath & Arjun
Hadoop
• Open Source Software Framework
• Inspired by Google’s Map – Reduce Programming Model (GFS)
• Originally written for the Nutch search engine project
• Written in java
• Efficiently processes large volumes of Data
• Breaks up Big data into multiple parts
• Two key parts
• HDFS
• MapReduce
Big Data – Srinath & Arjun
History of Hadoop
Big Data – Srinath & Arjun
Hadoop Architecture
Big Data – Srinath & Arjun
Hadoop Components
Big Data – Srinath & Arjun
HDFS – Hadoop Distributed File System
• It’s a file system designed for storing very large files running on cluster of
commodity hardware
• High fault tolerance, Distributed, Reliable, Scalable file system for Data
Storage
• Stores multiple copies of data on different nodes. (default 64MB)
• Typically has a single namenode and no.of datanodes to form the HDFS
cluster
Big Data – Srinath & Arjun
HDFS Architecture
• Two types of Nodes
 Master or Namenode
 Slave or Datanode
Big Data – Srinath & Arjun
HDFS Architecture
Big Data – Srinath & Arjun
Read a File
Big Data – Srinath & Arjun
Write a File
Big Data – Srinath & Arjun
Hadoop Cluster Modes
• Standalone Mode
• Pseudo-Distributed Mode
• Fully-Distributed Mode
Big Data – Srinath & Arjun
MapReduce
Programming Model designed for processing large volumes of data in
parallel by dividing the work into a set of independent tasks
Big Data – Srinath & Arjun
Terminology
• Job
• Task
• Task Attempt
• NameNode
• MasterNode
• SlaveNode
• Clusters
• Commodity Hardware
Big Data – Srinath & Arjun
Components
• Master Nodes
• Slave Nodes
Big Data – Srinath & Arjun
Workflow
Big Data – Srinath & Arjun
Example
Big Data – Srinath & Arjun
Closer Look
Big Data – Srinath & Arjun
Input Formats
• Text Input Format
• Sequential input format
• Key value text input format
Big Data – Srinath & Arjun
NoSQL
• NoSQL mean “not only SQL”
• This includes key value stores, document-oriented databases, graph
databases, big datable structures, and caching data stores
Eg. MongoDB, Cassandra
Big Data – Srinath & Arjun
Hadoop ECO Systems
Big Data – Srinath & Arjun
What is HIVE?
• Data Warehousing Infrastructure
• Data Summarization, ad-hoc querying and analysis of large
volumes of data
Big Data – Srinath & Arjun
HiveQL
• HiveQL is the Hive query language.
• Hive doesn’t support transactions.
Big Data – Srinath & Arjun
Hive Application
• Log Processing
• Text Mining
• Document indexing
• Customer – facing Business intelligence (eg. Google Analytics)
• Predictive modelling, hypothesis testing
Big Data – Srinath & Arjun
Thank You….

More Related Content

PPTX
Big data&hadoop
PPTX
Introduction to Big Data
PPTX
Bigdata
PPT
Overview of Bigdata Analytics
PPTX
Gail Zhou on "Big Data Technology, Strategy, and Applications"
PPTX
BigData
PPTX
Introduction to Big Data and Hadoop
PPT
Big data and Internet
Big data&hadoop
Introduction to Big Data
Bigdata
Overview of Bigdata Analytics
Gail Zhou on "Big Data Technology, Strategy, and Applications"
BigData
Introduction to Big Data and Hadoop
Big data and Internet

What's hot (20)

PPTX
Intro to big data and how it works
PPTX
View on big data technologies
PPTX
Big Data Visualisation with Hadoop and PowerPivot
PPTX
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
PPTX
Presentation at Google Day on Big Data
PPTX
A brief history of "big data"
PPTX
Mining Big Data in Real Time
PDF
متن‌بازسازی کلان‌داده
PPTX
A Big Data Timeline
PPTX
Class 1 - Introduction to Big data.pptx
PPTX
A novel approach to big data veracity using crowd-sourcing techniques
PDF
Big data – An Introduction, July 2013
PPTX
What is big data?
PDF
Topic 1: Big Data and Warehouse-scale Computing
PDF
Analysis of big data in pandemic case
PPTX
PPTX
Hadoop Training Tutorial for Freshers
PPTX
Overview of bigdata
PPTX
A Brief History Of Data
PPTX
Big Data
Intro to big data and how it works
View on big data technologies
Big Data Visualisation with Hadoop and PowerPivot
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Presentation at Google Day on Big Data
A brief history of "big data"
Mining Big Data in Real Time
متن‌بازسازی کلان‌داده
A Big Data Timeline
Class 1 - Introduction to Big data.pptx
A novel approach to big data veracity using crowd-sourcing techniques
Big data – An Introduction, July 2013
What is big data?
Topic 1: Big Data and Warehouse-scale Computing
Analysis of big data in pandemic case
Hadoop Training Tutorial for Freshers
Overview of bigdata
A Brief History Of Data
Big Data
Ad

Similar to Big data, Hadoop and Hive (20)

PPT
Data analytics & its Trends
PPTX
Bigdata and Hadoop with applications
PPTX
Introduction to big data
PDF
Big data and hadoop
PPTX
BigData.pptx
PDF
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
PPTX
Data analytics introduction
DOCX
Big data lecture notes
PDF
Big Data Analytics Introduction chapter.pdf
PPTX
Kartikey tripathi
PPTX
An Overview of BigData
PDF
Bigdatappt 140225061440-phpapp01
PPTX
ppt final.pptx
PPTX
Big data with hadoop
PDF
Big data-analytics-cpe8035
PPTX
Big data
PPTX
Big data
PPTX
Big data ppt
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
Data analytics & its Trends
Bigdata and Hadoop with applications
Introduction to big data
Big data and hadoop
BigData.pptx
How Big Data ,Cloud Computing ,Data Science can help business
Data analytics introduction
Big data lecture notes
Big Data Analytics Introduction chapter.pdf
Kartikey tripathi
An Overview of BigData
Bigdatappt 140225061440-phpapp01
ppt final.pptx
Big data with hadoop
Big data-analytics-cpe8035
Big data
Big data
Big data ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
A comparative analysis of optical character recognition models for extracting...
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Teaching material agriculture food technology
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25-Week II
MYSQL Presentation for SQL database connectivity
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Building Integrated photovoltaic BIPV_UPV.pdf

Big data, Hadoop and Hive

  • 1. Big Data – Srinath & Arjun
  • 2. Big Data – Srinath & Arjun • The BIG-DATA • Hadoop • Hadoop Components • Hadoop Eco Systems 2 Agenda
  • 3. Big Data – Srinath & Arjun The BIG-DATA
  • 4. Big Data – Srinath & Arjun 4 The Context • Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) • Google collects 270PB data in a month (2007), 20000PB a day (2008) • 2010 census data is expected to be a huge gold mine of information • Data mining huge amounts of data collected in a wide range of domains from astronomy to healthcare has become essential for planning and performance.
  • 5. Big Data – Srinath & Arjun • We are in a knowledge economy. – Data is an important asset to any organization – Discovery of knowledge; Enabling discovery; annotation of data • We are looking at newer – programming models, and – Supporting algorithms and data structures. The Context
  • 6. Big Data – Srinath & Arjun • Big Data is New • Big Data is only about Massive Data Volume • Big data means Hadoop • Big data need a Data Warehouse • Big data means Unstructured Data • Big data is for Social Media and Data mining Analyses 6 The Myth about Big Data
  • 7. Big Data – Srinath & Arjun It is all about better analytic on a broader spectrum of data, and therefore represents an opportunity to create even more differentiation among industries. 7 Big Data is…
  • 8. Big Data – Srinath & Arjun Where Data is coming….? 12+ TBs of tweet data every day 25+ TBs of log data every day ?TBsof dataevery day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  • 9. Big Data – Srinath & Arjun Facebook • 4.5 billion Facebook likes every day • 350 million photos uploaded on a daily basis • 250 billion photos stored by Facebook • 10 billion messages sent everyday • 1 trillion posts in Facebook’s graph search database • 500 TB of data processed daily • 100 PB of data stored in Facebok’s Hadoop disk cluster (1PB=1000TB=1000000 GB) Example of Big Data Generation
  • 10. Big Data – Srinath & Arjun Flights • 1 Boeing plane engine generates 20TB of data for every hour of flying • How much data do all the flights in this world generate every year if there are 100000 two engine flights daily? Example of Big Data Generation
  • 11. Big Data – Srinath & Arjun • Black Box Data • Social Media Data • Stock Exchange Data • Power Grid Data • Transport Data • Search Engine Data What comes under Big data?
  • 12. Big Data – Srinath & Arjun • Capturing Data • Storage • Searching • Sharing • Transfer • Analysis • Presentation Big Data Challenges
  • 13. Big Data – Srinath & Arjun Characteristics of Big Data Volume of Tweets create daily. 12+terabytes Variety of different types of data. 100’s Veracity decision makers trust their information. Only 1 in 3 trade events per second. 5+million Velocity
  • 14. Big Data – Srinath & Arjun • Structured data : Relational Data • Semi Structured data : XML data • Unstructured Data : Word, PDF, Text, Media Logs Types of Data
  • 15. Big Data – Srinath & Arjun The Data Explosion • 2.5 quintillion bytes of data created each year • 90 % of data in the world was created in the last two years
  • 16. Big Data – Srinath & Arjun Hadoop
  • 17. Big Data – Srinath & Arjun Hadoop • Open Source Software Framework • Inspired by Google’s Map – Reduce Programming Model (GFS) • Originally written for the Nutch search engine project • Written in java • Efficiently processes large volumes of Data • Breaks up Big data into multiple parts • Two key parts • HDFS • MapReduce
  • 18. Big Data – Srinath & Arjun History of Hadoop
  • 19. Big Data – Srinath & Arjun Hadoop Architecture
  • 20. Big Data – Srinath & Arjun Hadoop Components
  • 21. Big Data – Srinath & Arjun HDFS – Hadoop Distributed File System • It’s a file system designed for storing very large files running on cluster of commodity hardware • High fault tolerance, Distributed, Reliable, Scalable file system for Data Storage • Stores multiple copies of data on different nodes. (default 64MB) • Typically has a single namenode and no.of datanodes to form the HDFS cluster
  • 22. Big Data – Srinath & Arjun HDFS Architecture • Two types of Nodes  Master or Namenode  Slave or Datanode
  • 23. Big Data – Srinath & Arjun HDFS Architecture
  • 24. Big Data – Srinath & Arjun Read a File
  • 25. Big Data – Srinath & Arjun Write a File
  • 26. Big Data – Srinath & Arjun Hadoop Cluster Modes • Standalone Mode • Pseudo-Distributed Mode • Fully-Distributed Mode
  • 27. Big Data – Srinath & Arjun MapReduce Programming Model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks
  • 28. Big Data – Srinath & Arjun Terminology • Job • Task • Task Attempt • NameNode • MasterNode • SlaveNode • Clusters • Commodity Hardware
  • 29. Big Data – Srinath & Arjun Components • Master Nodes • Slave Nodes
  • 30. Big Data – Srinath & Arjun Workflow
  • 31. Big Data – Srinath & Arjun Example
  • 32. Big Data – Srinath & Arjun Closer Look
  • 33. Big Data – Srinath & Arjun Input Formats • Text Input Format • Sequential input format • Key value text input format
  • 34. Big Data – Srinath & Arjun NoSQL • NoSQL mean “not only SQL” • This includes key value stores, document-oriented databases, graph databases, big datable structures, and caching data stores Eg. MongoDB, Cassandra
  • 35. Big Data – Srinath & Arjun Hadoop ECO Systems
  • 36. Big Data – Srinath & Arjun What is HIVE? • Data Warehousing Infrastructure • Data Summarization, ad-hoc querying and analysis of large volumes of data
  • 37. Big Data – Srinath & Arjun HiveQL • HiveQL is the Hive query language. • Hive doesn’t support transactions.
  • 38. Big Data – Srinath & Arjun Hive Application • Log Processing • Text Mining • Document indexing • Customer – facing Business intelligence (eg. Google Analytics) • Predictive modelling, hypothesis testing
  • 39. Big Data – Srinath & Arjun Thank You….