SlideShare a Scribd company logo
introduction of Big Data
Submitted by:-Harshit
BIG DATA
 Big data is also the simple data but in huge amount is termed as Big data.
 Big data is a term for data sets that are so large or complex that
traditional data processing application software's are inadequate to deal with
them.
 Challenges including capture, storage, analysis, search, sharing, transfer,
visualization, querying, updating and information and privacy.
Volume of Data
Big data:-Outcomes and data Source
Need of Big data
 Over 2.5 Exabyte(2.5 billion gigabytes) of data is generated every day.
 A typical, large stock exchange captures more than 1 TB of data every day.
There are around 5 billion mobile phones (including 1.75 billion smart phones)
in the world.
 A simple stock exchange market exchange more than 1TB of data on the daily
basis.
4V’s BY IBM
 Volume:- Size of the data.
 Velocity:-At what rate data is generating and getting analyzed.
 Variety:-Types of data like .jpg, .mp4, .txt, .xml, etc.
 Veracity:-Data veracity tells up to which point data is precise and tells
uncertainty.
INTRODUCTION OF BIG DATA
Types of Big data
 Structured Data
 Unstructured Data
 Semi-structured Data
Solution
Hadoop
 Hadoop is an open source, Java-based programming framework that supports
the processing and storage of extremely large data sets in a distributed
computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
 The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part which is a MapReduce
programming model.
Comparison of RDBMS and HDFS
Who uses Hadoop
HIVE
 Hive is a data warehouse infrastructure tool to process structured data in
Hadoop(used for structure and semi structured data analysis and processing).
It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.
 Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under the
name Apache Hive. It is used by different companies. For example, Amazon
 Hive is not a relational database.
Hive Architecture and its component
Mapreduce
 MapReduce is a programming model suitable for processing of huge data.
Hadoop is capable of running MapReduce programs written in various
languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in
nature, thus are very useful for performing large-scale data analysis using
multiple machines in the cluster.
 Mapreduce works in two different phase.
1.Map phase
2.Reduce phase.
Working of Mapreduce
Conclusion
 Data is growing day by day and there is only one way to manage such a huge
amount of data and that is BIG DATA.
 Big data software’s:
 Apache Hadoop
 Hive
 Mapreduce
 Scala
 Spark etc.
Thank you

More Related Content

PPTX
Big data computing
PDF
Big data presentation
PPT
Big data
PPTX
A Glimpse of Bigdata - Introduction
PPTX
Hadoop
PPTX
Hadoop Training Tutorial for Freshers
PPT
Big data and hadoop
PPSX
Introduction to Bigdata & Hadoop
Big data computing
Big data presentation
Big data
A Glimpse of Bigdata - Introduction
Hadoop
Hadoop Training Tutorial for Freshers
Big data and hadoop
Introduction to Bigdata & Hadoop

What's hot (20)

PDF
Thinking Outside the Table
PDF
Introduction To Big Data Analytics On Hadoop - SpringPeople
PPTX
Big Data and Hadoop
PPTX
Overview of bigdata
PPTX
Big Data Technology Stack : Nutshell
PPTX
Intro to Big Data Hadoop
PPTX
Big data ppt
PDF
Bigdata and Hadoop Bootcamp
PPTX
Big Data Visualisation with Hadoop and PowerPivot
PPTX
big data and hadoop
PPTX
Introduction of Big data and Hadoop
PDF
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
PPTX
Introduction to bigdata
DOCX
Bigdata & Hadoop
PPTX
Hadoop - A big data initiative
PPTX
Introduction to Big Data
PPTX
Big Data - Part II
PDF
PPTX
Big Data - HDInsight and Power BI
PPTX
Thinking Outside the Table
Introduction To Big Data Analytics On Hadoop - SpringPeople
Big Data and Hadoop
Overview of bigdata
Big Data Technology Stack : Nutshell
Intro to Big Data Hadoop
Big data ppt
Bigdata and Hadoop Bootcamp
Big Data Visualisation with Hadoop and PowerPivot
big data and hadoop
Introduction of Big data and Hadoop
Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis
Introduction to bigdata
Bigdata & Hadoop
Hadoop - A big data initiative
Introduction to Big Data
Big Data - Part II
Big Data - HDInsight and Power BI
Ad

Similar to INTRODUCTION OF BIG DATA (20)

PDF
Big data-analytics-cpe8035
PPT
Data analytics & its Trends
PDF
Hadoop Master Class : A concise overview
PPTX
Big data Presentation
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PPTX
Presentation on BigData by Swapnaja
PPT
Big Data
PPTX
Big Data
PDF
Big data technology
PPT
Hadoop HDFS.ppt
PPTX
Big data
PPTX
Big data
PDF
Lesson 1 introduction to_big_data_and_hadoop.pptx
PPTX
Big Data
PDF
Big data and hadoop
PDF
ODP
Hadoop and Big Data for Absolute Beginners
PDF
big data analytics introduction chapter 1
PPTX
Big data
PPTX
Big data
Big data-analytics-cpe8035
Data analytics & its Trends
Hadoop Master Class : A concise overview
Big data Presentation
Lecture 5 - Big Data and Hadoop Intro.ppt
Presentation on BigData by Swapnaja
Big Data
Big Data
Big data technology
Hadoop HDFS.ppt
Big data
Big data
Lesson 1 introduction to_big_data_and_hadoop.pptx
Big Data
Big data and hadoop
Hadoop and Big Data for Absolute Beginners
big data analytics introduction chapter 1
Big data
Big data
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Well-logging-methods_new................
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Construction Project Organization Group 2.pptx
PDF
Digital Logic Computer Design lecture notes
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPT
Project quality management in manufacturing
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
web development for engineering and engineering
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
bas. eng. economics group 4 presentation 1.pptx
Well-logging-methods_new................
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Construction Project Organization Group 2.pptx
Digital Logic Computer Design lecture notes
Automation-in-Manufacturing-Chapter-Introduction.pdf
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Project quality management in manufacturing
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
web development for engineering and engineering
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CH1 Production IntroductoryConcepts.pptx
Lecture Notes Electrical Wiring System Components
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf

INTRODUCTION OF BIG DATA

  • 1. introduction of Big Data Submitted by:-Harshit
  • 2. BIG DATA  Big data is also the simple data but in huge amount is termed as Big data.  Big data is a term for data sets that are so large or complex that traditional data processing application software's are inadequate to deal with them.  Challenges including capture, storage, analysis, search, sharing, transfer, visualization, querying, updating and information and privacy.
  • 5. Need of Big data  Over 2.5 Exabyte(2.5 billion gigabytes) of data is generated every day.  A typical, large stock exchange captures more than 1 TB of data every day. There are around 5 billion mobile phones (including 1.75 billion smart phones) in the world.  A simple stock exchange market exchange more than 1TB of data on the daily basis.
  • 6. 4V’s BY IBM  Volume:- Size of the data.  Velocity:-At what rate data is generating and getting analyzed.  Variety:-Types of data like .jpg, .mp4, .txt, .xml, etc.  Veracity:-Data veracity tells up to which point data is precise and tells uncertainty.
  • 8. Types of Big data  Structured Data  Unstructured Data  Semi-structured Data
  • 10. Hadoop  Hadoop is an open source, Java-based programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.  The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model.
  • 13. HIVE  Hive is a data warehouse infrastructure tool to process structured data in Hadoop(used for structure and semi structured data analysis and processing). It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.  Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon  Hive is not a relational database.
  • 14. Hive Architecture and its component
  • 15. Mapreduce  MapReduce is a programming model suitable for processing of huge data. Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in nature, thus are very useful for performing large-scale data analysis using multiple machines in the cluster.  Mapreduce works in two different phase. 1.Map phase 2.Reduce phase.
  • 17. Conclusion  Data is growing day by day and there is only one way to manage such a huge amount of data and that is BIG DATA.  Big data software’s:  Apache Hadoop  Hive  Mapreduce  Scala  Spark etc.