INTRODUCTION OF BIG DATA

introduction of Big Data
Submitted by:-Harshit

BIG DATA
 Big data is also the simple data but in huge amount is termed as Big data.
 Big data is a term for data sets that are so large or complex that
traditional data processing application software's are inadequate to deal with
them.
 Challenges including capture, storage, analysis, search, sharing, transfer,
visualization, querying, updating and information and privacy.

Big data:-Outcomes and data Source

Need of Big data
 Over 2.5 Exabyte(2.5 billion gigabytes) of data is generated every day.
 A typical, large stock exchange captures more than 1 TB of data every day.
There are around 5 billion mobile phones (including 1.75 billion smart phones)
in the world.
 A simple stock exchange market exchange more than 1TB of data on the daily
basis.

4V’s BY IBM
 Volume:- Size of the data.
 Velocity:-At what rate data is generating and getting analyzed.
 Variety:-Types of data like .jpg, .mp4, .txt, .xml, etc.
 Veracity:-Data veracity tells up to which point data is precise and tells
uncertainty.

Types of Big data
 Structured Data
 Unstructured Data
 Semi-structured Data

Hadoop
 Hadoop is an open source, Java-based programming framework that supports
the processing and storage of extremely large data sets in a distributed
computing environment. It is part of the Apache project sponsored by the
Apache Software Foundation.
 The core of Apache Hadoop consists of a storage part, known as Hadoop
Distributed File System (HDFS), and a processing part which is a MapReduce
programming model.

HIVE
 Hive is a data warehouse infrastructure tool to process structured data in
Hadoop(used for structure and semi structured data analysis and processing).
It resides on top of Hadoop to summarize Big Data, and makes querying and
analyzing easy.
 Initially Hive was developed by Facebook, later the Apache Software
Foundation took it up and developed it further as an open source under the
name Apache Hive. It is used by different companies. For example, Amazon
 Hive is not a relational database.

Hive Architecture and its component

Mapreduce
 MapReduce is a programming model suitable for processing of huge data.
Hadoop is capable of running MapReduce programs written in various
languages: Java, Ruby, Python, and C++. MapReduce programs are parallel in
nature, thus are very useful for performing large-scale data analysis using
multiple machines in the cluster.
 Mapreduce works in two different phase.
1.Map phase
2.Reduce phase.

Conclusion
 Data is growing day by day and there is only one way to manage such a huge
amount of data and that is BIG DATA.
 Big data software’s:
 Apache Hadoop
 Hive
 Mapreduce
 Scala
 Spark etc.

INTRODUCTION OF BIG DATA

More Related Content

What's hot (20)

Similar to INTRODUCTION OF BIG DATA (20)

Recently uploaded (20)

INTRODUCTION OF BIG DATA