SlideShare a Scribd company logo
Apache Hadoop
By Ashwin Kumar R
What is Bigdata?
Big Data is a collection of data that is huge in volume, yet growing
exponentially with time. It is a data with so large size and complexity
that none of traditional data management tools can store it or
process it efficiently. Big data is also a data but with huge size
Why we need Big data analytics?
Big data analytics helps organizations harness their data and use it
to identify new opportunities. That, in turn, leads to smarter business
moves, more efficient operations, higher profits and happier
customers.
What is hadoop?
Hadoop is an open-source software framework for storing data and running
applications on clusters of commodity hardware. It provides massive storage
for any kind of data, enormous processing power and the ability to handle
virtually limitless concurrent tasks or jobs.
Use of Hadoop ?
Apache Hadoop is an open source framework that is used to efficiently store
and process large datasets ranging in size from gigabytes to petabytes of
data. Instead of using one large computer to store and process the data,
Hadoop allows clustering multiple computers to analyze massive datasets in
parallel more quickly.
What is DFS?
A Distributed File System (DFS) as the name suggests, is a file system
that is distributed on multiple file servers or multiple locations.
It allows programs to access or store isolated files as they do with the
local ones, allowing programmers to access files from any network or
computer.
What is HDFS?
HDFS is designed to reliably store very large files across machines in a large
cluster. It stores each file as a sequence of blocks; all blocks in a file except
the last block are the same size.
The blocks of a file are replicated for fault tolerance. The block size and
replication factor are configurable per file.
Advantages
● Scalable. Hadoop is a highly scalable storage platform, because it can store
and distribute very large data sets across hundreds of inexpensive servers
that operate in parallel.
● Cost effective. Hadoop also offers a cost effective storage solution for
businesses' exploding data sets.
● Flexible.
● Fast.
● Resilient to failure.
Disadvantages
● Security Concerns.
● Vulnerable By Nature.
● Not Fit for Small Data.
● Potential Stability Issues.
● General Limitations.
Thank you !

More Related Content

DOCX
Hadoop map reduce
PPT
Hadoop Technology
PPTX
Seminar ppt
PPTX
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
PDF
Big data and hadoop overvew
PPT
Big data and hadoop
PPTX
What is Hadoop? Key Concepts, Architecture, and Applications
PPTX
Apache hadoop basics
Hadoop map reduce
Hadoop Technology
Seminar ppt
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Big data and hadoop overvew
Big data and hadoop
What is Hadoop? Key Concepts, Architecture, and Applications
Apache hadoop basics

Similar to Fundamentals of Apache Hadoop in Bigdata (20)

PPT
Introduction to Apache hadoop
PDF
Bigdata and Hadoop Bootcamp
PPTX
big data and hadoop
PDF
Big data and hadoop
PPTX
Introduction-to-Big-Data-and-Hadoop.pptx
DOCX
1. what is hadoop part 1
PPTX
Hadoop introduction , Why and What is Hadoop ?
PPTX
Fundamentals of big data analytics and Hadoop
PPSX
PDF
Hadoop Master Class : A concise overview
PDF
DBA to Data Scientist
PPTX
Bar camp bigdata
PPTX
Bigdata and Hadoop Introduction
PPTX
Big data analytics - hadoop
PPT
Big Data & Hadoop
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
PDF
An introduction to Big-Data processing applying hadoop
ODP
Hadoop and Big Data for Absolute Beginners
PDF
Big data presentation
Introduction to Apache hadoop
Bigdata and Hadoop Bootcamp
big data and hadoop
Big data and hadoop
Introduction-to-Big-Data-and-Hadoop.pptx
1. what is hadoop part 1
Hadoop introduction , Why and What is Hadoop ?
Fundamentals of big data analytics and Hadoop
Hadoop Master Class : A concise overview
DBA to Data Scientist
Bar camp bigdata
Bigdata and Hadoop Introduction
Big data analytics - hadoop
Big Data & Hadoop
Hadoop_EcoSystem slide by CIDAC India.pptx
An introduction to Big-Data processing applying hadoop
Hadoop and Big Data for Absolute Beginners
Big data presentation
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Approach and Philosophy of On baking technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
NewMind AI Weekly Chronicles - August'25-Week II
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Approach and Philosophy of On baking technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
A comparative analysis of optical character recognition models for extracting...
Encapsulation_ Review paper, used for researhc scholars
Unlocking AI with Model Context Protocol (MCP)
Spectroscopy.pptx food analysis technology
Digital-Transformation-Roadmap-for-Companies.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation
NewMind AI Weekly Chronicles - August'25-Week II
Ad

Fundamentals of Apache Hadoop in Bigdata

  • 2. What is Bigdata? Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently. Big data is also a data but with huge size
  • 3. Why we need Big data analytics? Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits and happier customers.
  • 4. What is hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power and the ability to handle virtually limitless concurrent tasks or jobs.
  • 5. Use of Hadoop ? Apache Hadoop is an open source framework that is used to efficiently store and process large datasets ranging in size from gigabytes to petabytes of data. Instead of using one large computer to store and process the data, Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly.
  • 6. What is DFS? A Distributed File System (DFS) as the name suggests, is a file system that is distributed on multiple file servers or multiple locations. It allows programs to access or store isolated files as they do with the local ones, allowing programmers to access files from any network or computer.
  • 7. What is HDFS? HDFS is designed to reliably store very large files across machines in a large cluster. It stores each file as a sequence of blocks; all blocks in a file except the last block are the same size. The blocks of a file are replicated for fault tolerance. The block size and replication factor are configurable per file.
  • 8. Advantages ● Scalable. Hadoop is a highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in parallel. ● Cost effective. Hadoop also offers a cost effective storage solution for businesses' exploding data sets. ● Flexible. ● Fast. ● Resilient to failure.
  • 9. Disadvantages ● Security Concerns. ● Vulnerable By Nature. ● Not Fit for Small Data. ● Potential Stability Issues. ● General Limitations.