SlideShare a Scribd company logo
Hadoop training in bangalore
What is Hadoop?
• The Apache Hadoop software library is a
framework that allows for the distributed
processing of large data sets across clusters
of computers using simple programming
models.
• It is made by apache software foundation in
2011.
• Written in JAVA.
Hadoop is open source software.
Framework
Massive Storage
Processing Power
Big Data
•Big data is a term used to define very large amount of unstructured and
semi structured data a company creates.
•The term is used when talking about Petabytes and Exabyte of data.
•That much data would take so much time and cost to load into relational
database for analysis.
•Facebook has almost 10billion photos taking up to 1Petabytes of storage.
So what is theproblem??
1. Processing that large data is very difficult in relational database.
2. It would take too much time to process data and cost.
We can solve this problem by Distributed
Computing.
But the problems in distributed computing is –
Hardware failure
Chances of hardware failure is always there.
Combine the data after analysis
Data from all disks have to be combined from all the disks which is a mess.
ToSolve all the Problems HadoopCame.
It has two main parts –
1. Hadoop Distributed File System (HDFS),
2. Data Processing Framework & MapReduce
1. Hadoop Distributed File System
It ties so many small and reasonable priced machines together into a single cost effective computer
cluster.
Data and application processing are protected against hardware failure.
If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed
computing does not fail.
it automatically stores multiple copies of all data.
It provides simplified programming model which allows user to quickly read and write the
distributed system.
2. MapReduce
MapReduce is a programming model for processing and generating large data sets with a
parallel, distributed algorithm on a cluster.
It is an associative implementation for processing and generating large data sets.
MAP function that process a key pair to generates a set of intermediate key pairs.
REDUCE function that merges all intermediate values associated with the same intermediate
key
Hadoop training in bangalore
Hadoop training in bangalore
Pros of Hadoop
1. Computing power
2. Flexibility
3. Fault Tolerance
4. Low Cost
5. Scalability
Cons of Hadoop
1. Integration with existing systems
Hadoop is not optimised for ease for use. Installing and integrating with existing
databases might prove to be difficult, especially since there is no software support
provided.
2. Administration and ease of use
Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This
means significant training may be required to administer Hadoop clusters.
3. Security
Hadoop lacks the level of security functionality needed for safe enterprise deployment,
especially if it concerns sensitive data.
https://www.traininginbangalore.com/hadoop-training-in-bangalore/

More Related Content

PPTX
Hadoop tutorial for Freshers,
PPTX
Hadoop
PPTX
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
PPT
MapReduce - Hadoop - Big Data
PDF
Apache hadoop & map reduce
PPTX
Azure cognitive service
PPTX
ODP
Challenges in Large Scale Machine Learning
Hadoop tutorial for Freshers,
Hadoop
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
MapReduce - Hadoop - Big Data
Apache hadoop & map reduce
Azure cognitive service
Challenges in Large Scale Machine Learning

What's hot (20)

PPTX
Machine Learning on Distributed Systems by Josh Poduska
PDF
Introduction To Hadoop Administration - SpringPeople
PPT
Pervasive DataRush
PDF
SparkApplicationDevMadeEasy_Spark_Summit_2015
PDF
Spark For Faster Batch Processing
PPTX
Revolution Analytics
PPTX
Couchbase
PPTX
5 things one must know about spark!
PPTX
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
PDF
Impala use case @ Zoosk
PPTX
Power aware load balancing in cloud
PPTX
Cluster computing
PPTX
Atlanta MLConf
PDF
Building Data Quality pipelines with Apache Spark and Delta Lake
PPTX
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
PPTX
BDM26: Spark Summit 2014 Debriefing
PPTX
From Pipelines to Refineries: scaling big data applications with Tim Hunter
PDF
Data replication and synchronization tool
PPTX
Hadoop Ecosystem at a Glance
PDF
Distributed machine learning
Machine Learning on Distributed Systems by Josh Poduska
Introduction To Hadoop Administration - SpringPeople
Pervasive DataRush
SparkApplicationDevMadeEasy_Spark_Summit_2015
Spark For Faster Batch Processing
Revolution Analytics
Couchbase
5 things one must know about spark!
Introduction to Big Data Technologies: Hadoop/EMR/Map Reduce & Redshift
Impala use case @ Zoosk
Power aware load balancing in cloud
Cluster computing
Atlanta MLConf
Building Data Quality pipelines with Apache Spark and Delta Lake
Keynote – From MapReduce to Spark: An Ecosystem Evolves by Doug Cutting, Chie...
BDM26: Spark Summit 2014 Debriefing
From Pipelines to Refineries: scaling big data applications with Tim Hunter
Data replication and synchronization tool
Hadoop Ecosystem at a Glance
Distributed machine learning
Ad

Similar to Hadoop training in bangalore (20)

PPTX
PPT on Hadoop
PPTX
Hadoop introduction , Why and What is Hadoop ?
PPTX
Seminar ppt
PPTX
Hadoop by kamran khan
DOCX
Hadoop Seminar Report
PPTX
Big Data and Hadoop
PDF
2.1-HADOOP.pdf
PPTX
Hadoop live online training
PPTX
Hadoop info
PDF
Understanding hadoop
PPTX
Hadoop technology
PDF
Hadoop Tutorial for Big Data Enthusiasts
PDF
Seminar_Report_hadoop
PPT
Introduccion a Hadoop / Introduction to Hadoop
PPTX
PPTX
PPTX
Cppt Hadoop
PPT
Hadoop Technology
PPTX
Learn what is Hadoop-and-BigData
PPTX
Big data
PPT on Hadoop
Hadoop introduction , Why and What is Hadoop ?
Seminar ppt
Hadoop by kamran khan
Hadoop Seminar Report
Big Data and Hadoop
2.1-HADOOP.pdf
Hadoop live online training
Hadoop info
Understanding hadoop
Hadoop technology
Hadoop Tutorial for Big Data Enthusiasts
Seminar_Report_hadoop
Introduccion a Hadoop / Introduction to Hadoop
Cppt Hadoop
Hadoop Technology
Learn what is Hadoop-and-BigData
Big data
Ad

More from TIB Academy (16)

PPTX
PPTX
Ios operating system
PPTX
Salesforce
PPTX
CCNA Introducing
PPTX
CCNA Introducing
PPTX
Hadoop training
PPTX
Selenium institute in bangalore
PPTX
Selenium Tutorial for Beginners - TIB Academy
PPTX
Django framework
PPTX
Python basics
PPTX
Core java tutorials
PPTX
Spring tutorials
PPTX
PPTX
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
PPTX
Python tutorial for beginners - Tib academy
PPTX
Best Angularjs tutorial for beginners - TIB Academy
Ios operating system
Salesforce
CCNA Introducing
CCNA Introducing
Hadoop training
Selenium institute in bangalore
Selenium Tutorial for Beginners - TIB Academy
Django framework
Python basics
Core java tutorials
Spring tutorials
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Python tutorial for beginners - Tib academy
Best Angularjs tutorial for beginners - TIB Academy

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Complications of Minimal Access Surgery at WLH
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Insiders guide to clinical Medicine.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Complications of Minimal Access Surgery at WLH
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Basic Mud Logging Guide for educational purpose
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
human mycosis Human fungal infections are called human mycosis..pptx
Week 4 Term 3 Study Techniques revisited.pptx
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Supply Chain Operations Speaking Notes -ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
RMMM.pdf make it easy to upload and study
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf

Hadoop training in bangalore

  • 2. What is Hadoop? • The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • It is made by apache software foundation in 2011. • Written in JAVA.
  • 3. Hadoop is open source software. Framework Massive Storage Processing Power
  • 4. Big Data •Big data is a term used to define very large amount of unstructured and semi structured data a company creates. •The term is used when talking about Petabytes and Exabyte of data. •That much data would take so much time and cost to load into relational database for analysis. •Facebook has almost 10billion photos taking up to 1Petabytes of storage.
  • 5. So what is theproblem?? 1. Processing that large data is very difficult in relational database. 2. It would take too much time to process data and cost.
  • 6. We can solve this problem by Distributed Computing. But the problems in distributed computing is – Hardware failure Chances of hardware failure is always there. Combine the data after analysis Data from all disks have to be combined from all the disks which is a mess.
  • 7. ToSolve all the Problems HadoopCame. It has two main parts – 1. Hadoop Distributed File System (HDFS), 2. Data Processing Framework & MapReduce
  • 8. 1. Hadoop Distributed File System It ties so many small and reasonable priced machines together into a single cost effective computer cluster. Data and application processing are protected against hardware failure. If a node goes down, jobs are automatically redirected to other nodes to make sure the distributed computing does not fail. it automatically stores multiple copies of all data. It provides simplified programming model which allows user to quickly read and write the distributed system.
  • 9. 2. MapReduce MapReduce is a programming model for processing and generating large data sets with a parallel, distributed algorithm on a cluster. It is an associative implementation for processing and generating large data sets. MAP function that process a key pair to generates a set of intermediate key pairs. REDUCE function that merges all intermediate values associated with the same intermediate key
  • 12. Pros of Hadoop 1. Computing power 2. Flexibility 3. Fault Tolerance 4. Low Cost 5. Scalability
  • 13. Cons of Hadoop 1. Integration with existing systems Hadoop is not optimised for ease for use. Installing and integrating with existing databases might prove to be difficult, especially since there is no software support provided. 2. Administration and ease of use Hadoop requires knowledge of MapReduce, while most data practitioners use SQL. This means significant training may be required to administer Hadoop clusters. 3. Security Hadoop lacks the level of security functionality needed for safe enterprise deployment, especially if it concerns sensitive data.