SlideShare a Scribd company logo
Why do I need Hadoop?
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune


Business analytics focuses on developing
new insights and understanding of
business performance based on data and
statistical methods.

Business analytics
Problem : Too much data
Big Data!!
Velocity
 How fast data is being produced and how fast the data must be processed to meet
demand.
 Have a look through analytics lens!
Variability
 highly inconsistent with periodic peaks
 Is something big trending in the social media?
 Difference in Variety and Variability
Megabytes,Gigabytes…


Terabyte : To put it in some perspective, a
Terabyte could hold about 300 hours of good
quality video. A Terabyte could hold 1,000 copies
of the Encyclopedia Britannica.



Petabyte : It could hold 500 billion pages of
standard printed text.



Exabyte: It has been said that 5 Exabytes would
be equal to all of the words ever spoken by
mankind.
Human Generated Data and Machine
Generated
Data
Sheer size of Big Data
 Big Data is unstructured or semi
structured.
 No point in just storing big data, if we
can't process it.


Challenges of Big Data
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Hadoop enables a computing
solution that is:

Scalable– New nodes can be added as needed, and added
without needing to change data formats, how data is loaded,
how jobs are written, or the applications on top.
 Cost effective– Hadoop brings massively parallel computing
to commodity servers.
 Flexible– Hadoop is schema-less, and can absorb any type of
data, structured or not, from any number of sources.
 Fault tolerant– When you lose a node, the system redirects
work to another location of the data and continues processing
without missing a beat.

Power of Map Reduce
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune


Introduction
Hadoop: Basic Concepts
What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster



Hadoop daemons
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Job tracker
Task tracker

Course Content


HDFS ( Hadoop Distributed File System )



Blocks and Splits
Input Splits
HDFS SplitsData Replication
Hadoop Rack Aware
Data high availability
Data Integrity
Cluster architecture and block placement
Accessing HDFS
JAVA Approach
CLI ApproachProgramming Practices
Developing MapReduce Programs in
Local Mode
Running without HDFS and Mapreduce
Pseudo-distributed Mode
Running all daemons in a single node
Fully distributed mode
Running daemons on dedicated nodes


Writing a MapReduce Program
Examining a Sample MapReduce Program
With several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API



Common MapReduce Algorithms
Sorting and Searching
Indexing
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Hands-On Exercise: Creating an Inverted Index
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications


Debugging MapReduce Programs
Testing with MRUnit
Logging
Other Debugging Strategies.



Advanced MapReduce Programming
A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats


HBase
HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database, Develop and run sample applications
Access data stored in HBase using clients like Java, Python and Pearl
HBase and Hive Integration
HBase admin tasks
Defining Schema and basic operation

Hadoop Ecosystem


Hive
Hive concepts
Hive architecture
Install and configure hive on cluster
Create database, access it from java client
Buckets
PartitionsJoins in hive
Inner joins
Outer Joins
Hive UDF
Hive UDAF
Hive UDTF
Develop and run sample applications in Java/Python to access hive


PIG
Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Programming in Eclipse
Running as Java program
PIG UDFs
Pig Macros



Flume
Flume concepts
Install and configure flume on cluster
Create a sample application to capture logs from Apache using
flume


Sqoop
Getting Sqoop
A Sample Import
Database Imports
Controlling the import
Imports and consistency
Direct-mode imports
Performing an Export
Contact Us
Address
MindScripts Technologies,
2nd Floor, Siddharth Hall,
Near Ranka Jewellers,
Behind HP Petrol Pump,
Karve Rd,
Pune 411004
Call
9595957557
8805674210
9764560238
9767427924
9881371828

Address
MindScripts Technologies,
C8, 2nd Floor, Sant Tukaram Complex ,
Pradhikaran, Above Savali Hotel,
Opp Nigdi Bus Stand,
Nigdi,
Pune - 411044

www.mindscripts.com
info@mindscripts.com

More Related Content

PPT
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
PPT
Hadoop training by keylabs
PPT
PPTX
Big data and hadoop product page
DOCX
Hadoop online training by certified trainer
DOCX
Best Hadoop and Amazon Online Training
PPTX
Hadoop Online Training | Online Hadoop Training certification in India
PDF
Bn1028 demo hadoop administration and development
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hadoop training by keylabs
Big data and hadoop product page
Hadoop online training by certified trainer
Best Hadoop and Amazon Online Training
Hadoop Online Training | Online Hadoop Training certification in India
Bn1028 demo hadoop administration and development

What's hot (20)

PPTX
Big data course
PPTX
Hadoop An Introduction
PPT
Hadoop presentation
DOCX
Hadoop online training course
PPT
Big Data Technologies - Hadoop
DOCX
Hadoop online training
PDF
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
PPT
Introduction to Hive for Hadoop
PPTX
Introduction to Big Data & Hadoop Architecture - Module 1
PPTX
Beauty and Big Data
PPT
1 content optimization-hug-2010-07-21
PPTX
Introduction to hadoop
PPTX
Big data and hadoop
PPT
Hadoop distributions - ecosystem
PPTX
Dataiku big data paris - the rise of the hadoop ecosystem
PPTX
Introduction to Big Data and hadoop
PDF
Hortonworks Big Data & Hadoop
PPT
Intro to big data and hadoop ubc cs lecture series - g fawkes
PDF
Hadoop Summit 2010 Keynote
PPTX
Hadoop jon
Big data course
Hadoop An Introduction
Hadoop presentation
Hadoop online training course
Big Data Technologies - Hadoop
Hadoop online training
Hadoop- A Highly Available and Secure Enterprise DataWarehousing solution
Introduction to Hive for Hadoop
Introduction to Big Data & Hadoop Architecture - Module 1
Beauty and Big Data
1 content optimization-hug-2010-07-21
Introduction to hadoop
Big data and hadoop
Hadoop distributions - ecosystem
Dataiku big data paris - the rise of the hadoop ecosystem
Introduction to Big Data and hadoop
Hortonworks Big Data & Hadoop
Intro to big data and hadoop ubc cs lecture series - g fawkes
Hadoop Summit 2010 Keynote
Hadoop jon
Ad

Viewers also liked (11)

PPTX
PPTX
Hadoop
PDF
Why do we need Hadoop?
PPTX
Hadoop distributed file system
PDF
Hadoop & MapReduce
PPTX
Hadoop HDFS Detailed Introduction
PPSX
PPTX
Hadoop introduction , Why and What is Hadoop ?
PPTX
Hadoop & HDFS for Beginners
PPTX
Big data ppt
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Hadoop
Why do we need Hadoop?
Hadoop distributed file system
Hadoop & MapReduce
Hadoop HDFS Detailed Introduction
Hadoop introduction , Why and What is Hadoop ?
Hadoop & HDFS for Beginners
Big data ppt
Big Data - 25 Amazing Facts Everyone Should Know
Ad

Similar to Big-Data Hadoop Tutorials - MindScripts Technologies, Pune (20)

PPTX
Big Data and Hadoop
PPTX
Hadoop
PPTX
Hands on Hadoop and pig
PDF
Hadoop Master Class : A concise overview
PDF
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
PDF
Big_data_1674238705.ppt is a basic background
DOCX
Hadoop Seminar Report
PDF
1. Big Data - Introduction(what is bigdata).pdf
PPTX
Module 1- Introduction to Big Data and Hadoop
PPTX
Introduction to BIg Data and Hadoop
PPTX
Introduction to Apache Hadoop
PPTX
Hadoop: An Industry Perspective
PPTX
Sf NoSQL MeetUp: Apache Hadoop and HBase
PPTX
Big data concepts
PPTX
Big data Analytics Hadoop
PPTX
Colorado Springs Open Source Hadoop/MySQL
PPTX
Introduction to Hadoop and Big Data
ODP
Hadoop introduction
PPTX
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
Big Data and Hadoop
Hadoop
Hands on Hadoop and pig
Hadoop Master Class : A concise overview
Big Data Essentials meetup @ IBM Ljubljana 23.06.2015
Big_data_1674238705.ppt is a basic background
Hadoop Seminar Report
1. Big Data - Introduction(what is bigdata).pdf
Module 1- Introduction to Big Data and Hadoop
Introduction to BIg Data and Hadoop
Introduction to Apache Hadoop
Hadoop: An Industry Perspective
Sf NoSQL MeetUp: Apache Hadoop and HBase
Big data concepts
Big data Analytics Hadoop
Colorado Springs Open Source Hadoop/MySQL
Introduction to Hadoop and Big Data
Hadoop introduction
M. Florence Dayana - Hadoop Foundation for Analytics.pptx

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Pre independence Education in Inndia.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
master seminar digital applications in india
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Business Ethics Teaching Materials for college
PPTX
Cell Types and Its function , kingdom of life
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
O5-L3 Freight Transport Ops (International) V1.pdf
Microbial diseases, their pathogenesis and prophylaxis
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pre independence Education in Inndia.pdf
PPH.pptx obstetrics and gynecology in nursing
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Week 4 Term 3 Study Techniques revisited.pptx
master seminar digital applications in india
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Microbial disease of the cardiovascular and lymphatic systems
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Business Ethics Teaching Materials for college
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
Final Presentation General Medicine 03-08-2024.pptx
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

Big-Data Hadoop Tutorials - MindScripts Technologies, Pune

  • 1. Why do I need Hadoop?
  • 5.  Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. Business analytics
  • 6. Problem : Too much data
  • 8. Velocity  How fast data is being produced and how fast the data must be processed to meet demand.  Have a look through analytics lens!
  • 9. Variability  highly inconsistent with periodic peaks  Is something big trending in the social media?  Difference in Variety and Variability
  • 10. Megabytes,Gigabytes…  Terabyte : To put it in some perspective, a Terabyte could hold about 300 hours of good quality video. A Terabyte could hold 1,000 copies of the Encyclopedia Britannica.  Petabyte : It could hold 500 billion pages of standard printed text.  Exabyte: It has been said that 5 Exabytes would be equal to all of the words ever spoken by mankind.
  • 11. Human Generated Data and Machine Generated Data
  • 12. Sheer size of Big Data  Big Data is unstructured or semi structured.  No point in just storing big data, if we can't process it.  Challenges of Big Data
  • 14. Hadoop enables a computing solution that is: Scalable– New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.  Cost effective– Hadoop brings massively parallel computing to commodity servers.  Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources.  Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat. 
  • 15. Power of Map Reduce
  • 17.  Introduction Hadoop: Basic Concepts What is Hadoop? The Hadoop Distributed File System Hadoop Map Reduce Works Anatomy of a Hadoop Cluster  Hadoop daemons Master Daemons Name node Job Tracker Secondary name node Slave Daemons Job tracker Task tracker Course Content
  • 18.  HDFS ( Hadoop Distributed File System )  Blocks and Splits Input Splits HDFS SplitsData Replication Hadoop Rack Aware Data high availability Data Integrity Cluster architecture and block placement Accessing HDFS JAVA Approach CLI ApproachProgramming Practices Developing MapReduce Programs in Local Mode Running without HDFS and Mapreduce Pseudo-distributed Mode Running all daemons in a single node Fully distributed mode Running daemons on dedicated nodes
  • 19.  Writing a MapReduce Program Examining a Sample MapReduce Program With several examples Basic API Concepts The Driver Code The Mapper The Reducer Hadoop's Streaming API  Common MapReduce Algorithms Sorting and Searching Indexing Classification/Machine Learning Term Frequency - Inverse Document Frequency Word Co-Occurrence Hands-On Exercise: Creating an Inverted Index Identity Mapper Identity Reducer Exploring well known problems using MapReduce applications
  • 20.  Debugging MapReduce Programs Testing with MRUnit Logging Other Debugging Strategies.  Advanced MapReduce Programming A Recap of the MapReduce Flow The Secondary Sort Customized Input Formats and Output Formats
  • 21.  HBase HBase concepts HBase architecture Region server architecture File storage architecture HBase basics Column access Scans HBase use cases Install and configure HBase on a multi node cluster Create database, Develop and run sample applications Access data stored in HBase using clients like Java, Python and Pearl HBase and Hive Integration HBase admin tasks Defining Schema and basic operation Hadoop Ecosystem
  • 22.  Hive Hive concepts Hive architecture Install and configure hive on cluster Create database, access it from java client Buckets PartitionsJoins in hive Inner joins Outer Joins Hive UDF Hive UDAF Hive UDTF Develop and run sample applications in Java/Python to access hive
  • 23.  PIG Pig basics Install and configure PIG on a cluster PIG Vs MapReduce and SQL Pig Vs Hive Write sample Pig Latin scripts Modes of running PIG Running in Grunt shell Programming in Eclipse Running as Java program PIG UDFs Pig Macros  Flume Flume concepts Install and configure flume on cluster Create a sample application to capture logs from Apache using flume
  • 24.  Sqoop Getting Sqoop A Sample Import Database Imports Controlling the import Imports and consistency Direct-mode imports Performing an Export
  • 25. Contact Us Address MindScripts Technologies, 2nd Floor, Siddharth Hall, Near Ranka Jewellers, Behind HP Petrol Pump, Karve Rd, Pune 411004 Call 9595957557 8805674210 9764560238 9767427924 9881371828 Address MindScripts Technologies, C8, 2nd Floor, Sant Tukaram Complex , Pradhikaran, Above Savali Hotel, Opp Nigdi Bus Stand, Nigdi, Pune - 411044 www.mindscripts.com info@mindscripts.com