Data analytics

Introduction to Hadoop:-
1. Hadoop is an open source, Java-based programming framework
that supports the processing and storage of extremely large data
sets in a distributed computing environment.
2. Hadoop makes it possible to run applications on systems with
thousands of commodity hardware nodes, and to handle thousands
of terabytes of data.
3. Its distributed file system facilitates rapid data transfer rates among
nodes and allows the system to continue operating in case of a node
failure.
data transfer rates

Hadoop was created by computer scientists Doug Cutting and Mike Cafarella.
In The Year 2006
Doug Cutting Mike Cafarella

Hadoop was created by computer scientists Doug Cutting
and Mike Cafarella in 2006 to support distribution for the
Nutch search engine.
It was inspired by Google's #MapReduce , a software
framework in which an application is broken down into
numerous small parts.

After years of development within the open source community,
Hadoop 1.0 became publically available in November 2012 as part of
the Apache project sponsored by the Apache Software Foundation.
90% of the world’s data was generated in the last few years.

Apache Nutch:-
Apache Nutch is a highly extensible and scalable open
source web crawler software project.
Nutch is coded entirely in the Java programming language, but
data is written in language-independent formats. It has a
highly modular architecture, allowing developers to create
plug-ins for media-type parsing, data retrieval, querying and
clustering.

Big data is a collection of both structured and
unstructured data that is too large fast and
distinct to be managed by traditional database
management tools or traditional data processing
applications.

Some Of the Examples:-
1.Data managed by eBayfor request search,
consumer/customer recommendations,
current trend and merchandising.

Platform For Managing Big Data:-
1.Hadoop uses simple programming model.
2. Hadoop can scale from single servers to
thousands of machines, each offering local
computation and storage.

Three Main application Of Hadoop :-
• Advertisement (Mining user behavior to generate
recommendations)
• Searches (group related documents).
• Security (search for uncommon patterns).

Hadoop in the Wild:-
• Hadoop is in use at most organizations that handle big data:
o Yahoo!
o Facebook
o Amazon
o Netflix
o Etc…
• Some examples of scale:
o Yahoo!’s Search Webmap runs on 10,000 core Linux cluster and powers Yahoo! Web
search
o FB’s Hadoop cluster hosts 100+ PB of data (July, 2012) & growing at ½ PB/day (Nov, 2012)

Data analytics

More Related Content

What's hot (20)

Similar to Data analytics (20)

Recently uploaded (20)

Data analytics