Hadoop Futures
Hadoop Futures
What to watch

Tom White, Cloudera
Hadoop User Group UK, Bristol
10 August 2009
About me
▪   Apache Hadoop Committer, PMC
    Member, Apache Member
▪   Employed by Cloudera
▪   Author of “Hadoop: The Definitive
    Guide”
    ▪   http://hadoopbook.com
Goals
▪   Modular
    ▪   E.g. pluggable block placement algorithm
▪   Multiple languages
    ▪   E.g. not just Java for MapReduce
▪   Integration with other systems
    ▪   E.g. JMX monitoring hooks
The Project Split
▪   Core -> Common, HDFS, MapReduce
▪   New repositories
▪   New mailing lists
    ▪   {common,hdfs,mapreduce}-{user,dev,issues}@hadoop.apache.org
▪   New directory layouts
▪   New configuration
    ▪   hadoop-site.xml -> {core,hdfs,mapreduce}-site.xml
▪   More information at
    ▪   http://www.cloudera.com/blog/2009/07/17/the-project-split/
    ▪   general@hadoop.apache.org
Releases
▪   0.18.3 - 29 Jan 2009
    ▪   Official “stable” release
    ▪   Probably the most commonly used
    ▪   Basis for first Cloudera distribution
▪   0.19.2 - 23 July 2009
    ▪   0.19 series is not widely used
▪   0.20.0 - 22 April 2009
    ▪   Expect large adoption with 0.20.1 release in coming weeks
    ▪   Basis for second Cloudera distribution, first Yahoo! distribution
▪   0.21 series - feature freeze end of August 2009
Hadoop 1.0
▪   After 0.21 release
▪   Need to establish rules about version evolution
    ▪   Hadoop 1.0 Interface Classification - HADOOP-5073
    ▪   API, Data, wire protocol compatibility - HADOOP-5071
Interesting Projects/JIRAs
▪   Common
    ▪   Avro for Hadoop RPC - HADOOP-6170
    ▪   Service lifecycle - HDFS-326
    ▪   Distributed configuration - HADOOP-5670
    ▪   10 minute patch builds - HADOOP-5628, HDFS-458,
        MAPREDUCE-670
    ▪   Ivy/Maven integration - HADOOP-5107
    ▪   Eclipse plugin
Interesting Projects/JIRAs (continued)
▪   MapReduce
    ▪   Metadata in Serialization - HADOOP-6165
    ▪   Compute splits on the cluster - MAPREDUCE-207
    ▪   Context Objects - ongoing migration of libraries/examples
    ▪   Security - HADOOP-4487
    ▪   Schedulers
        ▪   Fair share scheduler - global scheduling, FIFO - MAPREDUCE-548,
            MAPREDUCE-706
        ▪   Capacity - high RAM jobs - HADOOP-5884
    ▪   Speed: new shuffle
        ▪   See http://sortbenchmark.org/Yahoo2009.pdf
Popular JIRAs
▪   http://community.cloudera.com/
Questions?
▪   tom@cloudera.com


▪   Cloudera’s Distribution for Hadoop
    ▪   http://www.cloudera.com/hadoop
(c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0

More Related Content

PDF
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
PDF
Flexible and Fast Storage for Deep Learning with Alluxio
PDF
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
PDF
Ceph Management and Monitoring with Dashboard v2 - Lenz Grimmer
PDF
Postgres Plus Cloud Database on OpenStack
PDF
20080528dublinpt1
PDF
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
PDF
Cache Tiering and Erasure Coding
FOSSDAY@IIUM 2012 Cloud Presentation By LinuxMalaysia
Flexible and Fast Storage for Deep Learning with Alluxio
Ceph and Storage Management with openATTIC - FOSDEM 2017-02-05
Ceph Management and Monitoring with Dashboard v2 - Lenz Grimmer
Postgres Plus Cloud Database on OpenStack
20080528dublinpt1
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Cache Tiering and Erasure Coding

What's hot (20)

PPTX
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
PDF
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
PDF
Memcache and Drupal - Vaibhav Jain
PPTX
Ceph meetup montreal
PPTX
Working with the Moodle Database: The Basics
ODP
Ceph Day Santa Clara: Ceph and Apache CloudStack
PDF
Breaking performance web rules
PDF
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
PDF
openATTIC Technology Overview - Ceph Management
PDF
Usage case of HBase for real-time application
PPTX
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
PPTX
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
PPTX
Apache Tajo - BWC 2014
PDF
Spark Summit EU talk by Jiri Simsa
PDF
Introduction to CloudStack Storage Subsystem
PDF
HBase: Extreme Makeover
PPTX
HBaseCon 2015: State of HBase Docs and How to Contribute
PPTX
Digital Library Collection Management using HBase
PDF
GlusterFS And Big Data
PDF
Postgres on OpenStack
 
Nutch + Hadoop scaled, for crawling protected web sites (hint: Selenium)
Optimizing Latency-Sensitive Queries for Presto at Facebook: A Collaboration ...
Memcache and Drupal - Vaibhav Jain
Ceph meetup montreal
Working with the Moodle Database: The Basics
Ceph Day Santa Clara: Ceph and Apache CloudStack
Breaking performance web rules
Hadoop 2.x HDFS Cluster Installation (VirtualBox)
openATTIC Technology Overview - Ceph Management
Usage case of HBase for real-time application
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Hadoop Meetup Jan 2019 - Mounting Remote Stores in HDFS
Apache Tajo - BWC 2014
Spark Summit EU talk by Jiri Simsa
Introduction to CloudStack Storage Subsystem
HBase: Extreme Makeover
HBaseCon 2015: State of HBase Docs and How to Contribute
Digital Library Collection Management using HBase
GlusterFS And Big Data
Postgres on OpenStack
 
Ad

Viewers also liked (19)

PDF
PDF
Hadoop Hackathon Reader
PDF
Large-Scale Data Storage and Processing for Scientists with Hadoop
PDF
A hadoop implementation of pagerank
PDF
Hadoop implementation for algorithms apriori, pcy, son
PDF
Google PageRank
PDF
Mapreduce Algorithms
PPTX
Implementing the Lambda Architecture efficiently with Apache Spark
PDF
Large Scale Data Analysis with Map/Reduce, part I
PDF
Big Data - O que é o hadoop, map reduce, hdfs e hive
PPTX
Pig, Making Hadoop Easy
PDF
introduction to data processing using Hadoop and Pig
PDF
Practical Problem Solving with Apache Hadoop & Pig
PDF
Big Data and Fast Data - Lambda Architecture in Action
KEY
Hadoop, Pig, and Twitter (NoSQL East 2009)
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
PDF
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
PPT
Pagerank Algorithm Explained
PPT
Introduction To Map Reduce
Hadoop Hackathon Reader
Large-Scale Data Storage and Processing for Scientists with Hadoop
A hadoop implementation of pagerank
Hadoop implementation for algorithms apriori, pcy, son
Google PageRank
Mapreduce Algorithms
Implementing the Lambda Architecture efficiently with Apache Spark
Large Scale Data Analysis with Map/Reduce, part I
Big Data - O que é o hadoop, map reduce, hdfs e hive
Pig, Making Hadoop Easy
introduction to data processing using Hadoop and Pig
Practical Problem Solving with Apache Hadoop & Pig
Big Data and Fast Data - Lambda Architecture in Action
Hadoop, Pig, and Twitter (NoSQL East 2009)
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Pagerank Algorithm Explained
Introduction To Map Reduce
Ad

Similar to Hadoop Futures (20)

PDF
Webinar: The Future of Hadoop
PDF
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
PDF
Hw09 Welcome To Hadoop World
PDF
Introduction to Data Science with Hadoop
PDF
20080529dublinpt1
PDF
ApacheCon09: Avro
PPTX
Hadoop 3 (2017 hadoop taiwan workshop)
PPTX
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
PDF
Hw09 Whats New From Cloudera
PDF
Hadoop summit cloudera keynote_v5
PDF
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
PPTX
Amr Awadallah, unSEXY Presentation
PPTX
巨量資料入門 The evolution of data architecture
PDF
20100128ebay
PPTX
Introduction to Apache Hadoop Ecosystem
ODP
Hadoop Introduction
PDF
Deploying Hadoop-Based Bigdata Environments
PDF
Deploying Hadoop-based Bigdata Environments
PDF
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
PDF
Hadoop Primer
Webinar: The Future of Hadoop
Webinar: Productionizing Hadoop: Lessons Learned - 20101208
Hw09 Welcome To Hadoop World
Introduction to Data Science with Hadoop
20080529dublinpt1
ApacheCon09: Avro
Hadoop 3 (2017 hadoop taiwan workshop)
Hadoop Essentials -- The What, Why and How to Meet Agency Objectives
Hw09 Whats New From Cloudera
Hadoop summit cloudera keynote_v5
Search in the Apache Hadoop Ecosystem: Thoughts from the Field
Amr Awadallah, unSEXY Presentation
巨量資料入門 The evolution of data architecture
20100128ebay
Introduction to Apache Hadoop Ecosystem
Hadoop Introduction
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Hadoop Primer

More from Steve Loughran (20)

PPTX
Hadoop Vectored IO
PPTX
The age of rename() is over
PPTX
What does Rename Do: (detailed version)
PPTX
Put is the new rename: San Jose Summit Edition
PPTX
@Dissidentbot: dissent will be automated!
PPTX
PUT is the new rename()
PPT
Extreme Programming Deployed
PPT
PPTX
I hate mocking
PPTX
What does rename() do?
PPTX
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
PPTX
Apache Spark and Object Stores —for London Spark User Group
PPTX
Spark Summit East 2017: Apache spark and object stores
PPTX
Hadoop, Hive, Spark and Object Stores
PPTX
Apache Spark and Object Stores
PPTX
Household INFOSEC in a Post-Sony Era
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate
PPTX
Slider: Applications on YARN
PPTX
YARN Services
Hadoop Vectored IO
The age of rename() is over
What does Rename Do: (detailed version)
Put is the new rename: San Jose Summit Edition
@Dissidentbot: dissent will be automated!
PUT is the new rename()
Extreme Programming Deployed
I hate mocking
What does rename() do?
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Apache Spark and Object Stores —for London Spark User Group
Spark Summit East 2017: Apache spark and object stores
Hadoop, Hive, Spark and Object Stores
Apache Spark and Object Stores
Household INFOSEC in a Post-Sony Era
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate
Slider: Applications on YARN
YARN Services

Recently uploaded (20)

PPTX
Chapter 5: Probability Theory and Statistics
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Unlock new opportunities with location data.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Hybrid model detection and classification of lung cancer
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Five Habits of High-Impact Board Members
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
DOCX
search engine optimization ppt fir known well about this
PPTX
Benefits of Physical activity for teenagers.pptx
Chapter 5: Probability Theory and Statistics
Module 1.ppt Iot fundamentals and Architecture
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Unlock new opportunities with location data.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Final SEM Unit 1 for mit wpu at pune .pptx
A review of recent deep learning applications in wood surface defect identifi...
Enhancing emotion recognition model for a student engagement use case through...
A novel scalable deep ensemble learning framework for big data classification...
Hybrid model detection and classification of lung cancer
Assigned Numbers - 2025 - Bluetooth® Document
A contest of sentiment analysis: k-nearest neighbor versus neural network
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Five Habits of High-Impact Board Members
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
NewMind AI Weekly Chronicles – August ’25 Week III
search engine optimization ppt fir known well about this
Benefits of Physical activity for teenagers.pptx

Hadoop Futures

  • 2. Hadoop Futures What to watch Tom White, Cloudera Hadoop User Group UK, Bristol 10 August 2009
  • 3. About me ▪ Apache Hadoop Committer, PMC Member, Apache Member ▪ Employed by Cloudera ▪ Author of “Hadoop: The Definitive Guide” ▪ http://hadoopbook.com
  • 4. Goals ▪ Modular ▪ E.g. pluggable block placement algorithm ▪ Multiple languages ▪ E.g. not just Java for MapReduce ▪ Integration with other systems ▪ E.g. JMX monitoring hooks
  • 5. The Project Split ▪ Core -> Common, HDFS, MapReduce ▪ New repositories ▪ New mailing lists ▪ {common,hdfs,mapreduce}-{user,dev,issues}@hadoop.apache.org ▪ New directory layouts ▪ New configuration ▪ hadoop-site.xml -> {core,hdfs,mapreduce}-site.xml ▪ More information at ▪ http://www.cloudera.com/blog/2009/07/17/the-project-split/ ▪ general@hadoop.apache.org
  • 6. Releases ▪ 0.18.3 - 29 Jan 2009 ▪ Official “stable” release ▪ Probably the most commonly used ▪ Basis for first Cloudera distribution ▪ 0.19.2 - 23 July 2009 ▪ 0.19 series is not widely used ▪ 0.20.0 - 22 April 2009 ▪ Expect large adoption with 0.20.1 release in coming weeks ▪ Basis for second Cloudera distribution, first Yahoo! distribution ▪ 0.21 series - feature freeze end of August 2009
  • 7. Hadoop 1.0 ▪ After 0.21 release ▪ Need to establish rules about version evolution ▪ Hadoop 1.0 Interface Classification - HADOOP-5073 ▪ API, Data, wire protocol compatibility - HADOOP-5071
  • 8. Interesting Projects/JIRAs ▪ Common ▪ Avro for Hadoop RPC - HADOOP-6170 ▪ Service lifecycle - HDFS-326 ▪ Distributed configuration - HADOOP-5670 ▪ 10 minute patch builds - HADOOP-5628, HDFS-458, MAPREDUCE-670 ▪ Ivy/Maven integration - HADOOP-5107 ▪ Eclipse plugin
  • 9. Interesting Projects/JIRAs (continued) ▪ MapReduce ▪ Metadata in Serialization - HADOOP-6165 ▪ Compute splits on the cluster - MAPREDUCE-207 ▪ Context Objects - ongoing migration of libraries/examples ▪ Security - HADOOP-4487 ▪ Schedulers ▪ Fair share scheduler - global scheduling, FIFO - MAPREDUCE-548, MAPREDUCE-706 ▪ Capacity - high RAM jobs - HADOOP-5884 ▪ Speed: new shuffle ▪ See http://sortbenchmark.org/Yahoo2009.pdf
  • 10. Popular JIRAs ▪ http://community.cloudera.com/
  • 11. Questions? ▪ tom@cloudera.com ▪ Cloudera’s Distribution for Hadoop ▪ http://www.cloudera.com/hadoop
  • 12. (c) 2009 Cloudera, Inc. or its licensors.  "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0