Hadoop Operations 1st Edition Eric Sammer

Hadoop Operations 1st Edition Eric Sammer
download
https://ebookbell.com/product/hadoop-operations-1st-edition-eric-
sammer-4340720
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Hadoop Operations And Cluster Management Cookbook Shumin Guo
https://ebookbell.com/product/hadoop-operations-and-cluster-
management-cookbook-shumin-guo-4433118
Hadoop Operations Eric Sammer
https://ebookbell.com/product/hadoop-operations-eric-sammer-4700466
Hadoop Essentials Delve Into The Key Concepts Of Hadoop And Get A
Thorough Understanding Of The Hadoop Ecosystem 1st Edition Shiva
Achari
https://ebookbell.com/product/hadoop-essentials-delve-into-the-key-
concepts-of-hadoop-and-get-a-thorough-understanding-of-the-hadoop-
ecosystem-1st-edition-shiva-achari-50949564
Hadoop The Definitive Guide Third White Tom
https://ebookbell.com/product/hadoop-the-definitive-guide-third-white-
tom-55285128

Hadoop For Finance Essentials 1st Edition Rajiv Tiwari
https://ebookbell.com/product/hadoop-for-finance-essentials-1st-
edition-rajiv-tiwari-55292446
Hadoop Blueprints Anurag Shrivastava Tanmay Deshpande
https://ebookbell.com/product/hadoop-blueprints-anurag-shrivastava-
tanmay-deshpande-55292464
Hadoop Mapreduce V2 Cookbook Second Edition Thilina Gunarathne
https://ebookbell.com/product/hadoop-mapreduce-v2-cookbook-second-
edition-thilina-gunarathne-55292468
Hadoop Cluster Deployment 1st Edition Zburivsky Danil
https://ebookbell.com/product/hadoop-cluster-deployment-1st-edition-
zburivsky-danil-55292482
Hadoop Application Architectures Mark Grover Ted Malaska Jonathan
Seidman Gwen Shapira
https://ebookbell.com/product/hadoop-application-architectures-mark-
grover-ted-malaska-jonathan-seidman-gwen-shapira-56428346

Hadoop Operations
Eric Sammer
Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo

Hadoop Operations
by Eric Sammer
Copyright © 2012 Eric Sammer. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions
are also available for most titles (http://my.safaribooksonline.com). For more information, contact our
corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.
Editors: Mike Loukides and Courtney Nash
Production Editor: Melanie Yarbrough
Copyeditor: Audrey Doyle
Indexer: Jay Marchand
Cover Designer: Karen Montgomery
Interior Designer: David Futato
Illustrator: Robert Romano
September 2012: First Edition.
Revision History for the First Edition:
2012-09-25 First release
See http://oreilly.com/catalog/errata.csp?isbn=9781449327057 for release details.
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of
O’Reilly Media, Inc. Hadoop Operations, the cover image of a spotted cavy, and related trade dress are
trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a
trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and author assume
no responsibility for errors or omissions, or for damages resulting from the use of the information con-
tained herein.
ISBN: 978-1-449-32705-7
[LSI]
1348583608

Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Goals and Motivation 7
Design 8
Daemons 9
Reading and Writing Data 11
The Read Path 12
The Write Path 13
Managing Filesystem Metadata 14
Namenode High Availability 16
Namenode Federation 18
Access and Integration 20
Command-Line Tools 20
FUSE 23
REST Support 23
3. MapReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The Stages of MapReduce 26
Introducing Hadoop MapReduce 33
Daemons 34
When It All Goes Wrong 36
YARN 37
4. Planning a Hadoop Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Picking a Distribution and Version of Hadoop 41
Apache Hadoop 41
Cloudera’s Distribution Including Apache Hadoop 42
Versions and Features 42
v

What Should I Use? 44
Hardware Selection 45
Master Hardware Selection 46
Worker Hardware Selection 48
Cluster Sizing 50
Blades, SANs, and Virtualization 52
Operating System Selection and Preparation 54
Deployment Layout 54
Software 56
Hostnames, DNS, and Identification 57
Users, Groups, and Privileges 60
Kernel Tuning 62
vm.swappiness 62
vm.overcommit_memory 62
Disk Configuration 63
Choosing a Filesystem 64
Mount Options 66
Network Design 66
Network Usage in Hadoop: A Review 67
1 Gb versus 10 Gb Networks 69
Typical Network Topologies 69
5. Installation and Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Installing Hadoop 75
Apache Hadoop 76
CDH 80
Configuration: An Overview 84
The Hadoop XML Configuration Files 87
Environment Variables and Shell Scripts 88
Logging Configuration 90
HDFS 93
Identification and Location 93
Optimization and Tuning 95
Formatting the Namenode 99
Creating a /tmp Directory 100
Namenode High Availability 100
Fencing Options 102
Basic Configuration 104
Automatic Failover Configuration 105
Format and Bootstrap the Namenodes 108
Namenode Federation 113
MapReduce 120
Identification and Location 120
vi | Table of Contents

Optimization and Tuning 122
Rack Topology 130
Security 133
6. Identity, Authentication, and Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Identity 137
Kerberos and Hadoop 137
Kerberos: A Refresher 138
Kerberos Support in Hadoop 140
Authorization 153
HDFS 153
MapReduce 155
Other Tools and Systems 159
Tying It Together 164
7. Resource Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
What Is Resource Management? 167
HDFS Quotas 168
MapReduce Schedulers 170
The FIFO Scheduler 171
The Fair Scheduler 173
The Capacity Scheduler 185
The Future 193
8. Cluster Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Managing Hadoop Processes 195
Starting and Stopping Processes with Init Scripts 195
Starting and Stopping Processes Manually 196
HDFS Maintenance Tasks 196
Adding a Datanode 196
Decommissioning a Datanode 197
Checking Filesystem Integrity with fsck 198
Balancing HDFS Block Data 202
Dealing with a Failed Disk 204
MapReduce Maintenance Tasks 205
Adding a Tasktracker 205
Decommissioning a Tasktracker 206
Killing a MapReduce Job 206
Killing a MapReduce Task 207
Dealing with a Blacklisted Tasktracker 207
9. Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Differential Diagnosis Applied to Systems 209
Table of Contents | vii

Common Failures and Problems 211
Humans (You) 211
Misconfiguration 212
Hardware Failure 213
Resource Exhaustion 213
Host Identification and Naming 214
Network Partitions 214
“Is the Computer Plugged In?” 215
E-SPORE 215
Treatment and Care 217
War Stories 220
A Mystery Bottleneck 221
There’s No Place Like 127.0.0.1 224
10. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
An Overview 229
Hadoop Metrics 230
Apache Hadoop 0.20.0 and CDH3 (metrics1) 231
Apache Hadoop 0.20.203 and Later, and CDH4 (metrics2) 237
What about SNMP? 239
Health Monitoring 239
Host-Level Checks 240
All Hadoop Processes 242
HDFS Checks 244
MapReduce Checks 246
11. Backup and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Data Backup 249
Distributed Copy (distcp) 250
Parallel Data Ingestion 252
Namenode Metadata 254
Appendix: Deprecated Configuration Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
viii | Table of Contents

Preface
Conventions Used in This Book
The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program elements
such as variable or function names, databases, data types, environment variables,
statements, and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter-
mined by context.
This icon signifies a tip, suggestion, or general note.
This icon indicates a warning or caution.
Using Code Examples
This book is here to help you get your job done. In general, you may use the code in
this book in your programs and documentation. You do not need to contact us for
permission unless you’re reproducing a significant portion of the code. For example,
writing a program that uses several chunks of code from this book does not require
permission. Selling or distributing a CD-ROM of examples from O’Reilly books does
ix

require permission. Answering a question by citing this book and quoting example
code does not require permission. Incorporating a significant amount of example code
from this book into your product’s documentation does require permission.
We appreciate, but do not require, attribution. An attribution usually includes the title,
author, publisher, and ISBN. For example: “Hadoop Operations by Eric Sammer
(O’Reilly). Copyright 2012 Eric Sammer, 978-1-449-32705-7.”
If you feel your use of code examples falls outside fair use or the permission given above,
feel free to contact us at permissions@oreilly.com.
Safari® Books Online
Safari Books Online (www.safaribooksonline.com) is an on-demand digital
library that delivers expert content in both book and video form from the
world’s leading authors in technology and business.
Technology professionals, software developers, web designers, and business and cre-
ative professionals use Safari Books Online as their primary resource for research,
problem solving, learning, and certification training.
Safari Books Online offers a range of product mixes and pricing programs for organi-
zations, government agencies, and individuals. Subscribers have access to thousands
of books, training videos, and prepublication manuscripts in one fully searchable da-
tabasefrompublisherslikeO’ReillyMedia,PrenticeHallProfessional,Addison-Wesley
Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John
Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT
Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Tech-
nology, and dozens more. For more information about Safari Books Online, please visit
us online.
How to Contact Us
Please address comments and questions concerning this book to the publisher:
O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://oreil.ly/hadoop_operations.
To comment or ask technical questions about this book, send email to
bookquestions@oreilly.com.
x | Preface

For more information about our books, courses, conferences, and news, see our website
at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
I want to thank Aida Escriva-Sammer, my wife, best friend, and favorite sysadmin, for
putting up with me while I wrote this.
None of this was possible without the support and hard work of the larger Apache
Hadoop community and ecosystem projects. I want to encourage all readers to get
involved in the community and open source in general.
Matt Massie gave me the opportunity to do this, along with O’Reilly, and then cheered
me on the whole way. Both Matt and Tom White coached me through the proposal
process. Mike Olson, Omer Trajman, Amr Awadallah, Peter Cooper-Ellis, Angus Klein,
and the rest of the Cloudera management team made sure I had the time, resources,
and encouragement to get this done. Aparna Ramani, Rob Weltman, Jolly Chen, and
Helen Friedland were instrumental throughout this process and forgiving of my con-
stant interruptions of their teams. Special thanks to Christophe Bisciglia for giving me
an opportunity at Cloudera and for the advice along the way.
Many people provided valuable feedback and input throughout the entire process, but
especially Aida Escriva-Sammer, Tom White, Alejandro Abdelnur, Amina Abdulla,
PatrickAngeles,PaulBattaglia,WillChase,YanpeiChen,EliCollins,JoeCrobak,Doug
Cutting, Joey Echeverria, Sameer Farooqui, Andrew Ferguson, Brad Hedlund, Linden
Hillenbrand, Patrick Hunt, Matt Jacobs, Amandeep Khurana, Aaron Kimball, Hal Lee,
Justin Lintz, Todd Lipcon, Cameron Martin, Chad Metcalf, Meg McRoberts, Aaron T.
Myers,KayOusterhout,GregRahn,HenryRobinson,MarkRoddy,JonathanSeidman,
Ed Sexton, Loren Siebert, Sunil Sitaula, Ben Spivey, Dan Spiewak, Omer Trajman,
Kathleen Ting, Erik-Jan van Baaren, Vinithra Varadharajan, Patrick Wendell, Tom
Wheeler, Ian Wrigley, Nezih Yigitbasi, and Philip Zeyliger. To those whom I may have
omitted from this list, please forgive me.
The folks at O’Reilly have been amazing, especially Courtney Nash, Mike Loukides,
Maria Stallone, Arlette Labat, and Meghan Blanchette.
Jaime Caban, Victor Nee, Travis Melo, Andrew Bayer, Liz Pennell, and Michael De-
metria provided additional administrative, technical, and contract support.
Finally, a special thank you to Kathy Sammer for her unwavering support, and for
teaching me to do exactly what others say you cannot.
Preface | xi

Portions of this book have been reproduced or derived from software and documen-
tation available under the Apache Software License, version 2.
xii | Preface

CHAPTER 1
Introduction
Over the past few years, there has been a fundamental shift in data storage, manage-
ment, and processing. Companies are storing more data from more sources in more
formats than ever before. This isn’t just about being a “data packrat” but rather building
products, features, and intelligence predicated on knowing more about the world
(where the world can be users, searches, machine logs, or whatever is relevant to an
organization). Organizations are finding new ways to use data that was previously be-
lieved to be of little value, or far too expensive to retain, to better serve their constitu-
ents. Sourcing and storing data is one half of the equation. Processing that data to
produce information is fundamental to the daily operations of every modern business.
Data storage and processing isn’t a new problem, though. Fraud detection in commerce
and finance, anomaly detection in operational systems, demographic analysis in ad-
vertising, and many other applications have had to deal with these issues for decades.
What has happened is that the volume, velocity, and variety of this data has changed,
and in some cases, rather dramatically. This makes sense, as many algorithms benefit
from access to more data. Take, for instance, the problem of recommending products
to a visitor of an ecommerce website. You could simply show each visitor a rotating list
of products they could buy, hoping that one would appeal to them. It’s not exactly an
informed decision, but it’s a start. The question is what do you need to improve the
chance of showing the right person the right product? Maybe it makes sense to show
them what you think they like, based on what they’ve previously looked at. For some
products, it’s useful to know what they already own. Customers who already bought
a specific brand of laptop computer from you may be interested in compatible acces-
sories and upgrades.1 One of the most common techniques is to cluster users by similar
behavior (such as purchase patterns) and recommend products purchased by “similar”
users. No matter the solution, all of the algorithms behind these options require data
1. I once worked on a data-driven marketing project for a company that sold beauty products. Using
purchase transactions of all customers over a long period of time, the company was able to predict when
a customer would run out of a given product after purchasing it. As it turned out, simply offering them
the same thing about a week before they ran out resulted in a (very) noticeable lift in sales.
1

and generally improve in quality with more of it. Knowing more about a problem space
generally leads to better decisions (or algorithm efficacy), which in turn leads to happier
users, more money, reduced fraud, healthier people, safer conditions, or whatever the
desired result might be.
Apache Hadoop is a platform that provides pragmatic, cost-effective, scalable infra-
structure for building many of the types of applications described earlier. Made up of
a distributed filesystem called the Hadoop Distributed Filesystem (HDFS) and a com-
putation layer that implements a processing paradigm called MapReduce, Hadoop is
an open source, batch data processing system for enormous amounts of data. We live
in a flawed world, and Hadoop is designed to survive in it by not only tolerating hard-
ware and software failures, but also treating them as first-class conditions that happen
regularly. Hadoop uses a cluster of plain old commodity servers with no specialized
hardware or network infrastructure to form a single, logical, storage and compute plat-
form, or cluster, that can be shared by multiple individuals or groups. Computation in
Hadoop MapReduce is performed in parallel, automatically, with a simple abstraction
for developers that obviates complex synchronization and network programming. Un-
like many other distributed data processing systems, Hadoop runs the user-provided
processing logic on the machine where the data lives rather than dragging the data
across the network; a huge win for performance.
For those interested in the history, Hadoop was modeled after two papers produced
by Google, one of the many companies to have these kinds of data-intensive processing
problems. The first, presented in 2003, describes a pragmatic, scalable, distributed
filesystem optimized for storing enormous datasets, called the Google Filesystem, or
GFS. In addition to simple storage, GFS was built to support large-scale, data-intensive,
distributed processing applications. The following year, another paper, titled "Map-
Reduce: Simplified Data Processing on Large Clusters," was presented, defining a pro-
gramming model and accompanying framework that provided automatic paralleliza-
tion, fault tolerance, and the scale to process hundreds of terabytes of data in a single
job over thousands of machines. When paired, these two systems could be used to build
large data processing clusters on relatively inexpensive, commodity machines. These
papers directly inspired the development of HDFS and Hadoop MapReduce, respec-
tively.
Interest and investment in Hadoop has led to an entire ecosystem of related software
both open source and commercial. Within the Apache Software Foundation alone,
projects that explicitly make use of, or integrate with, Hadoop are springing up regu-
larly. Some of these projects make authoring MapReduce jobs easier and more acces-
sible, while others focus on getting data in and out of HDFS, simplify operations, enable
deployment in cloud environments, and so on. Here is a sampling of the more popular
projects with which you should familiarize yourself:
Apache Hive
Hive creates a relational database−style abstraction that allows developers to write
a dialect of SQL, which in turn is executed as one or more MapReduce jobs on the
2 | Chapter 1: Introduction

cluster. Developers, analysts, and existing third-party packages already know and
speak SQL (Hive’s dialect of SQL is called HiveQL and implements only a subset
of any of the common standards). Hive takes advantage of this and provides a quick
way to reduce the learning curve to adopting Hadoop and writing MapReduce jobs.
For this reason, Hive is by far one of the most popular Hadoop ecosystem projects.
Hive works by defining a table-like schema over an existing set of files in HDFS
and handling the gory details of extracting records from those files when a query
isrun.Thedataondiskisneveractuallychanged,justparsedatquerytime.HiveQL
statements are interpreted and an execution plan of prebuilt map and reduce
classes is assembled to perform the MapReduce equivalent of the SQL statement.
Apache Pig
Like Hive, Apache Pig was created to simplify the authoring of MapReduce jobs,
obviating the need to write Java code. Instead, users write data processing jobs in
a high-level scripting language from which Pig builds an execution plan and exe-
cutes a series of MapReduce jobs to do the heavy lifting. In cases where Pig doesn’t
support a necessary function, developers can extend its set of built-in operations
by writing user-defined functions in Java (Hive supports similar functionality as
well). If you know Perl, Python, Ruby, JavaScript, or even shell script, you can learn
Pig’s syntax in the morning and be running MapReduce jobs by lunchtime.
Apache Sqoop
Not only does Hadoop not want to replace your database, it wants to be friends
with it. Exchanging data with relational databases is one of the most popular in-
tegration points with Apache Hadoop. Sqoop, short for “SQL to Hadoop,” per-
forms bidirectional data transfer between Hadoop and almost any database with
a JDBC driver. Using MapReduce, Sqoop performs these operations in parallel
with no need to write code.
For even greater performance, Sqoop supports database-specific plug-ins that use
native features of the RDBMS rather than incurring the overhead of JDBC. Many
of these connectors are open source, while others are free or available from com-
mercial vendors at a cost. Today, Sqoop includes native connectors (called direct
support) for MySQL and PostgreSQL. Free connectors exist for Teradata, Netezza,
SQL Server, and Oracle (from Quest Software), and are available for download
from their respective company websites.
Apache Flume
Apache Flume is a streaming data collection and aggregation system designed to
transport massive volumes of data into systems such as Hadoop. It supports native
connectivity and support for writing directly to HDFS, and simplifies reliable,
streaming data delivery from a variety of sources including RPC services, log4j
appenders, syslog, and even the output from OS commands. Data can be routed,
load-balanced, replicated to multiple destinations, and aggregated from thousands
of hosts by a tier of agents.
Introduction | 3

Apache Oozie
It’s not uncommon for large production clusters to run many coordinated Map-
Reduce jobs in a workfow. Apache Oozie is a workflow engine and scheduler built
specifically for large-scale job orchestration on a Hadoop cluster. Workflows can
be triggered by time or events such as data arriving in a directory, and job failure
handling logic can be implemented so that policies are adhered to. Oozie presents
a REST service for programmatic management of workflows and status retrieval.
Apache Whirr
Apache Whirr was developed to simplify the creation and deployment of ephem-
eral clusters in cloud environments such as Amazon’s AWS. Run as a command-
line tool either locally or within the cloud, Whirr can spin up instances, deploy
Hadoop, configure the software, and tear it down on demand. Under the hood,
Whirr uses the powerful jclouds library so that it is cloud provider−neutral. The
developers have put in the work to make Whirr support both Amazon EC2 and
Rackspace Cloud. In addition to Hadoop, Whirr understands how to provision
Apache Cassandra, Apache ZooKeeper, Apache HBase, ElasticSearch, Voldemort,
and Apache Hama.
Apache HBase
Apache HBase is a low-latency, distributed (nonrelational) database built on top
of HDFS. Modeled after Google’s Bigtable, HBase presents a flexible data model
with scale-out properties and a very simple API. Data in HBase is stored in a semi-
columnar format partitioned by rows into regions. It’s not uncommon for a single
table in HBase to be well into the hundreds of terabytes or in some cases petabytes.
Over the past few years, HBase has gained a massive following based on some very
public deployments such as Facebook’s Messages platform. Today, HBase is used
to serve huge amounts of data to real-time systems in major production deploy-
ments.
Apache ZooKeeper
A true workhorse, Apache ZooKeeper is a distributed, consensus-based coordina-
tion system used to support distributed applications. Distributed applications that
require leader election, locking, group membership, service location, and config-
uration services can use ZooKeeper rather than reimplement the complex coordi-
nation and error handling that comes with these functions. In fact, many projects
within the Hadoop ecosystem use ZooKeeper for exactly this purpose (most no-
tably, HBase).
Apache HCatalog
A relatively new entry, Apache HCatalog is a service that provides shared schema
and data access abstraction services to applications with the ecosystem. The
long-term goal of HCatalog is to enable interoperability between tools such as
Apache Hive and Pig so that they can share dataset metadata information.
The Hadoop ecosystem is exploding into the commercial world as well. Vendors such
as Oracle, SAS, MicroStrategy, Tableau, Informatica, Microsoft, Pentaho, Talend, HP,

Dell, and dozens of others have all developed integration or support for Hadoop within
one or more of their products. Hadoop is fast becoming (or, as an increasingly growing
group would believe, already has become) the de facto standard for truly large-scale
data processing in the data center.
If you’re reading this book, you may be a developer with some exposure to Hadoop
looking to learn more about managing the system in a production environment. Alter-
natively, it could be that you’re an application or system administrator tasked with
owning the current or planned production cluster. Those in the latter camp may be
rolling their eyes at the prospect of dealing with yet another system. That’s fair, and we
won’t spend a ton of time talking about writing applications, APIs, and other pesky
code problems. There are other fantastic books on those topics, especially Hadoop: The
Definitive Guide by Tom White (O’Reilly). Administrators do, however, play an abso-
lutely critical role in planning, installing, configuring, maintaining, and monitoring
Hadoop clusters. Hadoop is a comparatively low-level system, leaning heavily on the
host operating system for many features, and it works best when developers and ad-
ministrators collaborate regularly. What you do impacts how things work.
It’s an extremely exciting time to get into Apache Hadoop. The so-called big data space
is all the rage, sure, but more importantly, Hadoop is growing and changing at a stag-
gering rate. Each new version—and there have been a few big ones in the past year or
two—brings another truckload of features for both developers and administrators
alike. You could say that Hadoop is experiencing software puberty; thanks to its rapid
growth and adoption, it’s also a little awkward at times. You’ll find, throughout this
book, that there are significant changes between even minor versions. It’s a lot to keep
up with, admittedly, but don’t let it overwhelm you. Where necessary, the differences
are called out, and a section in Chapter 4 is devoted to walking you through the most
commonly encountered versions.
This book is intended to be a pragmatic guide to running Hadoop in production. Those
who have some familiarity with Hadoop may already know alternative methods for
installation or have differing thoughts on how to properly tune the number of map slots
based on CPU utilization.2 That’s expected and more than fine. The goal is not to
enumerate all possible scenarios, but rather to call out what works, as demonstrated
in critical deployments.
Chapters 2 and 3 provide the necessary background, describing what HDFS and Map-
Reduce are, why they exist, and at a high level, how they work. Chapter 4 walks you
through the process of planning for an Hadoop deployment including hardware selec-
tion, basic resource planning, operating system selection and configuration, Hadoop
distribution and version selection, and network concerns for Hadoop clusters. If you
are looking for the meat and potatoes, Chapter 5 is where it’s at, with configuration
and setup information, including a listing of the most critical properties, organized by
2. We also briefly cover the flux capacitor and discuss the burn rate of energon cubes during combat.
Introduction | 5

topic. Those that have strong security requirements or want to understand identity,
access, and authorization within Hadoop will want to pay particular attention to
Chapter 6. Chapter 7 explains the nuts and bolts of sharing a single large cluster across
multiple groups and why this is beneficial while still adhering to service-level agree-
ments by managing and allocating resources accordingly. Once everything is up and
running, Chapter 8 acts as a run book for the most common operations and tasks.
Chapter 9 is the rainy day chapter, covering the theory and practice of troubleshooting
complex distributed systems such as Hadoop, including some real-world war stories.
In an attempt to minimize those rainy days, Chapter 10 is all about how to effectively
monitor your Hadoop cluster. Finally, Chapter 11 provides some basic tools and tech-
niques for backing up Hadoop and dealing with catastrophic failure.

CHAPTER 2
HDFS
Goals and Motivation
The first half of Apache Hadoop is a filesystem called the Hadoop Distributed Filesys-
tem or simply HDFS. HDFS was built to support high throughput, streaming reads and
writes of extremely large files. Traditional large storage area networks (SANs) and
network attached storage (NAS) offer centralized, low-latency access to either a block
device or a filesystem on the order of terabytes in size. These systems are fantastic as
the backing store for relational databases, content delivery systems, and similar types
of data storage needs because they can support full-featured POSIX semantics, scale to
meet the size requirements of these systems, and offer low-latency access to data.
Imagine for a second, though, hundreds or thousands of machines all waking up at the
same time and pulling hundreds of terabytes of data from a centralized storage system
at once. This is where traditional storage doesn’t necessarily scale.
By creating a system composed of independent machines, each with its own I/O sub-
system, disks, RAM, network interfaces, and CPUs, and relaxing (and sometimes re-
moving) some of the POSIX requirements, it is possible to build a system optimized,
in both performance and cost, for the specific type of workload we’re interested in.
There are a number of specific goals for HDFS:
• Store millions of large files, each greater than tens of gigabytes, and filesystem sizes
reaching tens of petabytes.
• Use a scale-out model based on inexpensive commodity servers with internal JBOD
(“Just a bunch of disks”) rather than RAID to achieve large-scale storage. Accom-
plishavailabilityandhighthroughputthroughapplication-levelreplicationofdata.
• Optimize for large, streaming reads and writes rather than low-latency access to
many small files. Batch performance is more important than interactive response
times.
7

• Gracefully deal with component failures of machines and disks.
• Support the functionality and scale requirements of MapReduce processing. See
Chapter 3 for details.
While it is true that HDFS can be used independently of MapReduce to store large
datasets, it truly shines when they’re used together. MapReduce, for instance, takes
advantage of how the data in HDFS is split on ingestion into blocks and pushes com-
putation to the machine where blocks can be read locally.
Design
HDFS, in many ways, follows traditional filesystem design. Files are stored as opaque
blocks and metadata exists that keeps track of the filename to block mapping, directory
tree structure, permissions, and so forth. This is similar to common Linux filesystems
such as ext3. So what makes HDFS different?
Traditional filesystems are implemented as kernel modules (in Linux, at least) and
together with userland tools, can be mounted and made available to end users. HDFS
is what’s called a userspace filesystem. This is a fancy way of saying that the filesystem
code runs outside the kernel as OS processes and by extension, is not registered with
or exposed via the Linux VFS layer. While this is much simpler, more flexible, and
arguably safer to implement, it means that you don't mount HDFS as you would ext3,
for instance, and that it requires applications to be explicitly built for it.
In addition to being a userspace filesystem, HDFS is a distributed filesystem. Dis-
tributed filesystems are used to overcome the limits of what an individual disk or ma-
chine is capable of supporting. Each machine in a cluster stores a subset of the data
that makes up the complete filesystem with the idea being that, as we need to store
more block data, we simply add more machines, each with multiple disks. Filesystem
metadata is stored on a centralized server, acting as a directory of block data and pro-
viding a global picture of the filesystem’s state.
Another major difference between HDFS and other filesystems is its block size. It is
common that general purpose filesystems use a 4 KB or 8 KB block size for data. Ha-
doop, on the other hand, uses the significantly larger block size of 64 MB by default.
In fact, cluster administrators usually raise this to 128 MB, 256 MB, or even as high as
1 GB. Increasing the block size means data will be written in larger contiguous chunks
on disk, which in turn means data can be written and read in larger sequential opera-
tions. This minimizes drive seek operations—one of the slowest operations a mechan-
ical disk can perform—and results in better performance when doing large streaming
I/O operations.
Rather than rely on specialized storage subsystem data protection, HDFS replicates
each block to multiple machines in the cluster. By default, each block in a file is repli-
cated three times. Because files in HDFS are write once, once a replica is written, it is
not possible for it to change. This obviates the need for complex reasoning about the
8 | Chapter 2: HDFS

consistency between replicas and as a result, applications can read any of the available
replicaswhenaccessingafile. Havingmultiple replicasmeansmultiplemachinefailures
are easily tolerated, but there are also more opportunities to read data from a machine
closesttoanapplication onthenetwork.HDFSactively tracksandmanagesthenumber
of available replicas of a block as well. Should the number of copies of a block drop
below the configured replication factor, the filesystem automatically makes a new copy
from one of the remaining replicas. Throughout this book, we’ll frequently use the term
replica to mean a copy of an HDFS block.
Applications, of course, don’t want to worry about blocks, metadata, disks, sectors,
and other low-level details. Instead, developers want to perform I/O operations using
higher level abstractions such as files and streams. HDFS presents the filesystem to
developers as a high-level, POSIX-like API with familiar operations and concepts.
Daemons
There are three daemons that make up a standard HDFS cluster, each of which serves
a distinct role, shown in Table 2-1.
Table 2-1. HDFS daemons
Daemon # per cluster Purpose
Namenode 1 Storesfilesystemmetadata,storesfiletoblockmap,andpro-
vides a global picture of the filesystem
Secondary namenode 1 Performs internal namenode transaction log checkpointing
Datanode Many Stores block data (file contents)
Blocks are nothing more than chunks of a file, binary blobs of data. In HDFS, the
daemon responsible for storing and retrieving block data is called the datanode (DN).
The datanode has direct local access to one or more disks—commonly called data disks
—in a server on which it’s permitted to store block data. In production systems, these
disks are usually reserved exclusively for Hadoop. Storage can be added to a cluster by
adding more datanodes with additional disk capacity, or even adding disks to existing
datanodes.
One of the most striking aspects of HDFS is that it is designed in such a way that it
doesn’t require RAID storage for its block data. This keeps with the commodity hard-
ware design goal and reduces cost as clusters grow in size. Rather than rely on a RAID
controller for data safety, block data is simply written to multiple machines. This fulfills
the safety concern at the cost of raw storage consumed; however, there’s a performance
aspect to this as well. Having multiple copies of each block on separate machines means
that not only are we protected against data loss if a machine disappears, but during
processing, any copy of this data can be used. By having more than one option, the
scheduler that decides where to perform processing has a better chance of being able
Daemons | 9

to find a machine with available compute resources and a copy of the data. This is
covered in greater detail in Chapter 3.
The lack of RAID can be controversial. In fact, many believe RAID simply makes disks
faster, akin to a magic go-fast turbo button. This, however, is not always the case. A
very large number of independently spinning disks performing huge sequential I/O
operations with independent I/O queues can actually outperform RAID in the specific
usecaseofHadoopworkloads.Typically,datanodeshavealargenumberofindependent
disks, each of which stores full blocks. For an expanded discussion of this and related
topics, see “Blades, SANs, and Virtualization” on page 52.
While datanodes are responsible for storing block data, the namenode (NN) is the
daemon that stores the filesystem metadata and maintains a complete picture of the
filesystem. Clients connect to the namenode to perform filesystem operations; al-
though, as we’ll see later, block data is streamed to and from datanodes directly, so
bandwidth is not limited by a single node. Datanodes regularly report their status to
the namenode in a heartbeat. This means that, at any given time, the namenode has a
complete view of all datanodes in the cluster, their current health, and what blocks they
have available. See Figure 2-1 for an example of HDFS architecture.
Figure 2-1. HDFS architecture overview
10 | Chapter 2: HDFS

When a datanode initially starts up, as well as every hour thereafter, it sends what’s
called a block report to the namenode. The block report is simply a list of all blocks the
datanode currently has on its disks and allows the namenode to keep track of any
changes. This is also necessary because, while the file to block mapping on the name-
node is stored on disk, the locations of the blocks are not written to disk. This may
seem counterintuitive at first, but it means a change in IP address or hostname of any
of the datanodes does not impact the underlying storage of the filesystem metadata.
Another nice side effect of this is that, should a datanode experience failure of a moth-
erboard, administrators can simply remove its hard drives, place them into a new chas-
sis, and start up the new machine. As far as the namenode is concerned, the blocks
have simply moved to a new datanode. The downside is that, when initially starting a
cluster (or restarting it, for that matter), the namenode must wait to receive block re-
ports from all datanodes to know all blocks are present.
The namenode filesystem metadata is served entirely from RAM for fast lookup and
retrieval, and thus places a cap on how much metadata the namenode can handle. A
rough estimate is that the metadata for 1 million blocks occupies roughly 1 GB of heap
(more on this in “Hardware Selection” on page 45). We’ll see later how you can
overcome this limitation, even if it is encountered only at a very high scale (thousands
of nodes).
Finally, the third HDFS process is called the secondary namenode and performs some
internal housekeeping for the namenode. Despite its name, the secondary namenode
is not a backup for the namenode and performs a completely different function.
The secondary namenode may have the worst name for a process in the
history of computing. It has tricked many new to Hadoop into believing
that, should the evil robot apocalypse occur, their cluster will continue
to function when their namenode becomes sentient and walks out of
the data center. Sadly, this isn’t true. We’ll explore the true function of
the secondary namenode in just a bit, but for now, remember what it is
not; that’s just as important as what it is.
Reading and Writing Data
Clients can read and write to HDFS using various tools and APIs (see “Access and
Integration” on page 20), but all of them follow the same process. The client always,
at some level, uses a Hadoop library that is aware of HDFS and its semantics. This
library encapsulates most of the gory details related to communicating with the name-
node and datanodes when necessary, as well as dealing with the numerous failure cases
that can occur when working with a distributed filesystem.
Reading and Writing Data | 11

The Read Path
First, let’s walk through the logic of performing an HDFS read operation. For this, we’ll
assume there’s a file /user/esammer/foo.txt already in HDFS. In addition to using Ha-
doop’s client library—usually a Java JAR file—each client must also have a copy of the
cluster configuration data that specifies the location of the namenode (see Chapter 5).
As shown in Figure 2-2, the client begins by contacting the namenode, indicating which
file it would like to read. The client identity is first validated—either by trusting the
client and allowing it to specify a username or by using a strong authentication mech-
anism such as Kerberos (see Chapter 6)—and then checked against the owner and
permissions of the file. If the file exists and the user has access to it, the namenode
responds to the client with the first block ID and the list of datanodes on which a copy
of the block can be found, sorted by their distance to the client. Distance to the client
is measured according to Hadoop’s rack topology—configuration data that indicates
which hosts are located in which racks. (More on rack topology configuration is avail-
able in “Rack Topology” on page 130.)
If the namenode is unavailable for some reason—because of a problem
with either the namenode itself or the network, for example—clients
will receive timeouts or exceptions (as appropriate) and will be unable
to proceed.
With the block IDs and datanode hostnames, the client can now contact the most
appropriate datanode directly and read the block data it needs. This process repeats
until all blocks in the file have been read or the client closes the file stream.
Figure 2-2. The HDFS read path

It is also possible that while reading from a datanode, the process or host on which it
runs, dies. Rather than give up, the library will automatically attempt to read another
replica of the data from another datanode. If all replicas are unavailable, the read op-
eration fails and the client receives an exception. Another corner case that can occur is
that the information returned by the namenode about block locations can be outdated
by the time the client attempts to contact a datanode, in which case either a retry will
occur if there are other replicas or the read will fail. While rare, these kinds of corner
cases make troubleshooting a large distributed system such as Hadoop so complex. See
Chapter 9 for a tour of what can go wrong and how to diagnose the problem.
The Write Path
Writing files to HDFS is a bit more complicated than performing reads. We’ll consider
the simplest case where a client is creating a new file. Remember that clients need not
actually implement this logic; this is simply an overview of how data is written to the
cluster by the underlying Hadoop library. Application developers use (mostly) familiar
APIs to open files, write to a stream, and close them similarly to how they would with
traditional local files.
Initially, a client makes a request to open a named file for write using the Hadoop
FileSystem APIs. A request is sent to the namenode to create the file metadata if the
user has the necessary permissions to do so. The metadata entry for the new file is made;
however, it initially has no associated blocks. A response to the client indicates the open
request was successful and that it may now begin writing data. At the API level, a
standard Java stream object is returned, although the implementation is HDFS-specific.
As the client writes data to the stream it is split into packets (not to be confused with
TCP packets or HDFS blocks), which are queued in memory. A separate thread in the
client consumes packets from this queue and, as necessary, contacts the namenode
requesting a set of datanodes to which replicas of the next block should be written. The
client then makes a direct connection to the first datanode in the list, which makes a
connection to the second, which connects to the third. This forms the replication pipe-
line to be used for this block of data, as shown in Figure 2-3. Data packets are then
streamed to the first datanode, which writes the data to disk, and to the next datanode
in the pipeline, which writes to its disk, and so on. Each datanode in the replication
pipeline acknowledges each packet as it’s successfully written. The client application
maintains a list of packets for which acknowledgments have not yet been received and
when it receives a response, it knows the data has been written to all nodes in the
pipeline. This process of writing packets to the pipeline continues until the block size
is reached, at which point the client goes back to the namenode for the next set of
datanodes to write to. Ultimately, the client indicates it’s finished sending data by clos-
ing the stream, which flushes any remaining packets out to disk and updates the name-
node to indicate the file is now complete.
Of course, things are not always this simple, and failures can occur. The most common
type of failure is that a datanode in the replication pipeline fails to write data for one
Reading and Writing Data | 13

reason or another—a disk dies or a datanode fails completely, for instance. When this
happens, the pipeline is immediately closed and all packets that had been sent since
the last acknowledgment are pushed back into the queue to be written so that any
datanodes past the failed node in the pipeline will receive the data. The current block
is given a new ID on the remaining healthy datanodes. This is done so that, should the
failed datanode return, the abandoned block will appear to not belong to any file and
be discarded automatically. A new replication pipeline containing the remaining da-
tanodes is opened and the write resumes. At this point, things are mostly back to normal
and the write operation continues until the file is closed. The namenode will notice that
one of the blocks in the file is under-replicated and will arrange for a new replica to be
created asynchronously. A client can recover from multiple failed datanodes provided
at least a minimum number of replicas are written (by default, this is one).
Managing Filesystem Metadata
The namenode stores its filesystem metadata on local filesystem disks in a few different
files, the two most important of which are fsimage and edits. Just like a database would,
fsimage contains a complete snapshot of the filesystem metadata whereas edits contains
Figure 2-3. The HDFS write path

only incremental modifications made to the metadata. A common practice for high-
throughput data stores, use of a write ahead log (WAL) such as the edits file reduces I/
O operations to sequential, append-only operations (in the context of the namenode,
since it serves directly from RAM), which avoids costly seek operations and yields better
overall performance. Upon namenode startup, the fsimage file is loaded into RAM and
any changes in the edits file are replayed, bringing the in-memory view of the filesystem
up to date.
In more recent versions of Hadoop (specifically, Apache Hadoop 2.0 and CDH4; more
on the different versions of Hadoop in “Picking a Distribution and Version of Ha-
doop” on page 41), the underlying metadata storage was updated to be more resilient
to corruption and to support namenode high availability. Conceptually, metadata stor-
age is similar, although transactions are no longer stored in a single edits file. Instead,
the namenode periodically rolls the edits file (closes one file and opens a new file),
numbering them by transaction ID. It’s also possible for the namenode to now retain
old copies of both fsimage and edits to better support the ability to roll back in time.
Most of these changes won’t impact you, although it helps to understand the purpose
of the files on disk. That being said, you should never make direct changes to these files
unless you really know what you are doing. The rest of this book will simply refer to
these files using their base names, fsimage and edits, to refer generally to their function.
Recall from earlier that the namenode writes changes only to its write ahead log,
edits. Over time, the edits file grows and grows and as with any log-based system such
as this, would take a long time to replay in the event of server failure. Similar to a
relational database, the edits file needs to be periodically applied to the fsimage file. The
problem is that the namenode may not have the available resources—CPU or RAM—
to do this while continuing to provide service to the cluster. This is where the secondary
namenode comes in.
The exact interaction that occurs between the namenode and the secondary namenode
(shown in Figure 2-4) is as follows:1
1. The secondary namenode instructs the namenode to roll its edits file and begin
writing to edits.new.
2. The secondary namenode copies the namenode’s fsimage and edits files to its local
checkpoint directory.
3. The secondary namenode loads fsimage, replays edits on top of it, and writes a new,
compacted fsimage file to disk.
4. The secondary namenode sends the new fsimage file to the namenode, which
adopts it.
5. The namenode renames edits.new to edits.
1. This process is slightly different for Apache Hadoop 2.0 and CDH4, but it is conceptually the equivalent.
Managing Filesystem Metadata | 15

This process occurs every hour (by default) or whenever the namenode’s edits file rea-
ches 64 MB (also the default). There isn’t usually a good reason to modify this, although
we’ll explore that later. Newer versions of Hadoop use a defined number of transactions
rather than file size to determine when to perform a checkpoint.
Namenode High Availability
As administrators responsible for the health and service of large-scale systems, the no-
tion of a single point of failure should make us a bit uneasy (or worse). Unfortunately,
for a long time the HDFS namenode was exactly that: a single point of failure. Recently,
the Hadoop community as a whole has invested heavily in making the namenode highly
available, opening Hadoop to additional mission-critical deployments.
Namenode high availability (or HA) is deployed as an active/passive pair of namenodes.
The edits write ahead log needs to be available to both namenodes, and therefore is
stored on a shared storage device. Currently, an NFS filer is required as the shared
storage, although there are plans to remove this dependency.2 As the active namenode
writes to the edits log, the standby namenode is constantly replaying transactions to
ensure it is up to date and ready to take over in the case of failure. Datanodes are also
aware of both namenodes in an HA configuration and send block reports to both
servers.
A high-availability pair of namenodes can be configured for manual or automatic fail-
over. In the default manual failover mode, a command must be sent to effect a state
transition from one namenode to the other. When configured for automatic failover,
each namenode runs an additional process called a failover controller that monitors the
health of the process and coordinates state transitions. Just as in other HA systems,
there are two primary types of failover: graceful failover, initiated by an administrator,
and nongraceful failover, which is the result of a detected fault in the active process. In
either case, it’s impossible to truly know if a namenode has relinquished active status
Figure 2-4. Metadata checkpoint process
2. See Apache JIRA HDFS-3077.

or if it’s simply inaccessible from the standby. If both processes were allowed to con-
tinue running, they could both write to the shared state and corrupt the filesystem
metadata. This is commonly called a split brain scenario. For this reason, the system
can use a series of increasingly drastic techniques to ensure the failed node (which could
still think it’s active) is actually stopped. This can start with something as simple as
asking it to stop via RPC, but can end with the mother of all fencing techniques:
STONITH, or “shoot the other node in the head.” STONITH can be implemented by
issuing a reboot via IPMI, or even by programmatically cutting power to a machine for
a short period of time if data center power distribution units (PDUs) support such
functionality. Most administrators who want high availability will also want to con-
figure automatic failover as well. See Figure 2-5 for an example of automatic failover.
When running with high availability, the standby namenode takes over the role of the
secondary namenode, described earlier. In other words, there is no separate secondary
namenode process in an HA cluster, only a pair of namenode processes. Those that
already run Hadoop clusters that have a dedicated machine on which they run the
secondary namenode process can repurpose that machine to be a second namenode in
mostcases.Thevariousconfigurationoptionsforhighavailabilityarecovered,indetail,
in “Namenode High Availability” on page 100.
Figure 2-5. A highly available namenode pair with automatic failover
Namenode High Availability | 17

At the time of this writing, namenode high availability (sometimes abbreviated NN
HA) is available in Apache Hadoop 2.0.0 and CDH4.
Why Not Use an XYZ HA Package?
Users familiar with packages such as the Linux-HA project sometimes ask why they
can't simply write some scripts and manage the HDFS namenode HA issue that way.
These tools, after all, support health checks, in and out of band communication, and
fencing plug-ins already. Unfortunately, HA is a tougher nut to crack than simply killing
a process and starting a new one elsewhere.
The real challenge with implementing a highly available namenode stems from the fact
that datanode block reports are not written to disk. In other words, even if one were
to set up a namenode with one of these systems, write the proper health checks, detect
a failure, initiate a state transition (failover), and activate the standby, it still wouldn’t
know where to find any of the blocks and wouldn’t be able to service HDFS clients.
Additionally, the datanodes—probably now viewing the namenode via a virtual IP
(VIP)—would not realize a transition had occurred and wouldn’t know to send a new
block report to bring the new namenode up to speed on the state of the cluster. As we
saw earlier, receiving and processing block reports from hundreds or thousands of
machines is actually the part of cluster startup that takes time; on the order of tens of
minutes or more. This type of interruption is still far outside of the acceptable service-
level agreement for many mission-critical systems.
Systems such as Linux-HA work well for stateless services such as static content serving,
but for a stateful system such as the namenode, they’re insufficient.
Namenode Federation
Large-scale users of Hadoop have had another obstacle with which to contend: the
limit of how much metadata the namenode can store in memory. In order to scale the
namenode beyond the amount of physical memory that could be stuffed into a single
server, there needed to be a way to move from a scale-up to a scale-out approach. Just
like we’ve seen with block storage in HDFS, it’s possible to spread the filesystem met-
adata over multiple machines. This technique is called namespace federation and refers
to assembling one logical namespace from a number of autonomous systems. An ex-
ample of a federated namespace is the Linux filesystem: many devices can be mounted
at various points to form a single namespace that clients can address without concern
for which underlying device actually contains the data.
Namenode federation (Figure 2-6) works around the memory limitation of the name-
node byallowingthe filesystemnamespaceto bebrokenupintoslicesandspreadacross
multiple namenodes. Just as it sounds, this is really just like running a number of sep-
arate namenodes, each of which is responsible for a different part of the directory
structure. The one major way in which namenode federation is different from running
several discreet clusters is that each datanode stores blocks for multiple namenodes.

More precisely, each datanode has a block pool for each namespace. While blocks from
different pools are stored on the same disks (there is no physical separation), they are
logically exclusive. Each datanode sends heartbeats and block reports to each name-
node.
Clients often do not want to have to worry about multiple namenodes, so a special
client API implementation called ViewFS can be used that maps slices of the filesystem
to the proper namenode. This is, conceptually, almost identical to the Linux /etc/
fstab file, except that rather than mapping paths to physical devices, ViewFS maps paths
to HDFS namenodes. For instance, we can configure ViewFS to look at namenode1 for
path /logs and namenode2 for path /hbase. Federation also allows us to use namespace
partitioning to control the availability and fault tolerance of different slices of the file-
system. In our previous example, /hbase could be on a namenode that requires ex-
tremely high uptime while maybe /logs is used only by batch operations in MapReduce.
Figure 2-6. Namenode federation overview
Namenode Federation | 19

Lastly, it’s important to note that HA and federation are orthogonal features. That is,
it is possible to enable them independently of each other, as they speak to two different
problems. This means a namespace can be partitioned and some of those partitions (or
all) may be served by an HA pair of namenodes.
Access and Integration
The sole native method of access to HDFS is its Java API. All other access methods are
built on top of this API and by definition, can expose only as much functionality as it
permits. In an effort to ease adoption and development of applications, the HDFS API
is simple and familiar to developers, piggybacking on concepts such as Java’s I/O
streams. The API does differ where necessary in order to provide the features and guar-
antees it advertises, but most of these are obvious or documented.
In order to access HDFS, clients—applications that are written against the API—must
have a copy of configuration data that tells them where the namenode is running. This
is analogous to an Oracle client application requiring the tnsnames.ora file. Each ap-
plication must also have access to the Hadoop library JAR file. Again, this is the equiv-
alent of a database client application’s dependence on a JDBC driver JAR. Clients can
be on the same physical machines as any of the Hadoop daemons, or they can be
separate from the cluster proper. MapReduce tasks and HBase Region Servers, for ex-
ample, access HDFS as any other normal client would. They just happen to be running
on the same physical machines where HDFS stores its block data.
It’s important to realize that, as a consequence of the direct client to datanode com-
munication, network access between clients and all cluster nodes’ relevant ports must
be unfettered. This has implications on network design, security, and bandwidth that
are covered in “Network Design” on page 66.
Command-Line Tools
Hadoop comes with a number of command-line tools that enable basic filesystem op-
erations. Like all Hadoop tools, HDFS commands are subcommands of the hadoop
command-line utility. Running hadoop fs will display basic usage information, as
shown in Example 2-1.
Example 2-1. hadoop fs help information
[esammer@hadoop01 ~]$ hadoop fs
Usage: java FsShell
[-ls <path>]
[-lsr <path>]
[-df [<path>]]
[-du <path>]
[-dus <path>]
[-count[-q] <path>]
[-mv <src> <dst>]

[-cp <src> <dst>]
[-rm [-skipTrash] <path>]
[-rmr [-skipTrash] <path>]
[-expunge]
[-put <localsrc> ... <dst>]
[-copyFromLocal <localsrc> ... <dst>]
[-moveFromLocal <localsrc> ... <dst>]
[-get [-ignoreCrc] [-crc] <src> <localdst>]
[-getmerge <src> <localdst> [addnl]]
[-cat <src>]
[-text <src>]
[-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>]
[-moveToLocal [-crc] <src> <localdst>]
[-mkdir <path>]
[-setrep [-R] [-w] <rep> <path/file>]
[-touchz <path>]
[-test -[ezd] <path>]
[-stat [format] <path>]
[-tail [-f] <file>]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-chgrp [-R] GROUP PATH...]
[-help [cmd]]
Most of these commands will be immediately obvious to an administrator with basic
shell experience. The major difference is that, because HDFS is a user space filesystem,
there’s no concept of a current working directory. All paths are either absolute (rec-
ommended) or relative to the user’s home directory within HDFS.3 An absolute path
can be of the form /logs/2012/01/25/, or it can include the full URL to specify the loca-
tion of the namenode, such as hdfs://mynamenode.mycompany.com:8020/logs/2012/01/
25/. If the full URL syntax is not used, the value is taken from the fs.default.name
parameter in the core-site.xml configuration file (see Example 2-2).
Example 2-2. Listing files and directories in HDFS
[esammer@hadoop01 ~]$ hadoop fs -ls /user/esammer
Found 4 items
drwx------ - esammer supergroup 0 2012-01-11 15:06 /user/esammer/.staging
-rw-r--r-- 3 esammer supergroup 27888890 2012-01-10 13:41 /user/esammer/data.txt
drwxr-xr-x - esammer supergroup 0 2012-01-11 13:08 /user/esammer/teragen
drwxr-xr-x - esammer supergroup 0 2012-01-11 15:06 /user/esammer/terasort
To prove to ourselves that the HDFS namespace is entirely separate from the host OS,
we can attempt to list the same path using the standard ls command (see Example 2-3).
Example 2-3. Attempting to list an HDFS path on the OS
esammer@hadoop01 ~]$ ls /user/esammer
ls: /user/esammer: No such file or directory
3. User home directories in HDFS are located in /user/<username> by default.
Access and Integration | 21

In many ways, HDFS is more like a remote filesystem than a local OS filesystem. The
act of copying files to or from HDFS is more like SCP or FTP than working with an
NFS mounted filesystem, for example. Files are uploaded using either -put or the
synonym -copyFromLocal and are downloaded with -get or -copyToLocal. As a conve-
nience, the -moveFromLocal and -moveToLocal commands will copy a file from or to
HDFS, respectively, and then remove the source file (see Example 2-4).
Example 2-4. Copying files to and from HDFS
[esammer@hadoop01 ~]$ hadoop fs -ls /user/esammer/
Found 2 items
[esammer@hadoop01 ~]$ hadoop fs -put /etc/passwd /user/esammer/
[esammer@hadoop01 ~]$ hadoop fs -ls /user/esammer/
Found 3 items
-rw-r--r-- 3 esammer supergroup 2216 2012-01-25 21:07 /user/esammer/passwd
esammer@hadoop01 ~]$ ls -al passwd
ls: passwd: No such file or directory
[esammer@hadoop01 ~]$ hadoop fs -get /user/esammer/passwd ./
[esammer@hadoop01 ~]$ ls -al passwd
-rw-rw-r--+ 1 esammer esammer 2216 Jan 25 21:17 passwd
[esammer@hadoop01 ~]$ hadoop fs -rm /user/esammer/passwd
Deleted hdfs://hadoop01.sf.cloudera.com/user/esammer/passwd
Also unique to HDFS is the ability to set the replication factor of a file. This can be
done by using the -setrep command, which takes a replication factor and an optional
flag (-R) to indicate it should operate recursively (see Example 2-5).
Example 2-5. Changing the replication factor on files in HDFS
[esammer@hadoop01 ~]$ hadoop fs -setrep 5 -R /user/esammer/tmp/
Replication 5 set: hdfs://hadoop01.sf.cloudera.com/user/esammer/tmp/a
Replication 5 set: hdfs://hadoop01.sf.cloudera.com/user/esammer/tmp/b
[esammer@hadoop01 ~]$ hadoop fsck /user/esammer/tmp -files -blocks -locations
FSCK started by esammer from /10.1.1.160 for path /user/esammer/tmp at
Wed Jan 25 21:57:39 PST 2012
/user/esammer/tmp <dir>
/user/esammer/tmp/a 27888890 bytes, 1 block(s): OK
0. blk_2989979708465864046_2985473 len=27888890 repl=5 [10.1.1.162:50010,
10.1.1.161:50010, 10.1.1.163:50010, 10.1.1.165:50010, 10.1.1.164:50010]
/user/esammer/tmp/b 27888890 bytes, 1 block(s): OK
0. blk_-771344189932970151_2985474 len=27888890 repl=5 [10.1.1.164:50010,
10.1.1.163:50010, 10.1.1.161:50010, 10.1.1.162:50010, 10.1.1.165:50010]
In Example 2-5, we’ve changed the replication factor of files a and b in the tmp directory
to 5. Next, the fsck, which is covered in “Checking Filesystem Integrity with
fsck” on page 198, is used to inspect file health but has the nice side effect of displaying
block location information for each file. Here, the five replicas of each block are spread

over five different datanodes in the cluster, as expected. You may notice that only files
have a block list. Directories in HDFS are purely metadata entries and have no block
data.
FUSE
Filesystem In Userspace, or FUSE, is a system that allows developers to implement
mountable filesystems in user space. That is, development of a kernel module is not
required. This is not only simpler to work with because developers can use standard
libraries in a familiar environment, but it is also safer because developer bugs can’t
necessarily cause kernel panics.
Both Apache Hadoop and CDH come with support for FUSE HDFS which, as you may
have guessed, allows you to mount the Hadoop distributed filesystem as you would
any other device. This allows legacy applications and systems to continue to read and
write files to a regular directory on a Linux server that is backed by HDFS. While this
is useful, it’s not a panacea. All of the properties of HDFS are still present: no in-place
modification of files, comparatively high latency, poor random access performance,
optimization for large streaming operations, and huge scale. To be absolutely clear,
FUSE does not make HDFS a POSIX-compliant filesystem. It is only a compatibility
layer that can expose HDFS to applications that perform only basic file operations.
REST Support
Over the past few years, Representational State Transfer (REST) has become an in-
creasingly popular way to interact with services in a language-agnostic way. Hadoop’s
native APIs are all Java-based, which presents a problem for non-Java clients. Appli-
cations have the option of shelling out and using the hadoop fs command, but that’s
inefficient and error-prone (not to mention aesthetically displeasing). Starting with
Apache Hadoop 1.0.0 and CDH4, WebHDFS, a RESTful API to HDFS, is now a stan-
dard part of the software. WebHDFS makes use of the already embedded web server
in each Hadoop HDFS daemon to run a set of REST APIs that mimic that of the Java
FileSystem API, including read and write methods. Full authentication, including Ker-
beros SPNEGO, is supported by WebHDFS. See Example 2-6 for a sample invocation
of the WebHDFS equivalent of the hadoop fs -ls /hbase command.
Example 2-6. Using a WebHDFS REST call to list a directory
[esammer@hadoop01 ~]$ curl http://hadoop01:50070/webhdfs/v1/hbase/?op=liststatus
{"FileStatuses":{"FileStatus":[
{"accessTime":0,"blockSize":0,"group":"hbase","length":0,"modificationTime":
1342560095961,"owner":"hbase","pathSuffix":"-ROOT-","permission":"755",
"replication":0,"type":"DIRECTORY"},
1342560094415,"owner":"hbase","pathSuffix":".META.","permission":"755",
Access and Integration | 23

1342561404890,"owner":"hbase","pathSuffix":".logs","permission":"755",
1342561406399,"owner":"hbase","pathSuffix":".oldlogs","permission":"755",
{"accessTime":1342560093866,"blockSize":67108864,"group":"hbase","length":38,
"modificationTime":1342560093866,"owner":"hbase","pathSuffix":"hbase.id",
"permission":"644","replication":3,"type":"FILE"},
{"accessTime":1342560093684,"blockSize":67108864,"group":"hbase","length":3,
"modificationTime":1342560093684,"owner":"hbase","pathSuffix":"hbase.version",
"permission":"644","replication":3,"type":"FILE"}
]}}
Around the same time, a standalone RESTful HDFS proxy service was created, called
HttpFS. While at first glance, both WebHDFS and HttpFS solve the same problem—
in fact, HttpFS is 100% API-compatible with WebHDFS—they address two separate
architectual problems. By using the embedded web server in each daemon, WebHDFS
clients must be able to communicate with each node of the cluster, just like native Java
clients. HttpFS primarily exists to solve this problem and instead acts as a gateway
service that can span network segments. Clients require only connectivity to the HttpFS
daemon, which in turn performs all communication with the HDFS cluster using the
standard Java APIs. The upside to HttpFS is that it minimizes the footprint required to
communicate with the cluster, but at the cost of total scale and capacity because all
data between clients and HDFS must now travel through a single node. Of course, it
is perfectly fine to run multiple HttpFS proxies to overcome this problem. Further,
because both WebHDFS and HttpFS are fully API-compatible, developers writing cli-
ent applications need to concern themselves with these details. The decision can be one
based exclusively on the required data throughput and network design and security
requirements.

CHAPTER 3
MapReduce
MapReduce refers to two distinct things: the programming model (covered here) and
the specific implementation of the framework (covered later in “Introducing Hadoop
MapReduce” on page 33). Designed to simplify the development of large-scale, dis-
tributed, fault-tolerant data processing applications, MapReduce is foremost a way of
writing applications. In MapReduce, developers write jobs that consist primarily of a
map function and a reduce function, and the framework handles the gory details of
parallelizing the work, scheduling parts of the job on worker machines, monitoring for
and recovering from failures, and so forth. Developers are shielded from having to
implement complex and repetitious code and instead, focus on algorithms and business
logic. User-provided code is invoked by the framework rather than the other way
around. This is much like Java application servers that invoke servlets upon receiving
an HTTP request; the container is responsible for setup and teardown as well as pro-
viding a runtime environment for user-supplied code. Similarly, as servlet authors need
not implement the low-level details of socket I/O, event handling loops, and complex
thread coordination, MapReduce developers program to a well-defined, simple inter-
face and the “container” does the heavy lifting.
The idea of MapReduce was defined in a paper written by two Google engineers in
2004, titled "MapReduce: Simplified Data Processing on Large Clusters" (J. Dean, S.
Ghemawat). The paper describes both the programming model and (parts of) Google’s
specific implementation of the framework. Hadoop MapReduce is an open source im-
plementation of the model described in this paper and tracks the implementation
closely.
Specifically developed to deal with large-scale workloads, MapReduce provides the
following features:
Simplicity of development
MapReduce is dead simple for developers: no socket programming, no threading
or fancy synchronization logic, no management of retries, no special techniques to
deal with enormous amounts of data. Developers use functional programming
concepts to build data processing applications that operate on one record at a time.
25

Map functions operate on these records and produce intermediate key-value pairs.
The reduce function then operates on the intermediate key-value pairs, processing
all values that have the same key together and outputting the result. These primi-
tives can be used to implement filtering, projection, grouping, aggregation, and
other common data processing functions.
Scale
Since tasks do not communicate with one another explicitly and do not share state,
they can execute in parallel and on separate machines. Additional machines can
be added to the cluster and applications immediately take advantage of the addi-
tional hardware with no change at all. MapReduce is designed to be a share noth-
ing system.
Automatic parallelization and distribution of work
Developers focus on the map and reduce functions that process individual records
(where “record” is an abstract concept—it could be a line of a file or a row from a
relational database) in a dataset. The storage of the dataset is not prescribed by
MapReduce, although it is extremely common, as we’ll see later, that files on a
distributed filesystem are an excellent pairing. The framework is responsible for
splitting a MapReduce job into tasks. Tasks are then executed on worker nodes or
(less pleasantly) slaves.
Fault tolerance
Failure is not an exception; it’s the norm. MapReduce treats failure as a first-class
citizen and supports reexecution of failed tasks on healthy worker nodes in the
cluster. Should a worker node fail, all tasks are assumed to be lost, in which case
they are simply rescheduled elsewhere. The unit of work is always the task, and it
either completes successfully or it fails completely.
In MapReduce, users write a client application that submits one or more jobs that con-
tain user-supplied map and reduce code and a job configuration file to a cluster of
machines. The job contains a map function and a reduce function, along with job con-
figuration information that controls various aspects of its execution. The framework
handles breaking the job into tasks, scheduling tasks to run on machines, monitoring
each task’s health, and performing any necessary retries of failed tasks. A job processes
an input dataset specified by the user and usually outputs one as well. Commonly, the
input and output datasets are one or more files on a distributed filesystem. This is one
of the ways in which Hadoop MapReduce and HDFS work together, but we’ll get into
that later.
The Stages of MapReduce
A MapReduce job is made up of four distinct stages, executed in order: client job sub-
mission, map task execution, shuffle and sort, and reduce task execution. Client ap-
plications can really be any type of application the developer desires, from command-
line tools to services. The MapReduce framework provides a set of APIs for submitting
26 | Chapter 3: MapReduce

jobs and interacting with the cluster. The job itself is made up of code written by a
developer against the MapReduce APIs and the configuration which specifies things
such as the input and output datasets.
As described earlier, the client application submits a job to the cluster using the frame-
work APIs. A master process, called the jobtracker in Hadoop MapReduce, is respon-
sible for accepting these submissions (more on the role of the jobtracker later). Job
submission occurs over the network, so clients may be running on one of the cluster
nodes or not; it doesn’t matter. The framework gets to decide how to split the input
dataset into chunks, or input splits, of data that can be processed in parallel. In Hadoop
MapReduce, the component that does this is called an input format, and Hadoop comes
with a small library of them for common file formats. We’re not going to get too deep
into the APIs of input formats or even MapReduce in this book. For that, check out
Hadoop: The Definitive Guide by Tom White (O’Reilly).
In order to better illustrate how MapReduce works, we’ll use a simple application log
processing example where we count all events of each severity within a window of time.
If you’re allergic to writing or reading code, don’t worry. We’ll use just enough pseu-
docode for you to get the idea. Let’s assume we have 100 GB of logs in a directory in
HDFS. A sample of log records might look something like this:
2012-02-13 00:23:54-0800 [INFO - com.company.app1.Main] Application started!
2012-02-13 00:32:02-0800 [WARN - com.company.app1.Main] Something hinky↵
is going down...
2012-02-13 00:32:19-0800 [INFO - com.company.app1.Main] False alarm. No worries.
...
2012-02-13 09:00:00-0800 [DEBUG - com.company.app1.Main] coffee units remaining:zero↵
- triggering coffee time.
2012-02-13 09:00:00-0800 [INFO - com.company.app1.Main] Good morning. It's↵
coffee time.
For each input split, a map task is created that runs the user-supplied map function on
each record in the split. Map tasks are executed in parallel. This means each chunk of
the input dataset is being processed at the same time by various machines that make
upthecluster.It’sfineiftherearemoremaptaskstoexecutethantheclustercanhandle.
They’re simply queued and executed in whatever order the framework deems best. The
map function takes a key-value pair as input and produces zero or more intermediate
key-value pairs.
The input format is responsible for turning each record into its key-value pair repre-
sentation. For now, trust that one of the built-in input formats will turn each line of
the file into a value with the byte offset into the file provided as the key. Getting back
to our example, we want to write a map function that will filter records for those within
a specific timeframe, and then count all events of each severity. The map phase is where
we’ll perform the filtering. We’ll output the severity and the number 1 for each record
that we see with that severity.
function map(key, value) {
// Example key: 12345 - the byte offset in the file (not really interesting).
The Stages of MapReduce | 27

// Example value: 2012-02-13 00:23:54-0800 [INFO - com.company.app1.Main]↵
// Application started!
// Do the nasty record parsing to get dateTime, severity,
// className, and message.
(dateTime, severity, className, message) = parseRecord(value);
// If the date is today...
if (dateTime.date() == '2012-02-13') {
// Emit the severity and the number 1 to say we saw one of these records.
emit(severity, 1);
}
}
Notice how we used an if statement to filter the data by date so that we got only the
records we wanted. It’s just as easy to output multiple records in a loop. A map function
can do just about whatever it wants with each record. Reducers, as we’ll see later,
operate on the intermediate key-value data we output from the mapper.
Given the sample records earlier, our intermediate data would look as follows:
DEBUG, 1
INFO, 1
INFO, 1
INFO, 1
WARN, 1
A few interesting things are happening here. First, we see that the key INFO repeats,
which makes sense because our sample contained three INFO records that would have
matched the date 2012-02-13. It’s perfectly legal to output the same key or value mul-
tiple times. The other notable effect is that the output records are not in the order we
would expect. In the original data, the first record was an INFO record, followed by
WARN, but that’s clearly not the case here. This is because the framework sorts the output
of each map task by its key. Just like outputting the value 1 for each record, the rationale
behind sorting the data will become clear in a moment.
Further, each key is assigned to a partition using a component called the partitioner. In
Hadoop MapReduce, the default partitioner implementation is a hash partitioner that
takes a hash of the key, modulo the number of configured reducers in the job, to get a
partition number. Because the hash implementation used by Hadoop ensures the hash
of the key INFO is always the same on all machines, all INFO records are guaranteed to
be placed in the same partition. The intermediate data isn’t physically partitioned, only
logically so. For all intents and purposes, you can picture a partition number next to
each record; it would be the same for all records with the same key. See Figure 3-1 for
a high-level overview of the execution of the map phase.

Figure 3-1. Map execution phase
Ultimately, we want to run the user’s reduce function on the intermediate output data.
A number of guarantees, however, are made to the developer with respect to the re-
ducers that need to be fulfilled.
• If a reducer sees a key, it will see all values for that key. For example, if a reducer
receives the INFO key, it will always receive the three number 1 values.
• A key will be processed by exactly one reducer. This makes sense given the pre-
ceding requirement.
• Each reducer will see keys in sorted order.
The next phase of processing, called the shuffle and sort, is responsible for enforcing
these guarantees. The shuffle and sort phase is actually performed by the reduce tasks
before they run the user’s reduce function. When started, each reducer is assigned one
of the partitions on which it should work. First, they copy the intermediate key-value
data from each worker for their assigned partition. It’s possible that tens of thousands
of map tasks have run on various machines throughout the cluster, each having output
key-value pairs for each partition. The reducer assigned partition 1, for example, would
need to fetch each piece of its partition data from potentially every other worker in the
cluster. A logical view of the intermediate data across all machines in the cluster might
look like this:

worker 1, partition 2, DEBUG, 1
worker 1, partition 1, INFO, 1
worker 2, partition 1, INFO, 1
worker 2, partition 1, INFO 1
worker 3, partition 2, WARN, 1
Copying the intermediate data across the network can take a fair amount of time, de-
pending on how much data there is. To minimize the total runtime of the job, the
framework is permitted to begin copying intermediate data from completed map tasks
as soon as they are finished. Remember that the shuffle and sort is being performed by
the reduce tasks, each of which takes up resources in the cluster. We want to start the
copy phase soon enough that most of the intermediate data is copied before the final
map task completes, but not so soon that the data is copied leaving the reduce tasks
idly taking up resources that could be used by other reduce tasks. Knowing when to
start the copy process can be tricky, and it’s largely based on the available bandwidth
of the network. See mapred.reduce.slowstart.completed.maps on page 129 for infor-
mation about how to configure when the copy is started.
Once the reducer has received its data, it is left with many small bits of its partition,
each of which is sorted by key. What we want is a single list of key-value pairs, still
sorted by key, so we have all values for each key together. The easiest way to accomplish
this is by performing a merge sort of the data. A merge sort takes a number of sorted
items and merges them together to form a fully sorted list using a minimal amount of
memory. With the partition data now combined into a complete sorted list, the user’s
reducer code can now be executed:
# Logical data input to the reducer assigned partition 1:
INFO, [ 1, 1, 1 ]
# Logical data input to the reducer assigned partition 2:
DEBUG, [ 1 ]
WARN, [ 1 ]
The reducer code in our example is hopefully clear at this point:
function reduce(key, iterator<values>) {
// Initialize a total event count.
totalEvents = 0;
// For each value (a number one)...
foreach (value in values) {
// Add the number one to the total.
totalEvents += value;
}
// Emit the severity (the key) and the total events we saw.
// Example key: INFO
// Example value: 3
emit(key, totalEvents);
}

Each reducer produces a separate output file, usually in HDFS (seeFigure 3-2). Separate
files are written so that reducers do not have to coordinate access to a shared file. This
greatly reduces complexity and lets each reducer run at whatever speed it can. The
format of the file depends on the output format specified by the author of the MapRe-
duce job in the job configuration. Unless the job does something special (and most
don’t) each reducer output file is named part-<XXXXX>, where <XXXXX> is the number
of the reduce task within the job, starting from zero. Sample reducer output for our
example job would look as follows:
# Reducer for partition 1:
INFO, 3
# Reducer for partition 2:
DEBUG, 1
WARN, 1
Figure 3-2. Shuffle and sort, and reduce phases
For those that are familiar with SQL and relational databases, we could view the logs
as a table with the schema:
CREATE TABLE logs (
EVENT_DATE DATE,
SEVERITY VARCHAR(8),

SOURCE VARCHAR(128),
MESSAGE VARCHAR(1024)
)
We would, of course, have to parse the data to get it into a table with this schema, but
that’s beside the point. (In fact, the ability to deal with semi-structured data as well as
act as a data processing engine are two of Hadoop’s biggest benefits.) To produce the
same output, we would use the following SQL statement. In the interest of readability,
we’re ignoring the fact that this doesn’t yield identically formatted output; the data is
the same.
SELECT SEVERITY,COUNT(*)
FROM logs GROUP BY SEVERITY
WHERE EVENT_DATE = '2012-02-13'
GROUP BY SEVERITY
ORDER BY SEVERITY
As exciting as all of this is, MapReduce is not a silver bullet. It is just as important to
know how MapReduce works and what it’s good for, as it is to understand why Map-
Reduce is not going to end world hunger or serve you breakfast in bed.
MapReduce is a batch data processing system
The design of MapReduce assumes that jobs will run on the order of minutes, if
not hours. It is optimized for full table scan style operations. Consequently, it un-
derwhelms when attempting to mimic low-latency, random access patterns found
in traditional online transaction processing (OLTP) systems. MapReduce is not a
relational database killer, nor does it purport to be.
MapReduce is overly simplistic
One of its greatest features is also one of its biggest drawbacks: MapReduce is
simple. In cases where a developer knows something special about the data and
wants to make certain optimizations, he may find the model limiting. This usually
manifests as complaints that, while the job is faster in terms of wall clock time, it’s
far less efficient in MapReduce than in other systems. This can be very true. Some
have said MapReduce is like a sledgehammer driving a nail; in some cases, it’s more
like a wrecking ball.
MapReduce is too low-level
Compared to higher-level data processing languages (notably SQL), MapReduce
seems extremely low-level. Certainly for basic query-like functionality, no one
wants to write, map, and reduce functions. Higher-level languages built atop Map-
Reduce exist to simplify life, and unless you truly need the ability to touch terabytes
(or more) of raw data, it can be overkill.
Not all algorithms can be parallelized
There are entire classes of problems that cannot easily be parallelized. The act of
training a model in machine learning, for instance, cannot be parallelized for many
types of models. This is true for many algorithms where there is shared state or
dependent variables that must be maintained and updated centrally. Sometimes

it’s possible to structure problems that are traditionally solved using shared state
differently such that they can be fit into the MapReduce model, but at the cost of
efficiency (shortest path−finding algorithms in graph processing are excellent ex-
amples of this). Other times, while this is possible, it may not be ideal for a host of
reasons. Knowing how to identify these kinds of problems and create alternative
solutions is far beyond the scope of this book and an art in its own right. This is
the same problem as the “mythical man month,” but is most succinctly expressed
by stating, “If one woman can have a baby in nine months, nine women should be
able to have a baby in one month,” which, in case it wasn’t clear, is decidedly false.
Introducing Hadoop MapReduce
Hadoop MapReduce is a specific implementation of the MapReduce programming
model, and the computation component of the Apache Hadoop project. The combi-
nation of HDFS and MapReduce is incredibly powerful, in much the same way that
Google’s GFS and MapReduce complement each other. Hadoop MapReduce is inher-
ently aware of HDFS and can use the namenode during the scheduling of tasks to decide
the best placement of map tasks with respect to machines where there is a local copy
of the data. This avoids a significant amount of network overhead during processing,
as workers do not need to copy data over the network to access it, and it removes one
of the primary bottlenecks when processing huge amounts of data.
Hadoop MapReduce is similar to traditional distributed computing systems in that
there is a framework and there is the user’s application or job. A master node coordi-
nates cluster resources while workers simply do what they’re told, which in this case
is to run a map or reduce task on behalf of a user. Client applications written against
the Hadoop APIs can submit jobs either synchronously and block for the result, or
asynchronously and poll the master for job status. Cluster daemons are long-lived while
user tasks are executed in ephemeral child processes. Although executing a separate
process incurs the overhead of launching a separate JVM, it isolates the framework
from untrusted user code that could—and in many cases does—fail in destructive ways.
Since MapReduce is specifically targeting batch processing tasks, the additional over-
head, while undesirable, is not necessarily a showstopper.
One of the ingredients in the secret sauce of MapReduce is the notion of data locality,
by which we mean the ability to execute computation on the same machine where the
data being processed is stored. Many traditional high-performance computing (HPC)
systems have a similar master/worker model, but computation is generally distinct from
data storage. In the classic HPC model, data is usually stored on a large shared cen-
tralized storage system such as a SAN or NAS. When a job executes, workers fetch the
data from the central storage system, process it, and write the result back to the storage
device. The problem is that this can lead to a storm effect when there are a large number
of workers attempting to fetch the same data at the same time and, for large datasets,
quickly causes bandwidth contention. MapReduce flips this model on its head. Instead
Introducing Hadoop MapReduce | 33

of using a central storage system, a distributed filesystem is used where each worker is
usually1 both a storage node as well as a compute node. Blocks that make up files are
distributed to nodes when they are initially written and when computation is
performed, the user-supplied code is executed on the machine where the block can be
pushed to the machine where the block is stored locally. Remember that HDFS stores
multiple replicas of each block. This is not just for data availability in the face of failures,
but also to increase the chance that a machine with a copy of the data has available
capacity to run a task.
Daemons
There are two major daemons in Hadoop MapReduce: the jobtracker and the
tasktracker.
Jobtracker
The jobtracker is the master process, responsible for accepting job submissions from
clients, scheduling tasks to run on worker nodes, and providing administrative func-
tions such as worker health and task progress monitoring to the cluster. There is one
jobtracker per MapReduce cluster and it usually runs on reliable hardware since a
failure of the master will result in the failure of all running jobs. Clients and tasktrackers
(see “Tasktracker” on page 35) communicate with the jobtracker by way of remote
procedure calls (RPC).
Just like the relationship between datanodes and the namenode in HDFS, tasktrackers
inform the jobtracker as to their current health and status by way of regular heartbeats.
Each heartbeat contains the total number of map and reduce task slots available (see
“Tasktracker” on page 35), the number occupied, and detailed information about
anycurrentlyexecutingtasks.Afteraconfigurableperiodofnoheartbeats,atasktracker
is assumed dead. The jobtracker uses a thread pool to process heartbeats and client
requests in parallel.
When a job is submitted, information about each task that makes up the job is stored
in memory. This task information updates with each tasktracker heartbeat while the
tasks are running, providing a near real-time view of task progress and health. After the
job completes, this information is retained for a configurable window of time or until
a specified number of jobs have been executed. On an active cluster where many jobs,
each with many tasks, are running, this information can consume a considerable
amount of RAM. It’s difficult to estimate memory consumption without knowing how
big each job will be (measured by the number of tasks it contains) or how many jobs
1. While it’s possible to separate them, this rarely makes sense because you lose the data locality features
of Hadoop MapReduce. Those that wish to run only Apache HBase, on the other hand, very commonly
run just the HDFS daemons along with their HBase counterparts.

will run within a given timeframe. For this reason, monitoring jobtracker memory uti-
lization is absolutely critical.
The jobtracker provides an administrative web interface that, while a charming flash-
back to web (anti-)design circa 1995, is incredibly information-rich and useful. As
tasktrackers all must report in to the jobtracker, a complete view of the available cluster
resources is available via the administrative interface. Each job that is submitted has a
job-level view that offers links to the job’s configuration, as well as data about progress,
the number of tasks, various metrics, and task-level logs. If you are to be responsible
for a production Hadoop cluster, you will find yourself checking this interface con-
stantly throughout the day.
The act of deciding which tasks of a job should be executed on which worker nodes is
referred to as task scheduling. This is not scheduling in the way that the cron daemon
executes jobs at given times, but instead is more like the way the OS kernel schedules
process CPU time. Much like CPU time sharing, tasks in a MapReduce cluster share
worker node resources, or space, but instead of context switching—that is, pausing the
execution of a task to give another task time to run—when a task executes, it executes
completely. Understanding task scheduling—and by extension, resource allocation
and sharing—is so important that an entire chapter (Chapter 7) is dedicated to the
subject.
Tasktracker
The second daemon, the tasktracker, accepts task assignments from the jobtracker,
instantiates the user code, executes those tasks locally, and reports progress back to
the jobtracker periodically. There is always a single tasktracker on each worker node.
Both tasktrackers and datanodes run on the same machines, which makes each node
both a compute node and a storage node, respectively. Each tasktracker is configured
with a specific number of map and reduce task slots that indicate how many of each
type of task it is capable of executing in parallel. A task slot is exactly what it sounds
like; it is an allocation of available resources on a worker node to which a task may be
assigned, in which case it is executed. A tasktracker executes some number of map
tasks and reduce tasks in parallel, so there is concurrency both within a worker where
many tasks run, and at the cluster level where many workers exist. Map and reduce
slots are configured separately because they consume resources differently. It is com-
mon that tasktrackers allow more map tasks than reduce tasks to execute in parallel
for reasons described in “MapReduce” on page 120. You may have picked up on the
idea that deciding the number of map and reduce task slots is extremely important to
making full use of the worker node hardware, and you would be correct.
Upon receiving a task assignment from the jobtracker, the tasktracker executes an
attempt of the task in a separate process. The distinction between a task and a task
attempt is important: a task is the logical unit of work, while a task attempt is a specific,
physical instance of that task being executed. Since an attempt may fail, it is possible
that a task has multiple attempts, although it’s common for tasks to succeed on their
Introducing Hadoop MapReduce | 35

first attempt when everything is in proper working order. As this implies, each task in
a job will always have at least one attempt, assuming the job wasn’t administratively
killed. Communication between the task attempt (usually called the child, or child pro-
cess) and the tasktracker is maintained via an RPC connection over the loopback in-
terface called the umbilical protocol. The task attempt itself is a small application that
acts as the container in which the user’s map or reduce code executes. As soon as the
task completes, the child exits and the slot becomes available for assignment.
The tasktracker uses a list of user-specified directories (each of which is assumed to be
on a separate physical device) to hold the intermediate map output and reducer input
during job execution. This is required because this data is usually too large to fit ex-
clusively in memory for large jobs or when many jobs are running in parallel.
Tasktrackers, like the jobtracker, also have an embedded web server and user interface.
It’s rare, however, that administrators access this interface directly since it’s unusual
to know the machine you need to look at without first referencing the jobtracker in-
terface, which already provides links to the tasktracker interface for the necessary
information.
When It All Goes Wrong
Rather than panic when things go wrong, MapReduce is designed to treat failures as
common and has very well-defined semantics for dealing with the inevitable. With tens,
hundreds, or even thousands of machines making up a Hadoop cluster, machines—
and especially hard disks—fail at a significant rate. It’s not uncommon to find that
approximately 2% to 5% of the nodes in a large Hadoop cluster have some kind of
fault, meaning they are operating either suboptimally or simply not at all. In addition
to faulty servers, there can sometimes be errant user MapReduce jobs, network failures,
and even errors in the data.
Child task failures
It’s common for child tasks to fail for a variety of reasons: incorrect or poorly imple-
mented user code, unexpected data problems, temporary machine failures, and ad-
ministrative intervention are a few of the more common causes. A child task is con-
sidered to be failed when one of three things happens:
• It throws an uncaught exception.
• It exits with a nonzero exit code.
• It fails to report progress to the tasktracker for a configurable amount of time.
When a failure is detected by the tasktracker, it is reported to the jobtracker in the next
heartbeat. The jobtracker, in turn, notes the failure and if additional attempts are per-
mitted (the default limit is four attempts), reschedules the task to run. The task may
be run either on the same machine or on another machine in the cluster, depending on
available capacity. Should multiple tasks from the same job fail on the same tasktracker

Another Random Document on
Scribd Without Any Related Topics

she most probably would, he would have to send her away. He
became as watchful of himself as he had been when his life
depended on every word he said; but he could not help his eyes.
When other people were there he did not look at Stella at all.
It was the first day Stella had been late for her work, and Julian had
prepared to be extremely angry until he saw her face. She came
slowly toward the open window out of the garden, looking oddly
drawn and white. The pain in her eyes hurt Julian intolerably.
"Hullo!" he said quickly, "what's wrong?"
She did not answer at once; her hands trembled. She was holding a
letter, face downward, as if she hated holding it.
"Your mother asked me to tell you myself," she began. "I am afraid
to tell you; but she seemed to think you would rather—"
"Yes," said Julian, quickly. "Are you going away?"
"Oh, no," whispered Stella. "If it was only that!"
Julian said, "Ah!" It was an exclamation that sounded like relief. He
leaned back in his chair, and did nothing further to help her.
Stella moved restlessly about the room. She had curious graceful
movements like a wild creature; she became awkward only when
she knew she was expected to behave properly. Finally she paused,
facing a bookcase, with her back to Julian.
"Well?" asked Julian, encouragingly. "Better get it over, hadn't we?
World come to pieces worse than usual this morning?"
"I don't know how to tell you," she said wretchedly. "For you
perhaps it has—I have heard from Marian."
Julian picked up his pipe, which he had allowed to go out when
Stella came in, relit it, and smiled at the back of her head. He looked
extraordinarily amused and cheerful.

"She hadn't written to me," Stella went on without turning round,
"for ages and ages,—you remember I told you?—and now she has."
"She was always an uncertain correspondent," said Julian, smoothly.
"Am I to see this letter? Message for me, perhaps? Or doesn't she
know you're here?"
"Oh, no!" cried Stella, quickly. "I mean there's nothing in it you
couldn't see, of course. There is a kind of message; still, she didn't
mean you actually to see it. She heard somehow that I was here,
and she wanted me to tell you—" Stella's voice broke, but she picked
herself up and went on, jerking out the cruel words that shook her
to the heart,— "she wanted me to tell you that she's—she's going to
be married."
Stella heard a curious sound from Julian incredibly like a chuckle.
She flinched, and held herself away from him. He would not want
her to see how he suffered. There was a long silence.
"Stella," said Julian at last in that singular, soft, new voice of his that
he occasionally used when they were alone together, "the ravages of
pain are now hidden. You can turn round."
She came back to him uncertainly, and sat down by the window at
his feet. He had a tender teasing look that she could not quite
understand. His eyes themselves never wavered as they met hers,
but the eagerness in them wavered; his tenderness seemed to hold
it back.
She thought that Julian's eyes had grown curiously friendly lately.
Despite his pain, they were very friendly now.
"Any details?" Julian asked. "Don't be afraid to tell me. I'm not—I
mean I'm quite prepared for it."
"It's to be next month," she said hurriedly. "She didn't want you to
see it first in the papers."

"Awfully considerate of her, wasn't it?" interrupted Julian. "By the by,
tell her when you write that she couldn't have chosen anybody
better to break it to me than you."
"O Julian," Stella pleaded, "please don't laugh at me! Do if it makes
you any easier, of course; only I—I mind so horribly!"
"Do you?" asked Julian, carefully. "I think I'm rather glad you mind,
but you mustn't mind horribly; only as much as a friend should mind
for another friend."
"That is the way I mind," said Stella.
She had a large interpretation of friendship.
"Oh, all right," said Julian, rather crossly. "Go on!"
"She says it's a Captain Edmund Stanley, and he's a D.S.O. They're
to be married very quietly while he's on leave."
"Lucky man!" said Julian. "Any money?"
"Oh, I think so," murmured Stella, anxiously skipping the letter in
her lap. "She says he's fairly well off."
"I think," observed Julian, "that we may take it that if Marian says
Captain Stanley is fairly well off, his means need give us no anxiety.
What?"
"Julian, must you talk like that?" Stella pleaded. "You'll make it so
hard for yourself if you're bitter."
"On the whole, I think I must," replied Julian, reflectively. "If I talked
differently, you mightn't like it; and, anyhow, I daren't run the risk. I
might break down, you know, and you wouldn't like that, would you?
Shall we get to work?"
"Oh, not this morning!" Stella cried. "I'm going out; I knew you
wouldn't want me."

"Did you though?" asked Julian. "But I happen to want you most
particularly. What are you going to do about it?"
She looked at him in surprise. He had a peculiarly teasing expression
which did not seem appropriate to extreme grief.
"I'll stay, of course, if you want me," she said quietly.
"You're a very kind little elf," said Julian, "but I don't think you must
make a precedent of my wanting you, or else—look here, d' you
mind telling me a few things about your—your friendship with
Marian?"
Stella's face cleared. She saw now why he wanted her to stay. She
turned her eyes back to the garden.
"I'll tell you anything you like to know," she answered.
"You liked her?" asked Julian.
"She was so different from everybody else in my world," Stella
explained. "I don't think I judged her; I just admired her. She was
awfully good to me. I didn't see her very often, but it was all the
brightness of my life."
"Stella, you've never told me about your life," Julian said irrelevantly.
"Will you some day? I want to know about the town hall and that
town clerk fellow."
"There isn't anything to tell you," said Stella. "I mean about that,
and Marian was never in my life. She couldn't have been, you know;
but she was my special dream. I used to love to hear about all her
experiences and her friends; and then—do you remember the night
of Chaliapine's opera? It was the only opera I ever went to, so of
course I remember; but perhaps you don't. You were there with
Marian. I think I knew then—"
"Knew what?" asked Julian, leaning forward a little. "You seem
awfully interested in that gravel path, Stella?"

"Knew," she said, without turning her head, "what you meant to
her."
"Where were you?" Julian inquired. "Looking down from the ceiling
or up from a hole in the ground, where the good people come from?
I never saw you."
"Ah, you wouldn't," said Stella. "I was in the gallery. Do you
remember the music?"
"Russian stuff," Julian said. "Pack of people going into a fire, yes.
Funnily enough, I've thought of it since, more than once, too; but I
didn't know you were there."
"And then when you were hurt," Stella went on in a low voice,
"Marian told me. Julian, she did mind frightfully. I always wanted
you to know that she did mind."
"It altered her plans, didn't it," said Julian, "quite considerably?"
"You've no business to talk like that!" said Stella, angrily. "It's not fair
—or kind."
"And does it matter to you whether I'm fair or kind?" Julian asked,
with deadly coolness.
"I beg your pardon," said Stella, quickly. "Of course it has nothing to
do with me. I have no right to—to mind what you say."
"I'm glad you recognize that," said Julian, quietly. "It facilitates our
future intercourse. And you agreed with Marian that she only did her
duty in painstakingly adhering to her given word? Perhaps you
encouraged her to do it? The inspiration sounds quite like yours."
She looked at him now.
"Julian," she said, "am I all wrong? Would you rather that we
weren't friends at all? You are speaking as if you hated me."

"No, I'm not," he said quickly, "you little goose! How could I keep
you here if I hated you? Have a little sense. No, don't put your hand
there, because, if you do, I shall take it, and I'm rather anxious just
now not to. You shall go directly you've answered me this. Did you
agree with Marian's point of view about me? You know what it was,
don't you? She didn't love me any more; she wished I had been
killed, and she decided to stick to me. She thought I'd be grateful.
Do you think I ought to have been grateful?"
"You know I don't! You know I don't!" cried Stella. "But why do you
make me say it? I simply hated it—hated her not seeing, not caring
enough to see, not caring enough to make you see. There! Is that all
you wanted me to say?"
"Practically," said Julian, "but I don't see why you should fly into a
rage over it. In your case, then, if it had been your case, you would
simply have broken off the engagement at once, like a sensible girl?"
"I can't imagine myself in such a situation," said Stella, getting up
indignantly.
"Naturally," interposed Julian smoothly. "But, still, if you had
happened, by some dreadful mischance, to find yourself engaged to
me—"
"I should have broken it off directly," said Stella, turning to go
—"directly I found out—"
"Found out what?" asked Julian.
"That you were nothing but a cold-blooded tease!" cried Stella over
her shoulder.
"You perfect darling!" said Julian under his breath. "By Jove! that
was a narrow squeak!"

CHAPTER XXIII
It puzzled Stella extremely that she found herself unable to say,
"What is it that you want, Julian?" She knew that there was
something that he wanted, and there was nothing that she would
dream of denying him. What, therefore, could be simpler than asking
him? And yet she did not want to ask him.
She began by trying hard to understand what it was that he had told
her above the bluebell wood, because she thought if she discovered
what he wanted then, the rest would follow. He had wanted a
particular kind of help from her; that was plain. It had something to
do with her being a woman; that was plainer. But was it to his
advantage or to his disadvantage that she was a woman? Ought she
to suppress the fact or build on it? And how could she build on it or
suppress it when she never felt in the least like anything else but a
woman?
Cicely used to say that the only safe way with men was never to be
nice to them; but Stella had always thought any risk was better than
such a surly plan. Besides, Julian couldn't mean that. He liked her to
be nice to him. She saw quite plainly that he liked her to be nice to
him.
Unfortunately, Julian had taken for granted in Stella a certain
experience of life, and Stella had never had any such experience.
She had never once recognized fancy in the eyes of any man. As for
love, it belonged solely to her dreams; and the dreams of a woman
of twenty-eight, unharassed by fact, are singularly unreliable. She
thought of Mr. Travers, but he did not count. She had never been
able to realize what he had felt for her. Her relation to him was as
formal, despite his one singular lapse, as that of a passenger to a
ticket-collector. She had nothing to go on but her dreams.

In her very early youth she had selected for heroes two or three
characters from real life. They were Cardinal Newman, Shelley, and
General Gordon. Later, on account of a difference in her religious
opinions, she had replaced the Cardinal by Charles Lamb. None of
these characters was in the least like Julian.
One had apparently no experience of women, the other two had
sisters, and Shelley's expression of love was vague and might be
said to be misleading.
She met me, robed in such exceeding glory,
That I beheld her not.
Life had unfortunately refused to meet Shelley on the same terms,
and difficulties had ensued, but it was this impracticable side of him
that Stella had accepted. She had skipped Harriet, and landed on
"Epipsychidion." Love was to her "a green and golden immortality."
She was not disturbed by it, because the deepest experiences of life
do not disturb us. What disturbs us is that which calls us away from
them.
It made it easier to wait to find out what Julian wanted that he was
happier with her. He was hardly ever impersonal or cold now, and he
sometimes made reasons to be with her that had nothing to do with
their work.
It was June, and the daffodils had gone, but there were harebells
and blue butterflies upon the downs, and in the hedges wild roses
and Star of Bethlehem. Lady Verny spent all her time in the garden.
She said the slugs alone took hours. They were supposed by the
uninitiated to be slow, but express trains could hardly do more
damage in less time. So Stella and Ostrog took their walks alone,
and were frequently intercepted by Julian on their return.
Julian, who ought to have known better, thought that the situation
might go on indefinitely, and Stella did not know that there was any
situation; she knew only that she was in a new world. There was

sorrow outside it, there was sorrow even in her heart for those
outside it; but through all sorrow was this unswerving, direct
experience of joy. She would have liked to share it with Julian, but
she thought it was all her own, and that what he liked about her—
since he liked something—was her ability to live beyond the margin
of her personal delight. The color of it was in her eyes, and the
strength of it at her heart; but she never let it interfere with Julian.
She was simply a companion with a hidden treasure. She sometimes
thought that having it made her a better companion; but even of
this she was not sure.
It made her a little nervous taking Ostrog out alone, but she always
took the lead with him, and slipped it on him if a living creature
appeared on the horizon. There were some living creatures he didn't
mind, but you couldn't be sure which.
One evening she was tired and forgot him. There was a wonderful
sunset. She stood to watch it in a hollow of the downs where she
was waiting for Julian. The soft, gray lines rose up on each side of
her, immemorial, inalterable lines of gentle land. The air was as
transparently clear as water, and hushed with evening. Far below
her, where the small church steeple sprang, she saw the swallows
cutting V-shaped figures to and fro above the shining elms.
For a long time she heard no sound, and then, out of the stillness,
came a faint and hollow boom. Far away across the placid shapes of
little hills, over the threatened seas, the guns sounded from France
—the dim, intolerable ghosts of war.
Ostrog, impatient of her stillness, bounded to the edge of the hollow
and challenged the strange murmur to the echo. He was answered
immediately. A sheep-dog shot up over the curve of the down.
Ostrog was at his throat in an instant.
There was a momentary recoil for a fresh onslaught, and then the
shrieks of the preliminary tussle changed into the full-throated growl

of combat. There was every prospect that one or other of them
would be dead before their jaws unlocked.
Stella hovered above them in frantic uncertainty. She was helpless
till she saw that there was no other help. The sheep-dog had had
enough; a sudden scream of pain stung her into action. She seized
Ostrog's hind leg and twisted it sharply from under him.
At the moment she did so she heard Julian's voice:
"Wait! For God's sake, let go!"
But she could not wait; the sheep-dog was having the life squeezed
out of him. She tugged and twisted again. Ostrog's grip slackened,
he flung a snap at her across his shoulder, and then, losing his
balance, turned on her in a flash. She guarded her head, but his
teeth struck at her shoulder. She felt herself thrust back by his
weight, saw his red jaws open for a fresh spring, and then Julian's
crutch descended sharply on Ostrog's head. Ostrog dropped like a
stone, the bob-tailed sheep-dog crawled safely away, and Stella
found herself in Julian's arms.

She tugged and twisted again
"Dearest, sure you're not hurt? Sure?" he implored breathlessly, and
then she knew what his eyes asked her, they were so near her own
and so intent; and while her lips said, "Sure, Julian," she knew her
own eyes answered them.
He drew her close to his heart and kissed her again and again.
The idea of making any resistance to him never occurred to Stella.
Nothing that Julian asked of her could seem strange. She only

wondered, if that was what he wanted, why he had not done it
before.
He put her away from him almost roughly.
"There," he said, "I swore I'd never touch you! And I have! I'm a
brute and a blackguard. Try and believe I'll never do it again.
Promise you won't leave me? Promise you'll forgive me? I was
scared out of my wits, and that's a fact. D' you think you can forgive
me, Stella?"
"But what have I to forgive?" Stella asked. "I let you kiss me."
"By Jove!" exclaimed Julian, half laughing, "you are an honest
woman! Well, if you did, you mustn't 'let me' again, that's all.
Ostrog, you wretch, lie down! You ought to have a sound thrashing.
I'd have shot you if you'd hurt her; but as I've rather scored over the
transaction, I'll let you off."
Stella looked at Julian thoughtfully.
"Why mustn't I let you again?" she inquired, "if that is what you
want?"
Julian, still laughing, but half vexed, looked at her.
"Look here," he said, "didn't I tell you you'd got to help me? I can't
very well keep you here and behave to you like that, can I?"
Stella considered for a moment, then she said quietly, "Were you
flirting with me, Julian?"
"I wish to God I was!" said Julian, savagely. "If I could get out of it
as easily as that, d'you suppose I should have been such a fool as
not to have tried?"
"I don't think you would have liked me to despise you," said Stella,
gently. "You see, if you had given me nothing when I was giving you
all I had, I should have despised you."

Julian stared at her. She was obviously speaking the truth, but in his
heart he knew that if she had loved him and he had flirted with her,
he would have expected her to be the one to be despised.
He put out his hand to her and then drew it back sharply.
"No, I'm hanged if I'll touch you," he said under his breath. "I love
you all right,—you needn't despise me for that,—but telling you of
it's different. I was deadly afraid you'd see; any other woman would
have seen. I've held on to myself for all I was worth, but it hasn't
been the least good, really. I suppose I've got to be honest about it:
I can't keep you with me, darling; you'll have to go. It makes it a
million times worse your caring, but it makes it better, too."
"I don't see why it should be worse at all," said Stella, calmly. "If we
both care, and care really, I don't see that anything can be even
bad."
Julian pulled up pieces of the turf with his hand. He frowned at her
sternly.
"You mustn't tempt me," he said; "I told you once I can't marry."
"You told me once, when you didn't know I cared," agreed Stella. "I
understand your feeling that about a woman who didn't care or who
only cared a little, but not about a woman who really cares."
"But, my dear child," said Julian, "that's what just makes it utterly
impossible. I can't understand how I ever was such a selfish brute as
to dream of taking Marian. I was ill at the time, and hadn't sized it
up; but if you think I'm going to let you make such a sacrifice, you're
mistaken. I'd see you dead before I married you!"
Stella's eyebrows lifted, but she did not seem impressed.
"I think," she said gently, "you talk far too much as if it had only got
to do with you. Suppose I don't wish to see myself dead?"

"Well, you must try to see the sense of it," Julian urged. "You're
young and strong; you ought to have a life. I'm sure you love
children. You like to be with me, and all that; you're the dearest
companion a man ever had. It isn't easy, Stella, to say I won't keep
you; don't make it any harder for me. I've looked at this thing
steadily for months. I don't mind owning that I thought you might
get to care if I tried hard enough to make you; but, darling, I
honestly didn't try. You can't say I wasn't awfully disagreeable and
cross. I knew I was done for long ago, but I thought you were all
right. You weren't like a girl in love, you were so quiet and—and
sisterly and all that. If I'd once felt you were beginning to care in
that way, I'd have made some excuse; I wouldn't have let it come to
this. I'd rather die than hurt you."
"Well, but you needn't hurt me," said Stella, "and neither of us need
die. It's not your love that wants to get rid of me, Julian; it's your
pride. But I haven't any pride in that sense, and I'm not going to let
you do it."
"By Jove! you won't!" cried Julian. His eyes shot a gleam of
amusement at her. It struck him that the still little figure by his side
was extraordinarily formidable. He had never thought her formidable
before. He had thought her brilliant, intelligent, and enchanting, not
formidable; but he had no intention of giving way to her. Formidable
or not, he felt quite sure of himself. He couldn't let her down.
"The sacrifice is all the other way," Stella went on. "You would be
sacrificing me hopelessly to your pride if you refused to marry me
simply because some one of all the things you want to give me you
can't give me. Do you suppose I don't mind,—mind for you, I mean,
hideously,—mind so much that if I were sure marrying you would
make you feel the loss more, I'd go away from you this minute and
never come near you again? But I do not think it will make it worse
for you. You will have me; you will have my love and companionship,
and they are—valuable to you, aren't they, Julian?"
Julian's eyes softened and filled.

"Yes," he muttered, turning his head away from her; "they're
valuable."
"Then," she said, "if you are like that to me, if I want you always,
and never anybody else, have you a right to rob me of yourself,
Julian?"
"If I could believe," he said, his voice shaking, "that you'd never be
sorry, never say to yourself, 'Why did I do it?' But, oh, my dear, you
know so little about the ordinary kind of love! You don't realize a bit,
and I do. It must make it all so confoundedly hard for you, and I'm
such an impatient chap. I mightn't be able to help you. And you're
right: I'm proud. If I once thought you cared less or regretted
marrying me, it would clean put the finish on it. But you're not right
about not loving you, Stella, that's worse than pride; loving you
makes it impossible. I can't take the risk for you. I'll do any other
mortal thing you want, but not that!"
"Julian," asked Stella in a low voice, "do you think I am a human
being?"
"Well, no!" said Julian. "Since you ask me, more like a fairy or an elf
or something. Why?"
"Because you're not treating me as if I were," said Stella, steadily.
"Human beings have a right to their own risks. They know their own
minds, they share the dangers of love."
"Then one of 'em mustn't take them all," said Julian, quickly.
"How could one take them all?" said Stella. "I have to risk your
pride, and you have to risk my regret. As a matter of fact, your pride
is more of a certainty than a risk, and my regret is a wholly
imaginary idea, founded upon your ignorance of my character. Still,
I'm willing to put it like that to please you. You have every right to
sacrifice yourself to your own theories, but what about sacrificing
me? I give you no such right."

For the first time Julian saw what loving Stella would be like; he
would never be able to get to the end of it. Marriage would be only
the beginning. She had given him her heart without an effort, and
he found that she was as inaccessible as ever. His soul leaped
toward this new, unconquerable citadel. He held himself in hand
with a great effort.
"What you don't realize," he said, "is that our knowledge of life is not
equal. If I take you at your word, you will make discoveries which it
will be too late for you to act upon. You cannot wish me to do what
is not fair to you."
"I want my life to be with you," said Stella. "Whatever discoveries I
make, I shall not want them to be anywhere else. You do not
understand, but if you send me away, you will take from me the
future which we might have used together. You will not be giving me
anything in its place but disappointment and utter uselessness. You'll
make me—morally—a cripple. Do you still wish me to go away from
you?"
Julian winced as if she had struck him.
"No, I'll marry you," he said; "but you've made me furiously angry.
Please go home by yourself. I wonder you dare use such an
illustration to me."
Stella slipped over the verge of the hollow. She, too, wondered how
she had dared; but she knew quite well that if she hadn't dared,
Julian would have sent her away.

CHAPTER XXIV
Stella was afraid that when she went down to dinner it would be like
slipping into another life—a life to which she was attached by her
love for Julian, but to which she did not belong. It did not seem
possible to her that Lady Verny would be able to bear her as a
daughter-in-law. As a secretary it had not mattered in the least that
she was shabby and socially ineffective. And she couldn't be
different; they'd have to take her like that if they took her at all. She
ranged them together in her fear of their stateliness; she almost
wished that they wouldn't take her at all, but let her slink back to
Redcliffe Square and bury herself in her own insignificance.
But when she went down-stairs she found herself caught in a swift
embrace by Lady Verny, and meeting without any barrier the
adoration of Julian's eyes.
"My dear, my dear," said Lady Verny, "I always felt that you belonged
to me."
"But are you pleased?" whispered Stella in astonishment.
"Pleased!" cried Lady Verny, with a little shaken laugh. "I'm satisfied;
a thing that at my age I hardly had the right to expect."
"Mother thinks it's all her doing," Julian explained. "It's her theory
that we've shown no more initiative than a couple of guaranteed
Dutch bulbs. Shall I tell you what she was saying before you came
down-stairs?"
"Dear Julian," said Lady Verny, blushing like a girl, "you're so
dreadfully modern, you will frighten Stella if you say things to her so
quickly before she has got used to the idea of you."

"She's perfectly used to the idea of me," laughed Julian, "and I've
tried frightening her already without the slightest success. Besides,
there's nothing modern about a madonna lily, which is what we were
discussing. My mother said, Stella, that she didn't care very much for
madonna lilies in the garden. They're too ecclesiastical for the other
flowers, but very suitable in church for weddings. And out in ten
days' time, didn't you say, Mother? I hope they haven't any of
Stella's procrastinating habits."
"You mustn't mind his teasing, dear," Lady Verny said, smiling. "We
will go in to dinner now. You're a little late, but no wonder. I am
delighted to feel that now I have a right to scold you."
"The thing that pleases me most," said Julian, "is that I shall be able
to remove Stella's apples and pears forcibly from her plate and peel
them myself. I forget how long she has been here, but the anguish I
have suffered meal by meal as I saw her plod her unreflecting way
over their delicate surfaces, beginning at the stalk and slashing
upward without consideration for any of the laws of nature, nothing
but the self-control of a host could have compelled me to endure. I
offered to peel them for her once, but she said she liked peeling
them; and I was far too polite to say, 'Darling, you've got to hand
them over to me.' I'm going to say it now, though, every time."
"Hush, dear," said Lady Verny, nervously. "Thompson has barely shut
the door. I really don't know what has happened to your behavior."
"I haven't any," said Julian. "I'm like the old lady in the earthquake
who found herself in the street with no clothes on. She bowed
gravely to a gentleman she had met the day before and said, 'I
should be happy to give you my card, Mr. Jones, but I have lost the
receptacle.' Things like that happen in earthquakes. I have lost my
receptacle." He met Stella's eyes and took the consent of her
laughter. He was as happy with her as a boy set loose from school.
Lady Verny, watching him, was almost frightened at his lack of self-
restraint. "He has never trusted any one like this before," she

thought. "He is keeping nothing back." It was like seeing the
released waters of a frozen stream.
While they sat in the hall before Julian rejoined them, Lady Verny
showed Stella all the photographs of Julian taken since he was a
baby.
There was a singularly truculent one of him, at three years old, with
a menacingly poised cricket-bat, which Stella liked best of all. Lady
Verny had no copy of it, but she pressed Stella to take it.
"Julian will give you so many things," she said; "but I want to give
you something that you will value, and which is quite my own." So
Stella took the truculent baby, which was Lady Verny's own.
"You look very comfortable sitting there together; I won't disturb you
for chess," Julian observed when he came in shortly afterward. "I
was wondering if you would like to hear what I did in Germany. It's a
year old now and as safe with you as with me, but it mustn't go any
further."
Julian told his story very quietly, leaning back against the cushions of
a couch by the open window. Above his head, Stella could see the
dark shapes of the black yew hedges and the wheeling of the bats
as they scurried to and fro upon their secret errands.
Neither Lady Verny nor Stella moved until Julian had finished
speaking. It was the most thrilling of detective stories; but it is not
often that the roots of our being are involved in detective stories.
They could not believe that he lay there before them, tranquilly
smoking a cigarette and breathed on by the soft June air. As they
watched his face comfort and security vanished. They were in a
ruthless world where a false step meant death. Julian had been in
danger, but it was never the danger which he had been in that he
described; it was the work he had set out to do and the way he had
done it. He noticed danger only when it obstructed him. Then he put
his wits to meet it. They were, as Stella realized, very exceptional

wits for meeting things. Julian combined imagination with strict
adherence to fact. He had the courage which never broods over an
essential risk and the caution which avoids all unnecessary ones.
"Of course," he broke off for a moment, "you felt all the time rather
like a flea under a microscope. Don't underrate the Germans. As a
microscope there's nothing to beat them; where the microscope
leaves off is where their miscalculations begin. A microscope can tell
everything about a flea except where it is going to hop.
"I had a lively time over my hopping; but the odd part of it was the
sense of security I often had, as if some one back of me was giving
me a straight tip. I don't understand concentration. You'd say it is
your own doing, of course, and yet behind your power of holding on
to things, it seems as if Something Else was holding on much harder.
It's as if you set a ball rolling, and some one else kicked it in the
right direction.
"After I'd been in Germany for a month I began to believe in an
Invisible Kicker-Off. It was company for me, for I was lonely. I had to
calculate every word I said, and there's no sense of companionship
where one has to calculate. The feeling that there was something
back of me was quite a help. I'd get to the end of my job, and then
something fresh would be pushed toward me.
"For instance, I met a couple of naval officers by chance,—I wasn't
out for anything naval,—and they poured submarine facts into me as
you pour milk into a jug—facts that we needed more than the points
I'd come to find out.
"I'm not at all sure," Julian finished reflectively, "that if you grip hard
enough under pressure, you don't tap facts.
"Have you ever watched a crane work? You shift a lever, and it
comes down as easily as a parrot picks up a pencil; it'll lift a weight
that a hundred men can't move an inch, and swing it up as if it were
packing feathers. Funny idea, if there's a law that works like that.

"I came back through Alsace and Lorraine, meaning to slip through
the French lines. A sentry winged me in the woods. Pure funk on his
part; he never even came to hunt up what he'd let fly at. But it
finished my job."
Lady Verny folded up her embroidery.
"It was worth the finish, Julian," she said quickly. "I am glad you told
me, because I had not thought so before." Then she left them.
"It isn't finished, Julian," murmured Stella in a low voice. "It never
can be when it's you."
"Well," said Julian, "it's all I've got to give you; so I'm rather glad
you like it, Stella."
They talked till half the long summer night was gone. She sat near
him, and sometimes Julian let his hand touch her shoulder or her
hair while he unpacked his heart to her. The bitterness of his reserve
was gone.
"I think perhaps I could have stood it decently if it hadn't been for
Marian," he explained. "I was damned weak about her, and that's a
fact. You see, I thought she had the kind of feeling for me that
women sometimes have and which some men deserve; but I'm
bound to admit I wasn't one of them. When I saw that Marian took
things rather the way I should have taken them myself, I went down
under it. I said, 'That's the end of love.' It was the end of the kind I
was fit for, the kind that has an end.
"Now I'm going to tell you something. I never shall again, so you
must make the most of it, and keep it to hold on to when I behave
badly. You've put the fear of God into me, Stella. Nothing else would
have made me give in to you; and you know I have given in to you,
don't you?"
"You've given me everything in the world I want," said Stella, gently,
"if that's what you call giving in to me."

"I've done more than that," said Julian, quietly. "I've let you take my
will and turn it with that steady little hand of yours; and it's the first
time—and I don't say it won't be the last—that I've let any man or
woman change my will for me.
"Now I'm going to send you to bed. I oughtn't to have you kept you
up like this; but if I've got to let you go back to your people to-
morrow, we had to know each other a little better first, hadn't we?
I've been trying not to know you all these months.
"Before you go, would you mind telling me about Mr. Travers and the
cat?"
"No," said Stella, with a startled look; "anything else in the world,
Julian, but not Mr. Travers and the cat."
"Ostrog and I are frightfully jealous by nature," Julian pleaded. "He
wouldn't be at all nice to that cat if he met it without knowing its
history."
"He can't be unkind to the poor cat," said Stella; "it's dead."
"And is Mr. Travers dead, too?" asked Julian.
"I should think," said Stella, "that he was about as dead as the red-
haired girl in the library."
"What red-haired girl?" cried Julian, sharply. "Who's been telling you
—I mean what made you think I knew her? It's a remarkably fine bit
of painting."
"But you did know her," said Stella; "only don't tell me anything
about her unless you want to."
"I won't refuse to answer any questions you ask," said Julian after a
pause, "but I'd much rather wait until we're married. I am a little
afraid of hurting you; you wouldn't be hurt, you see, if you were
used to me and knew more about men. You're an awfully clever
woman, Stella, but the silliest little girl I ever knew."

"I'll give up the red-haired girl if you'll give up Mr. Travers," said
Stella. She rose, and stood by his side, looking out of the window.
"Do you want to say good night, or would you rather go to bed
without?" he asked her.
"Of course I'll say good night," said Stella. "But, Julian, there are
some things I so awfully hate your doing. Saying good night doesn't
happen to be one of them. It's lighting my candle unless I'm sure
you want to. I want to be quite certain you don't mind me in little
things like that."
Julian put his arms round her and kissed her as gently as he would
have kissed a child. "Of course you shall light your candle," he said
tenderly, "just to show I don't mind you. But it isn't my pride now. I
don't a bit object to your seeing I can't. I'm quite sure of you, you
see; unless you meant to hurt me, you simply couldn't do it. And if
you meant to hurt me, it would be because you wanted to stop me
hurting myself, like this afternoon, wouldn't it?"
Stella nodded. She wanted to tell him that she had always loved
him, long before he remembered that she existed. All the while he
had felt himself alone, she was as near him as the air that touched
his cheek. But she could not find words in which to tell him of her
secret companionship. The instinct that would have saved them only
brushed her heart in passing.
Julian was alarmed at her continued silence.
"You're not frightened or worried or anything, are you?" he asked
anxiously. "Sure you didn't mind saying good night? It's not
compulsory, you know, even if we are engaged. I'd hate to bother
you."
"I'm not bothered," Stella whispered; "I—only love you. I was saying
it to you in my own way."
"I'll wait three days for you," said Julian, firmly. "Not an hour more.
You quite understand, don't you, that I'm coming up at the end of

three days to bring you home for good?"
Stella shivered as she thought of Redcliffe Square. Julian wouldn't
like Redcliffe Square, and she wouldn't be able to make him like it;
and yet she wouldn't be able not to mind his not liking it.
Julian knew nothing about Redcliffe Square, but he noticed that
Stella shivered when he told her that he was going to bring her
home for good.

CHAPTER XXV
It would be too strong an expression to say that after Stella's
departure Julian suffered from reaction. He himself couldn't have
defined what he suffered from, but he was uneasy.
He had given himself away to Stella as he had never in his wildest
dreams supposed that one could give oneself away to a woman. But
he wasn't worrying about that; he hadn't minded giving himself
away to Stella.
Samson was the character in the Old Testament whom Julian most
despised, because he had let Delilah get things out of him. What
Samson had got back hadn't been worth it, and could probably have
been acquired without the sacrifice of his hair. He had simply given
in to Delilah because he had a soft spot for her; and Delilah quite
blamelessly (from Julian's point of view) had retaliated by crying out,
"The Philistines be upon thee, Samson!"
Julian had always felt perfectly safe with women of this type; they
couldn't have entrapped him. But there wasn't an inch of Delilah in
Stella. She had no Philistines up her sleeve for any of the
contingencies of life and she had not tried to get anything out of
Julian.
That was where his uneasiness began. He understood her
sufficiently to trust her, but he was aware that beyond his confidence
she was a mapless country; he did not even know which was water
and which was land. His uncertainty had made him shrink from
telling Stella about Eugénie Matisse.
If Marian had been sharp enough—she probably wouldn't have been
—to guess that Julian knew the girl in the picture, she would have

known, too, precisely what kind of girl she was, and she would have
thought none the worse of Julian.
But he didn't know what Stella expected. He wasn't afraid that she
would cast him off for that or any other of his experiences; then he
would have told her. She would have forgiven him as naturally as
she loved him; but what if her forgiveness had involved her pain?
He had spoken the truth when he told Stella that she had "put the
fear of God into him." Julian had not known much about God before
or anything about fear; but he was convinced now that the fear of
God was not that God might let you down, but that you might let
down God. He wanted to be as careful of Stella as if she had been a
government secret.
Did she know in the least what she was in for. Or was she like an
unconscious Iphigenia vowed off to mortal peril by an inadvertent
parent?
He had done his best to make her realize the future, but there are
certain situations in life when doing one's best to make a person
aware of a fact is equivalent to throwing dust in his eyes. And Stella
herself might by a species of divine fooling, have outwitted both
himself and her. She might be marrying Julian for pity under the
mask of love.
Her pity was divine, and he could stand it for himself perfectly; but
he couldn't stand it for her. Why had she shivered when he had said
he was going to bring her home? He cursed his helplessness. If he
had not been crippled he would have taken her by surprise, and let
his instincts judge for him; but he had had to lie there like a log,
knowing that if he asked her to come to him, she would have
blinded him by her swift, prepared responsiveness.
The moment on the downs hardly counted. She had been so
frightened that it had been like taking advantage of her to take her
in his arms.

The one comfort he clung to was her fierce thrust at his pride. He
repeated it over and over to himself for reassurance. She had said, if
he wouldn't marry her, he would make her morally a cripple. That
really sounded like love, for only love dares to strike direct at the
heart. If he could see her, he knew it would be all right; if even she
had written (she had written, of course, but had missed the midnight
post), he would have been swept back into the safety of their shared
companionship. But in his sudden loneliness he mistrusted fortune.
When a man has had the conceit knocked out of him, he is not
immediately the stronger for it; and he is the more vulnerable to
doubt not only of himself, but of others. The saddest part of self-
distrust is that it breeds suspicion.
It would be useless to speak to his mother about it, for, though a
just woman, she was predominantly his mother; she wanted Stella
too much for Julian to admit a doubt of Stella's wanting him for
herself. She would have tried to close all his questions with facts.
This method of discussion appealed to Julian as a rule, but he had
begun to discover that there are deeper things than facts.
Lady Verny was in London at a flower show, and Julian was sitting in
the summer-house, which he was planning to turn into a room for
Stella. His misgivings had not yet begun to interfere with his plans.
He had just decided to have one of the walls above the water
meadows replaced by glass when his attention was attracted by the
most extraordinary figure he had ever seen.

The most extraordinary figure we had ever
seen
She was advancing rapidly down a grass path, between Lady Verny's
favorite herbaceous borders, pursued by the butler. At times
Thompson, stout and breathless, succeeded in reaching her side,
evidently for the purpose of expostulation, only to be swept
backward by the impetuosity of her speed. Eurydice was upon a
secret mission. She had borrowed a pound from Stella with which to
carry it out; and she was not going to be impeded by a butler.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Hadoop Operations 1st Edition Eric Sammer

More Related Content

Similar to Hadoop Operations 1st Edition Eric Sammer (20)

Recently uploaded (20)

Hadoop Operations 1st Edition Eric Sammer