006 performance tuningandclusteradmin

PERFORMANCE TUNING & CLUSTER
ADMINISTRATION
2012/8/2
Scott Miao

AGENDA
 Course Credit
 Performance Tuning
 More…
 Cluster Administration
 More…
2

COURSE CREDIT
 Show up, 30 scores
 Ask question, each question earns 5 scores
 Hands-on, 40 scores
 70 scores will pass this course
 Each course credit will be calculated once for each
course finished
 The course credit will be sent to you and your
supervisor by mail
3

PERFORMANCE TUNING
 Garbage Collection Tuning
 MSLAB
 Compression
 Optimizing Splits and Compactions
 Load Balancing
 Merging Regions
 Client API: Best Practices
 Configuration
 Load Tests
4

GARBAGE COLLECTION TUNING
 The process to rewrite the heap generation in
question is called a garbage collection (GC)
 GC parameters only need to be added to the region
servers
 JRE comes with basic assumptions
 Regarding what your programs are doing, how they
create objects, how they allocate the heap to handle
data, and so on
 These assumptions work well in a lot of cases
 But NOT work well for HBase…
 Especially write-heavy ones
 It cannot safely rely on the JRE assumption alone 5

6
https://service.ithome.com.tw/20120720Java/index3.html#3

GARBAGE COLLECTION TUNING –
WRITE-HEAVY USE CASES (1/2)
 Memstore flushes the data by the configured minimum
flush size, hbase.hregion.memstore.flush.size
 It leaves different size of holes in the heap
 Data resided in different locations in the generational
architecture of the Java heap
 Depending on how long the data was in memory
 Young generation (new generation)
 The space can be reclaimed quickly and no harm is done
 Old generation (tenured generation)
 Data promoted to this location if it stays in memory for a longer
period of time
8

WRITE-HEAVY USE CASES (2/2)
 Reuse the holes created by data that has been written
to disk
 Requests a size of heap that does not fit into one of
those holes
 Needs to compact the fragmented heap
 Young to Old
 The promotion of longer-living objects from the young to the old
generation
 Old to Stop-The-World
 There is no longer enough space for a young allocation caused by
the fragmentation
 Falls back to the stop-the-world garbage collector
 Rewrites the entire heap space and compacts it to the remaining
active objects
 If this fails, you will see a promotion failure in your
garbage collection logs
9

10
What is the Heap looks like ?

SPECIFY THE YOUNG GENERATION SIZE
 Young generation
 is between 128 MB and 512 MB
 Old generation
 holds the remaining available heap, which is usually
many gigabytes of memory
 Using 128 MB is a good starting point
 Further observation of the JVM metrics should be
conducted
 Specify the young generation size like so
 -XX:MaxNewSize=128m -XX:NewSize=128m
 One convenient option
 -Xmn128m
11

GC OPTIONS SETTING
 GC Options setting for HBase
 Adding them in the hbase-env.sh configuration file
 HBASE_OPTS variable for all HBase
 HBASE_REGIONSERVER_OPTS variable for all region
servers
 Enable the JRE’s log output for garbage collection
details
 Monitor it for occurrences of
 "concurrent mode failure" or "promotion
failed" messages 12
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"

GC STRATEGY FOR YOUNG GENERATION
 Recommended value for young generation
 -XX:+UseParNewGC
 Use the Parallel New Collector
 It stops the entire Java process to clean up the young
generation heap
 Since Young generation’s size is small in comparison
 Usually less than a few hundred milliseconds
13

GC STRATEGY FOR OLD GENERATION
 Recommended value for old generation
 -XX:+UseConcMarkSweepGC
 Use the Concurrent Mark-Sweep Collector (CMS)
 It tries to do as much work concurrently as
possible, without stopping the Java process
 It takes extra effort and an increased CPU load
 Avoids the required stops to rewrite a fragmented old
generation heap
 If you hit the promotion error
 It falls back to stop-the-world again
14

GC STRATEGY FOR OLD GENERATION
 A switch for CMS
 -XX:CMSInitiatingOccupancyFraction=70
 A percentage that specifies when the background
process starts
 Avoids the concurrent mode failure
 The background process to mark and sweep the heap for
collection is still running when the heap runs out of usable
space
 Falls back to stop-the-world again
 Initiating occupancy fraction to 70%
 20% block cache + 40% memstore limits = 60%, by default
 Starts the background process at appropriate time
 Early enough, and not too early 15

GARBAGE COLLECTION TUNING - SUMMARY
 Recommended GC options
 The Alex Su’s GC options
 GC Options Reference
16
export HBASE_REGIONSERVER_OPTS=
"-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:<%= hbase_log_path %>/hbase-regionserver-gc-`date +%F-%H-%M-%S`.log
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=70 -XX:PrintFLSStatistics=1
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=<%= hbase_log_path %>/hbase-regionserver.hprof
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-
140102.html

MSLAB - QUESTION
 For solving the stop-the-world issue
 Stop-the-world
 The key to reducing these compacting collections is to
reduce fragmentation
 Only objects of exactly the same size should be
allocated from the heap
 Subsequent allocations of new objects of the exact same size
will always reuse these holes
 No promotion error, and therefore no stop-the-world
compacting collection is required
17

MSLAB –
MEMSTORE-LOCAL ALLOCATION BUFFER (1/3)
 Are buffers of fixed sizes containing KeyValue
instances of varying sizes
1. A buffer cannot completely fit a newly added
KeyValue, it is considered full
2. And a new buffer is created, once again of the given
fixed size
 Enabled by default in version 0.92
 Disabled in version 0.90 of HBase
 hbase.hregion.memstore.mslab.enabled property
 It is recommended that test your setup with this
feature 18

MSLAB –
 The size of each allocated, fixed-sized buffer
 hbase.hregion.memstore.mslab.chunksize property
 Default is 2 MB
 Based on your KeyValue instances, you may have to adjust
this value
 E.g., 100 KB in size, you need to increase the MSLAB size to fit
more than just a few cells
 An upper boundary of what is stored in the buffers
 hbase.hregion.memstore.mslab.max.allocation property
 Default 256 KB
 Any cell (KeyValue) that is larger will be directly allocated in
the Java heap 19

MSLAB –
 MSLAB do not come without a cost
 More wasteful in regard to heap usage
 Most likely not fill every buffer to the last byte
 A Tradeoff
 Use MSLABs and benefit from better garbage collection but
incur the extra space that is required
 NOT use MSLABs and benefit from better memory
efficiency but deal with the problem caused by garbage
collection pauses
 Could plan to restart the servers every few days, or weeks, before
the pause happens
 The buffers require an additional byte array copy
operation, therefore slightly slower
 Measure the impact on your workload
20

COMPRESSION
 A number of compression algorithms that can be
enabled at the column family level
 It is recommended
 Enable compression unless you have a reason not to do
so
 For example, when using already compressed content, such
as JPEG images
 Compression usually will yield overall better
performance
 The overhead of the CPU performing the compression
/de-compression is less than what is required to read
more data from disk
21

COMPRESSION – AVAILABLE CODECS
 It is recommended
 Snappy/Zippy (in Bigtable)
 Released by Google under the BSD License
 Ships with the required JNI libraries to be able to use it in HBase-0.92
 Must install the native binary library on all region servers
 LZO (Lempel-Ziv-Oberhumer)
 A lossless data compression algorithm that is focused on
decompression speed, and written in ANSI C
 HBase cannot ship with LZO because of licensing issues
 incompatible GNU General Public License (GPL)
 LZO installation needs to be performed separately, after HBase has
been installed
22
http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437

COMPRESSION –
COMPRESSION TEST TOOL
 Use command
 hbase org.apache.hadoop.hbase.util.CompressionTest
<path> <none|gz|lzo|snappy>
 Example
 ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest
/user/larsgeorge/test.gz gz
 It will return result based on the test
 If success
 If failed
23
…
SUCCESS
Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassNotFoundException:
com.hadoop.compression.lzo.LzoCodec
…

COMPRESSION – STARTUP CHECK
 A fast failing setup notices the missing libraries
 Instead of running into issues later
 For example, check the Snappy and LZO
compression libraries
 The server will abort at startup with an IOException
stating
 "Compression codec <codec-name> not
supported, aborting RS construction"
 Copy the changed configuration file to all region
servers and to restart them afterward
24
<property>
<name>hbase.regionserver.codecs</name>
<value>snappy,lzo</value>
</property>

COMPRESSION – ENABLING COMPRESSION
 Install the JNI libraries
 Install native compression libraries
 Specifying the chosen algorithm in the column family schema
 In HBase shell
 create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }
 In API
 HColumnDescriptor.setCompressionType(…)
 Refer to ppt#003, p#11
25

OPTIMIZING SPLITS AND COMPACTIONS
- SPLIT/COMPACTION STORMS
 Grow your regions roughly at the same rate
 Eventually they all need to be split at about the
same time
 A large spike in disk I/O because of the required
compactions to rewrite the split region
 Refer to ppt#004, p#13
26

OPTIMIZING SPLITS AND COMPACTIONS –
MANAGED SPLITTING (1/2)
 you can turn it off and manually invoke the split and
major_compact commands
 Setting Region Maximum File Size
 hbase.hregion.max.filesize property for the entire cluster
 table level by API
 HTableDescriptor.setMaxFileSize(…)
 To a very high number
 Better to set this value to a reasonable upper boundary
 Such as 100GB
 Long.MAX_VALUE is not recommended in case the manual
splits fail to run
 Then you can time-control them
 Running them staggered across all regions
 Spreads the I/O load as much as possible, avoiding any
split/compaction storm
 Use HBase shell + cron
 Or write your own codes with HBase Admin API supports
 Refer to #003, p#21
27

MANAGED SPLITTING (2/2)
 RegionSplitter Class (added in version 0.90.2)
 Another way to split existing regions
 Rolling split feature
 Split the existing regions while waiting long enough for the
involved compactions to complete
 API docs
 An additional advantage
 Have better control over which regions are available at
any time
 In rare case, you need to do very low-level debugging
 With automated splits, it is hard to debug !!
 Due to this region is split to two daughter regions
28

REGION HOTSPOTTING
 You may be dealing with a write pattern that is causing a
specific region to run hot
 Use Region Server Metrics to observe
 Key design approaches
 Salt keys, random keys, etc
 Other only way to alleviate this situation
 Manually split a hot region into one or more new regions, at
exact boundaries
 You can specify any row key within specific region
 Be able to generate halves that are completely different in size
 Refer ppt#003, p#21
 This can not dealing with completely sequential key ranges
 Those are always going to hit one region for a considerable amount
of time
29

PRESPLITTING REGIONS (1/3)
 Manage splits manually is useful
 Therefore start with a larger number of regions right from
the table creation
 Means to create a table with the required number of
regions
 Three ways…
 HBase shell
 create, refer to ppt#003, p#37
 API
 HBaseAdmin.createTable(…), refer to ppt#003, p#16
 RegionSplitter Class
 By default, MD5StringSplit class to partition the row keys into
ranges
 Use -D split.algorithm=<your-algorithm-class> for other
implementation
30
/bin/hbase org.apache.hadoop.hbase.util.RegionSplitter
usage: RegionSplitter <TABLE>

 RegionSplitter with MD5StringSplit sample
31
testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093.
testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a.
testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92.
testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258.
testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9.
testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5.

 How many presplit regions ?
 Start low with 10 presplit regions per server and watch as data
grows over time
 It is better to err on the side of too few regions and using a
rolling split later
 If Presplit regions to thin
 Increase hbase.hregion.majorcompaction property
 Refet to ppt#004, p# 19
 If data size grows too large
 Use the RegionSplitter utility to perform a rolling split of all
regions
 The main objective is to avoid split/compaction storm
32

LOAD BALANCING – BALANCER (1/3)
 The master has a built-in feature
 Called the balancer
 By default, runs every five minutes
 hbase.balancer.period property
 Attempts to equal out the number of assigned
regions per region server
 Within one region of the average number per server
 Determines a new assignment plan
 Describes which regions should be moved where starts
the process of moving the regions by calling the
unassign() method
 Refer to ppt#003, p#22 33

LOAD BALANCING - BALANCER (2/3)
 balancer has an upper limit on how long it is allowed to
run
 hbase.balancer.max.balancing property
 defaults to half of the balancer period value
 2.5 mins
 The balancer switch
 Toggle the balancer status between enabled and disabled
 HBase shell
 balance_switch command, refer to ppt#003, p#39
 balanceSwitch() API method, refer to ppt#003, p#22
34

LOAD BALANCING - BALANCER (3/3)
 Can be explicitly started
 HBase shell
 balancer command, refer to ppt#003, p#39
 balancer() API method, refer to ppt#003, p#22
 Return true
 Any work has be done
 Return false
 balancer was switched off
 No work to be done
 balancer was not able to run the balancer
 There is a region currently in transition, the balancer will be
skipped
35

LOAD BALANCING - MOVE
 Can also use the move
 To assign regions to other servers
 HBase shell
 move command, refer to ppt#003, p#39
 move() API method, refer to ppt#003, p#22
36

MERGING REGIONS
 Sometimes you may need to merge regions
 For example, after you have removed a large amount of
data and you want to reduce the number of regions
hosted by each server
 HBase allows you to merge two adjacent regions
 The HBase cluster must be offline, but HDFS
37
/bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>

CLIENT API: BEST PRACTICES (1/3)
 Disable auto-flush
 When performing a lot of put operations
 Use scanner-caching
 Set Scan.setCaching() method to something greater than the
default of 1 if needed
 Limit scan scope
 If only a small number of the available columns are to be
processed, only those should be specified in the input scan
 For example, use Scan.addFamily() method

 Close ResultScanners
 Avoiding performance problems
 This may cause problems on the region servers
 Block cache usage
 Scan instances can be set to use the block cache in the
region server via the setCacheBlocks() method
 true by default, default settings of the table and family
are used
 API docs
 Server side block cache settings

 Optimal loading of row keys
 When performing a table scan where only the row keys
are needed
 a FilterList with a MUST_PASS_ALL operator +
FirstKeyOnlyFilter + KeyOnlyFilter
 Refer to ppt#002, p#43 & 46
 Turn off WAL on Puts
 Increasing throughput on Puts is to call
writeToWAL(false), there might be data loss
 Consider to use the bulk loading techniques instead
40

CONFIGURATION (1/6)
 Advanced options you can consider adjusting
based on your use case
 Most properties are configured in hbase-site.xml
 Others are in hbase-env.sh
 Decrease ZooKeeper timeout
 The default timeout between a region server and the
ZooKeeper quorum is three minutes
 Tune the timeout down to a minute, or even less, so the
master notices failures sooner
 zookeeper.session.timeout property
 Be careful of ―Juliet Pause‖ 41

CONFIGURATION (2/6)
 Increase handlers
 The number of threads that are kept open to answer
incoming requests to user tables
 By default is 10
 hbase.regionserver.handler.count property
 Keep this number low when the payload per request
approaches megabytes
 And high when the payload is small
 Increase heap settings
 HBASE_HEAPSIZE setting in hbase-env.sh file
 Consider using HBASE_REGIONSERVER_OPTS
instead of changing the global HBASE_HEAP SIZE
 Region servers may need more memory than Master
42

CONFIGURATION (3/6)
 Enable data compression
 Should enable compression for the storage files
 In most cases, boosts performance
 Increase region size
 Consider going to larger regions to cut down on the total
number of regions on your cluster
 Fewer regions to manage makes for a smoother-running
cluster
43

CONFIGURATION (4/6)
 Adjust block cache size
 The amount of heap used for the block cache is specified as a
percentage
 Defaults to 20%
 perf.hfile.block.cache.size property
 It is good if you have mainly reading workloads
 Adjust memstore limits
 Memstore heap usage
 hbase.regionserver.global.memstore.upperLimit property
 Defaults to 40%
 hbase.regionserver.global.memstore.lowerLimit property
 Defaults to 35%
 Control the amount of flushing that will take place once the server is
required to free heap space
 Mainly read-oriented workloads
 Consider reducing both limits to make more room for the block cache
 Handling many writes
 Increase the memstore limits to reduce the excessive amount of I/O
this causes 44

CONFIGURATION (5/6)
 Increase blocking store files
 The region servers block further updates from clients to
give compactions time to reduce the number of files
 Default is seven files
 hbase.hstore.blockingStoreFiles property
 Increase block multiplier
 A safety latch that blocks any further updates from clients
when the memstores exceed the multiplier * flush size limit
 hbase.hregion.memstore.block.multiplier property
 Default to 2
 If you have enough memory, can increase this value to
handle spikes more gracefully
 Refer to ppt#003, p#8 45

CONFIGURATION (6/6)
 Decrease maximum logfiles
 How often flushes occur based on the number of WAL
files on disk
 Default is 32
 hbase.regionserver.maxlogs property
 Can be high in a write-heavy use case
 Lower it to force the servers to flush data more often to
disk
46

LOAD TESTS
 It is advisable to run performance tests to verify
functionality of your cluster
 These tests give you a baseline which you can refer
to
 After making changes to the configuration of the cluster
 Or the schemas of your tables
 Doing a burn-in of your cluster
 Show you how much you can gain from it
 But this does not replace a test with the load as
expected from your use case
47

LOAD TESTS –
PERFORMANCE EVALUATION (1/2)
 HBase ships with its own tool to execute a
performance evaluation
 Performance Evaluation (PE)
 Wiki
 http://wiki.apache.org/hadoop/Hbase/PerformanceEvalu
ation
48
/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation
[--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>

LOAD TESTS –
PERFORMANCE EVALUATION (2/2)
 Example
49
/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
11/07/03 13:18:34 INFO hbase.PerformanceEvaluation: Start class
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at
offset 0 for 1048576 rows
...
11/07/03 13:18:41 INFO hbase.PerformanceEvaluation: 0/104857/1048576
...
...
11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: Finished class
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest
in 89062ms at offset 0 for 1048576 rows

LOAD TESTS – YCSB (1/2)
 Yahoo! Cloud Serving Benchmark* (YCSB)
 It is a suite of tools that can be used to run comparable
workloads against different storage systems
 Also a reasonable tool for performing an HBase cluster burn-
in—or performance test
 Using YCSB is preferred over the HBase-supplied
Performance Evaluation
 Offers more options
 Can combine read and write workloads
 Home page
 http://research.yahoo.com/Web_Information_Management/YCSB
50

LOAD TESTS – YCSB (2/2)
 Use HBase shell
 create “usertable”, “family”
 git pull
 cd ${GIT_HOME}/hbase-training/006/ycsb
 Run command
 Then you can see performance metrics in ycsb-
laod.log file
51
java -cp "${HBASE_CONF_DIR}:core-0.1.4.jar:hbase-binding-0.1.4.jar"
com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P
workloads/workloada -p columnfamily=family -p recordcount=1000 -s > ycsb-
load.log

CLUSTER ADMINISTRATION
52
 Operational Tasks
 Node Decommission
 Rolling Restarts
 Adding Backup
Master
 Adding a Region
Server
 Data Task
 Export
 Import
 CopyTable Tool
 Bulk Import
 Troubleshooting
 HBase Fsck
 Analyzing the Logs

OPERATIONAL TASKS – NODE DECOMMISSION (1/2)
 Use following script
 In normal HBase distribution
 In tm distribution
 Disable the Load Balancer before
Decommissioning a node
 In hbase shell
 balance_switch false
 Regions could be offline for a good period of time
 Many regions on the server
 All regions close
 The master notices the region server’s ZooKeeper
znode being removed
53
${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver
${TM_PUPPET_HOME}/bin/services/shutdown-regionservers.sh [<host> ...]

OPERATIONAL TASKS – NODE DECOMMISSION (2/2)
 Stop a region server gradually
 A node to gradually shed its load and then shut itself
down
 From HBASE 0.90.2
 ${HBASE_HOME}/bin/graceful_stop.sh
 Example
 Check the HOSTNAME on your HBase master UI
 IP address is NOT supported at present
54
${HBASE_HOME}/bin/graceful_stop.sh HOSTNAME

OPERATIONAL TASKS – ROLLING RESTARTS
 Also use graceful_stop.sh
 Steps as follows
1. Ensure the cluster is consistent
 Fix it if inconsistent
2. Restart the master
3. Disable the region balancer
4. Run the graceful_stop.sh script per region server
5. Restart the master again
 Clear out the dead servers list and reenable the balancer
6. Run hbck to ensure the cluster is consistent
55
hbase hbck
hbase hbck -fix
${HBASE_HOME}/bin/hbase-daemon.sh stop master;
${HBASE_HOME}/bin/hbase-daemon.sh start master
echo "balance_switch false" | ${HBASE_HOME}/bin/hbase shell
for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh
--restart --reload --debug $i; done &> /tmp/log.txt &

OPERATIONAL TASKS –
ADDING BACKUP MASTER (1/2)
 To prevent the Single Point of Failure
 The machine currently hosting the active master is
failing, the system can fall back to a backup master
 Underlying operations
1. A dedicated ZooKeeper znode /hbase/master
2. All master processes will race to create, and the first
one to create it wins (become currently master)
 It happens at startup
3. All other master processes simply loop around the
znode check and wait for it to disappear
 Triggering the race again 56

ADDING BACKUP MASTER (2/2)
 How to start multiple backup master processes
 Use original way to start a master process
 In tm distribution
 Specifically start a backup master process
57
${HBASE_HOME}/bin/hbase-daemon.sh start master
${TM_PUPPET_HOME}/bin/services/startup-hmaster.sh [<host> ...]
${HBASE_HOME}/bin/hbase-daemon.sh start master --backup

ADDING A REGION SERVER
 In normal HBase distribution
 Edit the ${HBASE_HOME}/conf/regionservers
 To add newly added region server’s host name
 Two scripts can use…
 ${HBASE_HOME}/bin/start-hbase.sh
 It will bypass the original existing region servers, and start
the newly added region server referred to regionservers file
 ${HBASE_HOME}/bin/hbase-daemon.sh start regionserver
 Must executing on the newly added region server
 In tm distribution
 New feature, not talk about this here
58

DATA TASK
 You may be required to move the data as a whole
or in parts
 Archive data for backup purposes
 To bootstrap another cluster
59
hadoop jar ${HBASE_HOME}/hbase-0.91.0-SNAPSHOT.jar
An example program must be given as the first argument.
Valid program names are:
…
completebulkload: Complete a bulk data load.
copytable: Export a table from local cluster to peer cluster
export: Write table data to HDFS.
import: Import data written by Export.
importtsv: Import data in TSV format.
…
http://hbase.apache.org/book/ops_mgt.html

DATA TASK – EXPORT (1/3)
60
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export
Usage: Export [-D <property=value>]* <tablename> <outputdir>
[<versions> [<starttime> [<endtime>]]

DATA TASK - EXPORT (2/3)
61
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export
testtable /user/larsgeorge/backup-testtable
11/06/25 15:58:29 INFO mapred.JobClient: Running job: job_201106251558_0001
11/06/25 15:58:30 INFO mapred.JobClient: map 0% reduce 0%
…
11/06/25 15:59:42 INFO mapred.JobClient: Job complete: job_201106251558_0001
11/06/25 15:59:42 INFO mapred.JobClient: Counters: 6
11/06/25 15:59:42 INFO mapred.JobClient: Job Counters
11/06/25 15:59:42 INFO mapred.JobClient: Rack-local map tasks=32
11/06/25 15:59:42 INFO mapred.JobClient: Launched map tasks=32
11/06/25 15:59:42 INFO mapred.JobClient: FileSystemCounters
11/06/25 15:59:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3648
11/06/25 15:59:42 INFO mapred.JobClient: Map-Reduce Framework
11/06/25 15:59:42 INFO mapred.JobClient: Map input records=0
11/06/25 15:59:42 INFO mapred.JobClient: Spilled Records=0
11/06/25 15:59:42 INFO mapred.JobClient: Map output records=0

DATA TASK - EXPORT (3/3)
 Each part-m-nnnnn file contains a piece of the
exported data
 Together they form the full backup of the table
 Use the hadoop distcp command to move the
directory from one cluster to another, and perform
the import there 62
hadoop dfs -lsr /user/larsgeorge/backup-testtable
drwxr-xr-x - ... 0 2011-06-25 15:58 _logs
-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00000
-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00001
…
-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00030
-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00031

DATA TASK – IMPORT (1/2)
63
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import
Usage: Import <tablename> <inputdir>
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import
testtable /user/larsgeorge/backup-testtable
11/06/25 17:09:48 INFO mapreduce.TableOutputFormat: Created table instance
for testtable
11/06/25 17:09:48 INFO input.FileInputFormat: Total input paths to process : 32
…
11/06/25 17:10:51 INFO mapred.JobClient: Data-local map tasks=32
11/06/25 17:10:51 INFO mapred.JobClient: FileSystemCounters
11/06/25 17:10:51 INFO mapred.JobClient: HDFS_BYTES_READ=3648

DATA TASK - IMPORT (2/2)
 Use the Import job to store the data in a different
table
 With the same schema
 Both export/import commend are per-table only
 Use hadoop distcp command to copy the entire
/hbase in HDFS
 Not recommended
 May copy store files that are halfway through a
memstore flush operation
64

DATA TASK – COPYTABLE TOOL (1/2)
 Designed to bootstrap cluster replication
 Make a copy of an existing table from the master
cluster to the slave cluster
65
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable
Usage: CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X]
[--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>

DATA TASK – COPYTABLE TOOL (2/2)
 The copy of the table is stored on the same cluster
66
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable
--new.name=testtable3 testtable
11/06/26 15:20:07 INFO mapreduce.TableOutputFormat:
Created table instance for testtable3
…
11/06/26 15:21:06 INFO mapred.JobClient: Job complete: job_201106261454_0003
11/06/26 15:21:06 INFO mapred.JobClient: Counters: 5
11/06/26 15:21:06 INFO mapred.JobClient: Data-local map tasks=32

DATA TASK – BULK IMPORT (1/2)
 Importtsv tool
 Given files containing data in tab-separated value (TSV)
format
 By default , it uses the HBase put() API to insert data
into HBase one row at a time
 By setting importtsv.bulk.output option, generate files
using HFileOutputFormat
 These can subsequently be bulk-loaded into HBase by
completebulkload Tool
67
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importtsv
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>

DATA TASK – BULK IMPORT (2/2)
 completebulkload Tool
 Is used to import the data into the running cluster
 After a data import has been prepared
 By using the importtsv tool with the importtsv.bulk.output
option
 By some other MapReduce job using the
HFileOutputFormat
68
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar completebulkload
-conf ~/my-hbase-site.xml /user/larsgeorge/myoutput mytable

TROUBLESHOOTING – HBASE FSCK (1/4)
 Shell Command
 ${HBASE_HOME}/bin/hbase hbck
 Once started
 Scans the .META. table to gather all the pertinent information
it holds
 Scans the HDFS root directory HBase is configured to use
 Compare the collected details to report on inconsistencies
and integrity issues
 Consistency check
 Whether the region is listed in .META. and exists in HDFS
 Is also assigned to exactly one region server
 Integrity check
 Compares the regions with the table details to find missing
regions
 Those that have holes or overlaps in their row key ranges 69

70
${HBASE_HOME}/bin/hbase hbck -h
Usage: fsck [opts]
where [opts] are:
-details Display full report of all regions.
-timelag {timeInSeconds} Process only regions that have not experienced
any metadata updates in the last {{timeInSeconds} seconds.
-fix Try to fix some of the errors.
-sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checking
if the fix worked if run with -fix
-summary Print only summary of the tables and status.

 No option at all invokes the normal output detail
71
${HBASE_HOME}/bin/hbase hbck
Number of Tables: 40
Number of live region servers: 19
Number of dead region servers: 0
Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
Summary:
...
testtable2 is okay.
Number of regions: 1
Deployed on: host11.foo.com:60020
0 inconsistencies detected.
Status: OK

 ${HBASE_HOME}/bin/hbase hbck -fix
 Repairs following issues
 Assign .META. to a single new server if it is unassigned
 Reassign .META. to a single new server if it is assigned to
multiple servers
 Assign a user table region to a new server if it is unassigned
 Reassign a user table region to a single new server if it is
assigned to multiple servers
 Reassign a user table region to a new server if the current
server does not match
 what the .META. table refers to
 hbck reports inconsistencies which are temporal, or
transitional only
 Rerun the tool a few times to confirm a permanent problem
72

TROUBLESHOOTING – ANALYZING THE LOGS (1/2)
Server type Default Logfile tm settings
HBase Master
$HBASE_HOME/logs/hbase-<user>-master-
<hostname>.log
/var/log/hbase/hbase-<user>-master-
<hostname>.log
HBase
RegionServer
$HBASE_HOME/logs/hbase-<user>-regionserver-
<hostname>.log
/var/log/hbase/hbase-<user>-regionserver-
<hostname>.log
ZooKeeper Console log output only
/var/log/hbase/hbase-<user>-zookeeper-
<hostname>.log
NameNode
$HADOOP_HOME/logs/hadoop-<user>-namenode-
<hostname>.log
/var/log/hadoop/hadoop-<user>-namenode-
<hostname>.log
DataNode
$HADOOP_HOME/logs/hadoop-<user>-datanode-
<hostname>.log
/var/log/hadoop/hadoop-<user>-datanode-
<hostname>.log
JobTracker
$HADOOP_HOME/logs/hadoop-<user>-jobtracker-
<hostname>.log
/var/log/hadoop/hadoop-<user>-jobtracker-
<hostname>.log
TaskTracker
$HADOOP_HOME/logs/hadoop-<user>-jobtracker-
<hostname>.log
/var/log/hadoop/hadoop-<user>-jobtracker-
<hostname>.log
73

TROUBLESHOOTING – ANALYZING THE LOGS (2/2)
 Is useful to begin with the master logfile first
 It acts as the coordinator service of the entire cluster
 Find the processes began logging ERROR level
messages
 Be able to identify the root cause
 A lot of subsequent messages are often side-effect of the
original problem
 Recommend to use the error log event metric under
System Event Metrics group
 Gives you a graph showing you where the server(s)
started logging an increasing number of error messages
in the logfiles
 If find an error message
 Google it !!
 Use the online resources to search for the message in
the public mailing lists
 Search Hadoop
74

HANDS-ON – USE YCSB
 New VM list
 Due to VMs are not affordable at present :p
 ${YOUR_HOME}=${GIT_HOME}/hbase-
training/006/hands-on/${YOUR_NAME}
 mkdir ${YOUR_HOME}
 cd ${YOUR_HOME}; cp -rf ../../ycsb/* .
 Use HBase shell
 create <YOUR_NAMED_TABLE>, “family”
 Run YCSB with 5000 record count
 And ouput ycsb-load.log file
 Hands-on result
 Put the ycsb-load.log file under ${YOUR_HOME}
75

006 performance tuningandclusteradmin

More Related Content

What's hot (20)

Similar to 006 performance tuningandclusteradmin (20)

More from Scott Miao (7)

Recently uploaded (20)

006 performance tuningandclusteradmin