SlideShare a Scribd company logo
PERFORMANCE TUNING & CLUSTER
ADMINISTRATION
2012/8/2
Scott Miao
AGENDA
 Course Credit
 Performance Tuning
 More…
 Cluster Administration
 More…
2
COURSE CREDIT
 Show up, 30 scores
 Ask question, each question earns 5 scores
 Hands-on, 40 scores
 70 scores will pass this course
 Each course credit will be calculated once for each
course finished
 The course credit will be sent to you and your
supervisor by mail
3
PERFORMANCE TUNING
 Garbage Collection Tuning
 MSLAB
 Compression
 Optimizing Splits and Compactions
 Load Balancing
 Merging Regions
 Client API: Best Practices
 Configuration
 Load Tests
4
GARBAGE COLLECTION TUNING
 The process to rewrite the heap generation in
question is called a garbage collection (GC)
 GC parameters only need to be added to the region
servers
 JRE comes with basic assumptions
 Regarding what your programs are doing, how they
create objects, how they allocate the heap to handle
data, and so on
 These assumptions work well in a lot of cases
 But NOT work well for HBase…
 Especially write-heavy ones
 It cannot safely rely on the JRE assumption alone 5
6
https://service.ithome.com.tw/20120720Java/index3.html#3
7
GARBAGE COLLECTION TUNING –
WRITE-HEAVY USE CASES (1/2)
 Memstore flushes the data by the configured minimum
flush size, hbase.hregion.memstore.flush.size
 It leaves different size of holes in the heap
 Data resided in different locations in the generational
architecture of the Java heap
 Depending on how long the data was in memory
 Young generation (new generation)
 The space can be reclaimed quickly and no harm is done
 Old generation (tenured generation)
 Data promoted to this location if it stays in memory for a longer
period of time
8
GARBAGE COLLECTION TUNING –
WRITE-HEAVY USE CASES (2/2)
 Reuse the holes created by data that has been written
to disk
 Requests a size of heap that does not fit into one of
those holes
 Needs to compact the fragmented heap
 Young to Old
 The promotion of longer-living objects from the young to the old
generation
 Old to Stop-The-World
 There is no longer enough space for a young allocation caused by
the fragmentation
 Falls back to the stop-the-world garbage collector
 Rewrites the entire heap space and compacts it to the remaining
active objects
 If this fails, you will see a promotion failure in your
garbage collection logs
9
10
What is the Heap looks like ?
GARBAGE COLLECTION TUNING –
SPECIFY THE YOUNG GENERATION SIZE
 Young generation
 is between 128 MB and 512 MB
 Old generation
 holds the remaining available heap, which is usually
many gigabytes of memory
 Using 128 MB is a good starting point
 Further observation of the JVM metrics should be
conducted
 Specify the young generation size like so
 -XX:MaxNewSize=128m -XX:NewSize=128m
 One convenient option
 -Xmn128m
11
GARBAGE COLLECTION TUNING –
GC OPTIONS SETTING
 GC Options setting for HBase
 Adding them in the hbase-env.sh configuration file
 HBASE_OPTS variable for all HBase
 HBASE_REGIONSERVER_OPTS variable for all region
servers
 Enable the JRE’s log output for garbage collection
details
 Monitor it for occurrences of
 "concurrent mode failure" or "promotion
failed" messages 12
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
GARBAGE COLLECTION TUNING –
GC STRATEGY FOR YOUNG GENERATION
 Recommended value for young generation
 -XX:+UseParNewGC
 Use the Parallel New Collector
 It stops the entire Java process to clean up the young
generation heap
 Since Young generation’s size is small in comparison
 Usually less than a few hundred milliseconds
13
GARBAGE COLLECTION TUNING –
GC STRATEGY FOR OLD GENERATION
 Recommended value for old generation
 -XX:+UseConcMarkSweepGC
 Use the Concurrent Mark-Sweep Collector (CMS)
 It tries to do as much work concurrently as
possible, without stopping the Java process
 It takes extra effort and an increased CPU load
 Avoids the required stops to rewrite a fragmented old
generation heap
 If you hit the promotion error
 It falls back to stop-the-world again
14
GARBAGE COLLECTION TUNING –
GC STRATEGY FOR OLD GENERATION
 A switch for CMS
 -XX:CMSInitiatingOccupancyFraction=70
 A percentage that specifies when the background
process starts
 Avoids the concurrent mode failure
 The background process to mark and sweep the heap for
collection is still running when the heap runs out of usable
space
 Falls back to stop-the-world again
 Initiating occupancy fraction to 70%
 20% block cache + 40% memstore limits = 60%, by default
 Starts the background process at appropriate time
 Early enough, and not too early 15
GARBAGE COLLECTION TUNING - SUMMARY
 Recommended GC options
 The Alex Su’s GC options
 GC Options Reference
16
export HBASE_REGIONSERVER_OPTS=
"-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:<%= hbase_log_path %>/hbase-regionserver-gc-`date +%F-%H-%M-%S`.log 
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled 
-XX:CMSInitiatingOccupancyFraction=70 -XX:PrintFLSStatistics=1 
-XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=<%= hbase_log_path %>/hbase-regionserver.hprof
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-
140102.html
MSLAB - QUESTION
 For solving the stop-the-world issue
 Stop-the-world
 The key to reducing these compacting collections is to
reduce fragmentation
 Only objects of exactly the same size should be
allocated from the heap
 Subsequent allocations of new objects of the exact same size
will always reuse these holes
 No promotion error, and therefore no stop-the-world
compacting collection is required
17
MSLAB –
MEMSTORE-LOCAL ALLOCATION BUFFER (1/3)
 Are buffers of fixed sizes containing KeyValue
instances of varying sizes
1. A buffer cannot completely fit a newly added
KeyValue, it is considered full
2. And a new buffer is created, once again of the given
fixed size
 Enabled by default in version 0.92
 Disabled in version 0.90 of HBase
 hbase.hregion.memstore.mslab.enabled property
 It is recommended that test your setup with this
feature 18
MSLAB –
MEMSTORE-LOCAL ALLOCATION BUFFER (2/3)
 The size of each allocated, fixed-sized buffer
 hbase.hregion.memstore.mslab.chunksize property
 Default is 2 MB
 Based on your KeyValue instances, you may have to adjust
this value
 E.g., 100 KB in size, you need to increase the MSLAB size to fit
more than just a few cells
 An upper boundary of what is stored in the buffers
 hbase.hregion.memstore.mslab.max.allocation property
 Default 256 KB
 Any cell (KeyValue) that is larger will be directly allocated in
the Java heap 19
MSLAB –
MEMSTORE-LOCAL ALLOCATION BUFFER (3/3)
 MSLAB do not come without a cost
 More wasteful in regard to heap usage
 Most likely not fill every buffer to the last byte
 A Tradeoff
 Use MSLABs and benefit from better garbage collection but
incur the extra space that is required
 NOT use MSLABs and benefit from better memory
efficiency but deal with the problem caused by garbage
collection pauses
 Could plan to restart the servers every few days, or weeks, before
the pause happens
 The buffers require an additional byte array copy
operation, therefore slightly slower
 Measure the impact on your workload
20
COMPRESSION
 A number of compression algorithms that can be
enabled at the column family level
 It is recommended
 Enable compression unless you have a reason not to do
so
 For example, when using already compressed content, such
as JPEG images
 Compression usually will yield overall better
performance
 The overhead of the CPU performing the compression
/de-compression is less than what is required to read
more data from disk
21
COMPRESSION – AVAILABLE CODECS
 It is recommended
 Snappy/Zippy (in Bigtable)
 Released by Google under the BSD License
 Ships with the required JNI libraries to be able to use it in HBase-0.92
 Must install the native binary library on all region servers
 LZO (Lempel-Ziv-Oberhumer)
 A lossless data compression algorithm that is focused on
decompression speed, and written in ANSI C
 HBase cannot ship with LZO because of licensing issues
 incompatible GNU General Public License (GPL)
 LZO installation needs to be performed separately, after HBase has
been installed
22
http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437
COMPRESSION –
COMPRESSION TEST TOOL
 Use command
 hbase org.apache.hadoop.hbase.util.CompressionTest
<path> <none|gz|lzo|snappy>
 Example
 ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest 
/user/larsgeorge/test.gz gz
 It will return result based on the test
 If success
 If failed
23
…
SUCCESS
Exception in thread "main" java.lang.RuntimeException: 
java.lang.ClassNotFoundException:
com.hadoop.compression.lzo.LzoCodec
…
COMPRESSION – STARTUP CHECK
 A fast failing setup notices the missing libraries
 Instead of running into issues later
 For example, check the Snappy and LZO
compression libraries
 The server will abort at startup with an IOException
stating
 "Compression codec <codec-name> not
supported, aborting RS construction"
 Copy the changed configuration file to all region
servers and to restart them afterward
24
<property>
<name>hbase.regionserver.codecs</name>
<value>snappy,lzo</value>
</property>
COMPRESSION – ENABLING COMPRESSION
 Install the JNI libraries
 Install native compression libraries
 Specifying the chosen algorithm in the column family schema
 In HBase shell
 create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }
 In API
 HColumnDescriptor.setCompressionType(…)
 Refer to ppt#003, p#11
25
OPTIMIZING SPLITS AND COMPACTIONS
- SPLIT/COMPACTION STORMS
 Grow your regions roughly at the same rate
 Eventually they all need to be split at about the
same time
 A large spike in disk I/O because of the required
compactions to rewrite the split region
 Refer to ppt#004, p#13
26
OPTIMIZING SPLITS AND COMPACTIONS –
MANAGED SPLITTING (1/2)
 you can turn it off and manually invoke the split and
major_compact commands
 Setting Region Maximum File Size
 hbase.hregion.max.filesize property for the entire cluster
 table level by API
 HTableDescriptor.setMaxFileSize(…)
 Refer to ppt#003, p#7
 To a very high number
 Better to set this value to a reasonable upper boundary
 Such as 100GB
 Long.MAX_VALUE is not recommended in case the manual
splits fail to run
 Then you can time-control them
 Running them staggered across all regions
 Spreads the I/O load as much as possible, avoiding any
split/compaction storm
 Use HBase shell + cron
 Or write your own codes with HBase Admin API supports
 Refer to #003, p#21
27
OPTIMIZING SPLITS AND COMPACTIONS –
MANAGED SPLITTING (2/2)
 RegionSplitter Class (added in version 0.90.2)
 Another way to split existing regions
 Rolling split feature
 Split the existing regions while waiting long enough for the
involved compactions to complete
 API docs
 An additional advantage
 Have better control over which regions are available at
any time
 In rare case, you need to do very low-level debugging
 With automated splits, it is hard to debug !!
 Due to this region is split to two daughter regions
28
OPTIMIZING SPLITS AND COMPACTIONS –
REGION HOTSPOTTING
 You may be dealing with a write pattern that is causing a
specific region to run hot
 Use Region Server Metrics to observe
 Refer to ppt#005, p#12
 Key design approaches
 Salt keys, random keys, etc
 Refer to ppt#004, p#52
 Other only way to alleviate this situation
 Manually split a hot region into one or more new regions, at
exact boundaries
 You can specify any row key within specific region
 Be able to generate halves that are completely different in size
 Refer ppt#003, p#21
 This can not dealing with completely sequential key ranges
 Those are always going to hit one region for a considerable amount
of time
29
OPTIMIZING SPLITS AND COMPACTIONS –
PRESPLITTING REGIONS (1/3)
 Manage splits manually is useful
 Therefore start with a larger number of regions right from
the table creation
 Means to create a table with the required number of
regions
 Three ways…
 HBase shell
 create, refer to ppt#003, p#37
 API
 HBaseAdmin.createTable(…), refer to ppt#003, p#16
 RegionSplitter Class
 By default, MD5StringSplit class to partition the row keys into
ranges
 Use -D split.algorithm=<your-algorithm-class> for other
implementation
30
/bin/hbase org.apache.hadoop.hbase.util.RegionSplitter
usage: RegionSplitter <TABLE>
OPTIMIZING SPLITS AND COMPACTIONS –
PRESPLITTING REGIONS (2/3)
 RegionSplitter with MD5StringSplit sample
31
testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093.
testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a.
testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92.
testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258.
testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9.
testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5.
OPTIMIZING SPLITS AND COMPACTIONS –
PRESPLITTING REGIONS (3/3)
 How many presplit regions ?
 Start low with 10 presplit regions per server and watch as data
grows over time
 It is better to err on the side of too few regions and using a
rolling split later
 If Presplit regions to thin
 Increase hbase.hregion.majorcompaction property
 Refet to ppt#004, p# 19
 If data size grows too large
 Use the RegionSplitter utility to perform a rolling split of all
regions
 The main objective is to avoid split/compaction storm
32
LOAD BALANCING – BALANCER (1/3)
 The master has a built-in feature
 Called the balancer
 By default, runs every five minutes
 hbase.balancer.period property
 Attempts to equal out the number of assigned
regions per region server
 Within one region of the average number per server
 Determines a new assignment plan
 Describes which regions should be moved where starts
the process of moving the regions by calling the
unassign() method
 Refer to ppt#003, p#22 33
LOAD BALANCING - BALANCER (2/3)
 balancer has an upper limit on how long it is allowed to
run
 hbase.balancer.max.balancing property
 defaults to half of the balancer period value
 2.5 mins
 The balancer switch
 Toggle the balancer status between enabled and disabled
 HBase shell
 balance_switch command, refer to ppt#003, p#39
 balanceSwitch() API method, refer to ppt#003, p#22
34
LOAD BALANCING - BALANCER (3/3)
 Can be explicitly started
 HBase shell
 balancer command, refer to ppt#003, p#39
 balancer() API method, refer to ppt#003, p#22
 Return true
 Any work has be done
 Return false
 balancer was switched off
 No work to be done
 balancer was not able to run the balancer
 There is a region currently in transition, the balancer will be
skipped
35
LOAD BALANCING - MOVE
 Can also use the move
 To assign regions to other servers
 HBase shell
 move command, refer to ppt#003, p#39
 move() API method, refer to ppt#003, p#22
36
MERGING REGIONS
 Sometimes you may need to merge regions
 For example, after you have removed a large amount of
data and you want to reduce the number of regions
hosted by each server
 HBase allows you to merge two adjacent regions
 The HBase cluster must be offline, but HDFS
37
/bin/hbase org.apache.hadoop.hbase.util.Merge
Usage: bin/hbase merge <table-name> <region-1> <region-2>
CLIENT API: BEST PRACTICES (1/3)
 Disable auto-flush
 When performing a lot of put operations
 Refer to ppt#002, p#9
 Use scanner-caching
 Set Scan.setCaching() method to something greater than the
default of 1 if needed
 Refer to ppt#002, p#26
 Limit scan scope
 If only a small number of the available columns are to be
processed, only those should be specified in the input scan
 For example, use Scan.addFamily() method
 Refer to ppt#002, p#24 38
CLIENT API: BEST PRACTICES (2/3)
 Close ResultScanners
 Avoiding performance problems
 This may cause problems on the region servers
 Refer to ppt#002, p#25
 Block cache usage
 Scan instances can be set to use the block cache in the
region server via the setCacheBlocks() method
 true by default, default settings of the table and family
are used
 API docs
 Server side block cache settings
 Refer to ppt#003, p#12 39
CLIENT API: BEST PRACTICES (3/3)
 Optimal loading of row keys
 When performing a table scan where only the row keys
are needed
 a FilterList with a MUST_PASS_ALL operator +
FirstKeyOnlyFilter + KeyOnlyFilter
 Refer to ppt#002, p#43 & 46
 Turn off WAL on Puts
 Increasing throughput on Puts is to call
writeToWAL(false), there might be data loss
 Consider to use the bulk loading techniques instead
40
CONFIGURATION (1/6)
 Advanced options you can consider adjusting
based on your use case
 Most properties are configured in hbase-site.xml
 Others are in hbase-env.sh
 Decrease ZooKeeper timeout
 The default timeout between a region server and the
ZooKeeper quorum is three minutes
 Tune the timeout down to a minute, or even less, so the
master notices failures sooner
 zookeeper.session.timeout property
 Be careful of ―Juliet Pause‖ 41
CONFIGURATION (2/6)
 Increase handlers
 The number of threads that are kept open to answer
incoming requests to user tables
 By default is 10
 hbase.regionserver.handler.count property
 Keep this number low when the payload per request
approaches megabytes
 And high when the payload is small
 Increase heap settings
 HBASE_HEAPSIZE setting in hbase-env.sh file
 Consider using HBASE_REGIONSERVER_OPTS
instead of changing the global HBASE_HEAP SIZE
 Region servers may need more memory than Master
42
CONFIGURATION (3/6)
 Enable data compression
 Should enable compression for the storage files
 In most cases, boosts performance
 Increase region size
 Consider going to larger regions to cut down on the total
number of regions on your cluster
 Fewer regions to manage makes for a smoother-running
cluster
43
CONFIGURATION (4/6)
 Adjust block cache size
 The amount of heap used for the block cache is specified as a
percentage
 Defaults to 20%
 perf.hfile.block.cache.size property
 It is good if you have mainly reading workloads
 Adjust memstore limits
 Memstore heap usage
 hbase.regionserver.global.memstore.upperLimit property
 Defaults to 40%
 hbase.regionserver.global.memstore.lowerLimit property
 Defaults to 35%
 Control the amount of flushing that will take place once the server is
required to free heap space
 Mainly read-oriented workloads
 Consider reducing both limits to make more room for the block cache
 Handling many writes
 Increase the memstore limits to reduce the excessive amount of I/O
this causes 44
CONFIGURATION (5/6)
 Increase blocking store files
 The region servers block further updates from clients to
give compactions time to reduce the number of files
 Default is seven files
 hbase.hstore.blockingStoreFiles property
 Increase block multiplier
 A safety latch that blocks any further updates from clients
when the memstores exceed the multiplier * flush size limit
 hbase.hregion.memstore.block.multiplier property
 Default to 2
 If you have enough memory, can increase this value to
handle spikes more gracefully
 Refer to ppt#003, p#8 45
CONFIGURATION (6/6)
 Decrease maximum logfiles
 How often flushes occur based on the number of WAL
files on disk
 Default is 32
 hbase.regionserver.maxlogs property
 Can be high in a write-heavy use case
 Lower it to force the servers to flush data more often to
disk
46
LOAD TESTS
 It is advisable to run performance tests to verify
functionality of your cluster
 These tests give you a baseline which you can refer
to
 After making changes to the configuration of the cluster
 Or the schemas of your tables
 Doing a burn-in of your cluster
 Show you how much you can gain from it
 But this does not replace a test with the load as
expected from your use case
47
LOAD TESTS –
PERFORMANCE EVALUATION (1/2)
 HBase ships with its own tool to execute a
performance evaluation
 Performance Evaluation (PE)
 Wiki
 http://wiki.apache.org/hadoop/Hbase/PerformanceEvalu
ation
48
/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation 
[--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>
LOAD TESTS –
PERFORMANCE EVALUATION (2/2)
 Example
49
/bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
11/07/03 13:18:34 INFO hbase.PerformanceEvaluation: Start class 
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at 
offset 0 for 1048576 rows
...
11/07/03 13:18:41 INFO hbase.PerformanceEvaluation: 0/104857/1048576
...
11/07/03 13:18:45 INFO hbase.PerformanceEvaluation: 0/209714/1048576
...
11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: 0/1048570/1048576
11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: Finished class 
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 
in 89062ms at offset 0 for 1048576 rows
LOAD TESTS – YCSB (1/2)
 Yahoo! Cloud Serving Benchmark* (YCSB)
 It is a suite of tools that can be used to run comparable
workloads against different storage systems
 Also a reasonable tool for performing an HBase cluster burn-
in—or performance test
 Using YCSB is preferred over the HBase-supplied
Performance Evaluation
 Offers more options
 Can combine read and write workloads
 Home page
 http://research.yahoo.com/Web_Information_Management/YCSB
50
LOAD TESTS – YCSB (2/2)
 Use HBase shell
 create “usertable”, “family”
 git pull
 cd ${GIT_HOME}/hbase-training/006/ycsb
 Run command
 Then you can see performance metrics in ycsb-
laod.log file
51
java -cp "${HBASE_CONF_DIR}:core-0.1.4.jar:hbase-binding-0.1.4.jar"
com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P
workloads/workloada -p columnfamily=family -p recordcount=1000 -s > ycsb-
load.log
CLUSTER ADMINISTRATION
52
 Operational Tasks
 Node Decommission
 Rolling Restarts
 Adding Backup
Master
 Adding a Region
Server
 Data Task
 Export
 Import
 CopyTable Tool
 Bulk Import
 Troubleshooting
 HBase Fsck
 Analyzing the Logs
OPERATIONAL TASKS – NODE DECOMMISSION (1/2)
 Use following script
 In normal HBase distribution
 In tm distribution
 Disable the Load Balancer before
Decommissioning a node
 In hbase shell
 balance_switch false
 Regions could be offline for a good period of time
 Many regions on the server
 All regions close
 The master notices the region server’s ZooKeeper
znode being removed
53
${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver
${TM_PUPPET_HOME}/bin/services/shutdown-regionservers.sh [<host> ...]
OPERATIONAL TASKS – NODE DECOMMISSION (2/2)
 Stop a region server gradually
 A node to gradually shed its load and then shut itself
down
 From HBASE 0.90.2
 ${HBASE_HOME}/bin/graceful_stop.sh
 Example
 Check the HOSTNAME on your HBase master UI
 Refer to ppt#003, p#41
 IP address is NOT supported at present
54
${HBASE_HOME}/bin/graceful_stop.sh HOSTNAME
OPERATIONAL TASKS – ROLLING RESTARTS
 Also use graceful_stop.sh
 Steps as follows
1. Ensure the cluster is consistent
 Fix it if inconsistent
2. Restart the master
3. Disable the region balancer
4. Run the graceful_stop.sh script per region server
5. Restart the master again
 Clear out the dead servers list and reenable the balancer
6. Run hbck to ensure the cluster is consistent
55
hbase hbck
hbase hbck -fix
${HBASE_HOME}/bin/hbase-daemon.sh stop master; 
${HBASE_HOME}/bin/hbase-daemon.sh start master
echo "balance_switch false" | ${HBASE_HOME}/bin/hbase shell
for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh 
--restart --reload --debug $i; done &> /tmp/log.txt &
OPERATIONAL TASKS –
ADDING BACKUP MASTER (1/2)
 To prevent the Single Point of Failure
 The machine currently hosting the active master is
failing, the system can fall back to a backup master
 Underlying operations
1. A dedicated ZooKeeper znode /hbase/master
2. All master processes will race to create, and the first
one to create it wins (become currently master)
 It happens at startup
3. All other master processes simply loop around the
znode check and wait for it to disappear
 Triggering the race again 56
OPERATIONAL TASKS –
ADDING BACKUP MASTER (2/2)
 How to start multiple backup master processes
 Use original way to start a master process
 In tm distribution
 Specifically start a backup master process
57
${HBASE_HOME}/bin/hbase-daemon.sh start master
${TM_PUPPET_HOME}/bin/services/startup-hmaster.sh [<host> ...]
${HBASE_HOME}/bin/hbase-daemon.sh start master --backup
OPERATIONAL TASKS –
ADDING A REGION SERVER
 In normal HBase distribution
 Edit the ${HBASE_HOME}/conf/regionservers
 To add newly added region server’s host name
 Two scripts can use…
 ${HBASE_HOME}/bin/start-hbase.sh
 It will bypass the original existing region servers, and start
the newly added region server referred to regionservers file
 ${HBASE_HOME}/bin/hbase-daemon.sh start regionserver
 Must executing on the newly added region server
 In tm distribution
 New feature, not talk about this here
58
DATA TASK
 You may be required to move the data as a whole
or in parts
 Archive data for backup purposes
 To bootstrap another cluster
59
hadoop jar ${HBASE_HOME}/hbase-0.91.0-SNAPSHOT.jar
An example program must be given as the first argument.
Valid program names are:
…
completebulkload: Complete a bulk data load.
copytable: Export a table from local cluster to peer cluster
export: Write table data to HDFS.
import: Import data written by Export.
importtsv: Import data in TSV format.
…
http://hbase.apache.org/book/ops_mgt.html
DATA TASK – EXPORT (1/3)
60
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export
Usage: Export [-D <property=value>]* <tablename> <outputdir> 
[<versions> [<starttime> [<endtime>]]
DATA TASK - EXPORT (2/3)
61
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export 
testtable /user/larsgeorge/backup-testtable
11/06/25 15:58:29 INFO mapred.JobClient: Running job: job_201106251558_0001
11/06/25 15:58:30 INFO mapred.JobClient: map 0% reduce 0%
…
11/06/25 15:59:40 INFO mapred.JobClient: map 100% reduce 0%
11/06/25 15:59:42 INFO mapred.JobClient: Job complete: job_201106251558_0001
11/06/25 15:59:42 INFO mapred.JobClient: Counters: 6
11/06/25 15:59:42 INFO mapred.JobClient: Job Counters
11/06/25 15:59:42 INFO mapred.JobClient: Rack-local map tasks=32
11/06/25 15:59:42 INFO mapred.JobClient: Launched map tasks=32
11/06/25 15:59:42 INFO mapred.JobClient: FileSystemCounters
11/06/25 15:59:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3648
11/06/25 15:59:42 INFO mapred.JobClient: Map-Reduce Framework
11/06/25 15:59:42 INFO mapred.JobClient: Map input records=0
11/06/25 15:59:42 INFO mapred.JobClient: Spilled Records=0
11/06/25 15:59:42 INFO mapred.JobClient: Map output records=0
DATA TASK - EXPORT (3/3)
 Each part-m-nnnnn file contains a piece of the
exported data
 Together they form the full backup of the table
 Use the hadoop distcp command to move the
directory from one cluster to another, and perform
the import there 62
hadoop dfs -lsr /user/larsgeorge/backup-testtable
drwxr-xr-x - ... 0 2011-06-25 15:58 _logs
-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00000
-rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00001
…
-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00030
-rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00031
DATA TASK – IMPORT (1/2)
63
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import
Usage: Import <tablename> <inputdir>
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import 
testtable /user/larsgeorge/backup-testtable
11/06/25 17:09:48 INFO mapreduce.TableOutputFormat: Created table instance 
for testtable
11/06/25 17:09:48 INFO input.FileInputFormat: Total input paths to process : 32
11/06/25 17:09:49 INFO mapred.JobClient: Running job: job_201106251558_0003
11/06/25 17:09:50 INFO mapred.JobClient: map 0% reduce 0%
11/06/25 17:10:04 INFO mapred.JobClient: map 6% reduce 0%
…
11/06/25 17:10:51 INFO mapred.JobClient: Job Counters
11/06/25 17:10:51 INFO mapred.JobClient: Launched map tasks=32
11/06/25 17:10:51 INFO mapred.JobClient: Data-local map tasks=32
11/06/25 17:10:51 INFO mapred.JobClient: FileSystemCounters
11/06/25 17:10:51 INFO mapred.JobClient: HDFS_BYTES_READ=3648
11/06/25 17:10:51 INFO mapred.JobClient: Map-Reduce Framework
11/06/25 17:10:51 INFO mapred.JobClient: Map input records=0
11/06/25 17:10:51 INFO mapred.JobClient: Spilled Records=0
11/06/25 17:10:51 INFO mapred.JobClient: Map output records=0
DATA TASK - IMPORT (2/2)
 Use the Import job to store the data in a different
table
 With the same schema
 Both export/import commend are per-table only
 Use hadoop distcp command to copy the entire
/hbase in HDFS
 Not recommended
 May copy store files that are halfway through a
memstore flush operation
64
DATA TASK – COPYTABLE TOOL (1/2)
 Designed to bootstrap cluster replication
 Make a copy of an existing table from the master
cluster to the slave cluster
65
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable
Usage: CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X]
[--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
DATA TASK – COPYTABLE TOOL (2/2)
 The copy of the table is stored on the same cluster
66
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable 
--new.name=testtable3 testtable
11/06/26 15:20:07 INFO mapreduce.TableOutputFormat: 
Created table instance for testtable3
11/06/26 15:20:07 INFO mapred.JobClient: Running job: job_201106261454_0003
11/06/26 15:20:08 INFO mapred.JobClient: map 0% reduce 0%
11/06/26 15:20:19 INFO mapred.JobClient: map 6% reduce 0%
…
11/06/26 15:21:04 INFO mapred.JobClient: map 100% reduce 0%
11/06/26 15:21:06 INFO mapred.JobClient: Job complete: job_201106261454_0003
11/06/26 15:21:06 INFO mapred.JobClient: Counters: 5
11/06/26 15:21:06 INFO mapred.JobClient: Job Counters
11/06/26 15:21:06 INFO mapred.JobClient: Launched map tasks=32
11/06/26 15:21:06 INFO mapred.JobClient: Data-local map tasks=32
11/06/26 15:21:06 INFO mapred.JobClient: Map-Reduce Framework
11/06/26 15:21:06 INFO mapred.JobClient: Map input records=0
11/06/26 15:21:06 INFO mapred.JobClient: Spilled Records=0
11/06/26 15:21:06 INFO mapred.JobClient: Map output records=0
DATA TASK – BULK IMPORT (1/2)
 Importtsv tool
 Given files containing data in tab-separated value (TSV)
format
 By default , it uses the HBase put() API to insert data
into HBase one row at a time
 By setting importtsv.bulk.output option, generate files
using HFileOutputFormat
 These can subsequently be bulk-loaded into HBase by
completebulkload Tool
67
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importtsv
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
DATA TASK – BULK IMPORT (2/2)
 completebulkload Tool
 Is used to import the data into the running cluster
 After a data import has been prepared
 By using the importtsv tool with the importtsv.bulk.output
option
 By some other MapReduce job using the
HFileOutputFormat
68
hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar completebulkload 
-conf ~/my-hbase-site.xml /user/larsgeorge/myoutput mytable
TROUBLESHOOTING – HBASE FSCK (1/4)
 Shell Command
 ${HBASE_HOME}/bin/hbase hbck
 Once started
 Scans the .META. table to gather all the pertinent information
it holds
 Scans the HDFS root directory HBase is configured to use
 Compare the collected details to report on inconsistencies
and integrity issues
 Consistency check
 Whether the region is listed in .META. and exists in HDFS
 Is also assigned to exactly one region server
 Integrity check
 Compares the regions with the table details to find missing
regions
 Those that have holes or overlaps in their row key ranges 69
TROUBLESHOOTING – HBASE FSCK (2/4)
70
${HBASE_HOME}/bin/hbase hbck -h
Usage: fsck [opts]
where [opts] are:
-details Display full report of all regions.
-timelag {timeInSeconds} Process only regions that have not experienced
any metadata updates in the last {{timeInSeconds} seconds.
-fix Try to fix some of the errors.
-sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checking
if the fix worked if run with -fix
-summary Print only summary of the tables and status.
TROUBLESHOOTING – HBASE FSCK (3/4)
 No option at all invokes the normal output detail
71
${HBASE_HOME}/bin/hbase hbck
Number of Tables: 40
Number of live region servers: 19
Number of dead region servers: 0
Number of empty REGIONINFO_QUALIFIER rows in .META.: 0
Summary:
...
testtable2 is okay.
Number of regions: 1
Deployed on: host11.foo.com:60020
0 inconsistencies detected.
Status: OK
TROUBLESHOOTING – HBASE FSCK (4/4)
 ${HBASE_HOME}/bin/hbase hbck -fix
 Repairs following issues
 Assign .META. to a single new server if it is unassigned
 Reassign .META. to a single new server if it is assigned to
multiple servers
 Assign a user table region to a new server if it is unassigned
 Reassign a user table region to a single new server if it is
assigned to multiple servers
 Reassign a user table region to a new server if the current
server does not match
 what the .META. table refers to
 hbck reports inconsistencies which are temporal, or
transitional only
 Rerun the tool a few times to confirm a permanent problem
72
TROUBLESHOOTING – ANALYZING THE LOGS (1/2)
Server type Default Logfile tm settings
HBase Master
$HBASE_HOME/logs/hbase-<user>-master-
<hostname>.log
/var/log/hbase/hbase-<user>-master-
<hostname>.log
HBase
RegionServer
$HBASE_HOME/logs/hbase-<user>-regionserver-
<hostname>.log
/var/log/hbase/hbase-<user>-regionserver-
<hostname>.log
ZooKeeper Console log output only
/var/log/hbase/hbase-<user>-zookeeper-
<hostname>.log
NameNode
$HADOOP_HOME/logs/hadoop-<user>-namenode-
<hostname>.log
/var/log/hadoop/hadoop-<user>-namenode-
<hostname>.log
DataNode
$HADOOP_HOME/logs/hadoop-<user>-datanode-
<hostname>.log
/var/log/hadoop/hadoop-<user>-datanode-
<hostname>.log
JobTracker
$HADOOP_HOME/logs/hadoop-<user>-jobtracker-
<hostname>.log
/var/log/hadoop/hadoop-<user>-jobtracker-
<hostname>.log
TaskTracker
$HADOOP_HOME/logs/hadoop-<user>-jobtracker-
<hostname>.log
/var/log/hadoop/hadoop-<user>-jobtracker-
<hostname>.log
73
TROUBLESHOOTING – ANALYZING THE LOGS (2/2)
 Is useful to begin with the master logfile first
 It acts as the coordinator service of the entire cluster
 Find the processes began logging ERROR level
messages
 Be able to identify the root cause
 A lot of subsequent messages are often side-effect of the
original problem
 Recommend to use the error log event metric under
System Event Metrics group
 Gives you a graph showing you where the server(s)
started logging an increasing number of error messages
in the logfiles
 If find an error message
 Google it !!
 Use the online resources to search for the message in
the public mailing lists
 Search Hadoop
74
HANDS-ON – USE YCSB
 New VM list
 Due to VMs are not affordable at present :p
 ${YOUR_HOME}=${GIT_HOME}/hbase-
training/006/hands-on/${YOUR_NAME}
 mkdir ${YOUR_HOME}
 cd ${YOUR_HOME}; cp -rf ../../ycsb/* .
 Use HBase shell
 create <YOUR_NAMED_TABLE>, “family”
 Run YCSB with 5000 record count
 And ouput ycsb-load.log file
 Hands-on result
 Put the ycsb-load.log file under ${YOUR_HOME}
75

More Related Content

PDF
003 admin featuresandclients
PPTX
002 hbase clientapi
PDF
004 architecture andadvanceduse
PPTX
001 hbase introduction
PPTX
005 cluster monitoring
PPT
HBase at Xiaomi
PPTX
HBaseCon 2013: A Developer’s Guide to Coprocessors
PPTX
HBase Low Latency
003 admin featuresandclients
002 hbase clientapi
004 architecture andadvanceduse
001 hbase introduction
005 cluster monitoring
HBase at Xiaomi
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBase Low Latency

What's hot (20)

PDF
HBase Application Performance Improvement
PDF
Apache HBase Low Latency
PDF
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
PPTX
HBase Low Latency, StrataNYC 2014
PDF
PostgreSQL Streaming Replication Cheatsheet
PDF
plProxy, pgBouncer, pgBalancer
PPT
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
PDF
HBaseCon2017 Improving HBase availability in a multi tenant environment
ODP
PostgreSQL Replication in 10 Minutes - SCALE
PDF
PostgreSQL Performance Tuning
PDF
Accelerating HBase with NVMe and Bucket Cache
PPTX
HBase Coprocessor Introduction
PDF
Streaming replication in practice
KEY
Replication, Durability, and Disaster Recovery
PDF
Tuning Java for Big Data
PPTX
Keynote: Apache HBase at Yahoo! Scale
PDF
Concurrency
PDF
PGPool-II Load testing
 
PDF
Mastering PostgreSQL Administration
 
PDF
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
HBase Application Performance Improvement
Apache HBase Low Latency
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase Low Latency, StrataNYC 2014
PostgreSQL Streaming Replication Cheatsheet
plProxy, pgBouncer, pgBalancer
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
HBaseCon2017 Improving HBase availability in a multi tenant environment
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Performance Tuning
Accelerating HBase with NVMe and Bucket Cache
HBase Coprocessor Introduction
Streaming replication in practice
Replication, Durability, and Disaster Recovery
Tuning Java for Big Data
Keynote: Apache HBase at Yahoo! Scale
Concurrency
PGPool-II Load testing
 
Mastering PostgreSQL Administration
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
Ad

Similar to 006 performance tuningandclusteradmin (20)

PPTX
HBaseCon 2015: HBase Performance Tuning @ Salesforce
PPTX
Apache HBase Performance Tuning
PDF
Basics of JVM Tuning
PPTX
HBase Accelerated: In-Memory Flush and Compaction
PPTX
HBaseCon 2015: HBase 2.0 and Beyond Panel
ODP
Jvm tuning in a rush! - Lviv JUG
PDF
HBase本輪読会資料(11章)
PDF
HBase Sizing Guide
PDF
HBase Applications - Atlanta HUG - May 2014
POTX
Meet HBase 2.0 and Phoenix 5.0
PDF
Hotspot Garbage Collection - Tuning Guide
PDF
Taming The JVM
PPT
HBase In Action - Chapter 10 - Operations
PPTX
Scaling HBase for Big Data
PDF
[BGOUG] Java GC - Friend or Foe
PPTX
Hadoop world g1_gc_forh_base_v4
PDF
Java tuning on GNU/Linux for busy dev
PDF
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
PPTX
PPTX
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
HBaseCon 2015: HBase Performance Tuning @ Salesforce
Apache HBase Performance Tuning
Basics of JVM Tuning
HBase Accelerated: In-Memory Flush and Compaction
HBaseCon 2015: HBase 2.0 and Beyond Panel
Jvm tuning in a rush! - Lviv JUG
HBase本輪読会資料(11章)
HBase Sizing Guide
HBase Applications - Atlanta HUG - May 2014
Meet HBase 2.0 and Phoenix 5.0
Hotspot Garbage Collection - Tuning Guide
Taming The JVM
HBase In Action - Chapter 10 - Operations
Scaling HBase for Big Data
[BGOUG] Java GC - Friend or Foe
Hadoop world g1_gc_forh_base_v4
Java tuning on GNU/Linux for busy dev
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Ad

More from Scott Miao (7)

PPTX
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
PPTX
20171122 aws usergrp_coretech-spn-cicd-aws-v01
PPTX
Achieve big data analytic platform with lambda architecture on cloud
PPTX
analytic engine - a common big data computation service on the aws
PPTX
Zero-downtime Hadoop/HBase Cross-datacenter Migration
PPTX
Attack on graph
PPTX
20121022 tm hbasecanarytool
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
20171122 aws usergrp_coretech-spn-cicd-aws-v01
Achieve big data analytic platform with lambda architecture on cloud
analytic engine - a common big data computation service on the aws
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Attack on graph
20121022 tm hbasecanarytool

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PPTX
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectral efficient network and resource selection model in 5G networks
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Mobile App Security Testing_ A Comprehensive Guide.pdf
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Cloud computing and distributed systems.

006 performance tuningandclusteradmin

  • 1. PERFORMANCE TUNING & CLUSTER ADMINISTRATION 2012/8/2 Scott Miao
  • 2. AGENDA  Course Credit  Performance Tuning  More…  Cluster Administration  More… 2
  • 3. COURSE CREDIT  Show up, 30 scores  Ask question, each question earns 5 scores  Hands-on, 40 scores  70 scores will pass this course  Each course credit will be calculated once for each course finished  The course credit will be sent to you and your supervisor by mail 3
  • 4. PERFORMANCE TUNING  Garbage Collection Tuning  MSLAB  Compression  Optimizing Splits and Compactions  Load Balancing  Merging Regions  Client API: Best Practices  Configuration  Load Tests 4
  • 5. GARBAGE COLLECTION TUNING  The process to rewrite the heap generation in question is called a garbage collection (GC)  GC parameters only need to be added to the region servers  JRE comes with basic assumptions  Regarding what your programs are doing, how they create objects, how they allocate the heap to handle data, and so on  These assumptions work well in a lot of cases  But NOT work well for HBase…  Especially write-heavy ones  It cannot safely rely on the JRE assumption alone 5
  • 7. 7
  • 8. GARBAGE COLLECTION TUNING – WRITE-HEAVY USE CASES (1/2)  Memstore flushes the data by the configured minimum flush size, hbase.hregion.memstore.flush.size  It leaves different size of holes in the heap  Data resided in different locations in the generational architecture of the Java heap  Depending on how long the data was in memory  Young generation (new generation)  The space can be reclaimed quickly and no harm is done  Old generation (tenured generation)  Data promoted to this location if it stays in memory for a longer period of time 8
  • 9. GARBAGE COLLECTION TUNING – WRITE-HEAVY USE CASES (2/2)  Reuse the holes created by data that has been written to disk  Requests a size of heap that does not fit into one of those holes  Needs to compact the fragmented heap  Young to Old  The promotion of longer-living objects from the young to the old generation  Old to Stop-The-World  There is no longer enough space for a young allocation caused by the fragmentation  Falls back to the stop-the-world garbage collector  Rewrites the entire heap space and compacts it to the remaining active objects  If this fails, you will see a promotion failure in your garbage collection logs 9
  • 10. 10 What is the Heap looks like ?
  • 11. GARBAGE COLLECTION TUNING – SPECIFY THE YOUNG GENERATION SIZE  Young generation  is between 128 MB and 512 MB  Old generation  holds the remaining available heap, which is usually many gigabytes of memory  Using 128 MB is a good starting point  Further observation of the JVM metrics should be conducted  Specify the young generation size like so  -XX:MaxNewSize=128m -XX:NewSize=128m  One convenient option  -Xmn128m 11
  • 12. GARBAGE COLLECTION TUNING – GC OPTIONS SETTING  GC Options setting for HBase  Adding them in the hbase-env.sh configuration file  HBASE_OPTS variable for all HBase  HBASE_REGIONSERVER_OPTS variable for all region servers  Enable the JRE’s log output for garbage collection details  Monitor it for occurrences of  "concurrent mode failure" or "promotion failed" messages 12 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log"
  • 13. GARBAGE COLLECTION TUNING – GC STRATEGY FOR YOUNG GENERATION  Recommended value for young generation  -XX:+UseParNewGC  Use the Parallel New Collector  It stops the entire Java process to clean up the young generation heap  Since Young generation’s size is small in comparison  Usually less than a few hundred milliseconds 13
  • 14. GARBAGE COLLECTION TUNING – GC STRATEGY FOR OLD GENERATION  Recommended value for old generation  -XX:+UseConcMarkSweepGC  Use the Concurrent Mark-Sweep Collector (CMS)  It tries to do as much work concurrently as possible, without stopping the Java process  It takes extra effort and an increased CPU load  Avoids the required stops to rewrite a fragmented old generation heap  If you hit the promotion error  It falls back to stop-the-world again 14
  • 15. GARBAGE COLLECTION TUNING – GC STRATEGY FOR OLD GENERATION  A switch for CMS  -XX:CMSInitiatingOccupancyFraction=70  A percentage that specifies when the background process starts  Avoids the concurrent mode failure  The background process to mark and sweep the heap for collection is still running when the heap runs out of usable space  Falls back to stop-the-world again  Initiating occupancy fraction to 70%  20% block cache + 40% memstore limits = 60%, by default  Starts the background process at appropriate time  Early enough, and not too early 15
  • 16. GARBAGE COLLECTION TUNING - SUMMARY  Recommended GC options  The Alex Su’s GC options  GC Options Reference 16 export HBASE_REGIONSERVER_OPTS= "-Xmx8g -Xms8g -Xmn128m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:$HBASE_HOME/logs/gc-$(hostname)-hbase.log" -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<%= hbase_log_path %>/hbase-regionserver-gc-`date +%F-%H-%M-%S`.log -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:PrintFLSStatistics=1 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<%= hbase_log_path %>/hbase-regionserver.hprof http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp- 140102.html
  • 17. MSLAB - QUESTION  For solving the stop-the-world issue  Stop-the-world  The key to reducing these compacting collections is to reduce fragmentation  Only objects of exactly the same size should be allocated from the heap  Subsequent allocations of new objects of the exact same size will always reuse these holes  No promotion error, and therefore no stop-the-world compacting collection is required 17
  • 18. MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (1/3)  Are buffers of fixed sizes containing KeyValue instances of varying sizes 1. A buffer cannot completely fit a newly added KeyValue, it is considered full 2. And a new buffer is created, once again of the given fixed size  Enabled by default in version 0.92  Disabled in version 0.90 of HBase  hbase.hregion.memstore.mslab.enabled property  It is recommended that test your setup with this feature 18
  • 19. MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (2/3)  The size of each allocated, fixed-sized buffer  hbase.hregion.memstore.mslab.chunksize property  Default is 2 MB  Based on your KeyValue instances, you may have to adjust this value  E.g., 100 KB in size, you need to increase the MSLAB size to fit more than just a few cells  An upper boundary of what is stored in the buffers  hbase.hregion.memstore.mslab.max.allocation property  Default 256 KB  Any cell (KeyValue) that is larger will be directly allocated in the Java heap 19
  • 20. MSLAB – MEMSTORE-LOCAL ALLOCATION BUFFER (3/3)  MSLAB do not come without a cost  More wasteful in regard to heap usage  Most likely not fill every buffer to the last byte  A Tradeoff  Use MSLABs and benefit from better garbage collection but incur the extra space that is required  NOT use MSLABs and benefit from better memory efficiency but deal with the problem caused by garbage collection pauses  Could plan to restart the servers every few days, or weeks, before the pause happens  The buffers require an additional byte array copy operation, therefore slightly slower  Measure the impact on your workload 20
  • 21. COMPRESSION  A number of compression algorithms that can be enabled at the column family level  It is recommended  Enable compression unless you have a reason not to do so  For example, when using already compressed content, such as JPEG images  Compression usually will yield overall better performance  The overhead of the CPU performing the compression /de-compression is less than what is required to read more data from disk 21
  • 22. COMPRESSION – AVAILABLE CODECS  It is recommended  Snappy/Zippy (in Bigtable)  Released by Google under the BSD License  Ships with the required JNI libraries to be able to use it in HBase-0.92  Must install the native binary library on all region servers  LZO (Lempel-Ziv-Oberhumer)  A lossless data compression algorithm that is focused on decompression speed, and written in ANSI C  HBase cannot ship with LZO because of licensing issues  incompatible GNU General Public License (GPL)  LZO installation needs to be performed separately, after HBase has been installed 22 http://norfolk.cs.washington.edu/htbin-post/unrestricted/colloq/details.cgi?id=437
  • 23. COMPRESSION – COMPRESSION TEST TOOL  Use command  hbase org.apache.hadoop.hbase.util.CompressionTest <path> <none|gz|lzo|snappy>  Example  ./bin/hbase org.apache.hadoop.hbase.util.CompressionTest /user/larsgeorge/test.gz gz  It will return result based on the test  If success  If failed 23 … SUCCESS Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec …
  • 24. COMPRESSION – STARTUP CHECK  A fast failing setup notices the missing libraries  Instead of running into issues later  For example, check the Snappy and LZO compression libraries  The server will abort at startup with an IOException stating  "Compression codec <codec-name> not supported, aborting RS construction"  Copy the changed configuration file to all region servers and to restart them afterward 24 <property> <name>hbase.regionserver.codecs</name> <value>snappy,lzo</value> </property>
  • 25. COMPRESSION – ENABLING COMPRESSION  Install the JNI libraries  Install native compression libraries  Specifying the chosen algorithm in the column family schema  In HBase shell  create 'testtable', { NAME => 'colfam1', COMPRESSION => 'GZ' }  In API  HColumnDescriptor.setCompressionType(…)  Refer to ppt#003, p#11 25
  • 26. OPTIMIZING SPLITS AND COMPACTIONS - SPLIT/COMPACTION STORMS  Grow your regions roughly at the same rate  Eventually they all need to be split at about the same time  A large spike in disk I/O because of the required compactions to rewrite the split region  Refer to ppt#004, p#13 26
  • 27. OPTIMIZING SPLITS AND COMPACTIONS – MANAGED SPLITTING (1/2)  you can turn it off and manually invoke the split and major_compact commands  Setting Region Maximum File Size  hbase.hregion.max.filesize property for the entire cluster  table level by API  HTableDescriptor.setMaxFileSize(…)  Refer to ppt#003, p#7  To a very high number  Better to set this value to a reasonable upper boundary  Such as 100GB  Long.MAX_VALUE is not recommended in case the manual splits fail to run  Then you can time-control them  Running them staggered across all regions  Spreads the I/O load as much as possible, avoiding any split/compaction storm  Use HBase shell + cron  Or write your own codes with HBase Admin API supports  Refer to #003, p#21 27
  • 28. OPTIMIZING SPLITS AND COMPACTIONS – MANAGED SPLITTING (2/2)  RegionSplitter Class (added in version 0.90.2)  Another way to split existing regions  Rolling split feature  Split the existing regions while waiting long enough for the involved compactions to complete  API docs  An additional advantage  Have better control over which regions are available at any time  In rare case, you need to do very low-level debugging  With automated splits, it is hard to debug !!  Due to this region is split to two daughter regions 28
  • 29. OPTIMIZING SPLITS AND COMPACTIONS – REGION HOTSPOTTING  You may be dealing with a write pattern that is causing a specific region to run hot  Use Region Server Metrics to observe  Refer to ppt#005, p#12  Key design approaches  Salt keys, random keys, etc  Refer to ppt#004, p#52  Other only way to alleviate this situation  Manually split a hot region into one or more new regions, at exact boundaries  You can specify any row key within specific region  Be able to generate halves that are completely different in size  Refer ppt#003, p#21  This can not dealing with completely sequential key ranges  Those are always going to hit one region for a considerable amount of time 29
  • 30. OPTIMIZING SPLITS AND COMPACTIONS – PRESPLITTING REGIONS (1/3)  Manage splits manually is useful  Therefore start with a larger number of regions right from the table creation  Means to create a table with the required number of regions  Three ways…  HBase shell  create, refer to ppt#003, p#37  API  HBaseAdmin.createTable(…), refer to ppt#003, p#16  RegionSplitter Class  By default, MD5StringSplit class to partition the row keys into ranges  Use -D split.algorithm=<your-algorithm-class> for other implementation 30 /bin/hbase org.apache.hadoop.hbase.util.RegionSplitter usage: RegionSplitter <TABLE>
  • 31. OPTIMIZING SPLITS AND COMPACTIONS – PRESPLITTING REGIONS (2/3)  RegionSplitter with MD5StringSplit sample 31 testtable,,1309766006467.c0937d09f1da31f2a6c2950537a61093. testtable,0ccccccc,1309766006467.83a0a6a949a6150c5680f39695450d8a. testtable,19999998,1309766006467.1eba79c27eb9d5c2f89c3571f0d87a92. testtable,26666664,1309766006467.7882cd50eb22652849491c08a6180258. testtable,33333330,1309766006467.cef2853e36bd250c1b9324bac03e4bc9. testtable,3ffffffc,1309766006467.00365940761359fee14d41db6a73ffc5.
  • 32. OPTIMIZING SPLITS AND COMPACTIONS – PRESPLITTING REGIONS (3/3)  How many presplit regions ?  Start low with 10 presplit regions per server and watch as data grows over time  It is better to err on the side of too few regions and using a rolling split later  If Presplit regions to thin  Increase hbase.hregion.majorcompaction property  Refet to ppt#004, p# 19  If data size grows too large  Use the RegionSplitter utility to perform a rolling split of all regions  The main objective is to avoid split/compaction storm 32
  • 33. LOAD BALANCING – BALANCER (1/3)  The master has a built-in feature  Called the balancer  By default, runs every five minutes  hbase.balancer.period property  Attempts to equal out the number of assigned regions per region server  Within one region of the average number per server  Determines a new assignment plan  Describes which regions should be moved where starts the process of moving the regions by calling the unassign() method  Refer to ppt#003, p#22 33
  • 34. LOAD BALANCING - BALANCER (2/3)  balancer has an upper limit on how long it is allowed to run  hbase.balancer.max.balancing property  defaults to half of the balancer period value  2.5 mins  The balancer switch  Toggle the balancer status between enabled and disabled  HBase shell  balance_switch command, refer to ppt#003, p#39  balanceSwitch() API method, refer to ppt#003, p#22 34
  • 35. LOAD BALANCING - BALANCER (3/3)  Can be explicitly started  HBase shell  balancer command, refer to ppt#003, p#39  balancer() API method, refer to ppt#003, p#22  Return true  Any work has be done  Return false  balancer was switched off  No work to be done  balancer was not able to run the balancer  There is a region currently in transition, the balancer will be skipped 35
  • 36. LOAD BALANCING - MOVE  Can also use the move  To assign regions to other servers  HBase shell  move command, refer to ppt#003, p#39  move() API method, refer to ppt#003, p#22 36
  • 37. MERGING REGIONS  Sometimes you may need to merge regions  For example, after you have removed a large amount of data and you want to reduce the number of regions hosted by each server  HBase allows you to merge two adjacent regions  The HBase cluster must be offline, but HDFS 37 /bin/hbase org.apache.hadoop.hbase.util.Merge Usage: bin/hbase merge <table-name> <region-1> <region-2>
  • 38. CLIENT API: BEST PRACTICES (1/3)  Disable auto-flush  When performing a lot of put operations  Refer to ppt#002, p#9  Use scanner-caching  Set Scan.setCaching() method to something greater than the default of 1 if needed  Refer to ppt#002, p#26  Limit scan scope  If only a small number of the available columns are to be processed, only those should be specified in the input scan  For example, use Scan.addFamily() method  Refer to ppt#002, p#24 38
  • 39. CLIENT API: BEST PRACTICES (2/3)  Close ResultScanners  Avoiding performance problems  This may cause problems on the region servers  Refer to ppt#002, p#25  Block cache usage  Scan instances can be set to use the block cache in the region server via the setCacheBlocks() method  true by default, default settings of the table and family are used  API docs  Server side block cache settings  Refer to ppt#003, p#12 39
  • 40. CLIENT API: BEST PRACTICES (3/3)  Optimal loading of row keys  When performing a table scan where only the row keys are needed  a FilterList with a MUST_PASS_ALL operator + FirstKeyOnlyFilter + KeyOnlyFilter  Refer to ppt#002, p#43 & 46  Turn off WAL on Puts  Increasing throughput on Puts is to call writeToWAL(false), there might be data loss  Consider to use the bulk loading techniques instead 40
  • 41. CONFIGURATION (1/6)  Advanced options you can consider adjusting based on your use case  Most properties are configured in hbase-site.xml  Others are in hbase-env.sh  Decrease ZooKeeper timeout  The default timeout between a region server and the ZooKeeper quorum is three minutes  Tune the timeout down to a minute, or even less, so the master notices failures sooner  zookeeper.session.timeout property  Be careful of ―Juliet Pause‖ 41
  • 42. CONFIGURATION (2/6)  Increase handlers  The number of threads that are kept open to answer incoming requests to user tables  By default is 10  hbase.regionserver.handler.count property  Keep this number low when the payload per request approaches megabytes  And high when the payload is small  Increase heap settings  HBASE_HEAPSIZE setting in hbase-env.sh file  Consider using HBASE_REGIONSERVER_OPTS instead of changing the global HBASE_HEAP SIZE  Region servers may need more memory than Master 42
  • 43. CONFIGURATION (3/6)  Enable data compression  Should enable compression for the storage files  In most cases, boosts performance  Increase region size  Consider going to larger regions to cut down on the total number of regions on your cluster  Fewer regions to manage makes for a smoother-running cluster 43
  • 44. CONFIGURATION (4/6)  Adjust block cache size  The amount of heap used for the block cache is specified as a percentage  Defaults to 20%  perf.hfile.block.cache.size property  It is good if you have mainly reading workloads  Adjust memstore limits  Memstore heap usage  hbase.regionserver.global.memstore.upperLimit property  Defaults to 40%  hbase.regionserver.global.memstore.lowerLimit property  Defaults to 35%  Control the amount of flushing that will take place once the server is required to free heap space  Mainly read-oriented workloads  Consider reducing both limits to make more room for the block cache  Handling many writes  Increase the memstore limits to reduce the excessive amount of I/O this causes 44
  • 45. CONFIGURATION (5/6)  Increase blocking store files  The region servers block further updates from clients to give compactions time to reduce the number of files  Default is seven files  hbase.hstore.blockingStoreFiles property  Increase block multiplier  A safety latch that blocks any further updates from clients when the memstores exceed the multiplier * flush size limit  hbase.hregion.memstore.block.multiplier property  Default to 2  If you have enough memory, can increase this value to handle spikes more gracefully  Refer to ppt#003, p#8 45
  • 46. CONFIGURATION (6/6)  Decrease maximum logfiles  How often flushes occur based on the number of WAL files on disk  Default is 32  hbase.regionserver.maxlogs property  Can be high in a write-heavy use case  Lower it to force the servers to flush data more often to disk 46
  • 47. LOAD TESTS  It is advisable to run performance tests to verify functionality of your cluster  These tests give you a baseline which you can refer to  After making changes to the configuration of the cluster  Or the schemas of your tables  Doing a burn-in of your cluster  Show you how much you can gain from it  But this does not replace a test with the load as expected from your use case 47
  • 48. LOAD TESTS – PERFORMANCE EVALUATION (1/2)  HBase ships with its own tool to execute a performance evaluation  Performance Evaluation (PE)  Wiki  http://wiki.apache.org/hadoop/Hbase/PerformanceEvalu ation 48 /bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation Usage: java org.apache.hadoop.hbase.PerformanceEvaluation [--miniCluster] [--nomapred] [--rows=ROWS] <command> <nclients>
  • 49. LOAD TESTS – PERFORMANCE EVALUATION (2/2)  Example 49 /bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1 11/07/03 13:18:34 INFO hbase.PerformanceEvaluation: Start class org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest at offset 0 for 1048576 rows ... 11/07/03 13:18:41 INFO hbase.PerformanceEvaluation: 0/104857/1048576 ... 11/07/03 13:18:45 INFO hbase.PerformanceEvaluation: 0/209714/1048576 ... 11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: 0/1048570/1048576 11/07/03 13:20:03 INFO hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest in 89062ms at offset 0 for 1048576 rows
  • 50. LOAD TESTS – YCSB (1/2)  Yahoo! Cloud Serving Benchmark* (YCSB)  It is a suite of tools that can be used to run comparable workloads against different storage systems  Also a reasonable tool for performing an HBase cluster burn- in—or performance test  Using YCSB is preferred over the HBase-supplied Performance Evaluation  Offers more options  Can combine read and write workloads  Home page  http://research.yahoo.com/Web_Information_Management/YCSB 50
  • 51. LOAD TESTS – YCSB (2/2)  Use HBase shell  create “usertable”, “family”  git pull  cd ${GIT_HOME}/hbase-training/006/ycsb  Run command  Then you can see performance metrics in ycsb- laod.log file 51 java -cp "${HBASE_CONF_DIR}:core-0.1.4.jar:hbase-binding-0.1.4.jar" com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p columnfamily=family -p recordcount=1000 -s > ycsb- load.log
  • 52. CLUSTER ADMINISTRATION 52  Operational Tasks  Node Decommission  Rolling Restarts  Adding Backup Master  Adding a Region Server  Data Task  Export  Import  CopyTable Tool  Bulk Import  Troubleshooting  HBase Fsck  Analyzing the Logs
  • 53. OPERATIONAL TASKS – NODE DECOMMISSION (1/2)  Use following script  In normal HBase distribution  In tm distribution  Disable the Load Balancer before Decommissioning a node  In hbase shell  balance_switch false  Regions could be offline for a good period of time  Many regions on the server  All regions close  The master notices the region server’s ZooKeeper znode being removed 53 ${HBASE_HOME}/bin/hbase-daemon.sh stop regionserver ${TM_PUPPET_HOME}/bin/services/shutdown-regionservers.sh [<host> ...]
  • 54. OPERATIONAL TASKS – NODE DECOMMISSION (2/2)  Stop a region server gradually  A node to gradually shed its load and then shut itself down  From HBASE 0.90.2  ${HBASE_HOME}/bin/graceful_stop.sh  Example  Check the HOSTNAME on your HBase master UI  Refer to ppt#003, p#41  IP address is NOT supported at present 54 ${HBASE_HOME}/bin/graceful_stop.sh HOSTNAME
  • 55. OPERATIONAL TASKS – ROLLING RESTARTS  Also use graceful_stop.sh  Steps as follows 1. Ensure the cluster is consistent  Fix it if inconsistent 2. Restart the master 3. Disable the region balancer 4. Run the graceful_stop.sh script per region server 5. Restart the master again  Clear out the dead servers list and reenable the balancer 6. Run hbck to ensure the cluster is consistent 55 hbase hbck hbase hbck -fix ${HBASE_HOME}/bin/hbase-daemon.sh stop master; ${HBASE_HOME}/bin/hbase-daemon.sh start master echo "balance_switch false" | ${HBASE_HOME}/bin/hbase shell for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh --restart --reload --debug $i; done &> /tmp/log.txt &
  • 56. OPERATIONAL TASKS – ADDING BACKUP MASTER (1/2)  To prevent the Single Point of Failure  The machine currently hosting the active master is failing, the system can fall back to a backup master  Underlying operations 1. A dedicated ZooKeeper znode /hbase/master 2. All master processes will race to create, and the first one to create it wins (become currently master)  It happens at startup 3. All other master processes simply loop around the znode check and wait for it to disappear  Triggering the race again 56
  • 57. OPERATIONAL TASKS – ADDING BACKUP MASTER (2/2)  How to start multiple backup master processes  Use original way to start a master process  In tm distribution  Specifically start a backup master process 57 ${HBASE_HOME}/bin/hbase-daemon.sh start master ${TM_PUPPET_HOME}/bin/services/startup-hmaster.sh [<host> ...] ${HBASE_HOME}/bin/hbase-daemon.sh start master --backup
  • 58. OPERATIONAL TASKS – ADDING A REGION SERVER  In normal HBase distribution  Edit the ${HBASE_HOME}/conf/regionservers  To add newly added region server’s host name  Two scripts can use…  ${HBASE_HOME}/bin/start-hbase.sh  It will bypass the original existing region servers, and start the newly added region server referred to regionservers file  ${HBASE_HOME}/bin/hbase-daemon.sh start regionserver  Must executing on the newly added region server  In tm distribution  New feature, not talk about this here 58
  • 59. DATA TASK  You may be required to move the data as a whole or in parts  Archive data for backup purposes  To bootstrap another cluster 59 hadoop jar ${HBASE_HOME}/hbase-0.91.0-SNAPSHOT.jar An example program must be given as the first argument. Valid program names are: … completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster export: Write table data to HDFS. import: Import data written by Export. importtsv: Import data in TSV format. … http://hbase.apache.org/book/ops_mgt.html
  • 60. DATA TASK – EXPORT (1/3) 60 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export Usage: Export [-D <property=value>]* <tablename> <outputdir> [<versions> [<starttime> [<endtime>]]
  • 61. DATA TASK - EXPORT (2/3) 61 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export testtable /user/larsgeorge/backup-testtable 11/06/25 15:58:29 INFO mapred.JobClient: Running job: job_201106251558_0001 11/06/25 15:58:30 INFO mapred.JobClient: map 0% reduce 0% … 11/06/25 15:59:40 INFO mapred.JobClient: map 100% reduce 0% 11/06/25 15:59:42 INFO mapred.JobClient: Job complete: job_201106251558_0001 11/06/25 15:59:42 INFO mapred.JobClient: Counters: 6 11/06/25 15:59:42 INFO mapred.JobClient: Job Counters 11/06/25 15:59:42 INFO mapred.JobClient: Rack-local map tasks=32 11/06/25 15:59:42 INFO mapred.JobClient: Launched map tasks=32 11/06/25 15:59:42 INFO mapred.JobClient: FileSystemCounters 11/06/25 15:59:42 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=3648 11/06/25 15:59:42 INFO mapred.JobClient: Map-Reduce Framework 11/06/25 15:59:42 INFO mapred.JobClient: Map input records=0 11/06/25 15:59:42 INFO mapred.JobClient: Spilled Records=0 11/06/25 15:59:42 INFO mapred.JobClient: Map output records=0
  • 62. DATA TASK - EXPORT (3/3)  Each part-m-nnnnn file contains a piece of the exported data  Together they form the full backup of the table  Use the hadoop distcp command to move the directory from one cluster to another, and perform the import there 62 hadoop dfs -lsr /user/larsgeorge/backup-testtable drwxr-xr-x - ... 0 2011-06-25 15:58 _logs -rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00000 -rw-r--r-- 1 ... 114 2011-06-25 15:58 part-m-00001 … -rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00030 -rw-r--r-- 1 ... 114 2011-06-25 15:59 part-m-00031
  • 63. DATA TASK – IMPORT (1/2) 63 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import Usage: Import <tablename> <inputdir> hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import testtable /user/larsgeorge/backup-testtable 11/06/25 17:09:48 INFO mapreduce.TableOutputFormat: Created table instance for testtable 11/06/25 17:09:48 INFO input.FileInputFormat: Total input paths to process : 32 11/06/25 17:09:49 INFO mapred.JobClient: Running job: job_201106251558_0003 11/06/25 17:09:50 INFO mapred.JobClient: map 0% reduce 0% 11/06/25 17:10:04 INFO mapred.JobClient: map 6% reduce 0% … 11/06/25 17:10:51 INFO mapred.JobClient: Job Counters 11/06/25 17:10:51 INFO mapred.JobClient: Launched map tasks=32 11/06/25 17:10:51 INFO mapred.JobClient: Data-local map tasks=32 11/06/25 17:10:51 INFO mapred.JobClient: FileSystemCounters 11/06/25 17:10:51 INFO mapred.JobClient: HDFS_BYTES_READ=3648 11/06/25 17:10:51 INFO mapred.JobClient: Map-Reduce Framework 11/06/25 17:10:51 INFO mapred.JobClient: Map input records=0 11/06/25 17:10:51 INFO mapred.JobClient: Spilled Records=0 11/06/25 17:10:51 INFO mapred.JobClient: Map output records=0
  • 64. DATA TASK - IMPORT (2/2)  Use the Import job to store the data in a different table  With the same schema  Both export/import commend are per-table only  Use hadoop distcp command to copy the entire /hbase in HDFS  Not recommended  May copy store files that are halfway through a memstore flush operation 64
  • 65. DATA TASK – COPYTABLE TOOL (1/2)  Designed to bootstrap cluster replication  Make a copy of an existing table from the master cluster to the slave cluster 65 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable Usage: CopyTable [--rs.class=CLASS] [--rs.impl=IMPL] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
  • 66. DATA TASK – COPYTABLE TOOL (2/2)  The copy of the table is stored on the same cluster 66 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable --new.name=testtable3 testtable 11/06/26 15:20:07 INFO mapreduce.TableOutputFormat: Created table instance for testtable3 11/06/26 15:20:07 INFO mapred.JobClient: Running job: job_201106261454_0003 11/06/26 15:20:08 INFO mapred.JobClient: map 0% reduce 0% 11/06/26 15:20:19 INFO mapred.JobClient: map 6% reduce 0% … 11/06/26 15:21:04 INFO mapred.JobClient: map 100% reduce 0% 11/06/26 15:21:06 INFO mapred.JobClient: Job complete: job_201106261454_0003 11/06/26 15:21:06 INFO mapred.JobClient: Counters: 5 11/06/26 15:21:06 INFO mapred.JobClient: Job Counters 11/06/26 15:21:06 INFO mapred.JobClient: Launched map tasks=32 11/06/26 15:21:06 INFO mapred.JobClient: Data-local map tasks=32 11/06/26 15:21:06 INFO mapred.JobClient: Map-Reduce Framework 11/06/26 15:21:06 INFO mapred.JobClient: Map input records=0 11/06/26 15:21:06 INFO mapred.JobClient: Spilled Records=0 11/06/26 15:21:06 INFO mapred.JobClient: Map output records=0
  • 67. DATA TASK – BULK IMPORT (1/2)  Importtsv tool  Given files containing data in tab-separated value (TSV) format  By default , it uses the HBase put() API to insert data into HBase one row at a time  By setting importtsv.bulk.output option, generate files using HFileOutputFormat  These can subsequently be bulk-loaded into HBase by completebulkload Tool 67 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar importtsv Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
  • 68. DATA TASK – BULK IMPORT (2/2)  completebulkload Tool  Is used to import the data into the running cluster  After a data import has been prepared  By using the importtsv tool with the importtsv.bulk.output option  By some other MapReduce job using the HFileOutputFormat 68 hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar completebulkload -conf ~/my-hbase-site.xml /user/larsgeorge/myoutput mytable
  • 69. TROUBLESHOOTING – HBASE FSCK (1/4)  Shell Command  ${HBASE_HOME}/bin/hbase hbck  Once started  Scans the .META. table to gather all the pertinent information it holds  Scans the HDFS root directory HBase is configured to use  Compare the collected details to report on inconsistencies and integrity issues  Consistency check  Whether the region is listed in .META. and exists in HDFS  Is also assigned to exactly one region server  Integrity check  Compares the regions with the table details to find missing regions  Those that have holes or overlaps in their row key ranges 69
  • 70. TROUBLESHOOTING – HBASE FSCK (2/4) 70 ${HBASE_HOME}/bin/hbase hbck -h Usage: fsck [opts] where [opts] are: -details Display full report of all regions. -timelag {timeInSeconds} Process only regions that have not experienced any metadata updates in the last {{timeInSeconds} seconds. -fix Try to fix some of the errors. -sleepBeforeRerun {timeInSeconds} Sleep this many seconds before checking if the fix worked if run with -fix -summary Print only summary of the tables and status.
  • 71. TROUBLESHOOTING – HBASE FSCK (3/4)  No option at all invokes the normal output detail 71 ${HBASE_HOME}/bin/hbase hbck Number of Tables: 40 Number of live region servers: 19 Number of dead region servers: 0 Number of empty REGIONINFO_QUALIFIER rows in .META.: 0 Summary: ... testtable2 is okay. Number of regions: 1 Deployed on: host11.foo.com:60020 0 inconsistencies detected. Status: OK
  • 72. TROUBLESHOOTING – HBASE FSCK (4/4)  ${HBASE_HOME}/bin/hbase hbck -fix  Repairs following issues  Assign .META. to a single new server if it is unassigned  Reassign .META. to a single new server if it is assigned to multiple servers  Assign a user table region to a new server if it is unassigned  Reassign a user table region to a single new server if it is assigned to multiple servers  Reassign a user table region to a new server if the current server does not match  what the .META. table refers to  hbck reports inconsistencies which are temporal, or transitional only  Rerun the tool a few times to confirm a permanent problem 72
  • 73. TROUBLESHOOTING – ANALYZING THE LOGS (1/2) Server type Default Logfile tm settings HBase Master $HBASE_HOME/logs/hbase-<user>-master- <hostname>.log /var/log/hbase/hbase-<user>-master- <hostname>.log HBase RegionServer $HBASE_HOME/logs/hbase-<user>-regionserver- <hostname>.log /var/log/hbase/hbase-<user>-regionserver- <hostname>.log ZooKeeper Console log output only /var/log/hbase/hbase-<user>-zookeeper- <hostname>.log NameNode $HADOOP_HOME/logs/hadoop-<user>-namenode- <hostname>.log /var/log/hadoop/hadoop-<user>-namenode- <hostname>.log DataNode $HADOOP_HOME/logs/hadoop-<user>-datanode- <hostname>.log /var/log/hadoop/hadoop-<user>-datanode- <hostname>.log JobTracker $HADOOP_HOME/logs/hadoop-<user>-jobtracker- <hostname>.log /var/log/hadoop/hadoop-<user>-jobtracker- <hostname>.log TaskTracker $HADOOP_HOME/logs/hadoop-<user>-jobtracker- <hostname>.log /var/log/hadoop/hadoop-<user>-jobtracker- <hostname>.log 73
  • 74. TROUBLESHOOTING – ANALYZING THE LOGS (2/2)  Is useful to begin with the master logfile first  It acts as the coordinator service of the entire cluster  Find the processes began logging ERROR level messages  Be able to identify the root cause  A lot of subsequent messages are often side-effect of the original problem  Recommend to use the error log event metric under System Event Metrics group  Gives you a graph showing you where the server(s) started logging an increasing number of error messages in the logfiles  If find an error message  Google it !!  Use the online resources to search for the message in the public mailing lists  Search Hadoop 74
  • 75. HANDS-ON – USE YCSB  New VM list  Due to VMs are not affordable at present :p  ${YOUR_HOME}=${GIT_HOME}/hbase- training/006/hands-on/${YOUR_NAME}  mkdir ${YOUR_HOME}  cd ${YOUR_HOME}; cp -rf ../../ycsb/* .  Use HBase shell  create <YOUR_NAMED_TABLE>, “family”  Run YCSB with 5000 record count  And ouput ycsb-load.log file  Hands-on result  Put the ycsb-load.log file under ${YOUR_HOME} 75