SlideShare a Scribd company logo
HBase from	the	Trenches
Avinash	Ramineni
Email: avinash@clairvoyantsoft.com
LinkedIn: https://www.linkedin.com/in/avinashramineni
Agenda
• Intro	to	HBase
– Overview
– Data	Model
– Architecture
• Common	Problems
• Best	Practices
• Tools	and	Utilities
Intro	to	HBase
• Non-relational	distributed	column-oriented	
database
– Modeled	after	Google’s	BigTable
• Billions	of	Rows	and	Millions	of	Columns
• Sparse,	consistent,	distributed	sorted	map
• Built	on	top	of	HDFS
• Tight	integration	with	MapReduce
• Supports	Random	CRUD	Operations
Intro	to	HBase
• Fault	Tolerant
• Horizontally	Scalable
• Real-time	Random	read-write	access	to	data	
stored	in	HDFS
• Millions	of	queries	/	second
• Support	for	transactions	at	a	single	row	level
• Bloom	filters
• Automatic	Sharding
• Implemented	in	Java
Data	Model
• Data	is	stored	in	Tables
• Tables	contain	rows
– Rows	are	referenced	by	a	unique	key	- Rowkey
• Rows	are	made	of	columns	which	are	grouped	
in	column	families
• Rows	are	sorted
• Everything	is	stored	as	a	sequence	of	bytes
• All	entries	are	versioned	and	timestamped
Data	Represenation
HBase Cluster
• HBase Master
• Zookeeper
• Region	Servers
• HDFS	- Data	Nodes
Component	View
Logical	View
HBase API
• API	is	simple
• Operations
– Get,Put,Delete,Scan,MapReduce
• Connection
• Create	this	instance	only	once	per	application	and	
share	it	during	its	runtime
• Htable
– Zookeeper	
• HBase:meta
Column	Families
• All	columns	that	are	accesed together	need	to	be	
grouped	into	a	Column	Family
• No	need	to	access	or	load	data	that		is	not	used
• At	the	column	family	we	can	define	the	settings	
like	
– compression,	version	retention	policy,	cache	priority
– Understand	the	data,	access	pattern	and	group	
column	family
• Column	Family	and	Column	Qualifiers	are	stored	
as	bytes
– Avoid	being	verbose
HBase Write	Path
HBase Compactions
• HDFS	does	not	support	updates	
– HFiles	are	immutable
– New	HFiles	are	created
• Minor	Compactions
– Small	HFiles	are	merged	into	larger	Hfiles
– Deletes	are	not	applied
• Major	Compactions
– Hfiles with	in	column	family	are	merged		into	Single	
Hfile
– Deletes	are	applied
Rowkey
• Immutable
• Get	it	right	the	first	time	before	a	lot	of	data	is	
loaded
• What	if	we	got		it	wrong	?
– New	table	and	load	the	data
– If	TTL	set..let the	data	expire
Secondary	Indexes
• Querying	/	Accessing	records	other	than	by	
Rowkey
• Map	Reduce	jobs	to	populate	index	table
– Periodic	update
• Build	a	secondary	index	with	dual	writes
• Co-processors
Region	Hotspotting
• Client	traffic	not	equally	distributed	across	the	region	
servers
– Performance	degradation
– Region	unavailability
• Poor	rowkey	design
– Monotonically	increasing	RowKey
• Timeseries	or	Sequence
– Salting
• Read	Vs	Writes
• GET	?
– Hashing
• Salt	with	one-way	hash	of	rowkey
Short	Circuit	Reads
• Region	Servers	are	co-located	with	data	nodes
• HMaster assigns	Regions	keeping	data	locality	
in	to	consideration	(mostly)
• dfs.client.read.short-circuit
– Region	Servers	read	the	data	directly	from	HDFS	
rather	than	going	through	Datanode
• Locality	Loss
Pre-Splitting
• Region	splitting
– Grows	untill it	needs	to	be	split	
– Region	at	a	time	is	served	by	only	1		Region	Server
• Pre-split	a	table	into	regions	at	table	creation	
time
– Uniformly	distribute	write	load	across	region	servers
– Understand	the	keyspace
• Risk	of	uneven	load	distribution	
• Auto	splitting
– Constant	size	region	split	policy
– IncreasingToUpperBoundRegionSplitPolicy
Bulk	Loading
• Native	API
– Disable	WAL	
• MapReduce Job	to	generate	Hfile
– Load	using	completebulkload /	importTSV tool
• Loads	into	relevant	region
– Faster	than	going	through	normal	write	path
• No	writes	to	WAL	and	Memstore
• No	flushing	and	compacting
Troubleshooting
• ulimit -n
– Limits	on	number	of	files	and	processs
• HBase is	database	and	needs	to	open	a	number	
of	files
• dfs.datanode.max.transfer.threadsrr.
• Network
• OS	Parameters
You	are	Dead	Exception
• Region	Servers	going	down	
– Zookeeper
• Distributed	co-ordinated service
– HBase Master	asks	the	region	server	to	shutdown
– Garbage	Collection
– Zookeeper	session	timeout
Performance	Tuning
• Compression
– Reduces	data	stored	on	disk	and	transferred
– Compression	speed	over	ratio
• Load	Balancing	- Balancer
• Merging	Regions
• Batch	Writes
– Client	Write	Buffer
– AutoFlush
• MemStore-local	allocation	buffers
– Garbage	Collection	Issues
Tuning	
• Heavy	Writes
– Flushes,	compacting,splitting increase	IO	and	degrade	
cluster	performance
• Keep	Region	sizes	larger
• Keep	Hfile size	large	
• Heavy		Sequential	Reads
• Higher	block	size	
• Avoid	Caching	on	table
• Heavy	Random	Reads
• Higher	Blocklevel cache
• Lower	Memstore limit
• Smaller	block	size
Apache	Phoenix
• SQL	over	Hbase
– Compiles	into	Hbase Scans
– Orchetrates parallel	execution
– Aggregate	queries
• JDBC	API’s	over	Native	HBase API.
• Salting	Buckets	PreSplitting
• Trafodion
– Transactional	SQL	on	HBase
Hannibal
• Monitor	and	maintain	HBase Clusters
• How	well	regions	are	balanced	over	the	
cluster?
• How	well	regions	are	split	for	each	table
• How	regions	evolve	over	time
• How	long	compactions	take
• Integration	with	HUE
Hannibal
Hannibal
Hannibal
Operational	Aspects
• Metrics
– Master
• Cluster	requests,split time,split size
– RegionServer
• Blockcache,memstore,compaction,store,IO
• Htrace
– Trace	tool	for	parallel	distributed	system
• Monitoring
– Nagios
– Hannibal
– Ganglia
– Graphite
– OpenTSDB
• Backup
– Export,CopyTable,Snapshot
Questions?
avinash@clairvoyantsoft.com
HBase from the Trenches - Phoenix Data Conference 2015

More Related Content

PPTX
Introduction to Big Data
PPTX
NoSql - mayank singh
PDF
Analyzing Large-Scale User Data with Hadoop and HBase
PPTX
Deven s presentation
PPTX
Librareis in Transition: From Integraged Library Systems to Library Managemen...
PPTX
Ten Commandants For Picking NoSQL Database
PDF
Big data Intro by Kaushik Dutta
Introduction to Big Data
NoSql - mayank singh
Analyzing Large-Scale User Data with Hadoop and HBase
Deven s presentation
Librareis in Transition: From Integraged Library Systems to Library Managemen...
Ten Commandants For Picking NoSQL Database
Big data Intro by Kaushik Dutta

What's hot (20)

PDF
Ciel, mes données ne sont plus relationnelles
PDF
DataGraft Platform: RDF Database-as-a-Service
PDF
Big Data Architecture For enterprise
PDF
Low-cost Open Data As-a-Service
PPTX
The Future Of The Integrated Library System
PPTX
The future of the integrated library system
PDF
Drupal as a Rapid Application Development Framework for Non Profits / NGOs
PDF
Cummings Level Up: Building Data Services
PPTX
SCCI'15 - Devology - Session 7 - Data and Databases
PPTX
CakePHP, cakePHP development Company
PPTX
Big Data and Hadoop Training in Chandigarh
PDF
RDF Database-as-a-Service with S4
PDF
GraphDB Connectors – Powering Complex SPARQL Queries
PPT
Apache HBase
PPTX
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
PDF
OWLIM@AWS - On-demand RDF Data Management in the Cloud
PPTX
Integrated library solutions by abid hussain
PPTX
Gallagher marac
Ciel, mes données ne sont plus relationnelles
DataGraft Platform: RDF Database-as-a-Service
Big Data Architecture For enterprise
Low-cost Open Data As-a-Service
The Future Of The Integrated Library System
The future of the integrated library system
Drupal as a Rapid Application Development Framework for Non Profits / NGOs
Cummings Level Up: Building Data Services
SCCI'15 - Devology - Session 7 - Data and Databases
CakePHP, cakePHP development Company
Big Data and Hadoop Training in Chandigarh
RDF Database-as-a-Service with S4
GraphDB Connectors – Powering Complex SPARQL Queries
Apache HBase
Rakuten techconf2015.baiji.he.bigdataforsmallstartupandbeyond
OWLIM@AWS - On-demand RDF Data Management in the Cloud
Integrated library solutions by abid hussain
Gallagher marac
Ad

Similar to HBase from the Trenches - Phoenix Data Conference 2015 (20)

PPTX
Introduction to Apache HBase
PPTX
Hbasepreso 111116185419-phpapp02
PDF
Hbase: an introduction
PPTX
PDF
Hbase 20141003
PPTX
H-Base in Data Base Mangement System
PPTX
HBase.pptx
PPT
Chicago Data Summit: Apache HBase: An Introduction
ODP
HBase introduction talk
ODP
Apache hadoop hbase
PDF
PPTX
Hadoop - Apache Hbase
PPTX
Introduction to HBase
PPTX
PPTX
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
PDF
Intro to HBase - Lars George
PPTX
Apache h base
PPTX
HBase in Practice
PPTX
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
PPTX
HBase in Practice
Introduction to Apache HBase
Hbasepreso 111116185419-phpapp02
Hbase: an introduction
Hbase 20141003
H-Base in Data Base Mangement System
HBase.pptx
Chicago Data Summit: Apache HBase: An Introduction
HBase introduction talk
Apache hadoop hbase
Hadoop - Apache Hbase
Introduction to HBase
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
Intro to HBase - Lars George
Apache h base
HBase in Practice
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase in Practice
Ad

More from clairvoyantllc (12)

PPTX
Getting started with SparkSQL - Desert Code Camp 2016
PPTX
MongoDB Replication fundamentals - Desert Code Camp - October 2014
PPTX
Architecture - December 2013 - Avinash Ramineni, Shekhar Veumuri
PPTX
Big data in the cloud - Shekhar Vemuri
PPTX
Webservices Workshop - september 2014
PPTX
Bigdata workshop february 2015
PPTX
Intro to Apache Spark
PPTX
Running Airflow Workflows as ETL Processes on Hadoop
PPTX
Databricks Community Cloud
PPTX
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
PPTX
Event Driven Architectures - Phoenix Java Users Group 2013
PDF
Strata+Hadoop World NY 2016 - Avinash Ramineni
Getting started with SparkSQL - Desert Code Camp 2016
MongoDB Replication fundamentals - Desert Code Camp - October 2014
Architecture - December 2013 - Avinash Ramineni, Shekhar Veumuri
Big data in the cloud - Shekhar Vemuri
Webservices Workshop - september 2014
Bigdata workshop february 2015
Intro to Apache Spark
Running Airflow Workflows as ETL Processes on Hadoop
Databricks Community Cloud
Log analysis using Logstash,ElasticSearch and Kibana - Desert Code Camp 2014
Event Driven Architectures - Phoenix Java Users Group 2013
Strata+Hadoop World NY 2016 - Avinash Ramineni

Recently uploaded (20)

PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Mushroom cultivation and it's methods.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
A Presentation on Artificial Intelligence
A comparative analysis of optical character recognition models for extracting...
Heart disease approach using modified random forest and particle swarm optimi...
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
WOOl fibre morphology and structure.pdf for textiles
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
DP Operators-handbook-extract for the Mautical Institute
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Weekly Chronicles - August'25-Week II
A novel scalable deep ensemble learning framework for big data classification...
1 - Historical Antecedents, Social Consideration.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Tartificialntelligence_presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Univ-Connecticut-ChatGPT-Presentaion.pdf
Mushroom cultivation and it's methods.pdf
OMC Textile Division Presentation 2021.pptx
A Presentation on Artificial Intelligence

HBase from the Trenches - Phoenix Data Conference 2015