Omid: A Transactional Framework for HBase

Omid: A Transactional Framework for HBase
Francisco Perez-Sorrosal
Ohad Shacham
Hadoop Summit SJ
June 29th, 2016

Outline
 Background
 Basic Concepts
 Use cases
 Architecture
 Transaction Management
 High Availability
 Performance
 Summary
Hadoop Summit SJ (June 29th 2016)2

 New Big data apps → new requirements:
● Low-latency
● Incremental data processing
● e.g. Percolator
 Multiple clients updating same data concurrently
● Problem: Conflicts/Inconsistencies may arise
● Solution: Transactional Access to Data
Background

 Transaction → Abstract UoW to manage data with certain
guarantees
● ACID
● Relational databases
 Big data → NoSQL datastores → Transactions in NoSQL
● Relaxed Guarantees:
○ e.g. Atomicity, Consistency
Background
● Hard to Scale
○ Data partition
○ Data replication

 Flexible
 Reliable
 High Performant
 Scalable
…OLTP framework that allows BigData apps to
execute ACID transactions on top of HBase
+ =
Consistency in
BigData Apps
Omid is a…

Why use Omid?
 Simplifies development of apps requiring consistency
● Multi-row/multi-table transactions on HBase
● Simple & well-known interface
 Good performance & reliability
 Lock-free
 Snapshot Isolation
 HBase is a blackbox
● No HBase code modification
● No changes on table schemas
 Used successfully at Yahoo

Snapshot Isolation
▪ Transaction T2 overlaps in time with T1 & T3, but spatially:
● T1 ∩ T2 = ∅
● T2 ∩ T3 = { R4 } Transactions T2 and T3 conflict
▪ Transaction T4 does not have conflicts
TxId
T1
T2
T3
T4
Time Overlap Spatial Overlap (WriteSet)
R1 R2 R3 R4
R3 R4
R2 R4
R1 R3

Sieve
Use Cases: Sieve @ Yahoo
HBase
Internet
Crawler Doc Proc Aggregation
Omid
Feeder
Real-Time
Index
Notifications
Transactional Data Flow

Hive Metastore Thrift Server
Use Cases:
HBase
HBaseStore
Omid
Hadoop Summit SJ (June 29th 2016)
ObjectStore
Relational
Database
9

Transactional App
Architectural Components
HBase
Omid Client
Transaction Status Oracle
(TSO)
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Keep track &
Validate TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
CellsApp TableApp Table
Shadow
Cells

Client APIs
▪ Transaction Manager → Create Transactional contexts
Transaction begin();
void commit(Transaction tx);
void rollback(Transaction tx);
▪ Transactional Tables (TTable) → Data access
Result get(Transaction tx, Get g);
void put(Transaction tx, Put p);
ResultScanner getScanner(Transaction tx, Scan s);

TX Management (Begin TX phase)
Omid Client TSO TO Table/SC CommitTable
Begin TX Get ST
ST=1
TX(ST=1)
R/W Ops for TX (ST=1)
App
Begin TX
R/W Ops (within TX context)
TX Context
R/W Results for TX with ST=1
Read Ops:
Get right results
for TX’s SnapshotWrite Ops:
Build Writeset
for TX

TX Management (Commit TX Phase)
Commit TX (Writeset)
Get CT
CT=2
TX(CT=2)
App
Commit TX
Check Conflicts
of TX Writeset
in Conflict Map
Persist commit details (ST/CT) for TX

TX Management (Complete TX Phase)
Update SC for TX (ST=1/CT=2)
App
Complete commit (Cleanup entry for TX with ST=1)
Result

Transactional App
High Availability
HBase
Omid Client
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
Shadow
Cells
Single
point of
failure

Timestamp
Oracle
Transactional App
High Availability
HBase
Omid Client
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
Commit data
R/W data
Guarantee
SI
App Table
Shadow
Shadow
CellsRecovery
State
Primary
/
Backup

High Availability – Failing Scenario
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1)
TX 1
TO
Data Store Commit Table
Write(k1, v1) (ST=1)
TX 1 Write(k1, v1)
(k1, v1, 1)

Omid Client TSO P TSO B Table/SC CommitTableApp TO
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1

Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3)
TX 3
TO
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)19

Omid Client TSO B Table/SC CommitTableApp TO
Return TX 1 CT
(k1, v1, 1)
! exist
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)
(k2, v2, 1)
CT = 2
Return TX 1 CT
v2
(1, 2)Hadoop Summit SJ (June 29th 2016)20

Timestamp
Oracle
Transactional App
High Availability
HBase
Omid Client
Timestamp
Oracle
Get Start/Commit
Timestamps
Start/Commit TXs
Commit Table
Compactor
R/W data
Guarantee
SI
App Table
Shadow
Shadow
CellsRecovery
State

High Availability – Solution
Omid Client TSO P TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=1
TX(ST=1,E=1)
TX 1, 1
TO
TX 1 Write(k1, v1)
(k1, v1, 1)

Omid Client TSO P TSO B Table/SC CommitTableApp TO
Write(k2, v2)
(k1, v1, 1)
(k2, v2, 1)
Commit TX 1{k1, k2}
Commit TX 1
Get CT
CT=2
Persist commit details for TX 1

Omid Client TSO B Table/SC CommitTableApp
Begin TX
Begin TX Get ST
ST=3
TX(ST=3,E=3)
TX 3,3
TO
Read(k1) (ST=3)
TX 3 Read(k1)
(k1, v1, 1)
(k1, v1, 1)
(k2, v2, 1)Hadoop Summit SJ (June 29th 2016)24

Return TX1 CT
(k1, v1, 1)
! exist
(k2, v2, 1)
Invalid
Try invalidate
(1, -, invalid)
! exist
Read(k2) (ST=3)
(k2, v2, 1)
TX 3 Read(k2)

Return TX 1 CT
(k1, v1, 1)
! exist
! exist
(k2, v2, 1)
(1, 2, invalid)Hadoop Summit SJ (June 29th 2016)26

High Availability
 No runtime overhead in mainstream execution
• Minor overhead after failover
 TSO uses regular writes
 Leases for leader election
• Lease status check before/after writing to Commit Table

Perf. Improvements: Read-Only Txs
Omid Client TSO/TO Table/SC
Begin TX
TX(ST=1)
Read Ops for TX (ST=1)
App
Begin TX
Read Ops (in TX context)
TX Context
Read Results in Snapshot
Commit TX
Writeset is ∅, so no need to contact TSO!!!Success

TSO
HBase
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data

HBase
TSO
Perf. Improvements: Commit Table Writes
Omid
Client
HBase
TSO
Commit
Table
Commit
Data
Omid
Client
Commit
Data

0
50
100
150
200
250
300
350
400
1 2 4 6
Tps*103
Commit Table: # Region servers
Omid Throughput with Improvements

Summary
 Transactions in NoSQL
• Use cases in incremental big data processing
• Snapshot Isolation: Scalable consistency model
 Omid
• Web-scale TPS for HBase
• Reliable and performant
• Battle-tested
• http://omid.incubator.apache.org/

Questions?

Omid: A Transactional Framework for HBase

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to Omid: A Transactional Framework for HBase (20)

More from DataWorks Summit/Hadoop Summit (20)

Recently uploaded (20)

Omid: A Transactional Framework for HBase