Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase

ノンブロッキングIOで分散システム
を手懐ける
ーチャットワークでのasynchbaseの
利用
Non-blocking IO to tame distributed systems
ー How and why ChatWork uses asynchbase
安田裕介/Yusuke Yasuda (@TanUkkii007)

Agenda
● How we used a native HBase client
● Problems we faced with a native HBase client
● Migration to asynchbase
● Blocking IO vs Non-blocking IO: performance test results

About me
● Yusuke Yasuda / 安田裕介
● @TanUkkii007
● Working for Chatwork for 2 years
● Scala developer

How we used a native HBase client

Messaging system architecture overview
You can find more information about our architecture at Kafka summit 2017.
Today’s topic

HBase
● Key-value storage to enable random access on HDFS
● HBase is used as a query-side storage in our system
○ version: 1.2.0
● Provides streaming API called “Scan” to query a sequence of
rows iteratively
● Scan is the most used HBase API in ChatWork

Synchronous scan with native HBase client
A bad example
def scanHBase(connection: Connection, tableName: TableName, scan: Scan): Vector[Result] = {
val table: Table = connection.getTable(tableName)
val scanner: ResultScanner = table.getScanner(scan)
@tailrec
def loop(results: Vector[Result]): Vector[Result] = {
val result = scanner.next()
if (result == null)
results
else
loop(results :+ result)
}
try {
loop(Vector.empty)
} finally {
table.close()
scanner.close()
}
}
● a thread is not released
until whole scan is
finished
● throughput is bounded
by the number of threads
in a pool
● long running blocking
calls cause serious
performance problem in
event loop style
application like Akka
HTTP
Cons:
Gist

Throughput and Latency trade-off
in asynchronous and synchronous settings
asynchronous : throughput=8, latency=2
synchronous: throughput=4, latency=1
Asynchronous setting is more flexible and fair!
synchronous asynchronous
Optimized for latency throughput
Under high
workload
throughput is
bounded
throughput
increases while
sacrificing
latency
Under low
workload
Requests for
many rows
are executed
exclusively
are evenly
scheduled as
small requests
both have equal latency and
throughput

Asynchronous streaming of Scan operation
with Akka Stream
class HBaseScanStage(connection: Connection, tableName: TableName, scan: Scan)
extends GraphStage[SourceShape[Result]] {
val out: Outlet[Result] = Outlet("HBaseScanSource")
override def shape: SourceShape[Result] = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
var table: Table = _
var scanner: ResultScanner = _
override def preStart(): Unit = {
table = connection.getTable(tableName)
scanner = table.getScanner(scan)
}
setHandler(out, new OutHandler {
override def onPull(): Unit = {
val next = scanner.next()
if (next == null)
complete(out)
else
push(out, next)
}
})
override def postStop(): Unit = {
if (scanner != null) scanner.close()
if (table != null) table.close()
super.postStop()
}
}
}
● ResultScanner#next() is passively called
inside callback in a thread safe way
● thread is released immediately after
single ResultScanner#next() call
● Results are pushed to downstream
asynchronously
● when and how many times next()s are
called is determined by downstream
Gist

Problems we faced
caused by a native HBase client

Just a single unresponsive HBase region
server caused whole system degradation
The call queue size of hslave-5 region server spiked.
All Message Read API servers suffered
latency increase and throughput fall.

Distributed systems are supposed
to fail partially but why not?
● Native HBase client uses blocking IO
● Requests to unresponsive HBase block a
thread until timeout
● All threads in a thread pool are consumed
so Message Read API servers were not
able to respond
upper limit of pool size
HBase IPC queue size
thread pool status in Read API servers
#active threads

Asynchronous streaming
is not enough.
Non-blocking IO matters.
What we learned

asynchbase
Non-blocking HBase client based on Netty
● https://github.com/OpenTSDB/asynchbase
● Netty 3.9
● Supports reverse scan since 1.8
● Asynchronous interface by Deferred
○ https://github.com/OpenTSDB/async
○ Observer pattern that provides callback interfaces
● Thread safety provided by Deferred
○ Event loop executes volatile checks at each step
○ Safe to mutate states inside callbacks

Introduce streaming
interface to
asynchbase with Akka
Stream
class HBaseAsyncScanStage(scanner: Scanner)
extends GraphStage[SourceShape[util.ArrayList[KeyValue]]] with HBaseCallbackConversion {
val out: Outlet[util.ArrayList[KeyValue]] = Outlet("HBaseAsyncScanStage")
override def shape: SourceShape[util.ArrayList[KeyValue]] = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
var buffer: List[util.ArrayList[KeyValue]] = List.empty
setHandler(out, new OutHandler {
override def onPull(): Unit = {
if (buffer.isEmpty) {
val deferred = scanner.nextRows()
deferred.addCallbacks(
(results: util.ArrayList[util.ArrayList[KeyValue]]) => callback.invoke(Option(results)),
(e: Throwable) => errorback.invoke(e)
)
} else {
val (element, tailBuffer) = (buffer.head, buffer.tail)
buffer = tailBuffer
push(out, element)
}
}
})
override def postStop(): Unit = {
scanner.close()
super.postStop()
}
private val callback = getAsyncCallback[Option[util.ArrayList[util.ArrayList[KeyValue]]]] {
case Some(results) if !results.isEmpty =>
val element = results.remove(0)
buffer = results.asScala.toList
push(out, element)
case Some(results) if results.isEmpty => complete(out)
case None => complete(out)
}
private val errorback = getAsyncCallback[Throwable] { error => fail(out, error) }
}
}
※ This code contains a serious issue.
You must handle downstream cancellation properly.
Otherwise a Close request may be fired while NextRows
request is still running, which causes HBase protocol violation.
See how to solve this problem on the Gist.
Gist

Customizing Scan behavior with
downstream pipelines
HBaseAsyncScanSource(scanner).take(1000)
HBaseAsyncScanSource(scanner)
.throttle(elements=100, per=1 second, maximumBurst=100, ThrottleMode.Shaping)
HBaseAsyncScanSource(scanner).completionTimeout(5 seconds)
HBaseAsyncScanSource(scanner).recoverWithRetries(10, {
case NotServingRegionException => HBaseAsyncScanSource(scanner)
})
● early termination of scan when count of rows limit is reached
● scan iteration rate limiting
● early termination of scan by timeout
● retrying if a region server is not serving
Gist

Switching from synchronous API to
asynchronous API
● Switching from synchronous API to asynchronous API usually
requires rewriting whole APIs
● Abstracting database drivers is difficult
● Starting with asynchronous interface like Future[T] is a good
practice
● Another option for abstract interface is streams
● Streams can behave collections like Future, Option, List, Try, but
do not require monad transformer to integrate each other
● Stream interface specification like reactive-streams (JEP266)
gives a way to connect various asynchronous libraries
● Akka Stream is one of the implementations of the reactive-streams

Database access abstraction with streams
Transport Interface Layer
interface: Directive[T], Future[T]
engine: Akka HTTP
Stream Adaptor
interface: Source[Out, M], Flow[In, Out, M], Sink[In, M]
engine: Akka Stream
Database Interface Layer
interface: implementation specific
engine: database driver
● native HBase client
● asynchbase
● HBaseScanStage
● HBaseAsyncScanStage
● ReadMessageDAS
UseCase Layer
engine: Akka Stream
Domain Layer
interface: Scala collections and case classes
engine: Scala standard library

Transport Interface Layer
interface: Directive[T], Future[T]
engine: Akka HTTP
Stream Adaptor
engine: Akka Stream
Database Interface Layer
interface: implementation specific
engine: database driver
● native HBase client
● asynchbase
● HBaseScanStage
● HBaseAsyncScanStage
● ReadMessageDAS
UseCase Layer
engine: Akka Stream
Domain Layer
interface: Scala collections and case classes
engine: Scala standard library
● Stream abstraction mitigates impact of changes of underlying implementations
● Database access implementation can be switched by Factory functions
● No change was required inside UseCase and Domain layers
Database access abstraction with streams

Blocking IO vs Non-blocking IO
performance test results
Fortunately we have not faced HBase issues since asynchbase migration in production.
Following slides show performance test results that was conducted before asynchbase deployment.

performance test settings
● Single Message Read API server
○ JVM heap size=4GiB
○ CPU request=3.5
○ CPU limit=4
● Using production workload pattern simulated with gatling stress tool
● 1340 request/second
● mainly invokes HBase Scan, but there are Get and batch Get
as well
Both implementations with asynchbase and native HBase client are
tested with the same condition.

throughput
Message Read API server
with native HBase client
with asynchbase
throughput: 1000 → 1300

latency
Message Read API server with
native HBase client
with asynchbase
※ Note that the scales of y-axis are different.
99pt.: 2000ms → 300ms
95pt.: 1000ms → 200ms

Thread pool usage
native HBase client
with asynchbase
Note that hbase-dispatcher is an application
thread pool, not Netty IO worker thread pool.
pool size: 600 → 8
active threads: 80 → 2

JVM heap usage
native HBase client
asynchbase
heap usage: 2.6GiB → 1.8Gi

HBase scan metrics
native HBase client
asynchbase
average of sum of millis sec between nexts average of sum of millis sec between nexts

HBase scan metrics may come to
asynchnase
https://github.com/OpenTSDB/asynchbase/pull/184

Room for improvement
Timeouts and Rate limiting
● Proper timeouts and rate limiting are necessary for asynchronous and non-blocking
systems
○ Without reins asynchronous system increases its throughput until consumes
all resources
● Timeouts
○ completionTimeout: timout based on total processing time
■ Not ideal for Scan that has broad distribution of processing time
○ idleTimeout: timeout based on processing time between two data
■ Single iteration of Scan has sharp distribution of processing time.
Probably a better strategy.
● Rate limiting
○ Under high workload, the first bottleneck is throughput of storage of HBase
■ How to implement storage-aware rate limiting?
■ Tuning application resources may be necessary

Conclusion
● Blocking IO spoils benefits of distributed databases
○ partial failure of database exhausts application threads and makes
the application unresponsive
● Non-blocking IO is resilient to partial failure
● Asynchronous stream is great as a flexible execution model and abstract
interface
● asynchronous stream with Non-blocking IO outperforms blocking one
● Our journey for resilient system continues

Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase

More Related Content

What's hot (20)

Similar to Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase (20)

More from TanUkkii (15)

Recently uploaded (20)

Non-blocking IO to tame distributed systems ー How and why ChatWork uses asynchbase