HBaseCon 2013: A Developer’s Guide to Coprocessors

A Developers Guide To Coprocessors
Hbasecon 2013John Weatherford
https://github.com/jweatherford

Telescope is the leading provider of interactive
television, audience participation and customer engagement
solutions.
Clients include TV networks, producers, digital
platforms, studios, and sponsors seeking to
reach, engage, and retain mass-audiences and consumers in
real-time.
Who Is Telescope?

Arbitrary code that can run on each server
Extendthe functionality of Hbase
Avoid bothering the core committers
What Is A Coprocessor

Region 2
Endpoint
Region 3
Post-Action
Endpoint
Endpoints
Call a function explicitly
Execute code on all regions
Action
Observers
React to an event
Run code before or after
Two Types of Coprocessors
Pre-ActionClient
Region 1
Endpoint
Client

What Can I Do With Coprocessors
Ideas
what can be done
Access Control
Secondary Indexes
Optimized Search
Data Aggregation
Control compaction times
Real Time Analytics
Reduce result sets
Cache Request
Email split alerts

Getting Started With Code
preGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get,
List<KeyValue> result)
postGet(ObserverContext<RegionCoprocessorEnvironment> c, Get get,
List<KeyValue> result)
prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put,
WALEdit edit, boolean writeToWAL)
postPut(ObserverContext<RegionCoprocessorEnvironment> c, Put put,
preDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete,
postDelete(ObserverContext<RegionCoprocessorEnvironment> c, Delete delete,

Our First Observer
Intercept and modify the action
Consider all circumstances that will trigger the observer
Compile your jar to the same version of Java running your
Hbase Regions
Look for output from the coprocessor

key: id-1332343
twitter:name: “loljk4u”
twitter:message: “<3”
twitter:length: 0x2
twitter:registered: 0xFF
favorite:name: “Taylor”
favorite:song: “I knew
you were trouble”
Our First Observer
Motivation Apache flume only writes one column per put
{twitter:
{ name: “loljk4u”,
message: “<3”,
length: 2,
registered: true
},
favorite:
{ name: “Taylor”
...
JSON
key: id-1332343
family: twitter
qualifier: json_raw
value: “{twitter:
{name: “loljk4u”,
message: “<3”,
length: 2,
registered: true
...
Single
Row Put
preput()
put

JsonColumnExpander
//get the arguments on the coprocessor
public void start(CoprocessorEnvironment env) throws IOException {
Configuration c = env.getConfiguration();
families = c.get("families", "").split(":");
}
public void prePut(ObserverContext<…> e, Put put, WALEdit edit, boolean waL) {
if(!put.has(FAMILY, JSON_COLUMN)) { return; } //check for the json_raw column
String json = Bytes.toString(put.get(FAMILY, JSON_COLUMN).get(0).getValue());
for(Entry<String, ?> column : columns.entrySet()) { //loop through the json
String value = (String) column.getValue();
put.add(family, Bytes.toBytes(column.getKey()), Bytes.toBytes(value));
}
//remove the original json from the put
put.add(FAMILY, JSON_COLUMN, "--removed--".getBytes());
}

Loading the Coprocessor
Push the jar to where your cluster can find it
$>hadoop fs –put JsonColumnExpander.jar /
Alter the table to enable the coprocessor
$> alter „test', METHOD =>
'table_att', 'coprocessor'=>'hdfs:///JsonColumnExpander.jar|telesco
pe.hbase.JsonColumnExpander|1001|arg1=1,arg2=2„
Verify the load by checking the master web UI.

Running The Code
Trigger the coprocessor with a put on the table
Put put = new Put(“rowkey”);
Put.add(“twitter”.toBytes(), “json_raw”.toBytes(), json_data);
Check each server’s local logs
http://regionnode:60030/logs/
hbase-hbase-regionserver-node2.
dev-hadoop.telescope.tv.out

Creating Your First Endpoint
Define the available methods a protocol
Implement the protocol
Extend BaseRegionEndpoint
Load the endpoint on the table

Endpoint Example
public interface TrendsProtocol extends CoprocessorProtocol{
HashMap<String, Long> getData() throws IOException;
}
//The endpoint class implements the protocol we wrote above
public class TrendsEndpoint extends BaseEndpointCoprocessor implements TrendsProtocol {
@Override
public HashMap<String, Long> getTrends() throws IOException {
RegionCoprocessorEnvironment environment = getEnvironment();
InternalScanner scanner = environment.getRegion().getScanner(s);
try {
List<KeyValue> curVals = new ArrayList<KeyValue>();
do {
curVals.clear();
for(KeyValue pair : curVals){
//loop through values on the region and process
}
}while(!done);
}
}
}

Endpoint Returned Results
htable = HBaseDB.getTable(connection, “hbase_demo");
Map<byte[], HashMap<String, Long>> results = null;
results = m_analytics.coprocessorExec(
TrendsProtocol.class,
null, //start row
null, //end row
new Batch.Call<TrendsProtocol, HashMap<String, Long>>(){
@Override
public HashMap<String, Long> call(TrendsProtocol trends)throws IOException {
return trends.getData();
}
}
);
for (Map.Entry<byte[], Boolean> entry : results.entrySet()) {
//process results from each region server
}

Addendum to Endpoints
0.96 is changing Endpoints to use protobuf
public static abstract class RowCountService
implements com.google.protobuf.Service {
...
public interface Interface {
public abstract void getRowCount(
com.google.protobuf.RpcController controller,
CountRequest request,
com.google.protobuf.RpcCallback done);
public abstract void getKeyValueCount(
com.google.protobuf.RpcController controller,
CountRequest request,
com.google.protobuf.RpcCallback done);
}
}

Telescope’s Coprocessors
Observers collect real time analytics data for our
moderation platform as well as to create aggregate tables
for the steaming data
Endpoints optimize searches and transmit only the
necessary data. Perform simple reporting queries that
don’t need the full power of mapreduce.

Questions?
Alreadyusing coprocessors? I would love to hear about it.
Curious to know more about a specific part?
All code samples and table definitions can be found at
https://github.com/jweatherford

HBaseCon 2013: A Developer’s Guide to Coprocessors

More Related Content

What's hot (19)

Similar to HBaseCon 2013: A Developer’s Guide to Coprocessors (20)

More from Cloudera, Inc. (20)

Recently uploaded (20)

HBaseCon 2013: A Developer’s Guide to Coprocessors

Editor's Notes