HPTS talk on micro-sharding with Katta

Text Retrieval Task

• Text viewed as a sequences of terms in fields

• Document and position for each term are indexed

• Query is a sequence of terms (typically many more than
user actually types)

Text Retrieval

• Scores computed by merging occurrences of terms in
query

• Only top scoring documents are kept

• Deletion and document edits done by adding new
documents and keeping deletion list

• (this is all standard ... Lucene is the best known example)

Traditional Scaling
Sharding
1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n
?
shard shard shard shard shard
1 2 3 4 5
Replication

1 2 3 4 5

1 2 3 4 5

Consistent Hashing
1 0
0 1

0 1

Problems

• Presumes objects can be moved individually

• Has very high insertion/deletion rate

• Has disordered access patterns

• Often exhibits content/placement correlations

Micro Sharding
map reduce hdfs
Retrieval Indexer #1
for (t in types) Retrieval Indexer #2
yield [key:(t, h(key)%shardCnt), Retrieval Indexer #n

value:doc]
Content Indexer #1
Content Indexer #2
Content Indexer #m

n,m >> number of search nodes

Search Architecture
Retrieval Engine #1

presentation federator Retrieval Engine #2
layer federator

Retrieval Engine #n

Content Engine #1

Content Engine #m

Control Architecture

federator Retrieval Engine #2

zookeeper

katta
HDFS
master
indexer

Scenario: Node Start
●
Node starts, tells ZK it exists and has no
shards
●
Master notified by ZK, looks at shard
placement
●
Imbalance exists so Master assigns shards to
new node
●
Node notified by ZK, downloads shard, tells ZK
●
Master notified by ZK, looks at shard
placement, unassigns shard somewhere

Scenario: Node Crash
●
ZK detects node connection loss and session
expiration
●
Master is notified by ZK that node ephemeral
file has vanished, looks at shard placements
●
If under-replication exists, Master assigns
shards to other nodes
●
Nodes are notified by ZK, download shards, tell
ZK
●
Master is notified by ZK, no action needed

Summary of Master
●
Master is notified of node set or shard set
change
●
Master examines current state of cluster
●
If shards are under-replicated, add
assignments
●
If shards are over-replicated, delete
assignments
●
If cluster is imbalanced, add assignments
●
Rinse, repeat

Quick Results
• No deletion/insertion in indexes at runtime

• Reloading micro-shards allows large sequential transfers

• Multiple shards allows very simple threading of search

• Random placement guided by balancing policy gives near
optimal motion

• Node addition and failure are simple, reliable
• = Random sharding also near optimal
local global statistics, 2x query time improvement
load balancing
uniform management

Building Blocks
• EC2 - elastic compute

• Zookeeper - reliable coordination

• Katta - shard and query management

• Hadoop - map-reduce, RPC for Katta

• Lucene - candidate set retrieval, index file storage

• Deepdyve search algorithms - segment scoring

Zookeeper
• Replicated key-value in-memory store
• Minimal semanticsversion
create, read, replace specified
sequential and ephemeral files
notifications

• orderingstrict correctness guarantees
strict
Very
quorum writes
no blocking operations
no race conditions

• High speed 200,000 reads per second
50,000 updates per second,

Katta Interface
• -Simple Interface for query, vertical broadcast for update
Client horizontal broadcast
InodeManaged - add/removeShard

• Pluggable Application Interface
• current returnReturn Policy
Given
Pluggable
state
return < 0 => done
return 0 => return result, allow updates
return n => wait at most n milliseconds

• Comprehensivetimes
Results, exceptions, arrival
Results

Horizontal/Vertical
Broadcast
1..n/4 n/4+1...n/2 n/2+1...3n/4 3n/4+1...n

shard shard shard shard
1 2 3 4
Replication

1 2 3 4

1 2 3 4

Operations

federator Retrieval Engine #2

zookeeper

katta
HDFS
master
indexer

Impact of Cloud
Approach
• Scale-free programming

• Deployed in EC2 (test) or in private farm (production)

• No single point of failure

• Real-time scale up/down

• Extensible to real-time index updates

Lessons

• Random document to shard assignments
no correlations in
is good

⇒ strong bounds on node variations in search time
⇒ local statistics are as good as global statistics

no structure in shard to node assignments

⇒ node failure is not correlated to documents
⇒ load balancing and rebalancing is trivial
⇒ threaded search is trivial

More Lessons

• Randomized clustering requires good coordination
Zookeeper makes that easy

• Good coordination means not having to say you’re sorry
Masters coordinate but don’t participate

Resources
●
My blog
– http://tdunning.blogspot.com/

●
The web-site
– www.deepdyve.com

●
Source code
– Katta (sourceforge)
– Hadoop (Apache)
– Lucene (Apache)

HPTS talk on micro-sharding with Katta

More Related Content

What's hot (18)

Viewers also liked (19)

Similar to HPTS talk on micro-sharding with Katta (20)

More from Ted Dunning (20)

Recently uploaded (20)

HPTS talk on micro-sharding with Katta