Lessons from Sharding Solr

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

Lessons from Sharding Solr at Etsy
Gregg Donovan
@greggdonovan
Senior Software Engineer, etsy.com

• 5.5 Years Solr & Lucene at Etsy.com
• 3 Years Solr & Lucene at TheLadders.com
• Speaker at LuceneRevolution 2011 & 2013

Jeff Dean, Challenges in Building Large-Scale Information Retrieval Systems

Agenda
• Sharding Solr at Etsy V0 — No sharding
• Sharding Solr at Etsy V1 — Local sharding
• Sharding Solr at Etsy V2 (*) — Distributed sharding
• Questions
* —What we’re about to launch.

Sharding V0 — Not Sharding
• Why do we shard?
• Data size grows beyond RAM on a single box
• Lucene can handle this, but there’s a performance cost
• Data size grows beyond local disk
• Latency requirements
• Not sharding allowed us to avoid many problems we’ll discuss later.

• How to keep data size small enough for one host?
• Don’t store anything other than IDs
• fl=pk_id,fk_id,score
• Keep materialized objects in memcached
• Only index ﬁelds needed
• Prune index after experiments add ﬁelds
• Get more RAM

• How does it fail?
• GC
• Solution
• “Banner” protocol
• Client-side load balancer
• Client connects, waits for 4-bytes — OxCODEA5CF— from the server within 1-10ms before
sending query. Otherwise, try another server.

Sharding V1 — Local Sharding
• Motivations
• Better latency
• Smaller JVMs
• Tough to open a 31gb heap dump on your laptop
• Working set still ﬁt in RAM on one box.
• What’s the simplest system we can built?

• Lucene parallelism
• Shikhar Bhushan at Etsy experimented with segment level parallelism
• See Search-time Parallelism at Lucene Revolution 2014
• Made its way into LUCENE-6294 (Generalize how IndexSearcher parallelizes collection
execution). Committed in Lucene 5.1.
• Ended up with eight Solr shards per host, each in its own small JVM
• Moved query generation and re-ranking to separate process: the “mixer”

• Based on Solr distributed search
• By default, Solr does two-pass distributed search
• First pass gets top IDs
• Second pass fetches stored ﬁelds for each top document
• Implemented distrib.singlePass mode (SOLR-5768)
• Does not make sense if individual documents are expensive to fetch
• Basic request tracing via HTTP headers (SOLR-5969)

• Required us to fetch 1000+ results from each shard for reranking layer
• How to efficiently fetch 1000 documents per shard?
• Use Solr’s field syntax to fetch data from FieldCache
• e.g. fl=pk_id:field(pk_id),fk_id:field(fk_id),score
• When all fields are “pseudo” fields, no need to fetch stored fields per document.

• Result
• Very large latency win
• Easy system to manage
• Well understood failure and recovery
• Avoided solving many distributed systems issues

Sharding V2 — Distributed Sharding
• Motivation
• Further latency improvements
• Prepare for data to exceed a single node’s capacity
• Signiﬁcant latency improvements require ﬁner sharding, more CPUs per request
• Requires a real distributed system and sophisticated RPC
• Before proceeding, stop what you’re doing and read everything by Google’s Jeff Dean and
Twitter’s Marius Eriksen

• New problems
• Partial failures
• Lagging shards
• Synchronizing cluster state and conﬁguration
• Network partitions
• Jespen
• Distributed IDF issues exacerbated

Solving Distributed IDF
• Inverse Document Frequency (IDF) now varies across shards, biasing ranking
• Calculate IDF ofﬂine in Hadoop
• IDFReplacedSimilarityFactory
• Ofﬂine data populates cache of Map<BytesRef,Float> (term —> score)
• Override SimilarityFactory#idfExplain
• Cache misses given rare document constant
• Can be extended to solve i18n IDF issues

• ShardHandler
• Solr’s abstraction for fanning out queries to shards
• Ships with default implementation (HttpShardHandler) based on HTTP 1.1
• Does fanout (distrib=true) and processes requests coming from other Solr nodes
(distrib=false).
• Reads shards.rows and shards.start parameters

ShardHandler API
Solr’s SearchHandler calls submit for each shard and then either takeCompletedIncludingErrors
or takeCompletedOrError depending on partial results tolerance.
public abstract class ShardHandler { 
public abstract void checkDistributed(ResponseBuilder rb);
 
public abstract void submit(ShardRequest sreq, String shard, ModifiableSolrParams params); 
public abstract ShardResponse takeCompletedIncludingErrors(); 
public abstract ShardResponse takeCompletedOrError(); 
public abstract void cancelAll(); 
public abstract ShardHandlerFactory getShardHandlerFactory(); 
}

Distributed query requirements
• Distributed tracing
• E.g.: Google’s Dapper, Twitter’s Zipkin, Etsy’s CrossStich
• Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
• Handle node failures, slowness

Better Know Your Switches
Have a clear understanding of your networking requirements and whether your hardware meets
them.
• Prefer line-rate switches
• Prefer cut-through to store-and-forward
• No buffering, just read the IP packet header and move packet to the destination
• Track and graph switch statistics in the same dashboard you display your search latency stats
• errors, retransmits, etc.

First experiment, Twitter’s Finagle
• Built on Netty
• Mux RPC multiplexing protocol
• SeeYour Server as a Function by Marius Eriksen
• Built-in support for Zipkin distributed tracing
• Served as inspiration for Facebook’s futures-based RPC Wangle
• Implemented a FinagleShardHandler

Second experiment, custom Thrift-based protocol
• Blocking I/O easier to integrate with SolrJ API
• Able to integrate our own distributed tracing
• LZ4 compression via a custom Thrift TTransport

Future experiment: HTTP/2
• One TCP connection for all requests between two servers
• Libraries
• Square’s OkHttp
• Google’s gRpc
• Jetty client in 9.3+ — appears to be Solr’s choice

Implementation note
• Separated fanout from individual request processing
• SolrJ client via an EmbeddedSolrServer containing empty RAM directory.
• Saves a network hop
• Makes shards easier to proﬁle, tune
• Can return result to SolrJ without sending merged results over the network

• Good
• Individual shard times demonstrate very low average latency
• Bad
• Overall p95, p99 nowhere near averages
• Why? Lagging shards due to GC, ﬁlterCache misses, etc.
• More shards means more chances to hit outliers

• Solutions
• See The Tail at Scale by Jeff Dean, CACM 2013.
• Eliminate all sources of inter-host variability
• No ﬁlter or other cache misses
• No GC
• Eliminate OS pauses, networking hiccups, deploys, restarts, etc.
• Not realistic

• Backup Requests
• Methods
• Brute force — send two copies of every request to different hosts, take the fastest
response
• Less crude — wait X milliseconds for the ﬁrst server to respond, then send a backup
request.
• Adaptive — choose X based on the ﬁrst Y% of responses to return.
• Cancellation — Cancel the slow request to save CPU once you’re sure you don’t need it.

• “Good enough”
• Return results to user after X% of results return if there are enough results. Don’t issue
backup requests, just cancel laggards.
• Only applicable in certain domains.
• Poses questions:
• Should you cache partial results?
• How is paging effected?

Resilience Testing
Now you own a distributed system. How do you know it works?
• “The Troublemaker”
• Inspired by Netﬂix’s Chaos Monkey
• Authored by Etsy’s Toria Gibbs
• Make sure humans can operate it
• Failure simulation — don’t wait until 3am
• Gameday exercises and Runbooks

Better Know Your Kernel
A lesson not about sharding learned while sharding…
• Linux’s futex_wait() was broken in CentOS 6.6
• Backported patches needed from Linux 3.18
• Future direction: make kernel updates independent from distribution updates
• E.g. Plenty of good stuff (e.g. networking improvements, kernel introspection [see
@brendangregg]) between 3.10 and 4.2+, but it won’t come to CentOS for years
• Updating kernel alone easier to roll out

What else are we working on?
• Mesos for cluster orchestration
• GPUs for massive increases in per query computational capacity

Thanks for coming.
gregg@etsy.com
@greggdonovan

Lessons from Sharding Solr

More Related Content

What's hot (17)

Similar to Lessons from Sharding Solr (20)

Recently uploaded (20)

Lessons from Sharding Solr