SF ElasticSearch Meetup 2012.10.03

Scaling ElasticSearch

SF Meetup
2012.10.03

Sushant Shankar
sushant.shankar@33across.com

Agenda
• Why we need a search engine
• Monitoring
• Index Building
• Query Performance

Who is asdfas
>600,000 Publishers
Machine Learning and Graph algorithms to:
- Build advertising segments
- Extract insights out of social and interest data
- Target via high-performance distributed systems that
integrate with our advertising partners

Website | Facebook | Twitter

Why we really need a search engine
Batch! Good for complicated tasks
(Machine Learning, Graph Algorithms, etc.)

… …

INDEX BUILDING
1 WEEK → 3 HOURS

Mappers to build index

6 nodes, 24GB RAM
16GB for ES service
4 cores
3x 1.5TB drive

>1TB/index
Build index
(replicated)
using MR job
~300M documents
and Bulk API
~5KB / document
~3 hours

Parameter Optimization
Amount bulk indexed

Time taken
CPU util.
Mem util.
Disk I/O
Network

# Shards

Index Building: Learnings
• Bulk API
• No replicas
• 2 shards / CPU
• 10,000 documents (users) per indexing
request
• Refresh off (index.refresh_interval = -1)

QUERY PERFORMANCE
5 MINUTES  10 SECONDS

Query Performance: Learnings
• 1-2 Replicas (and for reliability)
• Turn refresh on again (5s default)
• Warm up effect (Index Warm up API 0.20+)
• Optimize API
• Simulate multiple users

Warm Up: load into memory and cache

Other cool features
• Custom Scoring functions
• Scripts – MVEL, Python
• Facets

• Exploring:
• Real-time indexing
• Indexing images, files, etc.
• Parent-child relationships

SF ElasticSearch Meetup 2012.10.03

More Related Content

What's hot (19)

Similar to SF ElasticSearch Meetup 2012.10.03 (20)

Recently uploaded (20)

SF ElasticSearch Meetup 2012.10.03

Editor's Notes