SlideShare a Scribd company logo
Solr & Lucene at Etsy
       Gregg Donovan
    Technical Lead, Search
      gregg@etsy.com
1.5 years Solr & Lucene at Etsy.com

3 years Solr & Lucene at TheLadders.com
Solr & Lucene at Etsy
8+ million members
9.3 million items
800k+ active sellers
1+ billion pageviews / month
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Maximize Solr out-of-the-box
Hack at a low-level
Know when to do each
Solr & Lucene at Etsy
Or
Solr & Lucene at Etsy
Don’t fear trunk
builds.apache.org/job/Solr-trunk/changes
Solr & Lucene at Etsy
Solr & Lucene at Etsy
http://localhost:8393/solr/placesuggest/
                   select?
                q={!lucene}s*
  &sfield=latlong&pt=37.595804,-122.364521
&sort=div(geodist(),sqrt(sum(population,50)))
                    %20asc
{!lucene}
 {!field}
 {!term}
 {!boost}
 {!func}
{!dismax}
{!edismax}
Cheap ranking awesomeness
Solr & Lucene at Etsy
ExternalFileField ftw!
schema.xml:
    <fieldType name="file" keyField="treasury_id" defVal="0"
stored="false" indexed="true" class="solr.ExternalFileField"
valType="float"/>
    <field name="hotness" type="file"/>

/search/data/treasury/external_hotness.1306390802088:
1=2.3
2=1.7
3=1.1

Solr query:
sort={!func}hotness+desc
ExternalFileField caveats
More relevance: boost query
http://localhost:8983/solr/listings/select?
q={!boost b=$rel v=$qq}
&rel=category:furniture^10+OR+((-material:acrylic)
^5)
&qq=desk
Impression tracking
etsy.com/search?q=desk&explain=1
Side-by-Side testing
Solr & Lucene at Etsy
Cheap performance wins
Put off sharding till you must
cat ${indexDir}/* > /dev/null
Return IDs, minimize stored fields
RAM: $10-20 / GB
SSD: 0.1ms vs 10ms seek
Custom?
solr-user
Tools for low-level hacking
Continuous deployment
Solr & Lucene at Etsy
One button.
So easy a dog could do it.
Solr & Lucene at Etsy
Solr & Lucene at Etsy
MTTR > MTBF
Solr & Lucene at Etsy
Solr & Lucene at Etsy
github.com/etsy/logster
Tracking GC
export GC_DEBUG="-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:
+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -
XX:AdaptiveSizePolicyOutputInterval=1 -XX:+PrintTenuringDistribution -XX:
+PrintGCDetails -Xloggc:/var/log/search/gc.log"
Solr & Lucene at Etsy
Solr & Lucene at Etsy
Alerting
Testing
Solr & Lucene at Etsy
SaveAsFixture
Profiling
Java Primitive Library
         fastutil
         trove4j
Know the hooks
  SolrRequestHandler
  SearchComponent
    QParserPlugin
   SolrEventListener
       SolrCache
   ValueSourceParser
SolrIndexSearcher gotchas
                reference counting
             using it as a cache key:
   WeakHashMap<SolrIndexSearcher,MyValue> myCache...
Example:
personalized collections
Solr & Lucene at Etsy
fq={!term f=id}123 OR {!term f=id}456
Need a map of PK to docId
Use custom SolrCache plus SolrEventListener
                 to fill it
github.com/giokincade/FastTermFilter
i18n currency sorting and filtering
Solr & Lucene at Etsy
currency.xml:

<currencyConfig version="1.0">
! <currencies>
! ! <currency name="United States Dollar" symbol="$" code="USD"/>
! ! <currency name="Australian Dollar" symbol="$" code="AUD"/>
! ! <currency name="Canadian Dollar" symbol="$" code="CAD"/>
! ! <currency name="Czech Koruna" symbol="Kč" code="CZK"/>
...
! </currencies>
! <rates>
! ! <rate from="USD" to="AUD" rate="1.168750"/>
! ! <rate from="USD" to="CAD" rate="1.085000"/>
! ! <rate from="USD" to="CZK" rate="20.107500"/>
! ! <rate from="USD" to="DKK" rate="5.323750"/>
...
    </rates>
</currencyConfig>
price:[$10.00 to $50.00]

price:[10.00USD to 50.00USD]

       price:20.00EUR
MoneyFieldType.java:

  @Override
  public Query getRangeQuery(QParser parser, SchemaField field, String part1, String part2,
final boolean minInclusive, final boolean maxInclusive) {
    final MoneyValue p1 = MoneyValue.parse(part1, defaultCurrency);
    final MoneyValue p2 = MoneyValue.parse(part2, defaultCurrency);

    if (!p1.getCurrencyCode().equals(p2.getCurrencyCode())) {
      throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,
              new ParseException("Cannot parse range query " + part1 + " to " + part2 +
                      ": range queries only supported when upper and lower bound have same
currency."));
    }

      String currencyCode = p1.getCurrencyCode();
      final MoneyValueSource vs = new MoneyValueSource(field, currencyCode, parser);

      return new SolrConstantScoreQuery(new ValueSourceRangeFilter(vs,
              p1.getAmount() + "", p2.getAmount() + "", minInclusive, maxInclusive));
  }
Replication gotcha
SOLR-2202
Related Searches
Autosuggest!
bjewlery dewelry ejewelry ejwelry ewelery ewerly ewlery fewelry
fewlery fjewelery fjewelry gewerly gewlery hewelery hewelry hewerly
hewlery hjewelry iewelry ijewelry jawelery jawlery jeawlery jeelery
jeelry jeewelery jeewelry jeewlery jeewlry jefwelry jejelry jelelry
jelery jellery jelwelery jelwelry jelwlery jemelry jemerly jemwelry
jeqwelry jerelery jerelry jerely jererly jerlery jerwelery jerwelry
jerwely jerwerly jeselery jeselry jevelry jeverly jewalery jewdelry
jewedlry jeweelrry jeweelry jeweely jeweer jeweery jeweilry jeweiry
jewejery jewejlry jewejrly jewejry jewekey jewekry jewelary jeweldy
jewele jewelee jewelelry jewelera jewelerey jewelerly jewelert
jewelerty jeweleru jeweleruy jeweleryl jewelerys jeweleryy jewelet
jewelety jeweleya jewelfry jewelfy jeweliy jewellryp jewelltry
jewelly jewelory jewelra jewelray jewelre jewelree jewelreyy
jewelrfy jewelrh jewelri jewelrky jewelrly jewelrr jewelrs jewelrsy
jewelrt jewelrty jewelru jewelruy jewelrye jewelryh jewelryl
jewelrym jewelryr jewelrys jewelryt jewelryu jewelryuk
jewelryy jewelrz jewelsry jewelsy jeweltry jewelty jewelw jewelwery
jewelwey jewelwy jewelya jewelyj jewelyr jewelyry jewelyu jewelyy
jewelzry jeweory jewerey jeweriy jewerky jewerlary jewerley jewerli
jewerlly jewerls jewerlt jewerlu jewerlyh jewerlyr jewerlys jewerlyu
jewerry jeweryl jewetry jewewlry jewewly jewewrly jewewry jeweylry
jewiery jewilary jewkery jewlary jewledy jewleery jewlelery jewlely
The TermDictionary is not a whitelist
Solr & Lucene at Etsy

More Related Content

PDF
Solr @ Etsy - Apache Lucene Eurocon
PDF
Living with garbage
PDF
Solr & Lucene @ Etsy by Gregg Donovan
KEY
Unfiltered Unveiled
KEY
循環参照のはなし
PDF
Essentials and Impactful Features of ES6
PDF
PHP 7 – What changed internally? (Forum PHP 2015)
PDF
PHP 7 – What changed internally?
Solr @ Etsy - Apache Lucene Eurocon
Living with garbage
Solr & Lucene @ Etsy by Gregg Donovan
Unfiltered Unveiled
循環参照のはなし
Essentials and Impactful Features of ES6
PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally?

What's hot (20)

KEY
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
KEY
CS442 - Rogue: A Scala DSL for MongoDB
PDF
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PDF
... now write an interpreter (PHPem 2016)
PDF
Crafting Custom Interfaces with Sub::Exporter
PDF
Debugging: Rules And Tools - PHPTek 11 Version
KEY
(Parameterized) Roles
PDF
Teaching Your Machine To Find Fraudsters
KEY
Crazy things done on PHP
PDF
Doctrine MongoDB ODM (PDXPHP)
PPTX
OpenIRIS OpenAP
PDF
Your code is not a string
PDF
Barcelona.pm Curs1211 sess01
PDF
Dip Your Toes in the Sea of Security (PHP South Africa 2017)
PDF
You code sucks, let's fix it
PDF
Nubilus Perl
PDF
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
PPT
An Elephant of a Different Colour: Hack
PDF
Leveraging the Power of Graph Databases in PHP
Scala Days 2011 - Rogue: A Type-Safe DSL for MongoDB
CS442 - Rogue: A Scala DSL for MongoDB
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
... now write an interpreter (PHPem 2016)
Crafting Custom Interfaces with Sub::Exporter
Debugging: Rules And Tools - PHPTek 11 Version
(Parameterized) Roles
Teaching Your Machine To Find Fraudsters
Crazy things done on PHP
Doctrine MongoDB ODM (PDXPHP)
OpenIRIS OpenAP
Your code is not a string
Barcelona.pm Curs1211 sess01
Dip Your Toes in the Sea of Security (PHP South Africa 2017)
You code sucks, let's fix it
Nubilus Perl
Solr Anti-Patterns: Presented by Rafał Kuć, Sematext
An Elephant of a Different Colour: Hack
Leveraging the Power of Graph Databases in PHP
Ad

Viewers also liked (20)

PPTX
Webテクノロジー@2012
PPTX
Нестандартные методы интернет рекламы
PDF
What’s New in Apache Lucene 3.0
PPT
Hellosong
PPTX
Coterie 9 11
PDF
Lucene rev preso bialecki solr crawlers-lr
PPTX
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
PDF
最新ブラウザー UI 比較
PPTX
IE のサポート変更が Azure に及ぼす影響
PPTX
Network Forensics Puzzle Contest に挑戦 #2
PDF
Integrating Advanced Text Analytics into Solr
PDF
E learning At The Library
PDF
All Data Big and Small
PDF
Understanding Lucene Search Performance
PPTX
まっちゃ4451LT「IE の InPrivateブラウズ」
PDF
Building specialized industry applications using Solr, and migration from FAS...
PDF
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
PPTX
第4回「ブラウザー勉強会」オープニング トーク
PPTX
All the lovers
PPTX
корея
Webテクノロジー@2012
Нестандартные методы интернет рекламы
What’s New in Apache Lucene 3.0
Hellosong
Coterie 9 11
Lucene rev preso bialecki solr crawlers-lr
Chicago Solr Meetup - June 10th: Exploring Hadoop with Search
最新ブラウザー UI 比較
IE のサポート変更が Azure に及ぼす影響
Network Forensics Puzzle Contest に挑戦 #2
Integrating Advanced Text Analytics into Solr
E learning At The Library
All Data Big and Small
Understanding Lucene Search Performance
まっちゃ4451LT「IE の InPrivateブラウズ」
Building specialized industry applications using Solr, and migration from FAS...
"A Study of I/O and Virtualization Performance with a Search Engine based on ...
第4回「ブラウザー勉強会」オープニング トーク
All the lovers
корея
Ad

Similar to Solr & Lucene at Etsy (20)

PPTX
Open Source Search: An Analysis
PDF
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
PDF
Dirty Secrets of the PHP SOAP Extension
PDF
Solr As A SparkSQL DataSource
PDF
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
PDF
A noobs lesson on solr (configuration)
PDF
Os Pruett
PDF
The Ring programming language version 1.4.1 book - Part 13 of 31
PDF
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
PDF
The Ring programming language version 1.9 book - Part 53 of 210
PPTX
Solr vs. Elasticsearch - Case by Case
PPTX
Php & my sql
ODP
From 0 to 60 in SPARQL in 50 Minutes
PDF
Helvetia
PDF
Rapid prototyping search applications with solr
PPTX
Async Redux Actions With RxJS - React Rally 2016
PDF
Using Apache Solr
KEY
PHP security audits
PDF
Refactoring to Macros with Clojure
PPT
Propel sfugmd
Open Source Search: An Analysis
[제1회 루씬 한글분석기 기술세미나] solr로 나만의 검색엔진을 만들어보자
Dirty Secrets of the PHP SOAP Extension
Solr As A SparkSQL DataSource
Eric Redmond – Distributed Search on Riak 2.0 - NoSQL matters Barcelona 2014
A noobs lesson on solr (configuration)
Os Pruett
The Ring programming language version 1.4.1 book - Part 13 of 31
Streaming Aggregation in Solr - New Horizons for Search: Presented by Erick E...
The Ring programming language version 1.9 book - Part 53 of 210
Solr vs. Elasticsearch - Case by Case
Php & my sql
From 0 to 60 in SPARQL in 50 Minutes
Helvetia
Rapid prototyping search applications with solr
Async Redux Actions With RxJS - React Rally 2016
Using Apache Solr
PHP security audits
Refactoring to Macros with Clojure
Propel sfugmd

More from Lucidworks (Archived) (20)

PDF
Integrating Hadoop & Solr
PDF
The Data-Driven Paradigm
PDF
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
PDF
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
PPTX
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
PPTX
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
PPTX
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
PPTX
What's new in solr june 2014
PPTX
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
PPTX
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
PPTX
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
PDF
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
PDF
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
PDF
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PPTX
Solr At AOL, Presented by Sean Timm at SolrExchage DC
PPTX
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
PPTX
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
PPTX
Building a data driven search application with LucidWorks SiLK
PPTX
Introducing LucidWorks App for Splunk Enterprise webinar
Integrating Hadoop & Solr
The Data-Driven Paradigm
Downtown SF Lucene/Solr Meetup - September 17: Thoth: Real-time Solr Monitori...
SFBay Area Solr Meetup - July 15th: Integrating Hadoop and Solr
SFBay Area Solr Meetup - June 18th: Box + Solr = Content Search for Business
SFBay Area Solr Meetup - June 18th: Benchmarking Solr Performance
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
What's new in solr june 2014
Minneapolis Solr Meetup - May 28, 2014: eCommerce Search with Apache Solr
Minneapolis Solr Meetup - May 28, 2014: Target.com Search
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Unstructured Or: How I Learned to Stop Worrying and Love the xml, Presented...
Building a Lightweight Discovery Interface for Chinese Patents, Presented by ...
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Solr At AOL, Presented by Sean Timm at SolrExchage DC
Intro to Solr Cloud, Presented by Tim Potter at SolrExchage DC
Test Driven Relevancy, Presented by Doug Turnbull at SolrExchage DC
Building a data driven search application with LucidWorks SiLK
Introducing LucidWorks App for Splunk Enterprise webinar

Solr & Lucene at Etsy