SlideShare a Scribd company logo
Ferret
A Ruby Search Engine
  Brian Sam-Bodden
Agenda

• What is Ferret?
• Concepts
• Fields
• Indexing
• Installing Ferret
Agenda

• The Recipe
• Documents
• Ferret::Index::Index
• FQL
• Ferret in you App
Agenda

• Ferret in Rails
• Resources
What is Ferret?

• Information Retrieval (IR) Library
• Full-featured Text Search Engine
• Inspired on the         Search Engine

• Port to Ruby by David Balmain
What is Ferret?

• Initially a 100% pure Ruby port
• Since 0.9 many core functions are
  implemented in C

• Fast! Now Faster than Lucene ;-)
Concepts
Concepts

• Index : Sequence of documents
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
Concepts

• Index : Sequence of documents
• Document : Sequence of fields
• Field : Named sequence of terms
• Term : A text string, keyed by field name
Fields of a Document in
        an Index
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed
Fields of a Document in
           an Index
• Fields are individually searchable units
  that are:
  • Stored: The original Terms of the fields are store
  • Indexed: Inverted to rapidly find all Documents
    containing any of the Terms

  • Tokenized: Individual Terms extracted are
    indexed

  • Vectored: Frequency and location of Terms are
    stored
It’s all about Indexing

• Indexing is the processing of a source
  document into plain text tokens that Ferret
  can manipulate
• For any non-plaintext sources such as PDF,
  Word, Excel you need to:
  • Extract
  • Analyze
Installing Ferret
Installing Ferret



gem install ferret
Installing Ferret
Installing Ferret
Installing Ferret



    }
Installing Ferret



    }   Pick the latest version
        for your platform
The Recipe
The Recipe

1. Create some Documents
The Recipe

1. Create some Documents

2. Create an Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index
The Recipe

1. Create some Documents

2. Create an Index

3. Adding Documents to the Index

4. Perform some Queries
Example Documents
 Create some Documents
Example Documents
  Create some Documents




 “Any String is a Document”
Example Documents
 Create some Documents
Example Documents
   Create some Documents




[“This”, “is also”, “a document”]
Example Documents
 Create some Documents
Example Documents
 Create some Documents
Ferret::Index::Index
     Create an Index
Ferret::Index::Index
            Create an Index

• Indexes are encapsulated by the class
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
Ferret::Index::Index
             Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
Ferret::Index::Index
              Create an Index

• Indexes are encapsulated by the class
 ➡ Ferret::Index::Index
• Use the alias Ferret::I for convenience
• Index can be persistent
 ➡ index = Ferret::I.new(:path = > ‘/somepath’)
• Or, completely in Memory
 ➡ index = Ferret::I.new()
Ferret::Index::Index
     Adding Documents to the Index

• Index provides the add_document
  method

• It also provides the << alias
• Adding documents is then as easy as:
 ➡ index << “This is a document”
 ➡ index << {:first => “Bob”, :last => “Smith”}
Ferret::Index::Index
   Perform some Queries
Ferret::Index::Index
         Perform some Queries

• Index provides the search and
  search_each methods
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
Ferret::Index::Index
           Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
Ferret::Index::Index
          Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
Ferret::Index::Index
            Perform some Queries

• Index provides the search and
  search_each methods

• search method takes a query and a an
  optional set of parameters:
 ➡ search(query, options = {})
• The search_each method provides an
  iterator block
 ➡ search_each(query, options = {}) {|doc, score| ... }
Playing with Ferret in irb
Playing with Ferret in irb
Ferret Query Language

• Ferret own Query Language, FQL is a
  powerful way to specify search queries

• FQL supports many query types,
  including:

     • Term         • Range
     • Phrase       • Wild
     • Field        • Fuzz
     • Boolean
Index.explain

• The explain method of Index describes
  how a document score against a query
 • Very useful for debugging
 • and for learning how Ferret works
Index.explain
Ferret in your App
Application


                   Database             Web


                                                                   User
                                          Manual
              File System
                                           Input


                                                      Get User’s             Present
                        Gather Data                                       Search Results
                                                        Query



                              Index
                            Documents                        Search Index
Ferret




                                              Index
Ferret in Rails

• Acts As Ferret is an ActiveRecord
  extension

• Available as a plugin
• Provides a simplified interface to
  Ferret
• Maintained by Jens Kramer
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails

• Adding an index to an ActiveRecord
  model is as simple as:
Ferret in Rails
• Simple model has two searchable
  fields title and body:
Ferret in Rails

• After a quick rake db:migrate we now
  have some data to play with
• Fire up the Rails Console and let’s see
  what acts_as_ferret can do for our
  models
Ferret in Rails
Want more?

• Ferret is improving constantly
• Acts As Ferret seems to catch up
  quickly

• Real-life usage seems to require some
  good engineering on your part

  • Background indexing
  • Hot swap of indexes?
Want more?

• We only covered the simplest
  constructs in Ferret

• Ferret’s API provides enough
  flexibility for the most demanding
  searching needs
Online Resources

• http://ferret.davebalmain.com
• http://lucene.apache.org
• http://lucenebook.com
• http://projects.jkraemer.net/acts_as_ferret
In-Print Resources
Thanks!

More Related Content

PPTX
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
PPTX
Apache Solr
PPT
Lucece Indexing
PDF
Portable Lucene Index Format & Applications - Andrzej Bialecki
PDF
Metadata based statistics for DSpace
PDF
EVOLVE'13 | Enhance | External Search | Matthias Wermund
PDF
Flexible Indexing in Lucene 4.0
PDF
Berlin Buzzwords 2013 - How does lucene store your data?
11.5.14 Presentation Slides, “Fedora 4.0 in Action at Penn State and Stanford”
Apache Solr
Lucece Indexing
Portable Lucene Index Format & Applications - Andrzej Bialecki
Metadata based statistics for DSpace
EVOLVE'13 | Enhance | External Search | Matthias Wermund
Flexible Indexing in Lucene 4.0
Berlin Buzzwords 2013 - How does lucene store your data?

What's hot (20)

PDF
PPTX
Ld4 l triannon
PDF
Introduction to Solr
PPTX
Consuming External Content and Enriching Content with Apache Camel
PDF
IR with lucene
PPTX
Tagging search solution design Advanced edition
PDF
Doing Synonyms Right - John Marquiss, Wolters Kluwer
PDF
Thinking restfully
PDF
Exploring Direct Concept Search - Steve Rowe, Lucidworks
PDF
NoSQL Riak MongoDB Elasticsearch - All The Same?
PPTX
Do you need an external search platform for Adobe Experience Manager?
PDF
elasticsearch
PDF
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
PPT
Apache Tika end-to-end
PPTX
PPTX
Fire kit ios (r-baldwin)
PDF
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
PDF
W3C Web Annotation WG Update (I Annotate 2016)
PDF
What's new with Apache Tika?
PDF
Applied Semantic Search with Microsoft SQL Server
Ld4 l triannon
Introduction to Solr
Consuming External Content and Enriching Content with Apache Camel
IR with lucene
Tagging search solution design Advanced edition
Doing Synonyms Right - John Marquiss, Wolters Kluwer
Thinking restfully
Exploring Direct Concept Search - Steve Rowe, Lucidworks
NoSQL Riak MongoDB Elasticsearch - All The Same?
Do you need an external search platform for Adobe Experience Manager?
elasticsearch
What's with the 1s and 0s? Making sense of binary data at scale with Tika and...
Apache Tika end-to-end
Fire kit ios (r-baldwin)
CARA MEMBUAT REFERENSI DAN SITASI PADA NASKAH
W3C Web Annotation WG Update (I Annotate 2016)
What's new with Apache Tika?
Applied Semantic Search with Microsoft SQL Server
Ad

Similar to Ferret A Ruby Search Engine (20)

PDF
PDF
Ferret
PDF
Ruby Day Kraków: Full Text Search with Ferret
PPTX
Introduction to search engine-building with Lucene
PPTX
Introduction to search engine-building with Lucene
PPT
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
PPT
Information Retrieval
PDF
Lucene for Solr Developers
PDF
Full Text Search with Lucene
PDF
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
PPTX
Search enabled applications with lucene.net
PPT
Lucene basics
PPTX
Introduction to Information Retrieval using Lucene
PDF
Search pitb
PDF
PyCon Russian 2015 - Dive into full text search with python.
PDF
Text Mining
ODP
re7jenskramer
ODP
re7jenskramer
PDF
Direct Answers for Search Queries in the Long Tail
PPTX
Search Me: Using Lucene.Net
Ferret
Ruby Day Kraków: Full Text Search with Ferret
Introduction to search engine-building with Lucene
Introduction to search engine-building with Lucene
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Information Retrieval
Lucene for Solr Developers
Full Text Search with Lucene
Погружение в полнотекстовый поиск, используя Python - Андрей Солдатенко, Warg...
Search enabled applications with lucene.net
Lucene basics
Introduction to Information Retrieval using Lucene
Search pitb
PyCon Russian 2015 - Dive into full text search with python.
Text Mining
re7jenskramer
re7jenskramer
Direct Answers for Search Queries in the Long Tail
Search Me: Using Lucene.Net
Ad

More from elliando dias (20)

PDF
Clojurescript slides
PDF
Why you should be excited about ClojureScript
PDF
Functional Programming with Immutable Data Structures
PPT
Nomenclatura e peças de container
PDF
Geometria Projetiva
PDF
Polyglot and Poly-paradigm Programming for Better Agility
PDF
Javascript Libraries
PDF
How to Make an Eight Bit Computer and Save the World!
PDF
Ragel talk
PDF
A Practical Guide to Connecting Hardware to the Web
PDF
Introdução ao Arduino
PDF
Minicurso arduino
PDF
Incanter Data Sorcery
PDF
PDF
Fab.in.a.box - Fab Academy: Machine Design
PDF
The Digital Revolution: Machines that makes
PDF
Hadoop + Clojure
PDF
Hadoop - Simple. Scalable.
PDF
Hadoop and Hive Development at Facebook
PDF
Multi-core Parallelization in Clojure - a Case Study
Clojurescript slides
Why you should be excited about ClojureScript
Functional Programming with Immutable Data Structures
Nomenclatura e peças de container
Geometria Projetiva
Polyglot and Poly-paradigm Programming for Better Agility
Javascript Libraries
How to Make an Eight Bit Computer and Save the World!
Ragel talk
A Practical Guide to Connecting Hardware to the Web
Introdução ao Arduino
Minicurso arduino
Incanter Data Sorcery
Fab.in.a.box - Fab Academy: Machine Design
The Digital Revolution: Machines that makes
Hadoop + Clojure
Hadoop - Simple. Scalable.
Hadoop and Hive Development at Facebook
Multi-core Parallelization in Clojure - a Case Study

Recently uploaded (20)

PDF
The Science-Backed Benefits of Fruit and Vegetable Extracts.pdf
PPTX
hhhsyysvwvsydxuguduehshsvdhvdjbuwbjwjdbwubs
PPTX
Role, role conflict and ascribed and achieved role.pptx
PDF
Modern Furniture Trends & Home Interior Decoration
PPTX
Fashion Jewellery and Clothing Business.pptx
PPTX
LESSON-2-Physical-Education-FIT-and-ACTIVE-july-2025.pptx
PPTX
GEE2-BEED-II: Ibaloi Indigenous People .pptx
DOC
AU毕业证学历认证,阿拉巴马大学亨茨维尔分校毕业证成绩单图片
PDF
Renovating a Midwest Ranch Rustic Modern Charm with Carved Doors - Mogul Inte...
PPTX
photography_basics_jdfjdbjdbjbfjdbj.pptx
PPTX
James 1 Bible verses sermonbbbbbbbbbb.pptx
PDF
PrayerPetals- Where Faith and Womanhood Flourish Together.pdf
PDF
Non-Fatal Strangulation in Domestic Violence
PPTX
The-World-of-Fashion-Trends-and-Innovation-2025.pptx
PPTX
PPT ARIEZ'S TOUR FINAL Pulogebang on.pptx
PPTX
examinophobia;how does it occur and how to solve
PPTX
Benefits of Red Cyan Glasses - rdoptical.com.pptx
DOC
学历学位硕士ACAP毕业证,澳大利亚凯斯林大学毕业证留学未毕业
PPTX
Respiratory-and-Circulatory-Hazards-lecture.pptx
PDF
Echoes of Tomorrow - A Sustainable Legacy for Future Generations.pdf
The Science-Backed Benefits of Fruit and Vegetable Extracts.pdf
hhhsyysvwvsydxuguduehshsvdhvdjbuwbjwjdbwubs
Role, role conflict and ascribed and achieved role.pptx
Modern Furniture Trends & Home Interior Decoration
Fashion Jewellery and Clothing Business.pptx
LESSON-2-Physical-Education-FIT-and-ACTIVE-july-2025.pptx
GEE2-BEED-II: Ibaloi Indigenous People .pptx
AU毕业证学历认证,阿拉巴马大学亨茨维尔分校毕业证成绩单图片
Renovating a Midwest Ranch Rustic Modern Charm with Carved Doors - Mogul Inte...
photography_basics_jdfjdbjdbjbfjdbj.pptx
James 1 Bible verses sermonbbbbbbbbbb.pptx
PrayerPetals- Where Faith and Womanhood Flourish Together.pdf
Non-Fatal Strangulation in Domestic Violence
The-World-of-Fashion-Trends-and-Innovation-2025.pptx
PPT ARIEZ'S TOUR FINAL Pulogebang on.pptx
examinophobia;how does it occur and how to solve
Benefits of Red Cyan Glasses - rdoptical.com.pptx
学历学位硕士ACAP毕业证,澳大利亚凯斯林大学毕业证留学未毕业
Respiratory-and-Circulatory-Hazards-lecture.pptx
Echoes of Tomorrow - A Sustainable Legacy for Future Generations.pdf

Ferret A Ruby Search Engine

  • 1. Ferret A Ruby Search Engine Brian Sam-Bodden
  • 2. Agenda • What is Ferret? • Concepts • Fields • Indexing • Installing Ferret
  • 3. Agenda • The Recipe • Documents • Ferret::Index::Index • FQL • Ferret in you App
  • 4. Agenda • Ferret in Rails • Resources
  • 5. What is Ferret? • Information Retrieval (IR) Library • Full-featured Text Search Engine • Inspired on the Search Engine • Port to Ruby by David Balmain
  • 6. What is Ferret? • Initially a 100% pure Ruby port • Since 0.9 many core functions are implemented in C • Fast! Now Faster than Lucene ;-)
  • 8. Concepts • Index : Sequence of documents
  • 9. Concepts • Index : Sequence of documents • Document : Sequence of fields
  • 10. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms
  • 11. Concepts • Index : Sequence of documents • Document : Sequence of fields • Field : Named sequence of terms • Term : A text string, keyed by field name
  • 12. Fields of a Document in an Index
  • 13. Fields of a Document in an Index • Fields are individually searchable units that are:
  • 14. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store
  • 15. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms
  • 16. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed
  • 17. Fields of a Document in an Index • Fields are individually searchable units that are: • Stored: The original Terms of the fields are store • Indexed: Inverted to rapidly find all Documents containing any of the Terms • Tokenized: Individual Terms extracted are indexed • Vectored: Frequency and location of Terms are stored
  • 18. It’s all about Indexing • Indexing is the processing of a source document into plain text tokens that Ferret can manipulate • For any non-plaintext sources such as PDF, Word, Excel you need to: • Extract • Analyze
  • 24. Installing Ferret } Pick the latest version for your platform
  • 26. The Recipe 1. Create some Documents
  • 27. The Recipe 1. Create some Documents 2. Create an Index
  • 28. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index
  • 29. The Recipe 1. Create some Documents 2. Create an Index 3. Adding Documents to the Index 4. Perform some Queries
  • 30. Example Documents Create some Documents
  • 31. Example Documents Create some Documents “Any String is a Document”
  • 32. Example Documents Create some Documents
  • 33. Example Documents Create some Documents [“This”, “is also”, “a document”]
  • 34. Example Documents Create some Documents
  • 35. Example Documents Create some Documents
  • 36. Ferret::Index::Index Create an Index
  • 37. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class
  • 38. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index
  • 39. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience
  • 40. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent
  • 41. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’)
  • 42. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory
  • 43. Ferret::Index::Index Create an Index • Indexes are encapsulated by the class ➡ Ferret::Index::Index • Use the alias Ferret::I for convenience • Index can be persistent ➡ index = Ferret::I.new(:path = > ‘/somepath’) • Or, completely in Memory ➡ index = Ferret::I.new()
  • 44. Ferret::Index::Index Adding Documents to the Index • Index provides the add_document method • It also provides the << alias • Adding documents is then as easy as: ➡ index << “This is a document” ➡ index << {:first => “Bob”, :last => “Smith”}
  • 45. Ferret::Index::Index Perform some Queries
  • 46. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods
  • 47. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters:
  • 48. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {})
  • 49. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block
  • 50. Ferret::Index::Index Perform some Queries • Index provides the search and search_each methods • search method takes a query and a an optional set of parameters: ➡ search(query, options = {}) • The search_each method provides an iterator block ➡ search_each(query, options = {}) {|doc, score| ... }
  • 53. Ferret Query Language • Ferret own Query Language, FQL is a powerful way to specify search queries • FQL supports many query types, including: • Term • Range • Phrase • Wild • Field • Fuzz • Boolean
  • 54. Index.explain • The explain method of Index describes how a document score against a query • Very useful for debugging • and for learning how Ferret works
  • 56. Ferret in your App Application Database Web User Manual File System Input Get User’s Present Gather Data Search Results Query Index Documents Search Index Ferret Index
  • 57. Ferret in Rails • Acts As Ferret is an ActiveRecord extension • Available as a plugin • Provides a simplified interface to Ferret • Maintained by Jens Kramer
  • 58. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 59. Ferret in Rails • Adding an index to an ActiveRecord model is as simple as:
  • 60. Ferret in Rails • Simple model has two searchable fields title and body:
  • 61. Ferret in Rails • After a quick rake db:migrate we now have some data to play with • Fire up the Rails Console and let’s see what acts_as_ferret can do for our models
  • 63. Want more? • Ferret is improving constantly • Acts As Ferret seems to catch up quickly • Real-life usage seems to require some good engineering on your part • Background indexing • Hot swap of indexes?
  • 64. Want more? • We only covered the simplest constructs in Ferret • Ferret’s API provides enough flexibility for the most demanding searching needs
  • 65. Online Resources • http://ferret.davebalmain.com • http://lucene.apache.org • http://lucenebook.com • http://projects.jkraemer.net/acts_as_ferret