SlideShare a Scribd company logo
Lucene And Solr Introduction By Pascal Dimassimo [email_address]
About me Java developers with 10+ years of experience
Working for OpenText/Nstein on Semantic Navigation application
http://semanticnavigation.opentext.com/
History Lucene launches in 2000
Solr launches in 2006
Lucid Imagination in 2009 Hire the core developers of Lucene and Solr
Offer commercial support Lucene Revolution in 2010
Buzz According to IDC “53% of companies using Open source use Lucene”
“Largely responsible for significant decline in commercial OEM revenue” Source http://lucenerevolution.com/sites/default/files/slides/Lucene%20Rev%20Preso%20IDC_MarketTrends_Reynolds.pdf
Lucene? “Lucene is a powerful Java search  library  that lets you easily add search to any application” - Lucene In Action 2 nd  Edition
NOT an application
Text indexing and searching
Open Source
Mature
Easy to learn API
Typical Search App Taken from Lucene In Action 2 nd  Edition Lucene
Search? Naive approach: linear-search (à la grep)
O(n) -> Slow...
You want to find a word in a book: how do you do it?
Inverted Index
Inverted Index Original Slide from Michael Busch (available at  http://goo.gl/0MQvy  )
Inverted Index Original Slide from Michael Busch (available at  http://goo.gl/0MQvy  )
Lucene Document FSDirectory dir = FSDirectory. open ( new  File( "./index" )); SimpleAnalyzer analyzer =  new  SimpleAnalyzer(); MaxFieldLength len = IndexWriter.MaxFieldLength. UNLIMITED ; IndexWriter writer =  new  IndexWriter(dir, analyzer,  true , len); String content =  "The old night keeper keeps the keep in the town" ; Document doc =  new  Document(); doc.add( new  Field( "content" , content, Field.Store. YES , Field.Index. ANALYZED ));  writer.addDocument(doc); writer.commit();
Lucene Document Document: what is returned as search result
Organized in  fields.  A field must be specified at query time!
Schema-less
Plain text
Fields Indexed: put the content in the inverted index.
Analyzed: split the content into terms to be added to the inverted index. Normalized terms.
Stored: Keep the original content on disk
Multivalued: Repeat the same field multiple times in the same document with different values
Lucene Document String content =  "The old night keeper keeps the keep in the town" ; String author =  "Peter Smith" ; String category1 =  "Fiction" ; String category2 =  "Canadian" ; String isbn =  "978-1-933988-17-7" ; String id =  "ABY123" ; Document doc =  new  Document(); doc.add( new  Field( "content" , content, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new  Field( "author" , author, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new  Field( "category" , category1, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new  Field( "category" , category2, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new  Field( "isbn" , isbn, Field.Store. YES , Field.Index. NOT_ANALYZED )); doc.add( new  Field( "id" , id, Field.Store. YES , Field.Index. NO )); writer.addDocument(doc); writer.commit();
Lucene Demo Indexing unit tests written by David
Relevancy How to you tell which document is more important?
Vectorial Model N dimension vectors for documents and queries
Score represents how close the vectors are
Tf-idf (term frequency–inverse document frequency)
Documents with many of the search terms are scored higher
Smaller documents are scored higher
Analyzer Taken from Lucene In Action 2 nd  Edition
Analyzer Convert text into terms
Used when indexing and querying
Tokenizer + Filters
Custom analyzers
Analyzer "The quick brown fox jumped over the lazy dog" WhitespaceAnalyzer [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] SimpleAnalyzer [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] StopAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dog] StandardAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dog] Example from Lucene In Action 2 nd  Edition

More Related Content

PPT
Advanced full text searching techniques using Lucene
PPT
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
PPT
Presentation log4 j
PPTX
OSINT - Yandex Search
PDF
Doctrine ORM with eZ Platform REST API and GraphQL
DOCX
Library Project Stored Procs
ODP
REST dojo Comet
PDF
Overview of GraphQL & Clients
Advanced full text searching techniques using Lucene
MITH Digital Dialogues: Intro to Programming for Humanists (with Ruby)
Presentation log4 j
OSINT - Yandex Search
Doctrine ORM with eZ Platform REST API and GraphQL
Library Project Stored Procs
REST dojo Comet
Overview of GraphQL & Clients

What's hot (18)

PDF
Java 8 Streams and Rx Java Comparison
PDF
GraphQL & Relay - 串起前後端世界的橋樑
PPT
Implementing Ajax In ColdFusion 7
ODP
jQuery : Talk to server with Ajax
PDF
How To Webinar - Sumo Logic API
PDF
Free your lambdas
PPT
course slides -- powerpoint
PDF
Java SE 8 for Java EE developers
PPT
Go OO! - Real-life Design Patterns in PHP 5
ODP
Creating APIs over RDF
PPT
XML and Web Services with PHP5 and PEAR
PDF
The Django Book / Chapter 3: Views and URLconfs
PPT
Linq
PPT
Lecture 3 - Comm Lab: Web @ ITP
PPTX
1-04: HTML Elements
ODP
Introduction to Perl - Day 2
PDF
Building Automated REST APIs with Python
PPTX
Build JSON and XML using RABL gem
Java 8 Streams and Rx Java Comparison
GraphQL & Relay - 串起前後端世界的橋樑
Implementing Ajax In ColdFusion 7
jQuery : Talk to server with Ajax
How To Webinar - Sumo Logic API
Free your lambdas
course slides -- powerpoint
Java SE 8 for Java EE developers
Go OO! - Real-life Design Patterns in PHP 5
Creating APIs over RDF
XML and Web Services with PHP5 and PEAR
The Django Book / Chapter 3: Views and URLconfs
Linq
Lecture 3 - Comm Lab: Web @ ITP
1-04: HTML Elements
Introduction to Perl - Day 2
Building Automated REST APIs with Python
Build JSON and XML using RABL gem
Ad

Viewers also liked (20)

PDF
Portable Lucene Index Format & Applications - Andrzej Bialecki
PPTX
Introduction to Lucene and Solr - 1
PPT
Finite State Queries In Lucene
PPTX
Apache lucene
PDF
Analytics in olap with lucene & hadoop
PDF
Beyond full-text searches with Lucene and Solr
PDF
Lucene
PPT
Lucene and MySQL
PDF
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
PPT
Lucandra
PDF
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
PDF
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
PDF
Lucene for Solr Developers
PDF
Berlin Buzzwords 2013 - How does lucene store your data?
PDF
Architecture and Implementation of Apache Lucene: Marter's Thesis
PDF
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
PPT
Lucene Introduction
PDF
Text categorization with Lucene and Solr
PPTX
Introduction to Lucene & Solr and Usecases
PPT
Lucene basics
Portable Lucene Index Format & Applications - Andrzej Bialecki
Introduction to Lucene and Solr - 1
Finite State Queries In Lucene
Apache lucene
Analytics in olap with lucene & hadoop
Beyond full-text searches with Lucene and Solr
Lucene
Lucene and MySQL
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
Lucandra
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Lucene for Solr Developers
Berlin Buzzwords 2013 - How does lucene store your data?
Architecture and Implementation of Apache Lucene: Marter's Thesis
Near Real Time Indexing: Presented by Umesh Prasad & Thejus V M, Flipkart
Lucene Introduction
Text categorization with Lucene and Solr
Introduction to Lucene & Solr and Usecases
Lucene basics
Ad

Similar to Lucene And Solr Intro (20)

PPT
Introduction to Search Engines
PPT
Javascript2839
PPT
PPT
An Introduction to Solr
PPT
Douglas Crockford Presentation Jsonsaga
PPT
Processing XML with Java
ODP
FluentSelenium Presentation Code Camp09
ODP
Dev8d Apache Solr Tutorial
PPT
Jsonsaga
ODP
Letting In the Light: Using Solr as an External Search Component
ODP
Advanced Perl Techniques
PPTX
Don't Be Afraid of Abstract Syntax Trees
PPT
The JSON Saga
PPT
Lucene Bootcamp -1
PPT
Eugene Andruszczenko: jQuery
PPT
jQuery Presentation - Refresh Events
ODP
Cool bonsai cool - an introduction to ElasticSearch
PPT
Solr Presentation
PPT
Wso2 Scenarios Esb Webinar July 1st
PPSX
Spring has got me under it’s SpEL
Introduction to Search Engines
Javascript2839
An Introduction to Solr
Douglas Crockford Presentation Jsonsaga
Processing XML with Java
FluentSelenium Presentation Code Camp09
Dev8d Apache Solr Tutorial
Jsonsaga
Letting In the Light: Using Solr as an External Search Component
Advanced Perl Techniques
Don't Be Afraid of Abstract Syntax Trees
The JSON Saga
Lucene Bootcamp -1
Eugene Andruszczenko: jQuery
jQuery Presentation - Refresh Events
Cool bonsai cool - an introduction to ElasticSearch
Solr Presentation
Wso2 Scenarios Esb Webinar July 1st
Spring has got me under it’s SpEL

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Approach and Philosophy of On baking technology
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Big Data Technologies - Introduction.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Empathic Computing: Creating Shared Understanding
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Approach and Philosophy of On baking technology
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Big Data Technologies - Introduction.pptx
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Understanding_Digital_Forensics_Presentation.pptx
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Empathic Computing: Creating Shared Understanding
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

Lucene And Solr Intro

  • 1. Lucene And Solr Introduction By Pascal Dimassimo [email_address]
  • 2. About me Java developers with 10+ years of experience
  • 3. Working for OpenText/Nstein on Semantic Navigation application
  • 7. Lucid Imagination in 2009 Hire the core developers of Lucene and Solr
  • 8. Offer commercial support Lucene Revolution in 2010
  • 9. Buzz According to IDC “53% of companies using Open source use Lucene”
  • 10. “Largely responsible for significant decline in commercial OEM revenue” Source http://lucenerevolution.com/sites/default/files/slides/Lucene%20Rev%20Preso%20IDC_MarketTrends_Reynolds.pdf
  • 11. Lucene? “Lucene is a powerful Java search library that lets you easily add search to any application” - Lucene In Action 2 nd Edition
  • 13. Text indexing and searching
  • 17. Typical Search App Taken from Lucene In Action 2 nd Edition Lucene
  • 18. Search? Naive approach: linear-search (à la grep)
  • 20. You want to find a word in a book: how do you do it?
  • 22. Inverted Index Original Slide from Michael Busch (available at http://goo.gl/0MQvy )
  • 23. Inverted Index Original Slide from Michael Busch (available at http://goo.gl/0MQvy )
  • 24. Lucene Document FSDirectory dir = FSDirectory. open ( new File( "./index" )); SimpleAnalyzer analyzer = new SimpleAnalyzer(); MaxFieldLength len = IndexWriter.MaxFieldLength. UNLIMITED ; IndexWriter writer = new IndexWriter(dir, analyzer, true , len); String content = "The old night keeper keeps the keep in the town" ; Document doc = new Document(); doc.add( new Field( "content" , content, Field.Store. YES , Field.Index. ANALYZED )); writer.addDocument(doc); writer.commit();
  • 25. Lucene Document Document: what is returned as search result
  • 26. Organized in fields. A field must be specified at query time!
  • 29. Fields Indexed: put the content in the inverted index.
  • 30. Analyzed: split the content into terms to be added to the inverted index. Normalized terms.
  • 31. Stored: Keep the original content on disk
  • 32. Multivalued: Repeat the same field multiple times in the same document with different values
  • 33. Lucene Document String content = "The old night keeper keeps the keep in the town" ; String author = "Peter Smith" ; String category1 = "Fiction" ; String category2 = "Canadian" ; String isbn = "978-1-933988-17-7" ; String id = "ABY123" ; Document doc = new Document(); doc.add( new Field( "content" , content, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new Field( "author" , author, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new Field( "category" , category1, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new Field( "category" , category2, Field.Store. YES , Field.Index. ANALYZED )); doc.add( new Field( "isbn" , isbn, Field.Store. YES , Field.Index. NOT_ANALYZED )); doc.add( new Field( "id" , id, Field.Store. YES , Field.Index. NO )); writer.addDocument(doc); writer.commit();
  • 34. Lucene Demo Indexing unit tests written by David
  • 35. Relevancy How to you tell which document is more important?
  • 36. Vectorial Model N dimension vectors for documents and queries
  • 37. Score represents how close the vectors are
  • 38. Tf-idf (term frequency–inverse document frequency)
  • 39. Documents with many of the search terms are scored higher
  • 40. Smaller documents are scored higher
  • 41. Analyzer Taken from Lucene In Action 2 nd Edition
  • 42. Analyzer Convert text into terms
  • 43. Used when indexing and querying
  • 46. Analyzer "The quick brown fox jumped over the lazy dog" WhitespaceAnalyzer [The] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] SimpleAnalyzer [the] [quick] [brown] [fox] [jumped] [over] [the] [lazy] [dog] StopAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dog] StandardAnalyzer [quick] [brown] [fox] [jumped] [over] [lazy] [dog] Example from Lucene In Action 2 nd Edition
  • 47. Analyzer "XY&Z Corporation - xyz@example.com" WhitespaceAnalyzer [XY&Z] [Corporation] [-] [xyz@example.com] SimpleAnalyzer [xy] [z] [corporation] [xyz] [example] [com] StopAnalyzer [xy] [z] [corporation] [xyz] [example] [com] StandardAnalyzer [xy&z] [corporation] [xyz@example.com] Example from Lucene In Action 2 nd Edition
  • 48. Custom Analyzers WhitespaceTokenizer Tokenize at white spaces KeywordTokenizer Tokenize input as a single token StandardTokenizer Tokenize at white spaces but keeping high-level entity as token (email, etc TODO) LowerCaseFilter Lowercases token text StopFilter Removes words that exist in a provided set of words PorterStemFilter Stems each token using the Porter stemming algorithm. For example, country and countries both stem to countri . Some descriptions from Lucene In Action 2 nd Edition
  • 49. Query Asking Lucene “what documents contain this word?”
  • 50. Lucene applied an Analyzer to each word queried
  • 51. Query can be programmatically build
  • 53. Query code SimpleAnalyzer analyzer = new SimpleAnalyzer(); QueryParser parser = new QueryParser(Version. LUCENE_30 , "content" , analyzer); Query query = parser.parse( "big" ); TopDocs docs = searcher.search(query, 10);
  • 54. Query Syntax: Basic title:montreal text field
  • 55. Query Syntax: Range name:[a TO k] range field
  • 56. Query Syntax: Boolean title:(java AND programming) operator field
  • 57. Query Syntax: Boolean title:java OR name:pascal operator field field
  • 58. Query Syntax: Phrase title:”Lucene in Action” phrase field
  • 59. Query Syntax: Wildcard title:program* Term prefix field
  • 60. Lucene Demo Searching unit tests written by David
  • 61. Lucene summary Inverted index for fast document retrieval
  • 64. Solr Created by Yonik Seeley in 2004 and released as open source in 2006
  • 65. HTTP application built around Lucene
  • 66. Makes it easy to develop search solutions
  • 67. Advanced features develop on top of Lucene
  • 68. As of 2010, Lucene and Solr are merged
  • 69. Solr Schema Solr allows to administer one or more Lucene indexes
  • 70. Each index has its own schema
  • 71. Lists all fields allowed for an index
  • 72. Defines the analyzers for each field
  • 73. Solr Schema < field name = &quot;id&quot; type = &quot;string&quot; indexed = &quot;true&quot; stored = &quot;true&quot; required = &quot;true&quot; /> < field name = &quot;title&quot; type = &quot;text&quot; indexed = &quot;true&quot; stored = &quot;true&quot; /> < field name = &quot;presenter&quot; type = &quot;text_ws&quot; indexed = &quot;true&quot; stored = &quot;true&quot; /> < field name = &quot;date&quot; type = &quot;date&quot; indexed = &quot;true&quot; stored = &quot;true&quot; /> < field name = &quot;abstract&quot; type = &quot;text&quot; indexed = &quot;true&quot; stored = &quot;true&quot; />
  • 74. Solr Schema < fieldType name = &quot;text&quot; class = &quot;solr.TextField&quot; positionIncrementGap = &quot;100&quot; > < analyzer type = &quot;index&quot; > < tokenizer class = &quot;solr.WhitespaceTokenizerFactory&quot; /> < filter class = &quot;solr.StopFilterFactory&quot; ignoreCase = &quot;true&quot; words = &quot;stopwords.txt&quot; /> < filter class = &quot;solr.LowerCaseFilterFactory&quot; /> < filter class = &quot;solr.ISOLatin1AccentFilterFactory&quot; /> < filter class = &quot;solr.SnowballPorterFilterFactory&quot; language = &quot;English&quot; protected = &quot;protwords.txt&quot; /> </ analyzer > < analyzer type = &quot;query&quot; > < tokenizer class = &quot;solr.WhitespaceTokenizerFactory&quot; /> < filter class = &quot;solr.StopFilterFactory&quot; ignoreCase = &quot;true&quot; words = &quot;stopwords.txt&quot; /> < filter class = &quot;solr.LowerCaseFilterFactory&quot; /> < filter class = &quot;solr.ISOLatin1AccentFilterFactory&quot; /> < filter class = &quot;solr.SnowballPorterFilterFactory&quot; language = &quot;English&quot; protected = &quot;protwords.txt&quot; /> </ analyzer > </ fieldType >
  • 75. Solr Schema < fieldType name = &quot;text_ws&quot; class = &quot;solr.TextField&quot; positionIncrementGap = &quot;100&quot; > < analyzer type = &quot;index&quot; > < tokenizer class = &quot;solr.WhitespaceTokenizerFactory&quot; /> < filter class = &quot;solr.LowerCaseFilterFactory&quot; /> </ analyzer > < analyzer type = &quot;query&quot; > < tokenizer class = &quot;solr.WhitespaceTokenizerFactory&quot; /> < filter class = &quot;solr.LowerCaseFilterFactory&quot; /> </ analyzer > </ fieldType >
  • 77. XML by default, but also CSV
  • 79. Advanced features: binary document extraction, DB plugin
  • 80. Solr Indexation < add > < doc > < field name = &quot;id&quot; > 002 </ field > < field name = &quot;title&quot; > Lucene And Solr Introduction </ field > < field name = &quot;presenter&quot; > Pascal Dimassimo </ field > < field name = &quot;date&quot; > 2010-11-18T00:00:00Z </ field > < field name = &quot;abstract&quot; > ... </ field > </ doc > <doc>...</doc> </ add > curl http://localhost:8983/solr/update -H &quot;Content-Type: text/xml&quot; --data-binary @add.xml
  • 83. Response in XML by default, but other formats are supported (json, php, ruby)
  • 84. Solr Query curl http://localhost:8983/solr/select?q=title:Lucene < response > < lst name = &quot;responseHeader&quot; > < int name = &quot;status&quot; > 0 </ int > < int name = &quot;QTime&quot; > 269 </ int > < lst name = &quot;params&quot; > < str name = &quot;q&quot; > title:Lucene </ str > </ lst > </ lst > < result name = &quot;response&quot; numFound = &quot;1&quot; start = &quot;0&quot; > < doc > < str name = &quot;id&quot; > 002 </ str > < str name = &quot;title&quot; > Lucene And Solr Introduction </ str > < str name = &quot;presenter&quot; > Pascal Dimassimo </ str > < date name = &quot;date&quot; > 2010-11-18T00:00:00Z </ date > < str name = &quot;abstract&quot; > ... </ str > </ doc > </ result > </ response >
  • 85. Solr Query Parameters q Lucene Query sort Field to sort on. Defaut to score start Offset for the results page to display. Default 0 rows Numbers of results to display per page. Default 10 fq Filter Query. Default to all documents fl List of fields to display per document. Default to all fields wt Format to display result. Default to xml
  • 86. Solr Facets For a query results, list of all distinct indexed values of a field with their frequencies
  • 87. Useful for drilling down in results set
  • 88. SolrJ Library to connect and interact with Solr String url = &quot;http://localhost:8983/solr&quot; ; CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); SolrInputDocument doc = new SolrInputDocument(); doc.addField( &quot;id&quot; , &quot;id1&quot; , 1.0f); doc.addField( &quot;name&quot; , &quot;doc1&quot; , 1.0f); doc.addField( &quot;price&quot; , 10); server.add(doc); server.commit();
  • 89. Solr Demo Using Evernote Data
  • 92. Solr Features Text Highlighting

Editor's Notes

  • #6: Do one thing well Apache Licence 10 years Version 3.0 It is fast!
  • #7: Analyze documents: split each words Get documents in. Lucene returns a list of documents as search result.
  • #8: Exemple livre: on recherche du début à chaque fois qu&apos;on recherche un mot Beacoup plus simple d&apos;utiliser un index Inverted index: for a word, list documents that contains it
  • #10: Analyse: transformer le contenu en termes Un terme pourrait être plus d&apos;un mot: “New York” Position is also stored Binary Search: O(log n) -&gt; logarithmic Boolean Search Wildcard Search
  • #11: Lucene generates a id for each document Stored = Original content stored “as is” on disk. Can be returned to the user when document is returned When Lucene returns document, it returns id. You can retrieve stored content with the id
  • #12: Document: email, article, usager Email fields: expéditeur, destinataire, titre, contenu, attachement Article fields: auteur, titre, catégorie, contenu, date de publication Analogie BD: document = rangée, field = colonne On peut stocker des documents avec des champs différents.
  • #14: Lucene generates a id for each document Stored = Original content stored “as is” on disk. Can be returned to the user when document is returned When Lucene returns document, it returns id. You can retrieve stored content with the id
  • #16: Lucene can returns results sorted by a field
  • #19: Terms almost synonym of words
  • #24: Basic Query instance: TermQuery Use PerFieldAnalyzerWrapper to specify the specific analyzer for each field
  • #26: Terms stored in alphabetical order. Using String.compareTo. Returns all docs for each terms in range
  • #27: Supports AND, OR, NOT Supports +, -
  • #28: Supports AND, OR, NOT Supports +, -
  • #43: CNET l&apos;a utilisé pour permettre aux utilisateurs de mieux retrouver les produits