SlideShare a Scribd company logo
Search engine.
Elasticsearch
Andriy S.
What is Search Engine?
Search Engine - a set of applications designed to search for information. Usually
is part of the search engine.
The main criteria for the quality of the search engine is the relevance (degree of
compliance with the request and found that the relevance of results), fullness
index, accounting morphology of the language.
Most search services: Sphinx, Solr, ElasticSearch, etc...
Elasticsearch
Elasticsearch - search engine from json rest api, uses Lucene and written in Java.
Apache Lucene - a free library of open-source full-text search. Implemented in
Java, supported by the Apache Software Foundation and is produced under
license Apache Software.
Libraries: Java, C #, Python, JavaScript, PHP, Perl, Ruby
Requirements
In developing heavy websites or corporate systems often have trouble developing
fast and easy search engine. The following are the most important, in my opinion,
the requirements for this service:
◆ Speed
◆ Easy installation and configuration
◆ Price (preferably free and open source)
◆ Information exchange format JSON (over HTTP)
◆ Indexing in real time
◆ Multi-tenancy (flexible settings for individual user)
Index
Index - a database, document - a table in it, by understandable terms.
The document is a document format JSON, which is stored in elasticsearch. It's like a row in
a relational database. Each document is stored in the index, and is the type and ID. The
document is a JSON object (also known in other languages ​as hash / HashMap / associative
array) that contains zero or more fields or key-value pairs. The original JSON document
indexing will be stored in the field _source that returns a default receipt or document search.
Analysis
Analysis is the process of converting text, like the body of any email, into tokens or
terms which are added to the inverted index for searching. Analysis is performed
by an analyzer which can be either a built-in analyzer or a custom analyzer
defined per index.
Elasticsearch Mapping
Mapping is the process of defining how a document, and the fields it contains, are
stored and indexed. For instance, use mappings to define:
◆which string fields should be treated as full text fields.
◆which fields contain numbers, dates, or geolocations.
◆whether the values of all fields in the document should be indexed into the catch-all _all field.
◆the format of date values.
◆custom rules to control the mapping for dynamically added fields.
Elasticsearch Mapping
Each field has a data type which can be:
◆a simple type like text, keyword, date, long, double, boolean or ip.
◆a type which supports the hierarchical nature of JSON such as object or nested.
◆or a specialised type like geo_point, geo_shape, or completion.
Documents CRUD
Often, we use the terms object and document interchangeably. However, there is a distinction. An object
is just a JSON object—similar to what is known as a hash, hashmap, dictionary, or associative array.
Objects may contain other objects. In Elasticsearch, the term document has a specific meaning. It refers
to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.
Query and filter context
The behaviour of a query clause depends on whether it is used in query context or in filter context:
1. Query context
A query clause used in query context answers the question “How well does this document match this query clause?”
Besides deciding whether or not the document matches, the query clause also calculates a _score representing how well
the document matches, relative to other documents.
1. Filter context
In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a
simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
Search examples with DSL Builder
Search Response
Geolocation Filter
Elasticsearch offers two ways of representing geolocations: latitude-longitude points using the geo_point field type, and
complex shapes defined in GeoJSON, using the geo_shape field type.
Geo-points allow you to find points within a certain distance of another point, to calculate distances between two points for
sorting or relevance scoring, or to aggregate into a grid to display on a map. Geo-shapes, on the other hand, are used
purely for filtering. They can be used to decide whether two shapes overlap, or whether one shape completely contains
other shapes.
Four geo-point filters can be used to include or exclude documents by geolocation:
● geo_bounding_box
Find geo-points that fall within the specified rectangle.
● geo_distance
Find geo-points within the specified distance of a central point.
● geo_distance_range
Find geo-points within a specified minimum and maximum distance from a central point.
● geo_polygon
Find geo-points that fall within the specified polygon. This filter is very expensive. If you find yourself wanting to use
it, you should be looking at geo-shapes instead.
Geo Distance Filter Example
Thank You!
Questions? Comments?

More Related Content

PDF
Annotating Search Results from Web Databases
PPT
4.4 text mining
PPT
Web search engines
PPT
web clustering engines
PDF
A survey of web clustering engines
PDF
Coling2014:Single Document Keyphrase Extraction Using Label Information
DOC
ast nearest neighbor search with keywords
PPTX
Tdm recent trends
Annotating Search Results from Web Databases
4.4 text mining
Web search engines
web clustering engines
A survey of web clustering engines
Coling2014:Single Document Keyphrase Extraction Using Label Information
ast nearest neighbor search with keywords
Tdm recent trends

What's hot (19)

PPTX
Text mining
PPTX
An efficient approach for illustrating web data of user search result
PDF
Duplicate Detection on Hoaxy Dataset
PPTX
TextRank: Bringing Order into Texts
PPTX
Tdm information retrieval
PPT
Role of Text Mining in Search Engine
PPT
Boolean Retrieval
PPT
Phrase Based Indexing
PDF
Designing of Semantic Nearest Neighbor Search: Survey
PPTX
Data science chapter-7,8,9
DOCX
Keyword query routing
PDF
Comparisons of ranking algorithms
DOCX
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
PPTX
Latest trends in AI and information Retrieval
DOCX
Fast nearest neighbor search with keywords
PDF
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
PDF
Architecture of a search engine
PDF
Database management system session 6
PPTX
Mining Product Synonyms - Slides
Text mining
An efficient approach for illustrating web data of user search result
Duplicate Detection on Hoaxy Dataset
TextRank: Bringing Order into Texts
Tdm information retrieval
Role of Text Mining in Search Engine
Boolean Retrieval
Phrase Based Indexing
Designing of Semantic Nearest Neighbor Search: Survey
Data science chapter-7,8,9
Keyword query routing
Comparisons of ranking algorithms
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
Latest trends in AI and information Retrieval
Fast nearest neighbor search with keywords
Unit i data structure FYCS MUMBAI UNIVERSITY SEM II
Architecture of a search engine
Database management system session 6
Mining Product Synonyms - Slides
Ad

Similar to Search engine. Elasticsearch (20)

PPTX
PPTX
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
PDF
Efficiently searching nearest neighbor in documents using keywords
PDF
Efficiently searching nearest neighbor in documents
PDF
Intro to Elasticsearch
PDF
Searching and Analyzing Qualitative Data on Personal Computer
PDF
PPTX
Lec 2
PDF
Technical Whitepaper: A Knowledge Correlation Search Engine
PPT
Lucene basics
PDF
Elasticsearch
PPTX
Elastic search basic conceptes by gggg.pptx
PDF
Information_Retrieval_Models_Nfaoui_El_Habib
PDF
ElasticSearch
PPT
Inverted Files for Text Search Engin.ppt
ODP
Elasticsearch V/s Relational Database
PPTX
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
PPTX
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
PDF
IRJET- Review on Information Retrieval for Desktop Search Engine
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Efficiently searching nearest neighbor in documents using keywords
Efficiently searching nearest neighbor in documents
Intro to Elasticsearch
Searching and Analyzing Qualitative Data on Personal Computer
Lec 2
Technical Whitepaper: A Knowledge Correlation Search Engine
Lucene basics
Elasticsearch
Elastic search basic conceptes by gggg.pptx
Information_Retrieval_Models_Nfaoui_El_Habib
ElasticSearch
Inverted Files for Text Search Engin.ppt
Elasticsearch V/s Relational Database
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
IRJET- Review on Information Retrieval for Desktop Search Engine
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
cuic standard and advanced reporting.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Empathic Computing: Creating Shared Understanding
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Approach and Philosophy of On baking technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
NewMind AI Weekly Chronicles - August'25 Week I
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Empathic Computing: Creating Shared Understanding
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Approach and Philosophy of On baking technology

Search engine. Elasticsearch

  • 2. What is Search Engine? Search Engine - a set of applications designed to search for information. Usually is part of the search engine. The main criteria for the quality of the search engine is the relevance (degree of compliance with the request and found that the relevance of results), fullness index, accounting morphology of the language. Most search services: Sphinx, Solr, ElasticSearch, etc...
  • 3. Elasticsearch Elasticsearch - search engine from json rest api, uses Lucene and written in Java. Apache Lucene - a free library of open-source full-text search. Implemented in Java, supported by the Apache Software Foundation and is produced under license Apache Software. Libraries: Java, C #, Python, JavaScript, PHP, Perl, Ruby
  • 4. Requirements In developing heavy websites or corporate systems often have trouble developing fast and easy search engine. The following are the most important, in my opinion, the requirements for this service: ◆ Speed ◆ Easy installation and configuration ◆ Price (preferably free and open source) ◆ Information exchange format JSON (over HTTP) ◆ Indexing in real time ◆ Multi-tenancy (flexible settings for individual user)
  • 5. Index Index - a database, document - a table in it, by understandable terms. The document is a document format JSON, which is stored in elasticsearch. It's like a row in a relational database. Each document is stored in the index, and is the type and ID. The document is a JSON object (also known in other languages ​as hash / HashMap / associative array) that contains zero or more fields or key-value pairs. The original JSON document indexing will be stored in the field _source that returns a default receipt or document search.
  • 6. Analysis Analysis is the process of converting text, like the body of any email, into tokens or terms which are added to the inverted index for searching. Analysis is performed by an analyzer which can be either a built-in analyzer or a custom analyzer defined per index.
  • 7. Elasticsearch Mapping Mapping is the process of defining how a document, and the fields it contains, are stored and indexed. For instance, use mappings to define: ◆which string fields should be treated as full text fields. ◆which fields contain numbers, dates, or geolocations. ◆whether the values of all fields in the document should be indexed into the catch-all _all field. ◆the format of date values. ◆custom rules to control the mapping for dynamically added fields.
  • 8. Elasticsearch Mapping Each field has a data type which can be: ◆a simple type like text, keyword, date, long, double, boolean or ip. ◆a type which supports the hierarchical nature of JSON such as object or nested. ◆or a specialised type like geo_point, geo_shape, or completion.
  • 9. Documents CRUD Often, we use the terms object and document interchangeably. However, there is a distinction. An object is just a JSON object—similar to what is known as a hash, hashmap, dictionary, or associative array. Objects may contain other objects. In Elasticsearch, the term document has a specific meaning. It refers to the top-level, or root object that is serialized into JSON and stored in Elasticsearch under a unique ID.
  • 10. Query and filter context The behaviour of a query clause depends on whether it is used in query context or in filter context: 1. Query context A query clause used in query context answers the question “How well does this document match this query clause?” Besides deciding whether or not the document matches, the query clause also calculates a _score representing how well the document matches, relative to other documents. 1. Filter context In filter context, a query clause answers the question “Does this document match this query clause?” The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
  • 11. Search examples with DSL Builder
  • 13. Geolocation Filter Elasticsearch offers two ways of representing geolocations: latitude-longitude points using the geo_point field type, and complex shapes defined in GeoJSON, using the geo_shape field type. Geo-points allow you to find points within a certain distance of another point, to calculate distances between two points for sorting or relevance scoring, or to aggregate into a grid to display on a map. Geo-shapes, on the other hand, are used purely for filtering. They can be used to decide whether two shapes overlap, or whether one shape completely contains other shapes. Four geo-point filters can be used to include or exclude documents by geolocation: ● geo_bounding_box Find geo-points that fall within the specified rectangle. ● geo_distance Find geo-points within the specified distance of a central point. ● geo_distance_range Find geo-points within a specified minimum and maximum distance from a central point. ● geo_polygon Find geo-points that fall within the specified polygon. This filter is very expensive. If you find yourself wanting to use it, you should be looking at geo-shapes instead.