SlideShare a Scribd company logo
Welcome to the
BELARUS, MINSK
AZURE SEARCH
Alexej Sommer
Supported database types
■ Azure Cosmos DB
■ Azure SQL Database
■ SQL Server hosted in an AzureVM
■ Azure Blob Storage
■ AzureTable Storage
Azure Search elements
■ Datasources
■ Indexes
■ Indexers
Pricing
■ Free (3 indexes, up to 50 Mb storage)
■ Basic – 75$ month (15 indexes, 2Gb storage, up to 3 search
units, up to 3 replicas)
■ Standart – 250$ month (50 indexes, 25Gb storage, up to 36
search units, up to 12 replicas, up to 12 )
■ … up to 2850$ (with High Density support)
Create datasource with portal UI
■ Если вы используете Cosmos DB, то добавьте
Database=ИМЯ_ВАШЕЙ_БАЗЫ к connection string
Scoring profiles
"scoringProfiles": [
{
"name": "newDocuments",
"text": {
"weights": {
"title": 2,
“info": 1 }
} ]
Boosting
"functions": [ {
"type": "freshness",
"fieldName": "date",
"boost": 10,
"interpolation": "quadratic",
"freshness": {
"boostingDuration": "P7D"
} } ]
Suggesters
"suggesters": [
{
"name": "sg",
"searchMode": "analyzingInfixMatching",
"sourceFields": [“title", “name"]
}
]
Suggester vs Autocomplete
Suggestions API suggests documents and returns document Ids which
contain the query term
Autocomplete API returns potential terms from the index which match the
partial term in the query
Simple query syntax
wifi+luxury searching for wifi and luxury at same time
“luxury hotel” searching for phrase
wifi | luxury searching for wifi or luxury
wifi –luxury searching for wifi without luxury
motel+(wifi | luxury) you can combine with parenthesis
lux* searching for words starting from lux
Lucene query syntax - queryType=full
range searches are constructed in Azure Search
through $filter expressions
OR or ||
AND, && or +
NOT, ! or -
NOT ! или -
searchMode=any
wifi –luxury поиск содержит wifi или не содержит luxury
searchMode=all
wifi –luxury поиск содержит wifi и не содержит luxury
Стемминг и лемматизация
Сте́мминг — это процесс нахождения основы слова для
заданного исходного слова.
Лемматиза́ция — процесс приведения словоформы к лемме
— её нормальной (словарной) форме.
Analyzers
default analyzer is Standard Lucene
Lucene's English analyzer applies stemming as per Porter
Stemming algorithm
Microsoft's English analyzer performs lemmatization instead of
stemming
Escaping and encoding special
characters
Следующие символы
+ - && || ! ( ) { } [ ] ^ " ~ * ? :  /
необходимо экранировать с помощью 
Небезопасные для использования в URL символы
" ` < > # % { } |  ^ ~ [ ]
необходимо энкодировать. Например, символ # станет %23
Fuzzy search
"blue~" or "blue~1" would return "blue", "blues", and "glue“
but…
"business~analyst" means business OR analyst.
Proximity search
"hotel airport"~5
will find the terms "hotel" and "airport"
within 5 words of each other in a document
Term boosting
rock^2 electronic
documents that contain the search term rock will be ranked
higher than the other search term electronic
Regular expression search
lucene.apache.org/core/4_10_2/core/org/apache/lucene
/util/automaton/RegExp.html
bit.ly/2Ug2CKq
Place RegEx expression between forward slashes "/“
/[mh]otel/
ODATA support
{
"name": "Scott",
"filter": "(age ge 25 and and lt 50) or surname eq 'Guthrie'"
}
Data Change Detection Policy for SQL
"dataChangeDetectionPolicy" : {
"@odata.type" :
"#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName" : "[a rowversion or last_updated
column name]" }
"dataChangeDetectionPolicy" : {
"@odata.type" :
"#Microsoft.Azure.Search.SqlIntegratedChangeTrackingPolicy" }
SQL Server ChangeTracking
ALTER DATABASE AdventureWorks
SET ALLOW_SNAPSHOT_ISOLATION ON;
ALTER DATABASE AdventureWorks
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
Data Change Detection Policy for
Cosmos DB
{
"@odata.type" :
"#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy"
,
"highWaterMarkColumnName" : "_ts"
}
Data Deletion Detection Policy
"dataDeletionDetectionPolicy" : {
"@odata.type" :
"#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
"softDeleteColumnName" : "IsDeleted",
"softDeleteMarkerValue" : "true"
}
Development
NuGet package Microsoft.Azure.Search
REST API
Node.js and Java samples
Cognitive Search
Natural language processing skills
entity recognition
language detection
key phrase extraction
text manipulation
sentiment detection
Image processing
Optical Character Recognition (OCR)
textExtractionAlgorithm "handwritten“ (for English only)
textExtractionAlgorithm "printed“
Identification of visual features
provided by ComputerVision in Cognitive Services

More Related Content

PDF
PPT
Lucene Introduction
PDF
Introduction To Apache Lucene
ODP
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
PDF
Apache Solr/Lucene Internals by Anatoliy Sokolenko
PPTX
Apache lucene
PPT
Lucene and MySQL
PPT
Lucene basics
Lucene Introduction
Introduction To Apache Lucene
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Apache lucene
Lucene and MySQL
Lucene basics

What's hot (19)

PPTX
Introduction to apache lucene
PPT
Intelligent crawling and indexing using lucene
PPTX
Lucene indexing
PDF
Elasticsearch speed is key
PPTX
Hacking Lucene for Custom Search Results
PPTX
PDF
Faceted Search with Lucene
PPTX
Intro to Apache Lucene and Solr
PPT
Advanced full text searching techniques using Lucene
PDF
Beyond full-text searches with Lucene and Solr
PDF
Apache Lucene intro - Breizhcamp 2015
PDF
Full Text Search with Lucene
PDF
Tutorial 5 (lucene)
PDF
Berlin Buzzwords 2013 - How does lucene store your data?
PPT
Lucene BootCamp
PPTX
Apache Lucene Basics
PPTX
Tutorial on developing a Solr search component plugin
PDF
What is in a Lucene index?
PDF
Building your own search engine with Apache Solr
Introduction to apache lucene
Intelligent crawling and indexing using lucene
Lucene indexing
Elasticsearch speed is key
Hacking Lucene for Custom Search Results
Faceted Search with Lucene
Intro to Apache Lucene and Solr
Advanced full text searching techniques using Lucene
Beyond full-text searches with Lucene and Solr
Apache Lucene intro - Breizhcamp 2015
Full Text Search with Lucene
Tutorial 5 (lucene)
Berlin Buzzwords 2013 - How does lucene store your data?
Lucene BootCamp
Apache Lucene Basics
Tutorial on developing a Solr search component plugin
What is in a Lucene index?
Building your own search engine with Apache Solr
Ad

Similar to Azure search (9)

PPTX
Adding azuresearch
PDF
How to search...better! (azure search)
PPTX
SQL Server - Full text search
PPTX
Test driving Azure Search and DocumentDB
PPTX
01 IRS-1 (1) document upload the link to
PPTX
01 IRS to upload the data according to the.pptx
PPT
Information Retrieval
PPTX
Search Me: Using Lucene.Net
PPTX
More on Indexing Text Operations (1).pptx
Adding azuresearch
How to search...better! (azure search)
SQL Server - Full text search
Test driving Azure Search and DocumentDB
01 IRS-1 (1) document upload the link to
01 IRS to upload the data according to the.pptx
Information Retrieval
Search Me: Using Lucene.Net
More on Indexing Text Operations (1).pptx
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Big Data Technologies - Introduction.pptx
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf

Azure search

Editor's Notes

  • #4: https://docs.microsoft.com/en-us/azure/search/search-what-is-azure-search
  • #5: Indexers are available for Azure Cosmos DB, Azure SQL Database, Azure Blob Storage, and SQL Server hosted in an Azure VM.
  • #6: High Density tier is targeted at SaaS providers who build applications which support a large number of relatively small indexes in a single search service Max indexes for partition - 1000 (max 3000/service)
  • #10: https://docs.microsoft.com/en-us/azure/search/index-add-suggesters
  • #11: https://azure.microsoft.com/en-us/blog/autocomplete-in-azure-search-now-in-public-preview/
  • #12: This syntaxis is default https://docs.microsoft.com/en-us/azure/search/query-simple-syntax
  • #13: Отличие Azure-овского от классического Lucene синтаксиса только в отсутствии range ( mod_date:[20020101 TO 20030101] – вот так в Azure Search нельзя ) https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax
  • #14: https://docs.microsoft.com/en-us/azure/search/query-simple-syntax#not-operator
  • #15: Stem (англ.) – основа, стебель, происхождение Лемма (лингвистика) - каноническая, основная форма слова Стемминг – использует алгоритмы (зачастую обрезает слова удаляя суффиксы и окончания, получая основу слова) Лемматизация использует поиск по словарям содержащим различные формы слов
  • #16: Lucene's English and Microsoft's English analyzers are better than default
  • #22: Odata filter could be used additionally with simple or full query syntax used in a search parameter https://docs.microsoft.com/en-us/rest/api/searchservice/support-for-odata https://docs.microsoft.com/en-us/azure/search/query-odata-filter-orderby-syntax
  • #23: 2 варианта для SQL Второй только для tables и не для таблиц у которых составной первичный ключ
  • #24: https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/work-with-change-tracking-sql-server?view=sql-server-2017 https://docs.microsoft.com/en-us/sql/t-sql/statements/set-transaction-isolation-level-transact-sql?view=sql-server-2017 SET TRANSACTION ISOLATION LEVEL SNAPSHOT; BEGIN TRAN -- Verify that version of the previous synchronization is valid. -- Obtain the version to use next time. -- Obtain changes. COMMIT TRAN
  • #26: Soft delete if HighWaterMarkChangeDetectionPolicy was selected
  • #27: https://docs.microsoft.com/en-us/azure/search/search-get-started-nodejs https://docs.microsoft.com/en-us/azure/search/search-get-started-java
  • #29: Sentiment анализирует текст на положительный или отрицательные отзывы/сентименты/эмоции
  • #30: generate tags for image, or identify celebrities and landmarks .JPEG .JPG .PNG .BMP .GIF .TIFF