21. 転置インデックス : トークンからドキュメントを引き当
てるデータ構造
テキスト解析
インデクシング
Doc# ドキュメント内容
1 Microsoft is introducing SQL
Server
2 Windows Server on Azure
3 Microsoft is introducing
Azure
4 Application programming on
Microsoft Azure
単語(トーク
ン)
含まれるドキュメ
ント
microsoft 1, 3, 4
introducing 1, 3
sql 1
server 1, 2
Windows 2
azure 2, 3, 4
application 4
48. Standard1 Standard2
ドキュメント数 1500万/パーティション or
サービス全体で1.8億
6000万/パーティション or
サービス全体で7.2億
ストレージ
サイズ
25GB/パーティション or
サービス全体で300GB
100GB/パーティション or
サービス全体で1.2TB
P1 P2 P3 P4 P5
R1
P1
R1
49. https://azure.microsoft.com/en-us/documentation/articles/search-limits-quotas-capacity/
12 replicas 12 SU 24 SU 36 SU N/A N/A N/A
6 replicas 6 SU 12 SU 18 SU 24 SU 36 SU N/A
5 replicas 5 SU 10 SU 15 SU 20 SU 30 SU N/A
4 replicas 4 SU 8 SU 12 SU 16 SU 24 SU N/A
3 replicas 3 SU 6 SU 9 SU 12 SU 18 SU 36 SU
2 replicas 2 SU 4 SU 6 SU 8 SU 12 SU 24 SU
1 replica 1 SU 2 SU 3 SU 4 SU 6 SU 12 SU
1 Partition 2 Partitions 3 Partitions 4 Partitions 6 Partitions
12
Partitions
50. テキスト解
析
インデクシ
ング
Doc# ドキュメント内容
1 Microsoft is
introducing SQL Server
2 Windows Server on Azure
3 Microsoft is
introducing Azure
4 Application programming
on Microsoft Azure
Terms Doc#
microsoft 1, 3, 4
introducing 1, 3
sql 1
server 1, 2
Windows 2
azure 2, 3, 4
application 4
programming 4
転置
インデック
ス
51. Query:
Microsoft
Terms Doc#
microsoft 1, 3, 4
introducing 1, 3
sql 1
server 1, 2
Windows 2
azure 2, 3, 4
application 4
programming 4
Doc# ドキュメント内容
1 Microsoft is
introducing SQL
Server
2 Windows Server on
Azure
3 Microsoft is
introducing Azure
4 Application
programming on
Microsoft Azure
52. Doc# ドキュメント内容
1 Microsoft is
introducing SQL
Server
2 Windows Server on
Azure
3 Microsoft is
introducing Azure
4 Application
programming on
Microsoft Azure
Terms Doc#
microsoft 1, 3, 4
introducing 1, 3
sql 1
server 1, 2
Windows 2
azure 2, 3, 4
application 4
programming 4
3
1
Azure Microsoft
4
Query:
Microsoft AND Azure
2
53. Doc# ドキュメント内容
4 Application
programming on
Microsoft Azure
Terms Doc#
application 4:0
Programming 4:12
Microsoft 4:27
azure 4:37
インデクシング
ドキュメント中の各トークンの
offset値
(0)application (12)programming
(27)Microsoft (37)Azure
54. Doc# ドキュメント内容
1 Microsoft is
introducing SQL Server
2 Windows Server on
Azure
3 Microsoft is
introducing Azure
4 Application
programming on
Microsoft Azure
Terms Doc#
microsoft 1:0
3:0
4:27
introducing 1:14
3:13
sql 1:26
server 1:30
2:8
Windows 2:0
azure 2:18
3:25
4:37
application 4:0
programming 4:12
Query:
“Microsoft Azure”
キーワード1のオフセットとキー
ワード1とスペース(1)の長さの合
計がキーワード2のオフセット等
しくなるフレーズが含まれるド
キュメントを探す
Doc#4の場合
k1len:キーワード1長(“Microsoft”) =9
k1off: キーワード1のオフセット = 27
k2off: キーワード2(“Azure”)のオフセット
=37
⇒ k1off + (k1len +1) = k2off
フレーズクエリ:ダブルクォートで囲む
69. NOTE: searchModeとの組み合わせ
(1) search=A-B&searchMode=any
⇒ search=A or (NOT B)
(2) search=A-B&searchMode=all
⇒ A and (NOT B)
AND検索「+」A+B : AかつBquery:
Azure+Search
OR検索「|」 A|B: A, B or Both
query: Azure|Search
NOT検索「-」A-B: A or (NOT B)
query: Azure-Search
A NOT B
ワイルドカード検索「*」大小文字区別なし
query: Azu*
フレーズ検索「“”」”A B”: A B順にあるものだけ
query: “Azure-Search”
グルーピング「()」A+(B|C): A+B or A+C
query: Azure+(AD|Search)
70. search= A B の例
(1) search=A B&searchMode=any (2) search=A B&searchMode=all
⇒ search=A OR B ⇒ search=A AND B
71. フィールドスコープ「field:term」検索対象フィールドの指定
query: session:Azure AND Search
query: session:“Azure Search" AND “Azure AD"
あいまい検索「term~」または「term~N」(N=0~2, default 2): N回入れ替え
れば一致するもの全て
query: Azure~1
近似検索「”A B”~N 」: AとBの間がN語以内のもの
query:“Azure Search”~3
Azure search
3 words
92. en.lucene
• StandardAnalzyerの拡張
• 語幹変化 (Porter Stemming)
• ストップワード削除
en.microsoft
• マイクロソフト英語NLP
• 語幹変化ではなく見出し語変化
(lemmatization)
• 詳細処理非公開
after such a fall as this, I shall think nothing of
tumbling down stairs!, Why, I wouldn't say
anything about it, even if I fell off the top of
the house!'
after such a fall as this, I shall think nothing of
tumbling down stairs!, Why, I wouldn't say
anything about it, even if I fell off the top of
the house!
or she fell very slowly, for she had plenty of
time as she went down to look about her and
to wonder what was going to happen next
or she fell very slowly, for she had plenty of
time as she went down to look about her and
to wonder what was going to happen next
she she
she
fell
fell
fallfall
108. name type (char_filter_type) Description and Options
html_strip HtmlStripCharFilter HTMLタグを削除する文字フィルタ
Mapping MappingCharFilter 文字対文字のマッピングを行うフィルタ
pattern_replace PatternReplaceCharFilte
r
正規表現によるパターン文字列書き換え
フィルタ
109. name type (char_filter_type) Description and Options
nGram NGramTokenizer N-Gram方式で文字列を分割するトークナイザ
edgeNGram EdgeNGramTokenFilter エッジN-Gram方式で文字列を分割するトークナイザ
- MicrosoftLanguageTokenizer • maxTokenLength – maximum token length
• isSearchTokenizer - used as search or index tokenizer (depending on the language the
behavior may be different)
• language – allowed values: bengali", "bulgarian", "catalan",
"chinese_simplified", "chinese_traditional", "croatian", "czech", "danish", "dutch",
"english", "french", "german", "greek", "gujarati", "hindi", "icelandic", "indonesian","italian",
"japanese", "kannada", "korean", "malay", "malayalam", "marathi", "norwegian_bokmaal",
"polish", "portuguese", "portuguese_brazilian", "punjabi", "romanian", "russian", "serbian_cyrillic",
"serbian_latin", "slovenian", "spanish", "swedish", "tamil", "telugu", "thai", "ukrainian", "urdu",
"vietnamese"
138. • フルマネージ - PaaS
• Indexer Add-in
• Pull のみ
• Pre-Build skill
• Azure Cognitive Services + α
• Region
• South Central US か West Europe
• API Version
• api-version=2017-11-11-Preview
• 拡張性
• 任意の REST API の呼びだし
• 現状 追加費用なし!
155. “Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
Class A
Class B
Class C
156. “Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
“Lorem ipsum dolor sit amet,
consectetur adipiscing elit, sed
do eiusmod tempor incididunt ut
labore et dolore magna aliqua. Ut
enim ad minim veniam, quis
nostrud exercitation ullamco
laboris nisi…”
Entity type A
Entity type B
186. var q = encodeURIComponent($("#q").val());
var searchAPI =
"https://yoichikademo0.search.windows.net/indexes/
decodesessions2016/docs?$top=12&$select=id,title,t
rack,url,thumbnail,description&api-version=2015-
02-28&search=" + q;
inSearch= true;
$.ajax({
url: searchAPI,
beforeSend: function (request) {
request.setRequestHeader("api-key",
”A86C8C8929A5225D5120A151B584C5B6”);
request.setRequestHeader("Content-Type",
"application/json");
request.setRequestHeader("Accept",
"application/json; odata.metadata=none");
},
type: "GET",
success: function (data) {
187. インターフェース 最新バージョン 状況
NET SDK 3.0
Generally Available, released
November 2016
.NET SDK Preview 2.0-preview Preview, released August 2016
Service REST API 2016-09-01 Generally Available
Service REST API Preview 2015-02-28-Preview Preview
.NET Management SDK 2015-08-19 Generally Available
Management REST API 2015-08-19 Generally Available
https://docs.microsoft.com/en-us/azure/search/search-api-versions