Найти иг[ло]ку в
стоге сена
Information retrieval
●
○
●
○
●
○
Computer vision
Parametric search
http://market.yandex.ua
Text search
Basic
data structures
Simple matching
String needle = getUserInput();
for ( String title : dictionary ) {
if ( title.equals( needle ) ) {
System.out.println( “Gotcha!” );
}
}
Simple matching: better
String needle = getUserInput().toLowerCase();
for ( String title : dictionary ) {
if ( title.startsWith( needle ) ) {
System.out.println( “Gotcha!” );
}
}
Simple matching: awesomest !
String needle = “транс”;
начало сравнений
...
терминатор
транзит
трансамерика
трансильвания
трансформеры
...
Simple matching. Results.
Hash-based search
.hashCode() % buckets.length buckets
Терминатор 1207003689 00
Альф 32058595 01
Поймай меня если сможешь 824411401 02
Умри, но не сейчас 1792373633 03
Зловещие мертвецы 1534549542 04
Терминал 1534215135 05
09
...
Hash-based search
05
ТерминалАльф ...
Hash-based search
Radix trie
ме
с довый месяц
т яц в деревне
о ь в
сына встречи уду ампира илли
...
Медовый месяц
Месяц в деревне
Месть вампира
Месть вуду
Месть вилли
Место сына
Место встречи
...
Radix trie
Advanced
approaches
Fuzzy search factors
Трансформеры
перестановка трансфромеры
вставка транссформеры
удаление тр_нсформеры
замена трансформиры
Levenshtein distance
Ученый-математик, доктор
физико-математических
наук.
Функцию его имени можно
найти в любом языке
программирования.
http://ethw.org/images/8/83/Levenshtein.jpg
Jaro-Winkler distance
s - длина слова
m - кол-во совпавших сомволов
t - половина транспозиций
l - длина общего префикса
p - коэф.сглаживания (~0.1)
Jaro-Winkler distance
m = 5
W1
W2
d
хорошо
шорох
0.7
пробка
коробка
0.78
звезда
звездный
0.89
Jaro-Winkler distance
Depth-first search (DFS)
A
B C
DC A C
DB D E F
Needle = ABCD
ABCD
ABC(B→D)
AB(+C)D
A(C→B)CD
A(C→B)C(E→D)
A(C→B)C(F→D)
A(+B)C(A→D)
Depth-first search (DFS)
N-gram
АЛО начАЛО, русАЛОчка,
волшебное
зеркАЛО ...
АЧА нАЧАло, удАЧА, нАЧАльники ...
НАЧ НАЧало, НАЧальники,
геркулес
НАЧало ...
ЧАЛ наЧАЛо,
мгновения
пеЧАЛи,
наЧАЛьники ...
Начало (Inception), 2010 год
N-gram
N-gram
●
○
○
●
○
○
●
○
*по моему субъективному опыту
N-gram
Lookup
improvements
Template hashing
T R A N S F O R M E R S
1 0 1 0 0 0 1 1 1 1 0 0 0
A C E G I K M O Q S U W Y
B D F H J L N P R T V X Z
Template hashing
Bloom filter
0 1 0 0 0 0 1 1 0 0
Needle
Bloom filter
Thank you!
Questions? :)

More Related Content

PDF
Алгоритмы и структуры данных осень 2013 лекция 7
PDF
2014.10.15 блиц-доклад PostgreSQL kNN search
PPTX
BigData Week Moscow 2013 - Case: Personalization
PDF
SE2016 BigData Volodymyr Getmanskyi "How to build a dynamic pricing model usi...
PPTX
Eπισκεφτείτε τα υπέροχα ελληνικά νησιά
PPT
Java networing
PPT
Javasemantics
PPTX
τα μαρτύρια της ελλάδας
Алгоритмы и структуры данных осень 2013 лекция 7
2014.10.15 блиц-доклад PostgreSQL kNN search
BigData Week Moscow 2013 - Case: Personalization
SE2016 BigData Volodymyr Getmanskyi "How to build a dynamic pricing model usi...
Eπισκεφτείτε τα υπέροχα ελληνικά νησιά
Java networing
Javasemantics
τα μαρτύρια της ελλάδας

Viewers also liked (8)

PPT
PDF
SE2016 Management Aleksey Solntsev "Management of the projects in the conditi...
PDF
Bases Beca Luis Bravo Tornel
PPT
PPTX
το παιδι με τη ριγε πιτζαμα
PPTX
Ppt sahasfoundation
PPTX
Jesse owens
SE2016 Management Aleksey Solntsev "Management of the projects in the conditi...
Bases Beca Luis Bravo Tornel
το παιδι με τη ριγε πιτζαμα
Ppt sahasfoundation
Jesse owens
Ad

More from Inhacking (20)

PDF
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
PDF
SE2016 Fundraising Wlodek Laskowski "Insider guide to successful fundraising ...
PDF
SE2016 Fundraising Andrey Sobol "Blockchain Crowdfunding or "Mommy, look, I l...
PDF
SE2016 Company Development Valentin Dombrovsky "Travel startups challenges an...
PDF
SE2016 Company Development Vadym Gorenko "How to pass the death valley"
PDF
SE2016 Marketing&PR Jan Keil "Do the right thing marketing for startups"
PDF
SE2016 PR&Marketing Mikhail Patalakha "ASO how to start and how to finish"
PDF
SE2016 UI/UX Alina Kononenko "Designing for Apple Watch and Apple TV"
PDF
SE2016 Management Mikhail Lebedinkiy "iAIST the first pure ukrainian corporat...
PDF
SE2016 Management Anna Lavrova "Gladiator in the suit crisis is our brand!"
PDF
SE2016 Management Vitalii Laptenok "Processes and planning for a product comp...
PDF
SE2016 Management Yana Prolis "Please don't burn down!"
PDF
SE2016 Management Marina Bril "Management at marketing teams and performance"
PDF
SE2016 iOS Anton Fedorchenko "Swift for Server-side Development"
PDF
SE2016 iOS Alexander Voronov "Test driven development in real world"
PDF
SE2016 JS Gregory Shehet "Undefined on prod, or how to test a react application"
PDF
SE2016 JS Alexey Osipenko "Basics of functional reactive programming"
PDF
SE2016 Java Vladimir Mikhel "Scrapping the web"
PDF
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
PDF
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
SE2016 Fundraising Roman Kravchenko "Investment in Ukrainian IoT-Startups"
SE2016 Fundraising Wlodek Laskowski "Insider guide to successful fundraising ...
SE2016 Fundraising Andrey Sobol "Blockchain Crowdfunding or "Mommy, look, I l...
SE2016 Company Development Valentin Dombrovsky "Travel startups challenges an...
SE2016 Company Development Vadym Gorenko "How to pass the death valley"
SE2016 Marketing&PR Jan Keil "Do the right thing marketing for startups"
SE2016 PR&Marketing Mikhail Patalakha "ASO how to start and how to finish"
SE2016 UI/UX Alina Kononenko "Designing for Apple Watch and Apple TV"
SE2016 Management Mikhail Lebedinkiy "iAIST the first pure ukrainian corporat...
SE2016 Management Anna Lavrova "Gladiator in the suit crisis is our brand!"
SE2016 Management Vitalii Laptenok "Processes and planning for a product comp...
SE2016 Management Yana Prolis "Please don't burn down!"
SE2016 Management Marina Bril "Management at marketing teams and performance"
SE2016 iOS Anton Fedorchenko "Swift for Server-side Development"
SE2016 iOS Alexander Voronov "Test driven development in real world"
SE2016 JS Gregory Shehet "Undefined on prod, or how to test a react application"
SE2016 JS Alexey Osipenko "Basics of functional reactive programming"
SE2016 Java Vladimir Mikhel "Scrapping the web"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Roman Ugolnikov "Migration and source control for your DB"
Ad

SE2016 Java Alexey Tokar "To find a needle in a haystack"