Be Lazy & Scale

Be Lazy & Scale
Full-Text Tagging Billions Of Messages

reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx]
failed - POSSIBLE BREAK-IN ATTEMPT!
pam_unix(sshd:session): session opened for user xxxxxx by
(uid=0)
Bad protocol version identification 'root' from xxx.xx.xxx.xx
port xxxxx
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx]
failed - POSSIBLE BREAK-IN ATTEMPT!
Bad protocol version identification 'root' from xxx.xx.xxx.xx
port xxxxx
pam_unix(sshd:session): session opened for user xxxxxx by
(uid=0)

Percolator
Traditionally you design documents based on your data, store them into an
index, and then define queries via the search API in order to retrieve these
documents. The percolator works in the opposite direction. First you store
queries into an index and then, via the percolate API, you define documents
in order to retrieve these queries.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
reverse mapping checking
getaddrinfo for xxxxx
[xxx.xxx.xxx.xxx] failed -
POSSIBLE BREAK-IN ATTEMPT!
reverse mapping checking
getaddrinfo for xxxxx
[xxx.xxx.xxx.xxx] failed -
"possible break-in attempt!"
"bad protocol version identification"
"session opened"

Bad protocol version identification ...
"bad protocol"Phrase Query
versionTerm Query
ident*Prefix Query
Boolean Query AND, OR, NOT

105s
1 Big
OR
+3.8%
109s
160
500000
~ 33%
Tags
(real life)
Runs
(based on real messages)
Matches
-8.5%
96s
Using single char
message 'a'

105s
Trivial 1 Term
clause / tag
-72.8%
28.6s
160
~ 295
500000
~ 33%
Tags
(real life)
Terminal Clauses
Runs
(based on real messages)
Matches
-41%
62.7s
Keep only 1
clause / tag

Perco. Queries Index
Register Queries
In-Memory Index
Bad protocol
...
Bad
protocol
...
Perco. Req. Bad
protocol
...
Perco. Resp.
Execute
Each
Query

[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"
connect*
version
Query Term Index
possible --> 0
break --> 1
in --> 2
attempt --> 3
version --> 4
Query Clauses Rewritten Clauses
connect*
4

Query Term Index
possible --> 0
break --> 1
in --> 2
attempt --> 3
version --> 4
reverse mapping checking getaddrinfo for xxxxx [xxx.xxx.xxx.xxx] failed -
Raw Message
[reverse, mapping, checking, getaddrinfo, for, xxxxx, xxx, xxx, xxx, xxx,
failed, possible, break, in, attempt]
Analyzed Message
[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Message Rewritten in Query Space
true
true
true
true
false
Query Term Presence Bitset

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
true
true
true
true
false
[0, 1, 2, 3]"POSSIBLE BREAK-IN ATTEMPT!"
Quick Check / Early
Termination
Actual Check
~ contains

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
true
true
true
true
false
connect*connect*
Brute Force /
startsWith (FAST!)

[-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, 3]
Analyzed Message
true
true
true
true
false
4version
Simple Lookup

105s
160
Tags
500000
Runs
~ 33%
Matches
7.3s
x14.4
Faster
8.8s
x22.2
Faster
195s
320
Tags
500000
Runs
~ 33%
Matches

Be Lazy & Scale

More Related Content

What's hot (20)

Similar to Be Lazy & Scale (20)

Recently uploaded (20)

Be Lazy & Scale

Editor's Notes