SlideShare a Scribd company logo
Montreal Elasticsearch Meetup
Loïc Bertron
Director of Research & Development @Cedrom-SNI
!
Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation
Introduced Elasticsearch at Cedrom-Sni
!
Cedrom-Sni
!
10k+ different sources, 750k+ new docs/days
Our job : Ingesting, enriching, extracting analytics and intelligence from docs
loic.bertron@cedrom-sni.com
linkedin.com/in/loicbertron
@loicbertron
Who am I ?
ElasticSearch is offering advanced search features to any application or
website easily, scaling on a large amount of data.
«
»
ElasticSearch
Simple : Plug & Play - Schema free - RESTful API
!
Elastic : Automatically discover all others instances
!
Strong : Replication & Load balancing - Scales massively - Lucene based
!
Fast : Requests executed in parallel - Real Time
!
Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins …
What is ElasticSearch ?
Document as JSON
• Object representing your data
• Grouped in an index
• One index can have multiples types of documents
{
    "message": "Introducing #ElasticSearch",
"post_date": "2014-03-12T18:30:00",
    "author": {
"first_name" : "Loïc",
"email" : "loic.bertron@cedrom-sni.com"
},
"employee_at_Cedrom" : true,
"Tags" : ["Meetup","Montreal"]
}
• API REST : http://host:port/[index]/[type]/[_action/id]

HTTP Methods: GET, POST, PUT, DELETE
• Documents
• http://node1:9200/twitter/tweet/1 (POST)
• http://node1:9200/twitter/tweet/1 (GET)
• http://node1:9200/twitter/tweet/1 (DELETE)
• Search
• http://node1:9200/twitter/tweet/_search (GET)
• http://node1:9200/twitter/_search (GET)
• http://node1:9200/_search (GET)
• Metadata
• http://node1:9200/twitter/_status (GET)
• http://node1:9200/_shutdown (POST)
API
Index a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"1"
}
Index a document
Update a document
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}'
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"2"
}
Update a document
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Search for documents
$ curl -XGET http://node1:9200/twitter/tweet/_search?q=elasticsearch
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Execution
time
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
# of documents
matching
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Infos
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Score
Search for documents
{
"took" : 24,
"timed_out" : false,
"_shards" : { "total" : 2, "successful" : 2, "failed" : 0 },
"hits" : {
"total" : 1,
"max_score" : 0.227,
"hits" : [ {
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_score" : 0.227, "_source" : {
"user": "loicbertron",
    "post_date": "2014-03-12T18:40:00",
    "message": "Introducing #ElasticSearch to the #Community"
}
} ]
}
}
Document
Search for documents
Search operand
Terms quebec
quebec ontario
Phrases "city of montréal"
Proximity "montreal collusion" ~5
Fuzzy schwarzenegger ~0.8
Wildcards queb*
Boosting Quebec^5 montreal
Range [2011/03/12 TO 2014/03/12]
[java to json]
Boolean quebec AND NOT montreal
+quebec -montreal
(quebec OR ottawa) AND NOT toronto
Fields title:montreal^10 OR body:montreal
$ curl -XGET http://node1:9200/twitter/tweet/_search?q=<Your Query>
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
$ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{
"query": {
"filtered" : {
"query" : {
"bool" : {
!
"must" : {
"match" : {
"author.first_name" : {
"query" : "loic",
"fuzziness" : 0.1
}
}
},
!
"must" : {
"multi_match" : {
"query" : "elasticsearch",
"fields" : ["title^10","body"]
}
}
}
},
!
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
}
}
}’
Query DSL
"filter": {
"and" : [
{"terms" : { "tags" : ["search","scale","store"] } },
{"range" : { "created_at" : {"from": "2013" } } } ,
{"term": { "featured" : true } }
]
}
Facets
Ranges
Term
Term
Ranges
Facets
$ curl -XPOST http://node1:9200/articles/_search -d '{
    "aggregations" : {
"tag_cloud" : { "terms" : {"field" : "tags"} }
}
}'
Tag Cloud
"aggregations" : {
"tag_cloud" :[
{"terms": "Quebec", "count" : 5},
{"terms": "Montréal", "count" : 3},
...
]
}
$ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{
    "facets": {
"scores-per-subject" : {
"terms_stats" : {
"key_field" : "subject",
"value_field" : "score"
}
}
}
}'
Stats
"facets" : {
"scores-per-subject" : {
"_type" : "terms_stats",
"missing" : 0,
"terms" : [ {
"term" : "math",
"count" : 4,
"total_count" : 4,
"min" : 25.0,
"max" : 92.0,
"total" : 267.0,
"mean" : 66.75
}, […]
}
}
Advanced facets : Aggregations
{
"rank": "21",
"city": "Boston",
"state": "MA",
"population2012": "636479",
"population2010": "617594",
"land_area": "48.277",
"density": "12793",
"ansi": "619463",
"location": {
"lat": "42.332",
"lon": "71.0202"
}
}
curl -XGET "node1:9200/cities/_search?pretty" -d '{
"aggs" : {
"mean_density_by_state" : {
"terms" : {
"field" : "state"
},
"aggs": {
"mean_density": {
"avg" : {
"field" : "density"
}
}
}
}
}
}'
Advanced facets : Aggregations
"aggregations" : {
"mean_density_by_state" : {
"terms" : [ {
"term" : "CA",
"doc_count" : 69,
"mean_density" : {
"value" : 5558.623188405797
}
}, {
"term" : "TX",
"doc_count" : 32,
"mean_density" : {
"value" : 2496.625
}
}, {
"term" : "FL",
"doc_count" : 20,
"mean_density" : {
"value" : 4006.6
}
}, {
"term" : "CO",
"doc_count" : 11,
Advanced facets : Aggregations
Ranges
Term
Facets
Facets
Terms
Terms Stats
Statistical
Range
Histogram
Date Histogram
Filter
Query
Geo Distance
Noeud 1
Cluster
État du cluster : Vert
Node 1
Cluster
Shard 0
Shard 1
cluster state : Yellow
Architecture
$ curl -XPUT localhost:9200/twitter -d '{
"index" : {
"number_of_shards" : 2,
"number_of_replicas" : 1
}
}'
Noeud 1
Cluster
État du cluster : Vert
Noeud 1
Cluster
Shard 0
Shard 1
État du cluster : Jaune
Node 1
Cluster
Shard 0
Shard 1
cluster state : Green
Node 2
Shard 0
Shard 1
adding a second node
Architecture
Node 1
Cluster
Shard 0
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1
Cluster
Shard 0
Node 3 Node 4
Shard 1
Node 2
Shard 1
Shard 0
Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
$ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:30:00",
    "message": "Introducing #ElasticSearch"
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"1"
"_version":"1"
}
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T18:45:00",
    "message": "The crowd is on fire #ElasticSearch"
}'
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"2"
"_version":"1"
}
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1
Doc 1
Doc 2
Doc 2
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 0
Shard 1Shard 1
Shard 0
Doc 1 Doc 1
Doc 2 Doc 2
Architecture
Node 1 Node 2 Node 3 Node 4
Cluster
Shard 1Shard 1
Shard 0
Doc 1
Doc 2 Doc 2
Architecture
Node 2 Node 3 Node 4
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1
Node 2
Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 3 Node 4
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
Architecture
Node 2 Node 3 Node 4
$ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}'
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
{
"ok":true,
"_index":"twitter",
"_type":"tweet",
"_id":"3"
"_version":"1"
}
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Shard 0
Doc 1
Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2
Doc 2
Shard 0
Doc 1Doc 3
$ curl -XGET http://node1:9200/twitter/tweet/_search -d '{
    "query": {
    "term": { "message": "ElasticSearch" }
}
}'
Architecture
Node 2 Node 3 Node 4
Shard 0
Doc 1
Doc 3
Cluster
Shard 1Shard 1
Doc 2 Doc 2
Architecture
Node 2 Node 4
How users see search ?
ResultUser Query List of results
How search engine works?
1. Fetch document field
2. Pick configured anlyser
3. Parse text inot tokens
4. Apply token filters
5. Store into index
Analyzer
curl -XGET "http://localhost:9200/docs/_analyze?
analyzer=standard&pretty=1" -d "Édith Piaf vedette du feu d'artifice"
Analyzer
{
"tokens" : [ {
"token" : "édith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "piaf",
"start_offset" : 6,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "vedette",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "du",
"start_offset" : 19,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 4
}, {
"token" : "feu",
"start_offset" : 22,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 5
}, {
"token" : "d'artifice",
"start_offset" : 26,
"end_offset" : 36,
"type" : "<ALPHANUM>",
"position" : 6
} ]
}
composed of a single tokenizer and zero or more filters
Analyzer
Cutting out a string of words & transforming :
!
Whitespace tokenizer :
«Édith piaf» -> «Édith», «Piaf»
!
Standard tokenizer :
«Édith piaf!» -> «édith», «piaf»
Tokenizer
Modify, delete or add tokens
!
Asciifolding filter :
«Édith Piaf» -> «Edith Piaf»
!
Stemmer filter (english) :
«stemming» -> «stem»
«fishing», «fished», «fisher» -> «fish»
«cats,catlike» -> «cat»
!
Phonetic :
«quick» -> «Q200»
«quik» -> «Q200»
!
Edge nGram :
«Montreal» -> [«Mon», «Mont», «Montr»]
Filters
Analyzer
{
"tokens" : [ {
"token" : "edith",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "piaf",
"start_offset" : 6,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "vedet",
"start_offset" : 11,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "feu",
"start_offset" : 22,
"end_offset" : 25,
"type" : "<ALPHANUM>",
"position" : 5
},
!
!
{
"token" : "artific",
"start_offset" : 26,
"end_offset" : 36,
"type" : "<ALPHANUM>",
"position" : 6
} ]
}
1.Documents get indexed
2.I come back often on the search page to run my request
3.I hope that my document will be well ranked to be on top of the results page
4.if not, i won’t never see my document
Regular search engine usage
1. Register my query
2. When document get indexed, the percolator look for a match again registered queries
Percolator
Real Time Updates !
Percolator
Percolator
curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{
"query" : {
"match" : {
"message" : "elasticsearch"
}
}
}'
Percolator
$ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{
"doc" : {
    "user": "loicbertron",
    "post_date": "2014-03-12T19:00:00",
    "message": "A third message about #ElasticSearch"
}
}'
Percolator
{
    "took" : 19,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
    },
    "total" : 1,
    "matches" : [
        {
             "_index" : "twitter",
             "_id" : "elasticsearch"
        }
    ]
}
{
"name": "Jules Verne",
"biography": "One of the greatest author",
!
"books": [
{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
}
{
"title": "Les Châteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}
]
}
Inner objects
curl -XPUT node1:9200/authors/bare_author/1 -d'{
"name": "Jules Verne",
"biography": « One of the greets author"
}'
curl -XPOST node1:9200/authors/book/1?parent=1 -d '{
"title": "Les Châteaux en Californie",
"genre": "Drama",
"publisher": "Marc Soriano"
}'
!
curl -XPOST node1:9200/authors/book/2?parent=1 -d '{
"title": "Vingt mille lieues sous les mers",
"genre": "Novel",
"publisher": "Hetzel"
!
}'
Parents / Childs
Others features
• Suggest API : Did you mean ?, Autocomplete, …
• Results Highlight
• More like this
• Backup Data : Snapshot / Restore
• File System
• Amazon S3
• HDFS
• Google Compute Engine
• Microsoft Azure
• Hadoop connector
Clients
• Perl
• Python
• Ruby
• Php
• Javascript
• Java
• .Net
• Scala
• Clojure
• Erlang
• Eventmachine
• Bash
• Ocaml
• Smalltalk
• Cold Fusion
Who’s using it ?
Questions
Thank you
Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch
Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
Bonus :)
Suggest
curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{
  "suggest" : {
    "my-title-suggestions-1" : {
      "text" : "devloping",
      "term" : {
        "size" : 3,
        "field" : "title"  
      }
    }
  }
}'
Suggest
"suggest": {
    "my-title-suggestions-1": [
      {
        "text": "devloping",
        "offset": 0,
        "length": 9,
        "options": [
          {
            "text": "developing",
            "freq": 77,
            "score": 0.8888889
          },
          {
            "text": "deloping",
            "freq": 1,
            "score": 0.875
          },
          {
            "text": "deploying",
            "freq": 2,
            "score": 0.7777778
          }
        ]
      }
More Like This
curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1'
{
    "more_like_this" : {
        "fields" : ["name.first", "name.last"],
        "like_text" : "text like this one",
        "min_term_freq" : 1,
        "max_query_terms" : 12,
        "percent_terms_to_match" : 0.95
    }
}
Highlight
{
    "query" : {...},
    "highlight" : {
        "number_of_fragments" : 3,
        "fragment_size" : 150,
        "tag_schema" : "styled",
        "fields" : {
            "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },
            "bio.title" : { "number_of_fragments" : 0 },
            "bio.author" : { "number_of_fragments" : 0 },
            "bio.content" : { "number_of_fragments" : 5, "order" : "score" }
        }
    }
}
Highlight
Hadoop
Hadoop
• Java library for integrating Elasticsearch and Hadoop
• Pig, Hive, Cascading, MapReduce
• Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake
• Scales with Hadoop

More Related Content

PDF
MongoDB Europe 2016 - Debugging MongoDB Performance
PDF
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
PPTX
MongoDB 3.2 - Analytics
PPTX
MongoDB - Back to Basics - La tua prima Applicazione
PPTX
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
TXT
Kevin milla arbieto informatica piktochart backup data
PDF
MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...
KEY
Schema design
MongoDB Europe 2016 - Debugging MongoDB Performance
MongoDB Europe 2016 - Enabling the Internet of Things at Proximus - Belgium's...
MongoDB 3.2 - Analytics
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
Kevin milla arbieto informatica piktochart backup data
MongoDB .local Munich 2019: New Encryption Capabilities in MongoDB 4.2: A Dee...
Schema design

What's hot (20)

PDF
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
KEY
Mongo db presentation
PPTX
Building a Scalable Inbox System with MongoDB and Java
PDF
Webinar: Building Your First App with MongoDB and Java
PPTX
ElasticSearch - Introduction to Aggregations
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
KEY
Managing Social Content with MongoDB
PDF
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
PDF
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
PDF
Online | MongoDB Atlas on GCP Workshop
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PDF
elasticsearch - advanced features in practice
PDF
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PPTX
Webinar: Exploring the Aggregation Framework
PPTX
Beyond the Basics 2: Aggregation Framework
KEY
MongoDB In Production At Sailthru
PDF
Curiosity, outil de recherche open source par PagesJaunes
PPTX
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
MongoDB .local Munich 2019: Best Practices for Working with IoT and Time-seri...
Mongo db presentation
Building a Scalable Inbox System with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
ElasticSearch - Introduction to Aggregations
Webinar: General Technical Overview of MongoDB for Dev Teams
Managing Social Content with MongoDB
Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Sy...
Back to Basics Webinar 5: Introduction to the Aggregation Framework
Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Online | MongoDB Atlas on GCP Workshop
Back to Basics Webinar 3: Schema Design Thinking in Documents
elasticsearch - advanced features in practice
MongoDB .local Chicago 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
Webinar: Exploring the Aggregation Framework
Beyond the Basics 2: Aggregation Framework
MongoDB In Production At Sailthru
Curiosity, outil de recherche open source par PagesJaunes
Webinar: MongoDB 2.4 Feature Demo and Q&A on Hash-based Sharding
Ad

Similar to Montreal Elasticsearch Meetup (20)

PPTX
quick intro to elastic search
PDF
Elasticsearch Quick Introduction
PDF
Elasticsearch in 15 Minutes
PPTX
ElasticSearch AJUG 2013
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
PPTX
Introducing ElasticSearch - Ashish
PDF
Hopper Elasticsearch Hackathon
PPTX
Elasticsearch
PDF
DRUPAL AND ELASTICSEARCH
PDF
Elasticsearch in 15 minutes
PDF
Faster and better search results with Elasticsearch
PDF
Elasticsearch
PDF
Enhancement of Searching and Analyzing the Document using Elastic Search
KEY
Elasticsearch & "PeopleSearch"
PPTX
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
PPTX
Getting Started With Elasticsearch In .NET
PPTX
Getting started with Elasticsearch in .net
PDF
Introduction to Elasticsearch
PPT
Craig Brown speaks on ElasticSearch
PPTX
Introduction to Elasticsearch with basics of Lucene
quick intro to elastic search
Elasticsearch Quick Introduction
Elasticsearch in 15 Minutes
ElasticSearch AJUG 2013
Real-time search in Drupal with Elasticsearch @Moldcamp
Introducing ElasticSearch - Ashish
Hopper Elasticsearch Hackathon
Elasticsearch
DRUPAL AND ELASTICSEARCH
Elasticsearch in 15 minutes
Faster and better search results with Elasticsearch
Elasticsearch
Enhancement of Searching and Analyzing the Document using Elastic Search
Elasticsearch & "PeopleSearch"
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Getting Started With Elasticsearch In .NET
Getting started with Elasticsearch in .net
Introduction to Elasticsearch
Craig Brown speaks on ElasticSearch
Introduction to Elasticsearch with basics of Lucene
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Approach and Philosophy of On baking technology
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Network Security Unit 5.pdf for BCA BBA.
Spectral efficient network and resource selection model in 5G networks
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Understanding_Digital_Forensics_Presentation.pptx
Empathic Computing: Creating Shared Understanding
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Montreal Elasticsearch Meetup

  • 2. Loïc Bertron Director of Research & Development @Cedrom-SNI ! Working on Big Data for Cedrom-SNI : social media, tv & radio aggregation Introduced Elasticsearch at Cedrom-Sni ! Cedrom-Sni ! 10k+ different sources, 750k+ new docs/days Our job : Ingesting, enriching, extracting analytics and intelligence from docs loic.bertron@cedrom-sni.com linkedin.com/in/loicbertron @loicbertron Who am I ?
  • 3. ElasticSearch is offering advanced search features to any application or website easily, scaling on a large amount of data. « » ElasticSearch
  • 4. Simple : Plug & Play - Schema free - RESTful API ! Elastic : Automatically discover all others instances ! Strong : Replication & Load balancing - Scales massively - Lucene based ! Fast : Requests executed in parallel - Real Time ! Full featured : Search, Analytics, Facets, Percolator, Geo search, Suggest, Plugins … What is ElasticSearch ?
  • 5. Document as JSON • Object representing your data • Grouped in an index • One index can have multiples types of documents {     "message": "Introducing #ElasticSearch", "post_date": "2014-03-12T18:30:00",     "author": { "first_name" : "Loïc", "email" : "loic.bertron@cedrom-sni.com" }, "employee_at_Cedrom" : true, "Tags" : ["Meetup","Montreal"] }
  • 6. • API REST : http://host:port/[index]/[type]/[_action/id]
 HTTP Methods: GET, POST, PUT, DELETE • Documents • http://node1:9200/twitter/tweet/1 (POST) • http://node1:9200/twitter/tweet/1 (GET) • http://node1:9200/twitter/tweet/1 (DELETE) • Search • http://node1:9200/twitter/tweet/_search (GET) • http://node1:9200/twitter/_search (GET) • http://node1:9200/_search (GET) • Metadata • http://node1:9200/twitter/_status (GET) • http://node1:9200/_shutdown (POST) API
  • 7. Index a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }'
  • 9. Update a document $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" }'
  • 11. $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Search for documents $ curl -XGET http://node1:9200/twitter/tweet/_search?q=elasticsearch
  • 12. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } }
  • 13. Search for documents { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Execution time
  • 14. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } # of documents matching Search for documents
  • 15. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Infos Search for documents
  • 16. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Score Search for documents
  • 17. { "took" : 24, "timed_out" : false, "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "user": "loicbertron",     "post_date": "2014-03-12T18:40:00",     "message": "Introducing #ElasticSearch to the #Community" } } ] } } Document Search for documents
  • 18. Search operand Terms quebec quebec ontario Phrases "city of montréal" Proximity "montreal collusion" ~5 Fuzzy schwarzenegger ~0.8 Wildcards queb* Boosting Quebec^5 montreal Range [2011/03/12 TO 2014/03/12] [java to json] Boolean quebec AND NOT montreal +quebec -montreal (quebec OR ottawa) AND NOT toronto Fields title:montreal^10 OR body:montreal $ curl -XGET http://node1:9200/twitter/tweet/_search?q=<Your Query>
  • 19. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL
  • 20. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } }
  • 21. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } }
  • 22. $ curl -XGET http://node1:9200/twitter/tweet/_search -d ‘{ "query": { "filtered" : { "query" : { "bool" : { ! "must" : { "match" : { "author.first_name" : { "query" : "loic", "fuzziness" : 0.1 } } }, ! "must" : { "multi_match" : { "query" : "elasticsearch", "fields" : ["title^10","body"] } } } }, ! "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] } } } }’ Query DSL "filter": { "and" : [ {"terms" : { "tags" : ["search","scale","store"] } }, {"range" : { "created_at" : {"from": "2013" } } } , {"term": { "featured" : true } } ] }
  • 25. $ curl -XPOST http://node1:9200/articles/_search -d '{     "aggregations" : { "tag_cloud" : { "terms" : {"field" : "tags"} } } }' Tag Cloud "aggregations" : { "tag_cloud" :[ {"terms": "Quebec", "count" : 5}, {"terms": "Montréal", "count" : 3}, ... ] }
  • 26. $ curl -XPOST http://node1:9200/students/_search?search_type=count -d '{     "facets": { "scores-per-subject" : { "terms_stats" : { "key_field" : "subject", "value_field" : "score" } } } }' Stats "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 4, "total_count" : 4, "min" : 25.0, "max" : 92.0, "total" : 267.0, "mean" : 66.75 }, […] } }
  • 27. Advanced facets : Aggregations { "rank": "21", "city": "Boston", "state": "MA", "population2012": "636479", "population2010": "617594", "land_area": "48.277", "density": "12793", "ansi": "619463", "location": { "lat": "42.332", "lon": "71.0202" } }
  • 28. curl -XGET "node1:9200/cities/_search?pretty" -d '{ "aggs" : { "mean_density_by_state" : { "terms" : { "field" : "state" }, "aggs": { "mean_density": { "avg" : { "field" : "density" } } } } } }' Advanced facets : Aggregations
  • 29. "aggregations" : { "mean_density_by_state" : { "terms" : [ { "term" : "CA", "doc_count" : 69, "mean_density" : { "value" : 5558.623188405797 } }, { "term" : "TX", "doc_count" : 32, "mean_density" : { "value" : 2496.625 } }, { "term" : "FL", "doc_count" : 20, "mean_density" : { "value" : 4006.6 } }, { "term" : "CO", "doc_count" : 11, Advanced facets : Aggregations
  • 32. Noeud 1 Cluster État du cluster : Vert Node 1 Cluster Shard 0 Shard 1 cluster state : Yellow Architecture $ curl -XPUT localhost:9200/twitter -d '{ "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }'
  • 33. Noeud 1 Cluster État du cluster : Vert Noeud 1 Cluster Shard 0 Shard 1 État du cluster : Jaune Node 1 Cluster Shard 0 Shard 1 cluster state : Green Node 2 Shard 0 Shard 1 adding a second node Architecture
  • 34. Node 1 Cluster Shard 0 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 35. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 36. Node 1 Cluster Shard 0 Node 3 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 37. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 38. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 39. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Architecture
  • 40. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 41. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 42. Node 1 Cluster Shard 0 Node 3 Node 4 Shard 1 Node 2 Shard 1 Shard 0 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture
  • 43. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 $ curl -X PUT http://node1:9200/twitter/tweet/1 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:30:00",     "message": "Introducing #ElasticSearch" }' Architecture Node 1 Node 2 Node 3 Node 4
  • 44. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  • 45. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 46. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 47. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 48. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/2 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T18:45:00",     "message": "The crowd is on fire #ElasticSearch" }'
  • 49. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"2" "_version":"1" } Architecture Node 1 Node 2 Node 3 Node 4
  • 50. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 51. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 52. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 53. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 54. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 1 Node 2 Node 3 Node 4
  • 55. Cluster Shard 0 Shard 1Shard 1 Shard 0 Doc 1 Doc 1 Doc 2 Doc 2 Architecture Node 1 Node 2 Node 3 Node 4
  • 56. Cluster Shard 1Shard 1 Shard 0 Doc 1 Doc 2 Doc 2 Architecture Node 2 Node 3 Node 4
  • 57. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  • 58. Cluster Shard 1 Node 2 Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 3 Node 4 Shard 0 Doc 1
  • 59. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  • 60. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1
  • 61. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 Architecture Node 2 Node 3 Node 4 $ curl -X PUT http://node1:9200/twitter/tweet/3 -d '{     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" }' Shard 0 Doc 1 Doc 3
  • 62. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"3" "_version":"1" } Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 63. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1 Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 64. Cluster Shard 1Shard 1 Doc 2 Doc 2 Shard 0 Doc 1Doc 3 $ curl -XGET http://node1:9200/twitter/tweet/_search -d '{     "query": {     "term": { "message": "ElasticSearch" } } }' Architecture Node 2 Node 3 Node 4 Shard 0 Doc 1 Doc 3
  • 65. Cluster Shard 1Shard 1 Doc 2 Doc 2 Architecture Node 2 Node 4
  • 66. How users see search ? ResultUser Query List of results
  • 67. How search engine works? 1. Fetch document field 2. Pick configured anlyser 3. Parse text inot tokens 4. Apply token filters 5. Store into index
  • 69. Analyzer { "tokens" : [ { "token" : "édith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedette", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "du", "start_offset" : 19, "end_offset" : 21, "type" : "<ALPHANUM>", "position" : 4 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, { "token" : "d'artifice", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  • 70. composed of a single tokenizer and zero or more filters Analyzer
  • 71. Cutting out a string of words & transforming : ! Whitespace tokenizer : «Édith piaf» -> «Édith», «Piaf» ! Standard tokenizer : «Édith piaf!» -> «édith», «piaf» Tokenizer
  • 72. Modify, delete or add tokens ! Asciifolding filter : «Édith Piaf» -> «Edith Piaf» ! Stemmer filter (english) : «stemming» -> «stem» «fishing», «fished», «fisher» -> «fish» «cats,catlike» -> «cat» ! Phonetic : «quick» -> «Q200» «quik» -> «Q200» ! Edge nGram : «Montreal» -> [«Mon», «Mont», «Montr»] Filters
  • 73. Analyzer { "tokens" : [ { "token" : "edith", "start_offset" : 0, "end_offset" : 5, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "piaf", "start_offset" : 6, "end_offset" : 10, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "vedet", "start_offset" : 11, "end_offset" : 18, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "feu", "start_offset" : 22, "end_offset" : 25, "type" : "<ALPHANUM>", "position" : 5 }, ! ! { "token" : "artific", "start_offset" : 26, "end_offset" : 36, "type" : "<ALPHANUM>", "position" : 6 } ] }
  • 74. 1.Documents get indexed 2.I come back often on the search page to run my request 3.I hope that my document will be well ranked to be on top of the results page 4.if not, i won’t never see my document Regular search engine usage
  • 75. 1. Register my query 2. When document get indexed, the percolator look for a match again registered queries Percolator
  • 76. Real Time Updates ! Percolator
  • 77. Percolator curl -XPUT 'http://node1:9200/twitter/.percolator/elasticsearch' -d '{ "query" : { "match" : { "message" : "elasticsearch" } } }'
  • 78. Percolator $ curl -X GET http://node1:9200/twitter/tweet/_percolate -d '{ "doc" : {     "user": "loicbertron",     "post_date": "2014-03-12T19:00:00",     "message": "A third message about #ElasticSearch" } }'
  • 79. Percolator {     "took" : 19,     "_shards" : {         "total" : 5,         "successful" : 5,         "failed" : 0     },     "total" : 1,     "matches" : [         {              "_index" : "twitter",              "_id" : "elasticsearch"         }     ] }
  • 80. { "name": "Jules Verne", "biography": "One of the greatest author", ! "books": [ { "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" } { "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" } ] } Inner objects
  • 81. curl -XPUT node1:9200/authors/bare_author/1 -d'{ "name": "Jules Verne", "biography": « One of the greets author" }' curl -XPOST node1:9200/authors/book/1?parent=1 -d '{ "title": "Les Châteaux en Californie", "genre": "Drama", "publisher": "Marc Soriano" }' ! curl -XPOST node1:9200/authors/book/2?parent=1 -d '{ "title": "Vingt mille lieues sous les mers", "genre": "Novel", "publisher": "Hetzel" ! }' Parents / Childs
  • 82. Others features • Suggest API : Did you mean ?, Autocomplete, … • Results Highlight • More like this • Backup Data : Snapshot / Restore • File System • Amazon S3 • HDFS • Google Compute Engine • Microsoft Azure • Hadoop connector
  • 83. Clients • Perl • Python • Ruby • Php • Javascript • Java • .Net • Scala • Clojure • Erlang • Eventmachine • Bash • Ocaml • Smalltalk • Cold Fusion
  • 86. Thank you Thank you David Pilato for his presentation : https://speakerdeck.com/dadoonet/tours-jug-elasticsearch Thank you Kevin Kluge for his presentation : https://speakerdeck.com/elasticsearch/elasticsearch-in-20-minutes
  • 88. Suggest curl -s -XPOST 'localhost:9200/_search?search_type=count' -d '{   "suggest" : {     "my-title-suggestions-1" : {       "text" : "devloping",       "term" : {         "size" : 3,         "field" : "title"         }     }   } }'
  • 89. Suggest "suggest": {     "my-title-suggestions-1": [       {         "text": "devloping",         "offset": 0,         "length": 9,         "options": [           {             "text": "developing",             "freq": 77,             "score": 0.8888889           },           {             "text": "deloping",             "freq": 1,             "score": 0.875           },           {             "text": "deploying",             "freq": 2,             "score": 0.7777778           }         ]       }
  • 90. More Like This curl -XGET 'http://node1:9200/twitter/tweet/1/_mlt?mlt_fields=tag,content&min_doc_freq=1' {     "more_like_this" : {         "fields" : ["name.first", "name.last"],         "like_text" : "text like this one",         "min_term_freq" : 1,         "max_query_terms" : 12,         "percent_terms_to_match" : 0.95     } }
  • 92. {     "query" : {...},     "highlight" : {         "number_of_fragments" : 3,         "fragment_size" : 150,         "tag_schema" : "styled",         "fields" : {             "_all" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"] },             "bio.title" : { "number_of_fragments" : 0 },             "bio.author" : { "number_of_fragments" : 0 },             "bio.content" : { "number_of_fragments" : 5, "order" : "score" }         }     } } Highlight
  • 94. Hadoop • Java library for integrating Elasticsearch and Hadoop • Pig, Hive, Cascading, MapReduce • Search and Real Time Analytics with Elasticsearch, Hadoop as Data Lake • Scales with Hadoop