SlideShare a Scribd company logo
Elasticsearch 應用
PEGGY
Field datatypes
 a simple type like string, date, long, double, boolean or ip.
 a type which supports the hierarchical nature of JSON such as object or nested.
 or a specialised type like geo_point, geo_shape, or completion.
Search
get - 取得資料
 http://localhost:9200/_index/_type/_id
 http://localhost:9200/_index/_type/_id?pretty
get search
 http://localhost:9200/_index/_type/_search
 http://localhost:9200/_index/_type/_search?q=xxx&pretty
post search & query_string
 http://localhost:9200/_index/_type/_search
 http://localhost:9200/_index/_type/_search
{
"query": {
"query_string": {
"query": "*"
}
}
}
=
query_string
{
"query_string" : {
"fields" : ["content", "name"],
"query" : "this AND that"
}
}
{
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)“
}
}
=
query_string - query
 string
 “手機” 套 = “手機” OR 套
 apple phone = apple OR phone
 title: “手機” OR title:套 = title: (“手機“ 套) ! = (title: “手機” 套)
 boolean
 isPCT: true
 date & range
 dateName: [2012-01-01 TO 2012-12-31]
 dateName: [2012-01-01 TO *]
 dateName: {2011-12-31 TO *]
 range: [ 1 TO 5 ]
query_string - query
 object
 inventorsRaw.name: Nicky
 _missing_ & _exists_
 _missing_: title
 _exists_: title
query_string - nested
{
"query": {
"nested": {
"path": "relatedDocumentsRaw",
"query": {
"query_string": {
"query": "relatedDocumentsRaw.type:*"
}
}
}
}
}
query – size & from
 size (default: 10)
 The size parameter allows you to configure the maximum amount of hits to be
returned.
 from (default: 0)
 The from parameter defines the offset from the first result you want to fetch.
 [query_phase_execution_exception]
 Result window is too large, from + size must be less than or equal to: [10000]
 See the scroll api for a more efficient way to request large data sets.
query – sort & _source
 sort
 Allows to add one or more sort on specific fields.
 _source
 Allows to control how the _source field is returned with every hit.
{
"query": "…",
"size": 5,
"from": 10,
"sort": [{ "pubDate": "desc" }],
"_source": ["pubDate"],
}
query - filter
{
"query": {
"query_string": { "query": "*" }
},
"filter": {
"script": {
"script": {
"lang": "groovy",
"file": "fileNamw",
"params": {
"params1": "date1",
"params2": "date2",
}
}
}
}
}
query - aggregations (aggs)
 The aggregations framework helps provide
aggregated data based on a search query.
 size: 回傳的筆數
 default :10
 size: 0 回傳全部結果
 min_doc_count: 回傳的結果最小筆數
 order: 排序
 date_histogram: 依照日期
 terms: 依照doc_dount 結果
{
"query": "…",
"aggs": {
"date_agg": {
"date_histogram": {
"field": “pubDate",
"interval": "day",
"format": "yyyy-MM-dd",
"order": { "_count": "desc" },
"min_doc_count": 1
} },
"kindCode_agg": {
"terms": {
"field": "kindCode",
"size": 20,
"shard_size": 20
} }
}
}
query - aggregations (aggs)
{
"aggregations": {
"kindCode_agg": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{ "key": "U", "doc_count": 75879 },
{ "key": "A", "doc_count": 73732 },
{ "key": "B", "doc_count": 44115 },
{ "key": "S", "doc_count": 38981 } ] },
"appDocs": {
"buckets": [
{ "key_as_string": "2016-01-06", "key": 1452038400000, "doc_count": 56079 },
{ "key_as_string": "2016-01-13", "key": 1452643200000, "doc_count": 54256 },
{ "key_as_string": "2016-01-20", "key": 1453248000000, "doc_count": 80021 },
{ "key_as_string": "2016-01-27", "key": 1453852800000, "doc_count": 42351 } ] }
}
}
_timestamp field
 Mapping  query result
{
"mappings": {
"my_type": {
"_timestamp": {
"enabled": true
}
}
}
}
{
"_index": "test2",
"_type": "type",
"_id": "2",
"_score": 1,
"_timestamp": 1454051014319,
"_source": {
"name": "Tony",
"day": "1990-03-21"
}
}
Scan & Scroll
scan&scroll
 POST
 http://localhost:9200/{{_index}}/({{type}}/)_search?search_type=scan&scroll=1m
{
"query": {
"query_string": {
"query": “*"
}
}
}
Keeping the search context alive
 The scroll parameter (passed to the search request and to every scroll request)
tells Elasticsearch how long it should keep the search context alive.
 Its value (e.g. 1m, see the section called “Time unitsedit”) does not need to be
long enough to process all data — it just needs to be long enough to process the
previous batch of results.
 Each scroll request (with the scroll parameter) sets a new expiry time.
post
{
"_scroll_id": "c2Nhbjs1OzE5NjMzOkxXdWt2d2V2UVFHTVvdGFsX2hpdHM6MT……..",
"took": 487,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1041712,
"max_score": 0,
"hits": []
}
}
scroll
 Get
 http://localhost:9200/_search/scroll/{{_scroll_id}}?scroll=1m
{
"_scroll_id": "c2Nhbjs1OzE5NjMzOkxXdWt2d2V2UVFHTVvdGFsX2hpdHM6MT……..",
"took": 487,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1041712,
"max_score": 0,
"hits": [ {…….}, {…….}, {…….}, {…….}, {…….}, {…….}, {…….}, {…….}]
}
}
get
Ref.
 Bulk
 https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html
 Scan & Scroll
 https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html
 http://stackoverflow.com/questions/25453872/why-does-this-elasticsearch-scan-and-
scroll-keep-returning-the-same-scroll-id
Bulk
Cheaper in Bulk
{ action: { metadata }}n
{ request body }n
{ action: { metadata }}n
{ request body }n
…..
action
 delete
 { "delete": { "_index": "website", "_type": "blog", "_id": "123" }}n
 create
 { "create": { "_index": "website", "_type": "blog", "_id": "123" }} n
 { "title": "My first blog post" } n
 Index
 { "index": { "_index": "website", "_type": "blog" }} n
 { "title": "My second blog post" } n
 update
 { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } n
 { "doc" : {"title" : "My updated blog post"} } n
 status
 '200': 'OK',
 '201': 'Created',
{ "took": 4,
"errors": false,
"items": [
{ "delete": {
"_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 200, "found": true }},
{ "create": {
"_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 201 }},
{ "create": {
"_index": "website", "_type": "blog", "_id": "EiwfApScQiiy7TIKFxRCTw", "_version": 1, "status": 201 }},
{ "update": {
"_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 }}
]
}
Error Example
{ "create": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "Cannot create - it already exists" }
{ "index": { "_index": "website", "_type": "blog", "_id": "123" }}
{ "title": "But we can update it" }
Error Example
{ "took": 3,
"errors": true,
"items": [
{ "create": {
"_index": "website",
"_type": "blog",
"_id": "123",
"status": 409,
"error": "DocumentAlreadyExistsException [[website][4] [blog][123]: document already exists]" }},
{ "index": {
"_index": "website",
"_type": "blog",
"_id": "123",
"_version": 5,
"status": 200 }}
]
}

More Related Content

PDF
Data Processing and Aggregation with MongoDB
PPTX
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
PPTX
Webinar: Exploring the Aggregation Framework
PDF
Webinar: Data Processing and Aggregation Options
PDF
Analytics with MongoDB Aggregation Framework and Hadoop Connector
PPTX
The Aggregation Framework
PPTX
Agg framework selectgroup feb2015 v2
PPTX
Beyond the Basics 2: Aggregation Framework
Data Processing and Aggregation with MongoDB
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
Webinar: Exploring the Aggregation Framework
Webinar: Data Processing and Aggregation Options
Analytics with MongoDB Aggregation Framework and Hadoop Connector
The Aggregation Framework
Agg framework selectgroup feb2015 v2
Beyond the Basics 2: Aggregation Framework

What's hot (20)

PDF
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
PDF
MongoDB Europe 2016 - Graph Operations with MongoDB
PPTX
Back to Basics Webinar 5: Introduction to the Aggregation Framework
PDF
MongoDB dla administratora
PDF
Python and MongoDB
PDF
はじめてのMongoDB
ODP
Mongo db dla administratora
PPTX
The Aggregation Framework
PDF
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
PDF
Indexing
PDF
Webinar: Working with Graph Data in MongoDB
PPTX
Webinar: General Technical Overview of MongoDB for Dev Teams
KEY
Python Development (MongoSF)
PDF
Curlin' for Docs
PDF
Mongodb Aggregation Pipeline
PDF
NOSQL: il rinascimento dei database?
PDF
TXT
Kevin milla arbieto informatica piktochart backup data
PPTX
MongoDB - Back to Basics - La tua prima Applicazione
PPTX
MongoDB + Java - Everything you need to know
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Graph Operations with MongoDB
Back to Basics Webinar 5: Introduction to the Aggregation Framework
MongoDB dla administratora
Python and MongoDB
はじめてのMongoDB
Mongo db dla administratora
The Aggregation Framework
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
Indexing
Webinar: Working with Graph Data in MongoDB
Webinar: General Technical Overview of MongoDB for Dev Teams
Python Development (MongoSF)
Curlin' for Docs
Mongodb Aggregation Pipeline
NOSQL: il rinascimento dei database?
Kevin milla arbieto informatica piktochart backup data
MongoDB - Back to Basics - La tua prima Applicazione
MongoDB + Java - Everything you need to know
Ad

Similar to Peggy elasticsearch應用 (20)

PDF
Elasticsearch in 15 Minutes
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
TXT
Agile Testing Days 2018 - API Fundamentals - postman collection
PDF
elasticsearch - advanced features in practice
PDF
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
ODP
ELK Stack - Turn boring logfiles into sexy dashboard
PDF
Elasticsearch: You know, for search! and more!
PDF
Elasticsearch intro output
PDF
Anwendungsfaelle für Elasticsearch
ZIP
CouchDB-Lucene
PPTX
Academy PRO: Elasticsearch. Data management
PPTX
MongoDB Aggregation
PDF
Avro, la puissance du binaire, la souplesse du JSON
PPTX
Postman Collection Format v2.0 (pre-draft)
ODP
Elastic Search
PDF
Montreal Elasticsearch Meetup
PDF
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
PDF
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
PDF
d3sparql.js demo at SWAT4LS 2014 in Berlin
PDF
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
Elasticsearch in 15 Minutes
Real-time search in Drupal with Elasticsearch @Moldcamp
Agile Testing Days 2018 - API Fundamentals - postman collection
elasticsearch - advanced features in practice
Full-Text Search Explained - Philipp Krenn - Codemotion Rome 2017
ELK Stack - Turn boring logfiles into sexy dashboard
Elasticsearch: You know, for search! and more!
Elasticsearch intro output
Anwendungsfaelle für Elasticsearch
CouchDB-Lucene
Academy PRO: Elasticsearch. Data management
MongoDB Aggregation
Avro, la puissance du binaire, la souplesse du JSON
Postman Collection Format v2.0 (pre-draft)
Elastic Search
Montreal Elasticsearch Meetup
Finding the right stuff, an intro to Elasticsearch with Ruby/Rails
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
d3sparql.js demo at SWAT4LS 2014 in Berlin
SFScon17 - Patrick Puecher: "Exploring data with Elasticsearch and Kibana"
Ad

More from LearningTech (20)

PPTX
PPTX
PostCss
PPTX
ReactJs
PPTX
Docker
PPTX
Semantic ui
PPTX
node.js errors
PPTX
Process control nodejs
PPTX
Expression tree
PPTX
SQL 效能調校
PPTX
flexbox report
PPTX
Vic weekly learning_20160504
PPTX
Reflection & activator
PPTX
Peggy markdown
PPTX
Node child process
PPTX
20160415ken.lee
PPTX
Expression tree
PPTX
Vic weekly learning_20160325
PPTX
D3js learning tips
PPTX
git command
PDF
Asp.net MVC DI
PostCss
ReactJs
Docker
Semantic ui
node.js errors
Process control nodejs
Expression tree
SQL 效能調校
flexbox report
Vic weekly learning_20160504
Reflection & activator
Peggy markdown
Node child process
20160415ken.lee
Expression tree
Vic weekly learning_20160325
D3js learning tips
git command
Asp.net MVC DI

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Big Data Technologies - Introduction.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
Electronic commerce courselecture one. Pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
NewMind AI Weekly Chronicles - August'25 Week I
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
Big Data Technologies - Introduction.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Electronic commerce courselecture one. Pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Modernizing your data center with Dell and AMD
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Spectral efficient network and resource selection model in 5G networks

Peggy elasticsearch應用

  • 2. Field datatypes  a simple type like string, date, long, double, boolean or ip.  a type which supports the hierarchical nature of JSON such as object or nested.  or a specialised type like geo_point, geo_shape, or completion.
  • 4. get - 取得資料  http://localhost:9200/_index/_type/_id  http://localhost:9200/_index/_type/_id?pretty
  • 5. get search  http://localhost:9200/_index/_type/_search  http://localhost:9200/_index/_type/_search?q=xxx&pretty
  • 6. post search & query_string  http://localhost:9200/_index/_type/_search  http://localhost:9200/_index/_type/_search { "query": { "query_string": { "query": "*" } } } =
  • 7. query_string { "query_string" : { "fields" : ["content", "name"], "query" : "this AND that" } } { "query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)“ } } =
  • 8. query_string - query  string  “手機” 套 = “手機” OR 套  apple phone = apple OR phone  title: “手機” OR title:套 = title: (“手機“ 套) ! = (title: “手機” 套)  boolean  isPCT: true  date & range  dateName: [2012-01-01 TO 2012-12-31]  dateName: [2012-01-01 TO *]  dateName: {2011-12-31 TO *]  range: [ 1 TO 5 ]
  • 9. query_string - query  object  inventorsRaw.name: Nicky  _missing_ & _exists_  _missing_: title  _exists_: title
  • 10. query_string - nested { "query": { "nested": { "path": "relatedDocumentsRaw", "query": { "query_string": { "query": "relatedDocumentsRaw.type:*" } } } } }
  • 11. query – size & from  size (default: 10)  The size parameter allows you to configure the maximum amount of hits to be returned.  from (default: 0)  The from parameter defines the offset from the first result you want to fetch.  [query_phase_execution_exception]  Result window is too large, from + size must be less than or equal to: [10000]  See the scroll api for a more efficient way to request large data sets.
  • 12. query – sort & _source  sort  Allows to add one or more sort on specific fields.  _source  Allows to control how the _source field is returned with every hit. { "query": "…", "size": 5, "from": 10, "sort": [{ "pubDate": "desc" }], "_source": ["pubDate"], }
  • 13. query - filter { "query": { "query_string": { "query": "*" } }, "filter": { "script": { "script": { "lang": "groovy", "file": "fileNamw", "params": { "params1": "date1", "params2": "date2", } } } } }
  • 14. query - aggregations (aggs)  The aggregations framework helps provide aggregated data based on a search query.  size: 回傳的筆數  default :10  size: 0 回傳全部結果  min_doc_count: 回傳的結果最小筆數  order: 排序  date_histogram: 依照日期  terms: 依照doc_dount 結果 { "query": "…", "aggs": { "date_agg": { "date_histogram": { "field": “pubDate", "interval": "day", "format": "yyyy-MM-dd", "order": { "_count": "desc" }, "min_doc_count": 1 } }, "kindCode_agg": { "terms": { "field": "kindCode", "size": 20, "shard_size": 20 } } } }
  • 15. query - aggregations (aggs) { "aggregations": { "kindCode_agg": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "U", "doc_count": 75879 }, { "key": "A", "doc_count": 73732 }, { "key": "B", "doc_count": 44115 }, { "key": "S", "doc_count": 38981 } ] }, "appDocs": { "buckets": [ { "key_as_string": "2016-01-06", "key": 1452038400000, "doc_count": 56079 }, { "key_as_string": "2016-01-13", "key": 1452643200000, "doc_count": 54256 }, { "key_as_string": "2016-01-20", "key": 1453248000000, "doc_count": 80021 }, { "key_as_string": "2016-01-27", "key": 1453852800000, "doc_count": 42351 } ] } } }
  • 16. _timestamp field  Mapping  query result { "mappings": { "my_type": { "_timestamp": { "enabled": true } } } } { "_index": "test2", "_type": "type", "_id": "2", "_score": 1, "_timestamp": 1454051014319, "_source": { "name": "Tony", "day": "1990-03-21" } }
  • 19. Keeping the search context alive  The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive.  Its value (e.g. 1m, see the section called “Time unitsedit”) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results.  Each scroll request (with the scroll parameter) sets a new expiry time.
  • 20. post { "_scroll_id": "c2Nhbjs1OzE5NjMzOkxXdWt2d2V2UVFHTVvdGFsX2hpdHM6MT……..", "took": 487, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1041712, "max_score": 0, "hits": [] } }
  • 22. { "_scroll_id": "c2Nhbjs1OzE5NjMzOkxXdWt2d2V2UVFHTVvdGFsX2hpdHM6MT……..", "took": 487, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1041712, "max_score": 0, "hits": [ {…….}, {…….}, {…….}, {…….}, {…….}, {…….}, {…….}, {…….}] } } get
  • 23. Ref.  Bulk  https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html  Scan & Scroll  https://www.elastic.co/guide/en/elasticsearch/guide/current/scan-scroll.html  http://stackoverflow.com/questions/25453872/why-does-this-elasticsearch-scan-and- scroll-keep-returning-the-same-scroll-id
  • 24. Bulk
  • 25. Cheaper in Bulk { action: { metadata }}n { request body }n { action: { metadata }}n { request body }n …..
  • 26. action  delete  { "delete": { "_index": "website", "_type": "blog", "_id": "123" }}n  create  { "create": { "_index": "website", "_type": "blog", "_id": "123" }} n  { "title": "My first blog post" } n  Index  { "index": { "_index": "website", "_type": "blog" }} n  { "title": "My second blog post" } n  update  { "update": { "_index": "website", "_type": "blog", "_id": "123", "_retry_on_conflict" : 3} } n  { "doc" : {"title" : "My updated blog post"} } n
  • 27.  status  '200': 'OK',  '201': 'Created', { "took": 4, "errors": false, "items": [ { "delete": { "_index": "website", "_type": "blog", "_id": "123", "_version": 2, "status": 200, "found": true }}, { "create": { "_index": "website", "_type": "blog", "_id": "123", "_version": 3, "status": 201 }}, { "create": { "_index": "website", "_type": "blog", "_id": "EiwfApScQiiy7TIKFxRCTw", "_version": 1, "status": 201 }}, { "update": { "_index": "website", "_type": "blog", "_id": "123", "_version": 4, "status": 200 }} ] }
  • 28. Error Example { "create": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "Cannot create - it already exists" } { "index": { "_index": "website", "_type": "blog", "_id": "123" }} { "title": "But we can update it" }
  • 29. Error Example { "took": 3, "errors": true, "items": [ { "create": { "_index": "website", "_type": "blog", "_id": "123", "status": 409, "error": "DocumentAlreadyExistsException [[website][4] [blog][123]: document already exists]" }}, { "index": { "_index": "website", "_type": "blog", "_id": "123", "_version": 5, "status": 200 }} ] }

Editor's Notes

  • #17: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html
  • #18: https://www.elastic.co/guide/en/elasticsearch/guide/current/_search_options.html#search-type?q=sear
  • #23: http://stackoverflow.com/questions/25453872/why-does-this-elasticsearch-scan-and-scroll-keep-returning-the-same-scroll-id