SlideShare a Scribd company logo
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Grzegorz Kolpuc
@gkolpuc
https://pl.linkedin.com/pub/grzegorz-kolpuc/55/b7/700
grzegorzkolpuc@gmail.com
Event Platform
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Financial & Risk IP & Science
Legal News
Tax & Accounting Technology & OPS
Elastic development. Implementing Big Data search Grzegorz Kołpuć
60,000+
EMPLOYEES
10,000+
IN TECHNOLOGY
1200+
EMPLOYEES IN GDYNIA
150+
IN TECHNOLOGY
Eikon
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Elastic Development
Implementing big data search
Distributed Search Engine
Full-text search
Analytics
open source
document-oriented
based on Lucene
distributed
search
analytics
full-text search
filtering (exact matches, ranges, geo)
cacheable
Plugins
API extensions
scripting
analysis
scripting
Plugins
API extensions
scripting
analysis
scripting
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Marvel
Elastic development. Implementing Big Data search Grzegorz Kołpuć
Native client
fluent API
single request
bulk requests
bulk processor
1. Highlight offset
standard highlighter
"highlight": {
"StrEvntBriefBlob": [
"continues to see <b>pressure</b> on
internal growth"
]
}
"snippets": [
{
"snippet": "continues to see
<b>pressure</b> on internal growth",
"startOff": 2082,
"endOff": 2126,
"pages": [0],
"snippetID": 1
}
]
fvhEx highlighter
2. Cross cluster search
cluster_threecluster_twocluster_one
newsfilings
research
events
Tribe Node
cluster_threecluster_twocluster_one
newsfilings
research
events
Tribe Node
tribe:
t1:
cluster.name: cluster_one
t2:
cluster.name: cluster_two
t2:
cluster.name: cluster_three
3. Analytics
grouping search result
statistics
metrics
"aggregations": {
"events_grouped": {
"terms": {"field": "group"},
"aggregations": {
"top_events": {
"top_hits": {
"size": 10,
"sort": [
{"date": {"order": "desc"}}
]
}
}
}
}
}
"aggs": {
"level2": {
"date_histogram": {
"field": "timestamp",
"interval": "day",
"order": {"_key": "desc"}
},
"aggs": {
"level3": {
"terms": {"field": "host"},
"aggs": {
"level4": {
"terms": {"field":
"apiRequest.appId"}
}
}
}}}}
4. Geo search
distance filter
polygon filter
bounding box filter
shape query
"location": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "1m"
}
"position": {
"type": "geo_point"
}
5. Scripting
if nothing else works
decrease performance
‘total count of snippets’
{
"query": {
"function_score": {
"query": {
"query_string": {"query":
"Average "Running five""}
},
"script_score": {>>>>},
"boost_mode": "replace"
}
},
"sort": [{"_score": {}}]
}
"script_score": {
"script_id": "scount6",
"lang": "groovy",
"params": {
"terms": ["average"],
"phases": [
{
"distance": 1,
"phase": ["running","five"]
}
],
"fields": [{"field": "StrEvntTranscriptBlob",
"grp": 111}]
}
},
"boost_mode": "replace"
6. Alerting
users subscribed to data
notify users when data changes
Elastic Search
Percolate
Elastic Search
index documentsquery
Any Cons?
Low throughput
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException
org.elasticsearch.cluster.block.ClusterBlockException
{
"error": "NoShardAvailableActionException
[[std_events_sem_021215][5] null]",
“status": 503
}
Support
Reindex
Apply mapping changes online
Updatable number of shards in the index
Shard splitting
Elastic development. Implementing Big Data search Grzegorz Kołpuć
15+
Clusters
3+
percolate clusters
~75
nodes per cluster
250+
indices
16+
TB of indexed data
7,000,000,000+
documents
100,000+
queries per day (EVENTS)
500,000+
queries per day (expected for research)
Q?
Elastic development. Implementing Big Data search Grzegorz Kołpuć

More Related Content

PPTX
MongoDB et Hadoop
PPTX
Spreadsheets To API
PPTX
Meet the Infochimps Platform
PPTX
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
PDF
Infochimps: Cloud for Big Data
PPTX
Big Data Analytics with Hadoop, MongoDB and SQL Server
PPTX
Real time streaming analytics
PDF
Realizing your AIOps goals with machine learning in Elastic
MongoDB et Hadoop
Spreadsheets To API
Meet the Infochimps Platform
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Infochimps: Cloud for Big Data
Big Data Analytics with Hadoop, MongoDB and SQL Server
Real time streaming analytics
Realizing your AIOps goals with machine learning in Elastic

Similar to Elastic development. Implementing Big Data search Grzegorz Kołpuć (11)

PDF
SQL Server Konferenz 2014 - SSIS & HDInsight
PPTX
Managing Security At 1M Events a Second using Elasticsearch
PPTX
Scalable Real-time analytics using Druid
PDF
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
PDF
Managing data analytics in a hybrid cloud
PPTX
Big Data Analytics Projects - Real World with Pentaho
PPTX
Big Data Performance and Capacity Management
PDF
Big data beyond the hype may 2014
PDF
How to teach your data scientist to leverage an analytics cluster with Presto...
PDF
A Tighter Weave – How YARN Changes the Data Quality Game
PPTX
Real time monitoring of hadoop and spark workflows
SQL Server Konferenz 2014 - SSIS & HDInsight
Managing Security At 1M Events a Second using Elasticsearch
Scalable Real-time analytics using Druid
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Managing data analytics in a hybrid cloud
Big Data Analytics Projects - Real World with Pentaho
Big Data Performance and Capacity Management
Big data beyond the hype may 2014
How to teach your data scientist to leverage an analytics cluster with Presto...
A Tighter Weave – How YARN Changes the Data Quality Game
Real time monitoring of hadoop and spark workflows
Ad

More from Evention (20)

PDF
The Factorization Machines algorithm for building recommendation system - Paw...
PDF
A/B testing powered by Big data - Saurabh Goyal, Booking.com
PDF
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
PDF
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
PDF
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
PDF
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
PDF
Privacy by Design - Lars Albertsson, Mapflat
PDF
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
PDF
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
PDF
Enhancing Spark - increase streaming capabilities of your applications - Kami...
PDF
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
PDF
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
PDF
Stream processing with Apache Flink - Maximilian Michels Data Artisans
PDF
Scaling Cassandra in all directions - Jimmy Mardell Spotify
PDF
Big Data for unstructured data Dariusz Śliwa
PDF
H2 o deep water making deep learning accessible to everyone -jo-fai chow
PDF
That won’t fit into RAM - Michał Brzezicki
PDF
Stream Analytics with SQL on Apache Flink - Fabian Hueske
PDF
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
The Factorization Machines algorithm for building recommendation system - Paw...
A/B testing powered by Big data - Saurabh Goyal, Booking.com
Near Real-Time Fraud Detection in Telecommunication Industry - Burak Işıklı, ...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Machine learning security - Pawel Zawistowski, Warsaw University of Technolog...
Building a Modern Data Pipeline: Lessons Learned - Saulius Valatka, Adform
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Privacy by Design - Lars Albertsson, Mapflat
Elephants in the cloud or how to become cloud ready - Krzysztof Adamski, GetI...
Deriving Actionable Insights from High Volume Media Streams - Jörn Kottmann, ...
Enhancing Spark - increase streaming capabilities of your applications - Kami...
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with...
Big Data Journey at a Big Corp - Tomasz Burzyński, Maciej Czyżowicz, Orange P...
Stream processing with Apache Flink - Maximilian Michels Data Artisans
Scaling Cassandra in all directions - Jimmy Mardell Spotify
Big Data for unstructured data Dariusz Śliwa
H2 o deep water making deep learning accessible to everyone -jo-fai chow
That won’t fit into RAM - Michał Brzezicki
Stream Analytics with SQL on Apache Flink - Fabian Hueske
Hopsworks Secure Streaming as-a-service with Kafka Flinkspark - Theofilos Kak...
Ad

Recently uploaded (20)

PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Lecture1 pattern recognition............
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Foundation of Data Science unit number two notes
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Global journeys: estimating international migration
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
Computer network topology notes for revision
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
1_Introduction to advance data techniques.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Lecture1 pattern recognition............
Launch Your Data Science Career in Kochi – 2025
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Fluorescence-microscope_Botany_detailed content
climate analysis of Dhaka ,Banglades.pptx
Foundation of Data Science unit number two notes
Miokarditis (Inflamasi pada Otot Jantung)
Global journeys: estimating international migration
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
Computer network topology notes for revision
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Moving the Public Sector (Government) to a Digital Adoption
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
168300704-gasification-ppt.pdfhghhhsjsjhsuxush

Elastic development. Implementing Big Data search Grzegorz Kołpuć