SlideShare a Scribd company logo
SIZMEK MDX NXT
USING COUCHBASE AND
ELASTICSEARCH AS DATA LAYER
TAL MAAYANI YUVAL PERRY
Using Couchbase and Elasticsearch as data layers
HIGH LEVEL MISSION
Building a scalable , highly available , large and fast
ad management platform
MAIN TECHNOLOGIES USED
AWS Deployment
Micro services architecture
JavaRX
Couchbase
Elasticsearch
RabbitMQ
Consul
WHY NOSQL?
NoSQL Document Database Relational Database
Unstructured data Structured data
Memory first approach Disk first approach
No transactions Transactional
Scale Horizontally Scale vertically
No SQL DB allows
• Fast read and writes
• Hold variety of data models
• Large data volumes
• Cloud friendly deployment
• No single point of failure
Still need to take care of transactionless, eventually consistent data source
WHY COUCHBASE ?
JSON support Indexing and Querying
Cross data center replicationIncremental Map Reduce
OUR DATA LAYER
Generic Data Access Layer
Query
Get(Id)
Save / Update
XDCR
N1QL
DEMO – SIZMEK COUCHBASE ADMINISTRATOR TOOL
• In house development tool that allows to perform ES queries as well as N1QL
queries
• Usage
• Data investigation
• Data migration
HOW WE MAINTAIN ATOMICITY ON TRANSACTION LESS DATA SOURCE
Transaction manager service – maintain flow state between multiple entities change
Provide atomicity & tracking
Example: Save smart version ad flow
Dynamic
Campaign
Optimization
Transaction Manager
Asset Mgmt
Ad Service
Create Ad
Upload ad assets
Create Smart version
1. Assets created
2. Ad Created
3. Smart version created
ELASTIC SEARCH – CONSISTENCY PROBLEM
AND HOW TO OVERCOME THIS IN AUTOMATIC TESTING
The problem
In a clustered elastic search environment, one document update is not automatically reflected in all notes.
This caused an inconsistent results in automatic testing.
Example
Change campaign name from A to B.
Automatic test verifies that the change actually tool place by getting the entity and verify its name.
Possible Solutions
• Wait few seconds before checking for updated status
• Use elastic search refresh to force in memory index update
NAME UNIQUENESS IMPLEMENTATION
HOW TO IMPLEMENT UNIQUE CONSTRAINS USING COUCHBASE
Problem
Maintain unique entity name
Real use case
Keep advertisement name unique system wide
Possible Solution
Save uniqueness document
Key: entity name
Value : entity id
Save
succeeded?
Save
entity
Return
error
Delete
uniqueness
doc
Input: entity to save
Still need to take care of orphan uniqueness documents
N1QL EXAMPLE
• Use Query Workbench Developer Preview
• Example queries
1. select mvbucket.`key` from mvbucket where payload._type = 'AdSmartVersion' and
payload.createdOn is not missing
2. select * from mvbucket where payload._type = 'AdSmartVersion' and payload.masterAdId =
1073741825 and payload.createdOn between 1349057369158 and 1449057369158
3. select payload.masterAdId, count(1) from mvbucket where payload._type = 'AdSmartVersion' and
payload.createdOn between 1349057369158 and 1449057369158
4. select payload.masterAdId, count(1) as count from mvbucket where payload._type =
'AdSmartVersion'
COUCHBASE JAVA CLIENT 2
NOTES ON JAVA CLIENT
• Built in support of JSON documents
• Support counters
• Asynchronous client using java RX
• Allow exploit already used reactive business logic
• Parallel efficient processing
• Inherent error handling – for example retries get document with an exponential backoff
Observable
.from(docIds)
.flatMap(id -> {
return bucket .async()
.get(id)
.retryWhen(RetryBuilder .anyOf(BackpressureException.class)
.delay(Delay.exponential(TimeUnit.MILLISECONDS, 100))
.max(10)
.build() ); })
.subscribe();
OUR USE OF ELASTICSEARCH
QUERY ENGINE
• Free text search – user boolean queries
• Data filtering – data grid filtering
• Grouping – data grid grouping
• Authorization – filter document according to user permissions
• batch processing – internal services that use scan and scroll to operate on
large data set
ELASTIC SEARCH – SOME BEST PRACTICES
• Carefully maintain index schema
• Avoid using Dynamic mapping
• Data type collisions
• Large data set – do not save data that is not used
• Build static schema from data model
• Updating data model searchable field  trigger build of new index
• Some changes in schema required re-indexing, e.g. adding mandatory field, change of enumeration value
• Inconsistency – updated data is not immediately appears on query result
• System overall design must be aware of this limitation
• Throttling – must control number of writes
COUCHBASE 4.1
OUR USAGE
• Use optimistic locking - update operations are done through updater lambda function
• N1QL
• Do not meet performance for large data set with order by queries
• Took more than 5 sec to query 250 entities
• Used for business logic where no sorting is required
• Used when consistency is important
• XDCR
• Customize plugin to index required entities
• Add support of parent child relationship in elasticsearch
QUESTIONS
AND ANSWERS
Thank you

More Related Content

PPTX
NDC Sydney - Analyzing StackExchange with Azure Data Lake
PPTX
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
PPTX
Session 02 data_storage_and_database_services_in_aws_and_azure
PDF
Migrating a multi tenant app to Azure (war biopic)
PPTX
Google BigQuery 101 & What’s New
PPTX
R in Power BI
PPTX
Bleeding Edge Databases
PPTX
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
NDC Sydney - Analyzing StackExchange with Azure Data Lake
Building a Real-time Stream Processing Pipeline - Kinesis Data Firehose, Amaz...
Session 02 data_storage_and_database_services_in_aws_and_azure
Migrating a multi tenant app to Azure (war biopic)
Google BigQuery 101 & What’s New
R in Power BI
Bleeding Edge Databases
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known

What's hot (20)

PDF
Accelerating Data Ingestion with Databricks Autoloader
PDF
Basic Introduction to Crate @ ViennaDB Meetup
PDF
Moving to the cloud; PaaS, IaaS or Managed Instance
PPTX
Big query for data analysis quest
PPTX
LeanXcale for Monitoring
PPTX
Asp.Net MVC
PPTX
Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...
PPTX
Integration Monday - Analysing StackExchange data with Azure Data Lake
PPTX
Google cloud certification data engineer
PPTX
Azure Big Data Story
PDF
MySQL infra readiness-for-peak-sale-events - Kabilesh PR (Co-Founder of Mydbops)
PDF
Google Dataflow Intro
PPTX
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
PPTX
Maxis Alchemize imug 2017
PPTX
Analyzing StackExchange data with Azure Data Lake
PPTX
Migrating Data Pipeline from MongoDB to Cassandra
PDF
Machine Learning Data Lineage with MLflow and Delta Lake
PPTX
Microsoft Azure Data Factory Hands-On Lab Overview Slides
PPTX
Azure database services for PostgreSQL and MySQL
PDF
Big data on AWS
Accelerating Data Ingestion with Databricks Autoloader
Basic Introduction to Crate @ ViennaDB Meetup
Moving to the cloud; PaaS, IaaS or Managed Instance
Big query for data analysis quest
LeanXcale for Monitoring
Asp.Net MVC
Apache Cassandra Lunch #71: Creating a User Profile Using DataStax Astra and ...
Integration Monday - Analysing StackExchange data with Azure Data Lake
Google cloud certification data engineer
Azure Big Data Story
MySQL infra readiness-for-peak-sale-events - Kabilesh PR (Co-Founder of Mydbops)
Google Dataflow Intro
Analyzing StackExchange Data with Azure Data Lake (Tom Kerkhove @ Integration...
Maxis Alchemize imug 2017
Analyzing StackExchange data with Azure Data Lake
Migrating Data Pipeline from MongoDB to Cassandra
Machine Learning Data Lineage with MLflow and Delta Lake
Microsoft Azure Data Factory Hands-On Lab Overview Slides
Azure database services for PostgreSQL and MySQL
Big data on AWS
Ad

Similar to Using Couchbase and Elasticsearch as data layers (20)

PPTX
Geek Sync | Deployment and Management of Complex Azure Environments
PPTX
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
PDF
Secrets of highly_avail_oltp_archs
PDF
Serverless SQL
PDF
Data Architecture Best Practices for Advanced Analytics
PDF
Oracle Enterprise Manager 12c: updates and upgrades.
PPTX
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
PDF
Session 2: SQL Server 2012 with Christian Malbeuf
PPTX
Slide Share MDW Modern Data Warehouse DWH
PPTX
Boosting the Performance of your Rails Apps
PDF
Get the most out of your AWS Redshift investment while keeping cost down
PDF
Designing a modern data warehouse in azure
PDF
Designing a modern data warehouse in azure
PPTX
Sql azure cluster dashboard public.ppt
PDF
Demystifying Data Warehouse as a Service (DWaaS)
PDF
MIGRATION OF AN OLTP SYSTEM FROM ORACLE TO MYSQL AND COMPARATIVE PERFORMANCE ...
PDF
Big data and Analytics on AWS
PDF
Getting Started with Elasticsearch
PPTX
SF Architect Interview questions v1.3.pptx
PPTX
Geek Sync | Deployment and Management of Complex Azure Environments
Data Con LA 2022 - Making real-time analytics a reality for digital transform...
Secrets of highly_avail_oltp_archs
Serverless SQL
Data Architecture Best Practices for Advanced Analytics
Oracle Enterprise Manager 12c: updates and upgrades.
Enterprise Data World 2018 - Building Cloud Self-Service Analytical Solution
Session 2: SQL Server 2012 with Christian Malbeuf
Slide Share MDW Modern Data Warehouse DWH
Boosting the Performance of your Rails Apps
Get the most out of your AWS Redshift investment while keeping cost down
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Sql azure cluster dashboard public.ppt
Demystifying Data Warehouse as a Service (DWaaS)
MIGRATION OF AN OLTP SYSTEM FROM ORACLE TO MYSQL AND COMPARATIVE PERFORMANCE ...
Big data and Analytics on AWS
Getting Started with Elasticsearch
SF Architect Interview questions v1.3.pptx
Ad

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PPTX
L1 - Introduction to python Backend.pptx
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
AI in Product Development-omnex systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
history of c programming in notes for students .pptx
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
top salesforce developer skills in 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
Understanding Forklifts - TECH EHS Solution
How to Choose the Right IT Partner for Your Business in Malaysia
L1 - Introduction to python Backend.pptx
wealthsignaloriginal-com-DS-text-... (1).pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
AI in Product Development-omnex systems
VVF-Customer-Presentation2025-Ver1.9.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
CHAPTER 2 - PM Management and IT Context
history of c programming in notes for students .pptx
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Design an Analysis of Algorithms I-SECS-1021-03
top salesforce developer skills in 2025.pdf
PTS Company Brochure 2025 (1).pdf.......

Using Couchbase and Elasticsearch as data layers

  • 1. SIZMEK MDX NXT USING COUCHBASE AND ELASTICSEARCH AS DATA LAYER TAL MAAYANI YUVAL PERRY
  • 3. HIGH LEVEL MISSION Building a scalable , highly available , large and fast ad management platform
  • 4. MAIN TECHNOLOGIES USED AWS Deployment Micro services architecture JavaRX Couchbase Elasticsearch RabbitMQ Consul
  • 5. WHY NOSQL? NoSQL Document Database Relational Database Unstructured data Structured data Memory first approach Disk first approach No transactions Transactional Scale Horizontally Scale vertically No SQL DB allows • Fast read and writes • Hold variety of data models • Large data volumes • Cloud friendly deployment • No single point of failure Still need to take care of transactionless, eventually consistent data source
  • 6. WHY COUCHBASE ? JSON support Indexing and Querying Cross data center replicationIncremental Map Reduce
  • 7. OUR DATA LAYER Generic Data Access Layer Query Get(Id) Save / Update XDCR N1QL
  • 8. DEMO – SIZMEK COUCHBASE ADMINISTRATOR TOOL • In house development tool that allows to perform ES queries as well as N1QL queries • Usage • Data investigation • Data migration
  • 9. HOW WE MAINTAIN ATOMICITY ON TRANSACTION LESS DATA SOURCE Transaction manager service – maintain flow state between multiple entities change Provide atomicity & tracking Example: Save smart version ad flow Dynamic Campaign Optimization Transaction Manager Asset Mgmt Ad Service Create Ad Upload ad assets Create Smart version 1. Assets created 2. Ad Created 3. Smart version created
  • 10. ELASTIC SEARCH – CONSISTENCY PROBLEM AND HOW TO OVERCOME THIS IN AUTOMATIC TESTING The problem In a clustered elastic search environment, one document update is not automatically reflected in all notes. This caused an inconsistent results in automatic testing. Example Change campaign name from A to B. Automatic test verifies that the change actually tool place by getting the entity and verify its name. Possible Solutions • Wait few seconds before checking for updated status • Use elastic search refresh to force in memory index update
  • 11. NAME UNIQUENESS IMPLEMENTATION HOW TO IMPLEMENT UNIQUE CONSTRAINS USING COUCHBASE Problem Maintain unique entity name Real use case Keep advertisement name unique system wide Possible Solution Save uniqueness document Key: entity name Value : entity id Save succeeded? Save entity Return error Delete uniqueness doc Input: entity to save Still need to take care of orphan uniqueness documents
  • 12. N1QL EXAMPLE • Use Query Workbench Developer Preview • Example queries 1. select mvbucket.`key` from mvbucket where payload._type = 'AdSmartVersion' and payload.createdOn is not missing 2. select * from mvbucket where payload._type = 'AdSmartVersion' and payload.masterAdId = 1073741825 and payload.createdOn between 1349057369158 and 1449057369158 3. select payload.masterAdId, count(1) from mvbucket where payload._type = 'AdSmartVersion' and payload.createdOn between 1349057369158 and 1449057369158 4. select payload.masterAdId, count(1) as count from mvbucket where payload._type = 'AdSmartVersion'
  • 13. COUCHBASE JAVA CLIENT 2 NOTES ON JAVA CLIENT • Built in support of JSON documents • Support counters • Asynchronous client using java RX • Allow exploit already used reactive business logic • Parallel efficient processing • Inherent error handling – for example retries get document with an exponential backoff Observable .from(docIds) .flatMap(id -> { return bucket .async() .get(id) .retryWhen(RetryBuilder .anyOf(BackpressureException.class) .delay(Delay.exponential(TimeUnit.MILLISECONDS, 100)) .max(10) .build() ); }) .subscribe();
  • 14. OUR USE OF ELASTICSEARCH QUERY ENGINE • Free text search – user boolean queries • Data filtering – data grid filtering • Grouping – data grid grouping • Authorization – filter document according to user permissions • batch processing – internal services that use scan and scroll to operate on large data set
  • 15. ELASTIC SEARCH – SOME BEST PRACTICES • Carefully maintain index schema • Avoid using Dynamic mapping • Data type collisions • Large data set – do not save data that is not used • Build static schema from data model • Updating data model searchable field  trigger build of new index • Some changes in schema required re-indexing, e.g. adding mandatory field, change of enumeration value • Inconsistency – updated data is not immediately appears on query result • System overall design must be aware of this limitation • Throttling – must control number of writes
  • 16. COUCHBASE 4.1 OUR USAGE • Use optimistic locking - update operations are done through updater lambda function • N1QL • Do not meet performance for large data set with order by queries • Took more than 5 sec to query 250 entities • Used for business logic where no sorting is required • Used when consistency is important • XDCR • Customize plugin to index required entities • Add support of parent child relationship in elasticsearch

Editor's Notes

  • #3: For more details http://www.slideshare.net/erwinsizmek/sizmek-company-profile-2014?next_slideshow=2
  • #5: NoSQL is the most critical notion in the architecture
  • #7: RDBMS optimization – required DBA team NoSQL is like space ship vs RDBMS can be like rolls roise
  • #8: Add N1QL
  • #9: Internal tool for couchbase DB querie
  • #10: N1QL – fetch documents by Ids
  • #13: Do not ensure isolation (ACID) Add title
  • #14: A refresh effectively calls a reopen on the lucene index reader, so that the point in time snapshot of the data that you can search on gets updated. This lucene feature is part of the lucene near real-time api. An elasticsearch refresh makes your documents available for search, but it doesn't make sure that they are written to disk to a persistent storage, as it doesn't call fsync, thus doesn't guarantee durability. What makes your data durable is a lucene commit (flush), which is way more expensive. An elasticsearch flush effectively triggers a lucene commit, and empties also the transaction log, since once data is committed on the lucene level, durability can be guaranteed by lucene itself. Flush is exposed as an api too and can be tweaked, although usually that is not necessary. Flush happens automatically depending on how many operations get added to the transaction log, how big they are, and when the last flush happened.
  • #18: The following code retries with an exponential backoff (with a 100 millisecond ceiling), but stops after 10 attempts and propagates the error. Observable. For more info see http://developer.couchbase.com/documentation/server/4.0/sdks/java-2.2/documents-bulk.html
  • #20: Elastic search usually used for free text and auto complete
  • #21: Show rebuild of static index with high availability
  • #23: Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectively Resilience is achieved by replication, containment, isolation and delegation Elastic Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. Message driven ensures loose coupling, isolation, location transparency, and provides the means to delegate errors as messages