Using Couchbase and Elasticsearch as data layers

SIZMEK MDX NXT
USING COUCHBASE AND
ELASTICSEARCH AS DATA LAYER
TAL MAAYANI YUVAL PERRY

HIGH LEVEL MISSION
Building a scalable , highly available , large and fast
ad management platform

MAIN TECHNOLOGIES USED
AWS Deployment
Micro services architecture
JavaRX
Couchbase
Elasticsearch
RabbitMQ
Consul

WHY NOSQL?
NoSQL Document Database Relational Database
Unstructured data Structured data
Memory first approach Disk first approach
No transactions Transactional
Scale Horizontally Scale vertically
No SQL DB allows
• Fast read and writes
• Hold variety of data models
• Large data volumes
• Cloud friendly deployment
• No single point of failure
Still need to take care of transactionless, eventually consistent data source

WHY COUCHBASE ?
JSON support Indexing and Querying
Cross data center replicationIncremental Map Reduce

OUR DATA LAYER
Generic Data Access Layer
Query
Get(Id)
Save / Update
XDCR
N1QL

DEMO – SIZMEK COUCHBASE ADMINISTRATOR TOOL
• In house development tool that allows to perform ES queries as well as N1QL
queries
• Usage
• Data investigation
• Data migration

HOW WE MAINTAIN ATOMICITY ON TRANSACTION LESS DATA SOURCE
Transaction manager service – maintain flow state between multiple entities change
Provide atomicity & tracking
Example: Save smart version ad flow
Dynamic
Campaign
Optimization
Transaction Manager
Asset Mgmt
Ad Service
Create Ad
Upload ad assets
Create Smart version
1. Assets created
2. Ad Created
3. Smart version created

ELASTIC SEARCH – CONSISTENCY PROBLEM
AND HOW TO OVERCOME THIS IN AUTOMATIC TESTING
The problem
In a clustered elastic search environment, one document update is not automatically reflected in all notes.
This caused an inconsistent results in automatic testing.
Example
Change campaign name from A to B.
Automatic test verifies that the change actually tool place by getting the entity and verify its name.
Possible Solutions
• Wait few seconds before checking for updated status
• Use elastic search refresh to force in memory index update

NAME UNIQUENESS IMPLEMENTATION
HOW TO IMPLEMENT UNIQUE CONSTRAINS USING COUCHBASE
Problem
Maintain unique entity name
Real use case
Keep advertisement name unique system wide
Possible Solution
Save uniqueness document
Key: entity name
Value : entity id
Save
succeeded?
Save
entity
Return
error
Delete
uniqueness
doc
Input: entity to save
Still need to take care of orphan uniqueness documents

N1QL EXAMPLE
• Use Query Workbench Developer Preview
• Example queries
1. select mvbucket.`key` from mvbucket where payload._type = 'AdSmartVersion' and
payload.createdOn is not missing
2. select * from mvbucket where payload._type = 'AdSmartVersion' and payload.masterAdId =
1073741825 and payload.createdOn between 1349057369158 and 1449057369158
3. select payload.masterAdId, count(1) from mvbucket where payload._type = 'AdSmartVersion' and
payload.createdOn between 1349057369158 and 1449057369158
4. select payload.masterAdId, count(1) as count from mvbucket where payload._type =
'AdSmartVersion'

COUCHBASE JAVA CLIENT 2
NOTES ON JAVA CLIENT
• Built in support of JSON documents
• Support counters
• Asynchronous client using java RX
• Allow exploit already used reactive business logic
• Parallel efficient processing
• Inherent error handling – for example retries get document with an exponential backoff
Observable
.from(docIds)
.flatMap(id -> {
return bucket .async()
.get(id)
.retryWhen(RetryBuilder .anyOf(BackpressureException.class)
.delay(Delay.exponential(TimeUnit.MILLISECONDS, 100))
.max(10)
.build() ); })
.subscribe();

OUR USE OF ELASTICSEARCH
QUERY ENGINE
• Free text search – user boolean queries
• Data filtering – data grid filtering
• Grouping – data grid grouping
• Authorization – filter document according to user permissions
• batch processing – internal services that use scan and scroll to operate on
large data set

ELASTIC SEARCH – SOME BEST PRACTICES
• Carefully maintain index schema
• Avoid using Dynamic mapping
• Data type collisions
• Large data set – do not save data that is not used
• Build static schema from data model
• Updating data model searchable field  trigger build of new index
• Some changes in schema required re-indexing, e.g. adding mandatory field, change of enumeration value
• Inconsistency – updated data is not immediately appears on query result
• System overall design must be aware of this limitation
• Throttling – must control number of writes

COUCHBASE 4.1
OUR USAGE
• Use optimistic locking - update operations are done through updater lambda function
• N1QL
• Do not meet performance for large data set with order by queries
• Took more than 5 sec to query 250 entities
• Used for business logic where no sorting is required
• Used when consistency is important
• XDCR
• Customize plugin to index required entities
• Add support of parent child relationship in elasticsearch

Using Couchbase and Elasticsearch as data layers

More Related Content

What's hot (20)

Similar to Using Couchbase and Elasticsearch as data layers (20)

Recently uploaded (20)

Using Couchbase and Elasticsearch as data layers

Editor's Notes