SlideShare a Scribd company logo
©2016 Couchbase Inc.
Tuning For Performance
Indexes And Queries
1
©2016 Couchbase Inc.©2016 Couchbase Inc.
Agenda
• Data Model
• Query Execution
• Indexing Options
• Index Design
• QueryTuning
• Deployment & Configuration
2
©2016 Couchbase Inc. 3
Data Model
©2016 Couchbase Inc.©2016 Couchbase Inc.
Document Data Modeling for N1QL
4
• Define document boundaries
• Identifying parent and child objects
• Deciding whether to embed child objects
• Defining relationships
• Parent-child relationships
• Independent relationships
• Expressing relationships
©2016 Couchbase Inc.©2016 Couchbase Inc.
Identifying parent and child objects
5
• A Parent object has an independent lifecycle
• It is not deleted as part o deleting any other objects
• E.g. a registered user of a site
• A child object has dependent lifecycle; it has no meaningful existence without its
parent
• It must be deleted when it parent is deleted
• E.g. a comment on a blog (child of the blog object)
©2016 Couchbase Inc.©2016 Couchbase Inc.
Deciding whether to embed child objects
6
• Couchbase provides per-document atomicity
• If the child and parent must be atomically updated or deleted together, the child must be
embedded
• There is no key-value look up for embedded objects. If child requires key-value
look up it should not be embedded.
• Performance trade off
• Embedding the child makes it faster to read the parent together with all its children (single
document fetch)
• If the child has high cardinality, embedding the child makes the parent bigger and slower to
store and fetch
©2016 Couchbase Inc.©2016 Couchbase Inc.
Defining & Expressing Relationships
7
• Defining Relationships
• Parent-child relationships
• If we model the child as a separate document and not embedded, we have defined a
relationship (parent-child)
• Independent relationships
• Relationships between two independent objects
• Expressing relationships
• 3 ways to express relationshipsCouchbase
• Parent contains keys of children (outbound)
• Children contain key of parent (inbound)
• Both of the above (dual)
©2016 Couchbase Inc.©2016 Couchbase Inc.
Using JSON to Store Data {
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
DocumentKey: CBL2015
CustomerID Name DOB
CBL2015 Jane Smith 1990-01-30
Customer
ID
Type Cardnum Expiry
CBL2015 visa 5827… 2019-03
CBL2015 maste
r
6274… 2018-12
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam Smith
CustomerID item amt
CBL2015 mac 2823.52
CBL2015 ipad2 623.52
CustomerID ConnId Name
CBL2015 XYZ987 Joe Smith
CBL2015 SKR007 Sam
Smith
Contacts
Customer
Billing
ConnectionsPurchases
©2016 Couchbase Inc.©2016 Couchbase Inc.
Travel-Sample
9
key: airline_24
{
"id": "24",
"type": "airline",
"callsign": "AMERICAN",
"iata": "AA"
}
key: airport_3577
{
"id": 3577
"type": "airport",
"faa": "SEA",
"icao": "KSEA"
}
key: route_5966
{
"id": 5966
"type": "route",
"airlineid": "airline_24",
"sourceairport": "SEA"
}
key: landmark_21661
{
"id": 21661
"type": "landmark",
"country": "France",
"email": null
}
Key
reference
key: hotel_25592
{
"id": 25592
"type": " hotel",
"country": " San Francisco",
"phone": " +1 415 440-
5600 "
}
airline
landmark
route
hotel
airport
©2016 Couchbase Inc.©2016 Couchbase Inc.
Travel-sample: Hotel Document
"docid": "hotel_25390"
{
"address": "321 Castro St",
…
"city": "San Francisco",
"country": "United States",
"description": "An upscale bed and breakfast in a restored house.",
"directions": "at 16th",
"geo": {
"accuracy": "ROOFTOP",
"lat": 37.7634,
"lon": -122.435
},
"id": 25390,
"name": "Inn on Castro",
"phone": "+1 415 861-0321",
"price": "$95–$190",
"public_likes": ["John Smith", "Joe Carl", "Jane Smith", "Kate Smith"],
"reviews": [
{
"author": "Mason Koepp",
"content": ”blah-blah",
"date": "2012-08-23 16:57:56 +0300",
"ratings": {
"Check in / front desk": 3,
"Cleanliness": 3,
"Location": 4,
"Overall": 2,
"Rooms": 2,
"Service": -1,
"Value": 2
}
}
],
"state": "California",
"type": "hotel",
"url": "http://www.innoncastro.com/",
}
10
Document Key
city: Attributes (key-value pairs)
geo: Object. 1:1 relationship
public_likes: Array of strings:
Embedded 1:many relationship
reviews: Array of objects:
Embedded 1:N relationship
ratings: object within an array
©2016 Couchbase Inc.©2016 Couchbase Inc.
N1QL Access Methods and Performance
Fastest to slowest, 1 to 5
Method Description
1 USE KEYS Document fetch, no index scan
2 COVERED Index Scan
Query is (or part of the query during JOIN) is processed with index
scan only
3 Index Scan Partial index scan, then fetches
4 JOIN Fetch of left-hand-side, then fetches of right-hand-side
5 Primary Scan Full bucket scan, then fetches
©2016 Couchbase Inc.©2016 Couchbase Inc.
Child Representation and Access Method
Child Representation Access Method Notes
1 Embedded USE KEYS
• Parent with children loaded via
USE KEYS
• Child can be surfaced via UNNEST
2 Outbound relationship JOIN
• Parent contains child keys
• Children loaded via JOIN
3 Inbound relationship Index scan
• Children contain parent key
• child.parent_key is indexed
• Index is scanned to load children
4 Not modeled Primary scan
• Relationship not explicitly
modeled
©2016 Couchbase Inc. 13
Query Execution
©2016 Couchbase Inc.©2016 Couchbase Inc.
NoSQL
{
"Name" : "Jane Smith",
"DOB" : "1990-01-30",
"Billing" : [
{
"type" : "visa",
"cardnum" : "5827-2842-2847-3909",
"expiry" : "2019-03"
},
{
"type" : "master",
"cardnum" : "6274-2842-2847-3909",
"expiry" : "2019-03"
}
],
"Connections" : [
{
"CustId" : "XYZ987",
"Name" : "Joe Smith"
},
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
{
"CustId" : "PQR823",
"Name" : "Dylan Smith"
}
],
"Purchases" : [
{ "id":12, item: "mac", "amt": 2823.52 }
{ "id":19, item: "ipad2", "amt": 623.52 }
]
}
LoyaltyInfo ResultDocuments
Orders
CUSTOMER
Input: JSON Documents Output: JSON Documents
©2016 Couchbase Inc.©2016 Couchbase Inc.
N1QL: Query Execution Flow
Clients
1. Submit the query over REST API 8. Query result
2. Parse, Analyze, create Plan 7. Evaluate: Filter, Join, Aggregate, Sort, Paginate
3. Scan Request;
index filters
6. Fetch the documents
Index
Service
Query
Service
Data
Service
4. Get qualified doc keys
5. Fetch Request,
doc keys
SELECT c_id,
c_first,
c_last,
c_max
FROM CUSTOMER
WHERE c_id = 49165;
{
"c_first": "Joe",
"c_id": 49165,
"c_last": "Montana",
"c_max" : 50000
}
©2016 Couchbase Inc.©2016 Couchbase Inc.
Inside a Query Service
Client
FetchParse Plan Join Filter
Pre-Aggregate
Offset Limit ProjectSortAggregateScan
Query Service
Index
Service
Data
Service
©2016 Couchbase Inc. 17
Indexing Options
©2016 Couchbase Inc.©2016 Couchbase Inc.
Index Options
18
Index Type Description
1 Primary Index Index on the document key on the whole bucket
2 Named Primary Index Give name for the primary index. Allows multiple primary indexes in the cluster
3 Secondary Index Index on the key-value or document-key
4 Secondary Composite
Index
Index on more than one key-value
5 Functional Index Index on function or expression on key-values
6 Array Index Index individual elements of the arrays
7 Partial Index Index subset of items in the bucket
8 Covering Index Query able to answer using the the data from the index and skips retrieving the
item.
9 Duplicate Index This is not type of index. Feature of indexing that allows load balancing. Thus
providing scale-out, multi-dimensional scaling, performance, and high availability.
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
19
Get the lowest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get
country, name value with id’s.
Sample Document:
META().id : "airline_9833”
{
"callsign": "Epic",
"country": "United States",
"iata": "FA",
"icao": "4AA",
"id": 9833,
"name": "Epic Holiday",
"type": "airline"
}
Type of document Count
airline 187
airport 1968
route 24024
landmark 4495
hotel 917
total 31591
Type of document Count
type = “airline” AND country = “United States” 127
type = “airline” AND country = “United States” AND id BETWEEN 0 AND 1000 18
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
20
• Using Primary Index
• The data source as 31592 documents
• Primary Index gets all the document keys from the index, the documents, apply predicate, sort and
then paginate to return 10 documents
• Using Secondary index
• Predicate (type = "airline") is pushed to indexer, fetch 187 documents
• Two predicates not pushed to indexer: (country = "United States" AND id BETWEEN 0 AND 1000)
SELECT country, id, name
FROM `travel-sample`
WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000
ORDER BY id
LIMIT 10
OFFSET 5;
CREATE INDEX ts_ix1 ON `travel-sample`(type);
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
Composite index on all attributes in query predicates
• All predicates are pushed to indexer; fetches 18+ documents.
Partial composite index
• The document type can be an index condition
• Because document type check is equality, remove it.
• Leaner index performs better (saves I/O, memory, CPU, Network)
Covering partial composite index
• Add all referenced attributes to index keys. E.g., name
• Covered query avoids document fetch
21
CREATE INDEX ts_ix1 ON `travel-sample`(type,id,country);
CREATE INDEX ts_ix1 ON `travel-sample`(id,country) WHERE type = "airline";
CREATE INDEX ts_ix1 ON `travel-sample`(id, country, name) WHERE type = "airline";
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
ORDER BY optimization
• Index stores data is pre-sorted by the index keys
• ORDER BY list should match with INDEX keys list order: left to right.
• Explain index order to avoid additional fetch and sort
LIMIT pushdown to indexer.
22
"spans":[
{
"Range":{
"High":[
"1000",
"successor("United States")"
],
"Inclusion":1,
"Low":[
“0",
""United States""
]
}
}
]
LIMIT pushing to indexer improves efficiency & performance
Condition:
• Exact predicates are pushed down to indexer
• ORDER BY matches index key order
• Indexer evaluates all of predicates
• Unsupported: JOINs, GROUP BY
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
Optimizer for ORDER BY with LIMIT
• Query has equal predicate on country; id has range predicate;
• This exact predicate will product exact results
• Changing to: ORDER BY country, id the result will be same; LIMIT can be pushed down to indexer
Offset pushdown
• Pushed as (limit + offset) and query skips over limit
23
CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline";
SELECT country, id, name
FROM `travel-sample`
WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000
ORDER BY country, id
LIMIT 10
OFFSET 5;
©2016 Couchbase Inc.©2016 Couchbase Inc.
Final Query & Index
24
CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline";
SELECT country, id, name
FROM `travel-sample`
WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000
ORDER BY country, id
LIMIT 10
OFFSET 5;
©2016 Couchbase Inc.©2016 Couchbase Inc.
Design Query & Index
25
Get the highest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get
country, name value with id’s.
Index & Query (For Numbers only)
CREATE INDEX ts_ix1 ON `travel-sample`(country, -id, name) WHERE type = "airline";
SELECT country, -(-id), name
FROM `travel-sample`
WHERE type = "airline" AND country = "United States" AND -id BETWEEN -1000 AND 0
ORDER BY country, -id
LIMIT 10
OFFSET 5;
SELECT country, id, name
FROM `travel-sample`
WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000
ORDER BY country, id DESC
LIMIT 10
OFFSET 5;
©2016 Couchbase Inc. 26
Index Design
©2016 Couchbase Inc.©2016 Couchbase Inc.
Advice on Index Design. Part 1
• Standard GSI : Smaller mutations, larger index
• MOI Index: . 100% of the data in memory. Large # of mutations. Better performance
• Avoid primary index scan in production.
• Avoid creating the primary index itself.
• Primary scan equivalent of table scan.
• Query has right predicate to choose right index
• Query needs to have predicates on leading index keys
• Explore all combinations of index options.
• Divide and conquer with partial indexes. They support complex expressions.
• Index can have large number of keys with maximum total key size: 4096.
• Create the index with predicate attributes as leading keys of index, followed by non predicate attributes for covering.
• If the query is not covered, index keys should only be attributes used in query predicates
27
©2016 Couchbase Inc.©2016 Couchbase Inc.
Advice on Index Design. Part 2
• Index key order:
• Attributes typically used with EQUALITY & IN predicates
• Followed by BETWEEN ({<,<=} AND {>,>=})
• Followed by less than (<, <=)
• Followed by (>)
• If partial index condition has equal predicate on field, don’t include that field as index keys to make index LEAN (4.5.0+)
• META().id is always present. If META().id not part of the predicate, don’t include in the index keys.
• Only indexable META() filed is META().id, all others required fetch of the items.
• Remove unused indexes.
• If Index doesn’t fit in memory for MOI) use partial index.
• If index is heavily used create duplicate index.
• Add index nodes.
28
©2016 Couchbase Inc. 29
QueryTuning
©2016 Couchbase Inc.©2016 Couchbase Inc.
Advice on Query Performance
• EXPLAIN to analyze query plan
• Index selection, spans for push down of as many predicates as possible. More the merrier
• Pushdown of LIMIT,OFFSET
• Index order for ORDER BY
• Covering index
• Simple COUNT queries can take advantage of index count
• Exploit index for MIN queries
• For ANY, ANY AND EVERY, WITHIN predicates use ARRAY index.
• For UNNEST, use ARRAY index. Array key has to be the leading key (Only for UNNEST)
• USE IN instead ofWITHIN
• Use pretty=false (4.5.1), max_parallelism when queries return large resultset
• Improve fetch performance by increasing pipeline-cap, pipeline-batch
• Exploit array fetch by query rewite
• Execute query and explore each phase of monitoring stats of query.
• Monitor CPU and memory usage and adjust number of Query Service Nodes.
. 30
©2016 Couchbase Inc.©2016 Couchbase Inc.
SELECT: JOIN
31
SELECT COUNT(1)
FROM `beer-sample` beer
INNER JOIN
`beer-sample` brewery ON KEYS beer.brewery_id
WHERE state = ‘CA’
• JOIN operation stitches two keyspaces
• JOIN criteria is based on ON KEYS clause
• The outer table uses the index scan, if
possible
• The fetch of the inner table (brewery)
document-by-document
• 4.6 improves this by fetching in batches.
©2016 Couchbase Inc.©2016 Couchbase Inc.
SELECT: JOIN
SELECT COUNT(1)
FROM (
SELECT RAW META().id
FROM `beer-sample` beer
WHERE state = ‘CA’) as blist
INNER JOIN
`beer-sample` brewery ON KEYS blist;
32
SELECT COUNT(1)
FROM (
SELECT ARRAY_AGG(META().id) karray
FROM `beer-sample` beer
WHERE state = ‘CA’) as b
INNER JOIN
`beer-sample` brewery ON KEYS b.karray;
• Why not get all of the required document IDs from the index scan then do a big bulk get on the
outer table?
• Two ways to do it.
a) Use the array aggregate (ARRAY_AGG()) to create the list
b) Use RAW to create the the array and then use that to JOIN.
©2016 Couchbase Inc.©2016 Couchbase Inc.
DISTINCT
1. select DISTINCT type from `travel-sample`;
2. SELECT MIN(type) FROM `travel-sample` WHERE type IS NOT MISSING;
3. SELECT MIN(type) FROM `travel-sample` WHERE type > "airline";
import requests
import json
url="http://localhost:8093/query"
s = requests.Session()
s.keep_alive = True
s.auth = ('Administrator','password')
query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT
MISSING ;'}
r = s.post(url, data=query, stream=False, headers={'Connection':'close'})
result = r.json()['results'][0]
lastval = result['minval']
while lastval != None:
print lastval
stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval +
'";';
query = {'statement':stmt}
r = s.post(url, data=query, stream=False, headers={'Connection':'close'})
result = r.json()['results'][0]
lastval = result['minval']
33
©2016 Couchbase Inc.©2016 Couchbase Inc.
GROUP, COUNT()
SELECT type, count(type)
FROM `travel-sample`
GROUP BY type;
SELECT type, count(type)
FROM `travel-sample`
WHERE type IS NOT MISSING
GROUP BY type;
Step 1: Get the first entry in the index for the type.
Step 2: Then, COUNT() from the data set where type = first-value.
Step 3: Now we use the index to find the next value for type.
Step 4: Repeat step 2 and 3 for all the values of type.
34
©2016 Couchbase Inc.©2016 Couchbase Inc.
GROUP, COUNT()
import requests
import json
url="http://localhost:8093/query"
s = requests.Session()
s.keep_alive = True
s.auth = ('Administrator','password')
query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT MISSING ;'}
r = s.post(url, data=query, stream=False, headers={'Connection':'close'})
result = r.json()['results'][0]
lastval = result['minval']
while lastval != None:
stmt = 'SELECT COUNT(type) tcount FROM `travel-sample` WHERE type = "' + lastval + '";';
query = {'statement':stmt}
r = s.post(url, data=query, stream=False, headers={'Connection':'close'})
result = r.json()['results'][0]
tcount = result['tcount']
print lastval, tcount
stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval + '";';
query = {'statement':stmt}
r = s.post(url, data=query, stream=False, headers={'Connection':'close'})
result = r.json()['results'][0]
lastval = result['minval']
35
©2016 Couchbase Inc. 36
Deployment & Configuration
©2016 Couchbase Inc. 37
Deployment
• Couchbase Cluster Services
• Data
• Index
• Query
• FTS
• Analytics
• Data Service
• Enough RAM to cache reads
• Enough Disk to eventually persist writes
• CPU primarily forView and XDCR
• At least 3 nodes – Replication at the bucket
level
• Minimum requirements: 4GB RAM, 8 Cores
CPU
• Index Service
• Primarily RAM and Disk IO bound
• ForestDB persistence engine
• MOI – Memory Optimized Index
• At least 2 nodes for HA, each index replicated
individually
• Minimum requirements : 8GB RAM, 8 Cores
CPU, fast disk
• Query Service
• Primarily CPU bound
• Very low disk requirements
• At least 2 nodes for HA – Queries
automatically load balanced by CB SDKs
• Minimum requirements : 8GB RAM, 16+
Cores CPU
©2016 Couchbase Inc. 38
Deployment
• Multi Dimensional Scalability (MDS)
• Option1:All services enabled on all the nodes
• Option 2: Separated services – size nodes depends on workload.
©2016 Couchbase Inc.©2016 Couchbase Inc.
Query Configuration
curl -u Administrator:password http://localhost:8093/admin/settings >z.json
{
"completed-limit": 4000,
"completed-threshold": 1000,
"cpuprofile": "",
"debug": false,
"keep-alive-length": 16384,
"loglevel": "INFO",
"max-parallelism": 1,
"memprofile": "",
"pipeline-batch": 16,
"pipeline-cap": 512,
"request-size-cap": 67108864,
"scan-cap": 0,
"servicers": 32,
"timeout": 0
}
39
©2016 Couchbase Inc.©2016 Couchbase Inc.
Query Configuration
{
"completed-limit": 4000,
"completed-threshold": 1000,
"cpuprofile": "",
"debug": false,
"keep-alive-length": 16384,
"loglevel": "INFO",
"max-parallelism": 1,
"memprofile": "",
"pipeline-batch": 1024,
"pipeline-cap": 4096,
"request-size-cap": 67108864,
"scan-cap": 0,
"servicers": 32,
"timeout": 0
}
curl -u Administrator:password http://localhost:8093/admin/settings -XPOST -d@./z.json
40
©2016 Couchbase Inc.©2016 Couchbase Inc.
Query Configuration
{
"completed-limit": 4000,
"completed-threshold": 1000,
"cpuprofile": "",
"debug": false,
"keep-alive-length": 16384,
"loglevel": "INFO",
"max-parallelism": 1,
"memprofile": "",
"pipeline-batch": 1024,
"pipeline-cap": 4096,
"request-size-cap": 67108864,
"scan-cap": 0,
"servicers": 32,
"timeout": 0
}
41
©2016 Couchbase Inc.©2016 Couchbase Inc.
Query Configuration --
curl -X POST -u Administrator:<password>
http://127.0.0.1:9000/diag/eval/ -d 'ns_config:set({node, node(),
{query, extra_args}}, ["-- pipeline-batch=1024", "--pipeline-cap
=4096"])'
42
• Updating parameters via ns_server changes the values permanently
• The values survive the restart
• You can change any of the parameters over command line
©2016 Couchbase Inc.©2016 Couchbase Inc.
Query Configuration.
-acctstore="gometrics:": Accounting store address (http://URL or stub:)
-certfile="": HTTPS certificate file
-completed-limit=4000: maximum number of completed requests
-completed-threshold=1000: cache completed query lasting longer than this many milliseconds
-configstore="stub:": Configuration store address (http://URL or stub:)
-cpuprofile="": write cpu profile to file
-datastore="": Datastore address (http://URL or dir:PATH or mock:)
-debug=false: Debug mode
-enterprise=true: Enterprise mode
-http=":8093": HTTP service address
-https=":18093": HTTPS service address
-keep-alive-length=16384: maximum size of buffered result
-keyfile="": HTTPS private key file
-logger="": Logger implementation
-loglevel="info": Log level: debug, trace, info, warn, error, severe, none
-max-parallelism=1: Maximum parallelism per query; use zero or negative value to disable
-memprofile="": write memory profile to this file
-metrics=true: Whether to provide metrics
-mutation-limit=0: Maximum LIMIT for data modification statements; use zero or negative value to disable
-namespace="default": Default namespace
-order-limit=0: Maximum LIMIT for ORDER BY clauses; use zero or negative value to disable
-pipeline-batch=16: Number of items execution operators can batch
-pipeline-cap=512: Maximum number of items each execution operator can buffer
-plus-servicers=256: Plus servicer count
-pretty=true: Pretty output
-readonly=false: Read-only mode
-request-cap=1024: Maximum number of queued requests per logical CPU
-request-size-cap=67108864: Maximum size of a request
-scan-cap=0: Maximum buffer size for primary index scans; use zero or negative value to disable
-servicers=64: Servicer count
-signature=true: Whether to provide signature
-ssl_minimum_protocol="tlsv1": TLS minimum version ('tlsv1'/'tlsv1.1'/'tlsv1.2')
-static-path="static": Path to static content
-timeout=0: Server execution timeout, e.g. 500ms or 2s; use zero or negative value to disable 43
©2016 Couchbase Inc. 44
Keshav Murthy
Director
keshav@couchbase.com
SitaramVemulapalli
Sr. Software Engineer
Sitaram.vemulapalli@couchbase.com
©2016 Couchbase Inc.
ThankYou!
45

More Related Content

PPTX
Understanding N1QL Optimizer to Tune Queries
PPTX
N1QL workshop: Indexing & Query turning.
PPTX
From SQL to NoSQL: Structured Querying for JSON
PDF
MongoDB .local Munich 2019: Still Haven't Found What You Are Looking For? Use...
PDF
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
PPTX
Doing Joins in MongoDB: Best Practices for Using $lookup
PDF
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
PPTX
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2
Understanding N1QL Optimizer to Tune Queries
N1QL workshop: Indexing & Query turning.
From SQL to NoSQL: Structured Querying for JSON
MongoDB .local Munich 2019: Still Haven't Found What You Are Looking For? Use...
MongoDB .local Chicago 2019: Practical Data Modeling for MongoDB: Tutorial
Doing Joins in MongoDB: Best Practices for Using $lookup
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDB
Joins and Other Aggregation Enhancements Coming in MongoDB 3.2

What's hot (20)

PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PPTX
MongoDB and Hadoop: Driving Business Insights
PDF
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
PPTX
Indexing Strategies to Help You Scale
PPTX
Jumpstart: Introduction to MongoDB
PPTX
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
PDF
MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas
PDF
MongoDB Launchpad 2016: What’s New in the 3.4 Server
KEY
Managing Social Content with MongoDB
PPTX
Back to Basics Webinar 3: Schema Design Thinking in Documents
PPTX
MongoDB + Spring
PDF
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB and Schema Design
PDF
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
PPTX
Utilizing Arrays: Modeling, Querying and Indexing
PDF
Webinar: Working with Graph Data in MongoDB
PDF
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
PPTX
Tutorial: Building Your First App with MongoDB Stitch
PPTX
Beyond the Basics 2: Aggregation Framework
PPTX
Webinar: Schema Design
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB and Hadoop: Driving Business Insights
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
Indexing Strategies to Help You Scale
Jumpstart: Introduction to MongoDB
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas
MongoDB Launchpad 2016: What’s New in the 3.4 Server
Managing Social Content with MongoDB
Back to Basics Webinar 3: Schema Design Thinking in Documents
MongoDB + Spring
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB and Schema Design
MongoDB.local DC 2018: Tutorial - Data Analytics with MongoDB
Utilizing Arrays: Modeling, Querying and Indexing
Webinar: Working with Graph Data in MongoDB
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Tutorial: Building Your First App with MongoDB Stitch
Beyond the Basics 2: Aggregation Framework
Webinar: Schema Design
Ad

Viewers also liked (11)

PPTX
Drilling on JSON
PPTX
Introduction to NoSQL and Couchbase
PPTX
Building Event Driven API Services Using Webhooks
PPTX
Query in Couchbase. N1QL: SQL for JSON
PDF
Couchbase Day
PPTX
Deep dive into N1QL: SQL for JSON: Internals and power features.
PPTX
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
PDF
Couchbase @ Big Data France 2016
PDF
SDEC2011 Using Couchbase for social game scaling and speed
PPTX
Accelerating analytics on the Sensor and IoT Data.
PDF
APIs That Make Things Happen
Drilling on JSON
Introduction to NoSQL and Couchbase
Building Event Driven API Services Using Webhooks
Query in Couchbase. N1QL: SQL for JSON
Couchbase Day
Deep dive into N1QL: SQL for JSON: Internals and power features.
Real Time Interactive Queries IN HADOOP: Big Data Warehousing Meetup
Couchbase @ Big Data France 2016
SDEC2011 Using Couchbase for social game scaling and speed
Accelerating analytics on the Sensor and IoT Data.
APIs That Make Things Happen
Ad

Similar to Tuning for Performance: indexes & Queries (20)

PPTX
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
PPTX
Querying NoSQL with SQL - KCDC - August 2017
PPTX
N1QL: What's new in Couchbase 5.0
PPTX
NoSQL Data Modeling using Couchbase
PPTX
Querying NoSQL with SQL - MIGANG - July 2017
PPTX
JSON Data Modeling - GDG Indy - April 2020
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PPTX
NoSQL’s biggest secret: NoSQL never went away
PPTX
Querying Nested JSON Data Using N1QL and Couchbase
PPTX
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
PPTX
Putting the SQL Back in NoSQL - October 2022 - All Things Open
PPTX
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
PDF
Couchbase 5.5: N1QL and Indexing features
PDF
Three Things You Need to Know About Document Data Modeling in NoSQL
PPTX
Couchbase N1QL: Index Advisor
PPTX
Json data modeling june 2017 - pittsburgh tech fest
PPTX
Introducing N1QL: New SQL Based Query Language for JSON
PDF
NoSQL's biggest lie: SQL never went away - Martin Esmann
PPTX
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
Querying NoSQL with SQL - KCDC - August 2017
N1QL: What's new in Couchbase 5.0
NoSQL Data Modeling using Couchbase
Querying NoSQL with SQL - MIGANG - July 2017
JSON Data Modeling - GDG Indy - April 2020
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
NoSQL’s biggest secret: NoSQL never went away
Querying Nested JSON Data Using N1QL and Couchbase
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
Putting the SQL Back in NoSQL - October 2022 - All Things Open
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Couchbase 5.5: N1QL and Indexing features
Three Things You Need to Know About Document Data Modeling in NoSQL
Couchbase N1QL: Index Advisor
Json data modeling june 2017 - pittsburgh tech fest
Introducing N1QL: New SQL Based Query Language for JSON
NoSQL's biggest lie: SQL never went away - Martin Esmann
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5

More from Keshav Murthy (13)

PDF
N1QL New Features in couchbase 7.0
PPTX
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
PPTX
Couchbase N1QL: Language & Architecture Overview.
PPTX
Couchbase Query Workbench Enhancements By Eben Haber
PPTX
Mindmap: Oracle to Couchbase for developers
PPTX
Extended JOIN in Couchbase Server 4.5
PPTX
Enterprise Architect's view of Couchbase 4.0 with N1QL
PPTX
You know what iMEAN? Using MEAN stack for application dev on Informix
PPT
Informix SQL & NoSQL: Putting it all together
PPT
Informix SQL & NoSQL -- for Chat with the labs on 4/22
PDF
NoSQL Deepdive - with Informix NoSQL. IOD 2013
PDF
Informix NoSQL & Hybrid SQL detailed deep dive
PPT
Table for two? Hybrid approach to developing combined SQL, NoSQL applications...
N1QL New Features in couchbase 7.0
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
Couchbase N1QL: Language & Architecture Overview.
Couchbase Query Workbench Enhancements By Eben Haber
Mindmap: Oracle to Couchbase for developers
Extended JOIN in Couchbase Server 4.5
Enterprise Architect's view of Couchbase 4.0 with N1QL
You know what iMEAN? Using MEAN stack for application dev on Informix
Informix SQL & NoSQL: Putting it all together
Informix SQL & NoSQL -- for Chat with the labs on 4/22
NoSQL Deepdive - with Informix NoSQL. IOD 2013
Informix NoSQL & Hybrid SQL detailed deep dive
Table for two? Hybrid approach to developing combined SQL, NoSQL applications...

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Computer network topology notes for revision
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Data_Analytics_and_PowerBI_Presentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Foundation of Data Science unit number two notes
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Qualitative Qantitative and Mixed Methods.pptx
.pdf is not working space design for the following data for the following dat...
Computer network topology notes for revision
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Supervised vs unsupervised machine learning algorithms
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Tuning for Performance: indexes & Queries

  • 1. ©2016 Couchbase Inc. Tuning For Performance Indexes And Queries 1
  • 2. ©2016 Couchbase Inc.©2016 Couchbase Inc. Agenda • Data Model • Query Execution • Indexing Options • Index Design • QueryTuning • Deployment & Configuration 2
  • 3. ©2016 Couchbase Inc. 3 Data Model
  • 4. ©2016 Couchbase Inc.©2016 Couchbase Inc. Document Data Modeling for N1QL 4 • Define document boundaries • Identifying parent and child objects • Deciding whether to embed child objects • Defining relationships • Parent-child relationships • Independent relationships • Expressing relationships
  • 5. ©2016 Couchbase Inc.©2016 Couchbase Inc. Identifying parent and child objects 5 • A Parent object has an independent lifecycle • It is not deleted as part o deleting any other objects • E.g. a registered user of a site • A child object has dependent lifecycle; it has no meaningful existence without its parent • It must be deleted when it parent is deleted • E.g. a comment on a blog (child of the blog object)
  • 6. ©2016 Couchbase Inc.©2016 Couchbase Inc. Deciding whether to embed child objects 6 • Couchbase provides per-document atomicity • If the child and parent must be atomically updated or deleted together, the child must be embedded • There is no key-value look up for embedded objects. If child requires key-value look up it should not be embedded. • Performance trade off • Embedding the child makes it faster to read the parent together with all its children (single document fetch) • If the child has high cardinality, embedding the child makes the parent bigger and slower to store and fetch
  • 7. ©2016 Couchbase Inc.©2016 Couchbase Inc. Defining & Expressing Relationships 7 • Defining Relationships • Parent-child relationships • If we model the child as a separate document and not embedded, we have defined a relationship (parent-child) • Independent relationships • Relationships between two independent objects • Expressing relationships • 3 ways to express relationshipsCouchbase • Parent contains keys of children (outbound) • Children contain key of parent (inbound) • Both of the above (dual)
  • 8. ©2016 Couchbase Inc.©2016 Couchbase Inc. Using JSON to Store Data { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } DocumentKey: CBL2015 CustomerID Name DOB CBL2015 Jane Smith 1990-01-30 Customer ID Type Cardnum Expiry CBL2015 visa 5827… 2019-03 CBL2015 maste r 6274… 2018-12 CustomerID ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith CustomerID item amt CBL2015 mac 2823.52 CBL2015 ipad2 623.52 CustomerID ConnId Name CBL2015 XYZ987 Joe Smith CBL2015 SKR007 Sam Smith Contacts Customer Billing ConnectionsPurchases
  • 9. ©2016 Couchbase Inc.©2016 Couchbase Inc. Travel-Sample 9 key: airline_24 { "id": "24", "type": "airline", "callsign": "AMERICAN", "iata": "AA" } key: airport_3577 { "id": 3577 "type": "airport", "faa": "SEA", "icao": "KSEA" } key: route_5966 { "id": 5966 "type": "route", "airlineid": "airline_24", "sourceairport": "SEA" } key: landmark_21661 { "id": 21661 "type": "landmark", "country": "France", "email": null } Key reference key: hotel_25592 { "id": 25592 "type": " hotel", "country": " San Francisco", "phone": " +1 415 440- 5600 " } airline landmark route hotel airport
  • 10. ©2016 Couchbase Inc.©2016 Couchbase Inc. Travel-sample: Hotel Document "docid": "hotel_25390" { "address": "321 Castro St", … "city": "San Francisco", "country": "United States", "description": "An upscale bed and breakfast in a restored house.", "directions": "at 16th", "geo": { "accuracy": "ROOFTOP", "lat": 37.7634, "lon": -122.435 }, "id": 25390, "name": "Inn on Castro", "phone": "+1 415 861-0321", "price": "$95–$190", "public_likes": ["John Smith", "Joe Carl", "Jane Smith", "Kate Smith"], "reviews": [ { "author": "Mason Koepp", "content": ”blah-blah", "date": "2012-08-23 16:57:56 +0300", "ratings": { "Check in / front desk": 3, "Cleanliness": 3, "Location": 4, "Overall": 2, "Rooms": 2, "Service": -1, "Value": 2 } } ], "state": "California", "type": "hotel", "url": "http://www.innoncastro.com/", } 10 Document Key city: Attributes (key-value pairs) geo: Object. 1:1 relationship public_likes: Array of strings: Embedded 1:many relationship reviews: Array of objects: Embedded 1:N relationship ratings: object within an array
  • 11. ©2016 Couchbase Inc.©2016 Couchbase Inc. N1QL Access Methods and Performance Fastest to slowest, 1 to 5 Method Description 1 USE KEYS Document fetch, no index scan 2 COVERED Index Scan Query is (or part of the query during JOIN) is processed with index scan only 3 Index Scan Partial index scan, then fetches 4 JOIN Fetch of left-hand-side, then fetches of right-hand-side 5 Primary Scan Full bucket scan, then fetches
  • 12. ©2016 Couchbase Inc.©2016 Couchbase Inc. Child Representation and Access Method Child Representation Access Method Notes 1 Embedded USE KEYS • Parent with children loaded via USE KEYS • Child can be surfaced via UNNEST 2 Outbound relationship JOIN • Parent contains child keys • Children loaded via JOIN 3 Inbound relationship Index scan • Children contain parent key • child.parent_key is indexed • Index is scanned to load children 4 Not modeled Primary scan • Relationship not explicitly modeled
  • 13. ©2016 Couchbase Inc. 13 Query Execution
  • 14. ©2016 Couchbase Inc.©2016 Couchbase Inc. NoSQL { "Name" : "Jane Smith", "DOB" : "1990-01-30", "Billing" : [ { "type" : "visa", "cardnum" : "5827-2842-2847-3909", "expiry" : "2019-03" }, { "type" : "master", "cardnum" : "6274-2842-2847-3909", "expiry" : "2019-03" } ], "Connections" : [ { "CustId" : "XYZ987", "Name" : "Joe Smith" }, { "CustId" : "PQR823", "Name" : "Dylan Smith" } { "CustId" : "PQR823", "Name" : "Dylan Smith" } ], "Purchases" : [ { "id":12, item: "mac", "amt": 2823.52 } { "id":19, item: "ipad2", "amt": 623.52 } ] } LoyaltyInfo ResultDocuments Orders CUSTOMER Input: JSON Documents Output: JSON Documents
  • 15. ©2016 Couchbase Inc.©2016 Couchbase Inc. N1QL: Query Execution Flow Clients 1. Submit the query over REST API 8. Query result 2. Parse, Analyze, create Plan 7. Evaluate: Filter, Join, Aggregate, Sort, Paginate 3. Scan Request; index filters 6. Fetch the documents Index Service Query Service Data Service 4. Get qualified doc keys 5. Fetch Request, doc keys SELECT c_id, c_first, c_last, c_max FROM CUSTOMER WHERE c_id = 49165; { "c_first": "Joe", "c_id": 49165, "c_last": "Montana", "c_max" : 50000 }
  • 16. ©2016 Couchbase Inc.©2016 Couchbase Inc. Inside a Query Service Client FetchParse Plan Join Filter Pre-Aggregate Offset Limit ProjectSortAggregateScan Query Service Index Service Data Service
  • 17. ©2016 Couchbase Inc. 17 Indexing Options
  • 18. ©2016 Couchbase Inc.©2016 Couchbase Inc. Index Options 18 Index Type Description 1 Primary Index Index on the document key on the whole bucket 2 Named Primary Index Give name for the primary index. Allows multiple primary indexes in the cluster 3 Secondary Index Index on the key-value or document-key 4 Secondary Composite Index Index on more than one key-value 5 Functional Index Index on function or expression on key-values 6 Array Index Index individual elements of the arrays 7 Partial Index Index subset of items in the bucket 8 Covering Index Query able to answer using the the data from the index and skips retrieving the item. 9 Duplicate Index This is not type of index. Feature of indexing that allows load balancing. Thus providing scale-out, multi-dimensional scaling, performance, and high availability.
  • 19. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index 19 Get the lowest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get country, name value with id’s. Sample Document: META().id : "airline_9833” { "callsign": "Epic", "country": "United States", "iata": "FA", "icao": "4AA", "id": 9833, "name": "Epic Holiday", "type": "airline" } Type of document Count airline 187 airport 1968 route 24024 landmark 4495 hotel 917 total 31591 Type of document Count type = “airline” AND country = “United States” 127 type = “airline” AND country = “United States” AND id BETWEEN 0 AND 1000 18
  • 20. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index 20 • Using Primary Index • The data source as 31592 documents • Primary Index gets all the document keys from the index, the documents, apply predicate, sort and then paginate to return 10 documents • Using Secondary index • Predicate (type = "airline") is pushed to indexer, fetch 187 documents • Two predicates not pushed to indexer: (country = "United States" AND id BETWEEN 0 AND 1000) SELECT country, id, name FROM `travel-sample` WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000 ORDER BY id LIMIT 10 OFFSET 5; CREATE INDEX ts_ix1 ON `travel-sample`(type);
  • 21. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index Composite index on all attributes in query predicates • All predicates are pushed to indexer; fetches 18+ documents. Partial composite index • The document type can be an index condition • Because document type check is equality, remove it. • Leaner index performs better (saves I/O, memory, CPU, Network) Covering partial composite index • Add all referenced attributes to index keys. E.g., name • Covered query avoids document fetch 21 CREATE INDEX ts_ix1 ON `travel-sample`(type,id,country); CREATE INDEX ts_ix1 ON `travel-sample`(id,country) WHERE type = "airline"; CREATE INDEX ts_ix1 ON `travel-sample`(id, country, name) WHERE type = "airline";
  • 22. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index ORDER BY optimization • Index stores data is pre-sorted by the index keys • ORDER BY list should match with INDEX keys list order: left to right. • Explain index order to avoid additional fetch and sort LIMIT pushdown to indexer. 22 "spans":[ { "Range":{ "High":[ "1000", "successor("United States")" ], "Inclusion":1, "Low":[ “0", ""United States"" ] } } ] LIMIT pushing to indexer improves efficiency & performance Condition: • Exact predicates are pushed down to indexer • ORDER BY matches index key order • Indexer evaluates all of predicates • Unsupported: JOINs, GROUP BY
  • 23. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index Optimizer for ORDER BY with LIMIT • Query has equal predicate on country; id has range predicate; • This exact predicate will product exact results • Changing to: ORDER BY country, id the result will be same; LIMIT can be pushed down to indexer Offset pushdown • Pushed as (limit + offset) and query skips over limit 23 CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline"; SELECT country, id, name FROM `travel-sample` WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000 ORDER BY country, id LIMIT 10 OFFSET 5;
  • 24. ©2016 Couchbase Inc.©2016 Couchbase Inc. Final Query & Index 24 CREATE INDEX ts_ix1 ON `travel-sample`(country, id, name) WHERE type = "airline"; SELECT country, id, name FROM `travel-sample` WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000 ORDER BY country, id LIMIT 10 OFFSET 5;
  • 25. ©2016 Couchbase Inc.©2016 Couchbase Inc. Design Query & Index 25 Get the highest 6 to 15 id’s between 0 and 1000 of the airlines in the “United States”. Also get country, name value with id’s. Index & Query (For Numbers only) CREATE INDEX ts_ix1 ON `travel-sample`(country, -id, name) WHERE type = "airline"; SELECT country, -(-id), name FROM `travel-sample` WHERE type = "airline" AND country = "United States" AND -id BETWEEN -1000 AND 0 ORDER BY country, -id LIMIT 10 OFFSET 5; SELECT country, id, name FROM `travel-sample` WHERE type = "airline" AND country = "United States" AND id BETWEEN 0 AND 1000 ORDER BY country, id DESC LIMIT 10 OFFSET 5;
  • 26. ©2016 Couchbase Inc. 26 Index Design
  • 27. ©2016 Couchbase Inc.©2016 Couchbase Inc. Advice on Index Design. Part 1 • Standard GSI : Smaller mutations, larger index • MOI Index: . 100% of the data in memory. Large # of mutations. Better performance • Avoid primary index scan in production. • Avoid creating the primary index itself. • Primary scan equivalent of table scan. • Query has right predicate to choose right index • Query needs to have predicates on leading index keys • Explore all combinations of index options. • Divide and conquer with partial indexes. They support complex expressions. • Index can have large number of keys with maximum total key size: 4096. • Create the index with predicate attributes as leading keys of index, followed by non predicate attributes for covering. • If the query is not covered, index keys should only be attributes used in query predicates 27
  • 28. ©2016 Couchbase Inc.©2016 Couchbase Inc. Advice on Index Design. Part 2 • Index key order: • Attributes typically used with EQUALITY & IN predicates • Followed by BETWEEN ({<,<=} AND {>,>=}) • Followed by less than (<, <=) • Followed by (>) • If partial index condition has equal predicate on field, don’t include that field as index keys to make index LEAN (4.5.0+) • META().id is always present. If META().id not part of the predicate, don’t include in the index keys. • Only indexable META() filed is META().id, all others required fetch of the items. • Remove unused indexes. • If Index doesn’t fit in memory for MOI) use partial index. • If index is heavily used create duplicate index. • Add index nodes. 28
  • 29. ©2016 Couchbase Inc. 29 QueryTuning
  • 30. ©2016 Couchbase Inc.©2016 Couchbase Inc. Advice on Query Performance • EXPLAIN to analyze query plan • Index selection, spans for push down of as many predicates as possible. More the merrier • Pushdown of LIMIT,OFFSET • Index order for ORDER BY • Covering index • Simple COUNT queries can take advantage of index count • Exploit index for MIN queries • For ANY, ANY AND EVERY, WITHIN predicates use ARRAY index. • For UNNEST, use ARRAY index. Array key has to be the leading key (Only for UNNEST) • USE IN instead ofWITHIN • Use pretty=false (4.5.1), max_parallelism when queries return large resultset • Improve fetch performance by increasing pipeline-cap, pipeline-batch • Exploit array fetch by query rewite • Execute query and explore each phase of monitoring stats of query. • Monitor CPU and memory usage and adjust number of Query Service Nodes. . 30
  • 31. ©2016 Couchbase Inc.©2016 Couchbase Inc. SELECT: JOIN 31 SELECT COUNT(1) FROM `beer-sample` beer INNER JOIN `beer-sample` brewery ON KEYS beer.brewery_id WHERE state = ‘CA’ • JOIN operation stitches two keyspaces • JOIN criteria is based on ON KEYS clause • The outer table uses the index scan, if possible • The fetch of the inner table (brewery) document-by-document • 4.6 improves this by fetching in batches.
  • 32. ©2016 Couchbase Inc.©2016 Couchbase Inc. SELECT: JOIN SELECT COUNT(1) FROM ( SELECT RAW META().id FROM `beer-sample` beer WHERE state = ‘CA’) as blist INNER JOIN `beer-sample` brewery ON KEYS blist; 32 SELECT COUNT(1) FROM ( SELECT ARRAY_AGG(META().id) karray FROM `beer-sample` beer WHERE state = ‘CA’) as b INNER JOIN `beer-sample` brewery ON KEYS b.karray; • Why not get all of the required document IDs from the index scan then do a big bulk get on the outer table? • Two ways to do it. a) Use the array aggregate (ARRAY_AGG()) to create the list b) Use RAW to create the the array and then use that to JOIN.
  • 33. ©2016 Couchbase Inc.©2016 Couchbase Inc. DISTINCT 1. select DISTINCT type from `travel-sample`; 2. SELECT MIN(type) FROM `travel-sample` WHERE type IS NOT MISSING; 3. SELECT MIN(type) FROM `travel-sample` WHERE type > "airline"; import requests import json url="http://localhost:8093/query" s = requests.Session() s.keep_alive = True s.auth = ('Administrator','password') query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT MISSING ;'} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval'] while lastval != None: print lastval stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval'] 33
  • 34. ©2016 Couchbase Inc.©2016 Couchbase Inc. GROUP, COUNT() SELECT type, count(type) FROM `travel-sample` GROUP BY type; SELECT type, count(type) FROM `travel-sample` WHERE type IS NOT MISSING GROUP BY type; Step 1: Get the first entry in the index for the type. Step 2: Then, COUNT() from the data set where type = first-value. Step 3: Now we use the index to find the next value for type. Step 4: Repeat step 2 and 3 for all the values of type. 34
  • 35. ©2016 Couchbase Inc.©2016 Couchbase Inc. GROUP, COUNT() import requests import json url="http://localhost:8093/query" s = requests.Session() s.keep_alive = True s.auth = ('Administrator','password') query = {'statement':'SELECT MIN(type) minval FROM `travel-sample` WHERE type IS NOT MISSING ;'} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval'] while lastval != None: stmt = 'SELECT COUNT(type) tcount FROM `travel-sample` WHERE type = "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] tcount = result['tcount'] print lastval, tcount stmt = 'SELECT MIN(type) minval FROM `travel-sample` WHERE type > "' + lastval + '";'; query = {'statement':stmt} r = s.post(url, data=query, stream=False, headers={'Connection':'close'}) result = r.json()['results'][0] lastval = result['minval'] 35
  • 36. ©2016 Couchbase Inc. 36 Deployment & Configuration
  • 37. ©2016 Couchbase Inc. 37 Deployment • Couchbase Cluster Services • Data • Index • Query • FTS • Analytics • Data Service • Enough RAM to cache reads • Enough Disk to eventually persist writes • CPU primarily forView and XDCR • At least 3 nodes – Replication at the bucket level • Minimum requirements: 4GB RAM, 8 Cores CPU • Index Service • Primarily RAM and Disk IO bound • ForestDB persistence engine • MOI – Memory Optimized Index • At least 2 nodes for HA, each index replicated individually • Minimum requirements : 8GB RAM, 8 Cores CPU, fast disk • Query Service • Primarily CPU bound • Very low disk requirements • At least 2 nodes for HA – Queries automatically load balanced by CB SDKs • Minimum requirements : 8GB RAM, 16+ Cores CPU
  • 38. ©2016 Couchbase Inc. 38 Deployment • Multi Dimensional Scalability (MDS) • Option1:All services enabled on all the nodes • Option 2: Separated services – size nodes depends on workload.
  • 39. ©2016 Couchbase Inc.©2016 Couchbase Inc. Query Configuration curl -u Administrator:password http://localhost:8093/admin/settings >z.json { "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 16, "pipeline-cap": 512, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0 } 39
  • 40. ©2016 Couchbase Inc.©2016 Couchbase Inc. Query Configuration { "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 1024, "pipeline-cap": 4096, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0 } curl -u Administrator:password http://localhost:8093/admin/settings -XPOST -d@./z.json 40
  • 41. ©2016 Couchbase Inc.©2016 Couchbase Inc. Query Configuration { "completed-limit": 4000, "completed-threshold": 1000, "cpuprofile": "", "debug": false, "keep-alive-length": 16384, "loglevel": "INFO", "max-parallelism": 1, "memprofile": "", "pipeline-batch": 1024, "pipeline-cap": 4096, "request-size-cap": 67108864, "scan-cap": 0, "servicers": 32, "timeout": 0 } 41
  • 42. ©2016 Couchbase Inc.©2016 Couchbase Inc. Query Configuration -- curl -X POST -u Administrator:<password> http://127.0.0.1:9000/diag/eval/ -d 'ns_config:set({node, node(), {query, extra_args}}, ["-- pipeline-batch=1024", "--pipeline-cap =4096"])' 42 • Updating parameters via ns_server changes the values permanently • The values survive the restart • You can change any of the parameters over command line
  • 43. ©2016 Couchbase Inc.©2016 Couchbase Inc. Query Configuration. -acctstore="gometrics:": Accounting store address (http://URL or stub:) -certfile="": HTTPS certificate file -completed-limit=4000: maximum number of completed requests -completed-threshold=1000: cache completed query lasting longer than this many milliseconds -configstore="stub:": Configuration store address (http://URL or stub:) -cpuprofile="": write cpu profile to file -datastore="": Datastore address (http://URL or dir:PATH or mock:) -debug=false: Debug mode -enterprise=true: Enterprise mode -http=":8093": HTTP service address -https=":18093": HTTPS service address -keep-alive-length=16384: maximum size of buffered result -keyfile="": HTTPS private key file -logger="": Logger implementation -loglevel="info": Log level: debug, trace, info, warn, error, severe, none -max-parallelism=1: Maximum parallelism per query; use zero or negative value to disable -memprofile="": write memory profile to this file -metrics=true: Whether to provide metrics -mutation-limit=0: Maximum LIMIT for data modification statements; use zero or negative value to disable -namespace="default": Default namespace -order-limit=0: Maximum LIMIT for ORDER BY clauses; use zero or negative value to disable -pipeline-batch=16: Number of items execution operators can batch -pipeline-cap=512: Maximum number of items each execution operator can buffer -plus-servicers=256: Plus servicer count -pretty=true: Pretty output -readonly=false: Read-only mode -request-cap=1024: Maximum number of queued requests per logical CPU -request-size-cap=67108864: Maximum size of a request -scan-cap=0: Maximum buffer size for primary index scans; use zero or negative value to disable -servicers=64: Servicer count -signature=true: Whether to provide signature -ssl_minimum_protocol="tlsv1": TLS minimum version ('tlsv1'/'tlsv1.1'/'tlsv1.2') -static-path="static": Path to static content -timeout=0: Server execution timeout, e.g. 500ms or 2s; use zero or negative value to disable 43
  • 44. ©2016 Couchbase Inc. 44 Keshav Murthy Director keshav@couchbase.com SitaramVemulapalli Sr. Software Engineer Sitaram.vemulapalli@couchbase.com

Editor's Notes

  • #2: There are three things important in databases: performance, performance, performance. From a simple query to fetch a document to a query joining millions of documents, designing the right data models and indexes is important. There are many indices you can create, and many options you can choose for each index. This talk will help you understand tuning N1QL query, exploiting various types of indices, analyzing the system behavior, and sizing them correctly.
  • #9: So, finally, you have a JSON document that represents a CUSTOMER. In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
  • #17: Data-parallel — Query latency scales up with cores Memory-bound
  • #23: Query doesn’t have any JOINs, GROUP BY or other clauses that can change the produced by indexer.