P9 speed of-light faceted search via oracle in-memory option by alexander tokarev

Alexander Tokarev
Oracle In-Memory from trenches
Speed-of-light faceted search
deep tech dive
NOUG

• Alexander Tokarev
• Age 38
• Database performance architect at DataArt:
1. Solution design
2. Performance tuning
3. POCs and crazy ideas
• First experience with Oracle - 2001, Oracle 8.1.7
• First experience with In-Memory solutions - 2015
• Lovely In-Memory databases:
1. Oracle InMemory
2. Exasol
3. Tarantool
• Hobbies: spearfishing, drums
Who am I

DataArt
Consulting, solution design, re-engineering
20 development centers
>2500 employees
Famous clients: NASDAQ
S&P
Ocado
JacTravel
Maersk
Regus and etc
Who is my employer

 Faceted search
 Project
 Architecture
 Faceted search place
 Performance issues
 In-memory internals
 Implementation steps and traps
 Key findings
 Q&A
Agenda

Faceted search
Facet
Constraint
Facet count

Facet types
Values/Terms
Interval
Range

1. Filter by multiple taxonomies
2. Combine text, taxonomy terms and numeric values
3. Discover relationships or trends between objects
4. Make huge items volume navigable
5. Simplify "advanced" search UI
What for

• Tag-based
• Plain-structure based
Faceted search base
Object Author Category Format Price
Days to
deliver
The Oracle Book: Answers to
Life's Questions
Cerridwen
Greenleaf Fortune telling Paper 18 4
Ask a Question, Find Your Fate:
The Oracle Book Ben Hale Fantasy Kindle 12 2
Object Tags
Book 1 Paper, Fortune telling, Very good book, worth reading
Book 2 Ben Halle, Fantasy, Kindle

Tags
• Keyword assigned to an object
• Chosen informally and personally by item's creator or viewer
• Assigned by the viewer + unlimited = Folksonomy
• Assigned by the creator + limited = Taxonomy

1. Facet source: taxonomy + folksonomy
2. Facet types: terms mostly
3. Implementation type: tag-based
Our case facets

Our case statistics
Extracted entities: Objects, Tags, Tags of objects, Facet types
Date: 2016 to 2017
Tagged objects: 3 000 000
Applied tags: 42 000 000
Unique tags count: 100 000
Max tags count for an object: 15
Max tag length: 50
Facets count: 150
Data volume = StackOveflow x 3!

Architecture
Oracle
main
Oracle
DG
Application
server
Application
server
Load
balancer
Client
browser
10000 RPS
In-memory
cluster

Architecture
~20 fine-tuned SQL queries

Database structure
OLTP DWH
ONLY ACTIVE VALUES

0.0010.010.11
Search by PK
Search by 1 tag
Search by 5 tags
No In-Memory
Performance
* Logarithmic scale

Full Text Search server
Act III
To FTS, or not to FTS, that is the
question
Solution design

Implementation
It isn’t our case completely!

1. Limited POC resources: people + time
2. Customer has a license
3. Wide searсh table
4. A lot of rows
5. A lot of equal values:
• Object types
• Facet types
• Tag names
6. Size is fine for InMemory
7. Queries use a lot of aggregate functions
Why to try

Dual storage format!
Internals

InMemory size
Options Volume, Gb
data in row format 6,5
no compress InMememory 7,2

Performance profit
2x <> 21x!
Where is our performance?!

InMemory internals
1. IMCU – InMemory Compression Unit
Size = 1 Mb
Columnar Data
2. SMU – Snapshot Metadata Unit
Size = 64 Kb
Zone Map Based Index

Zone Maps
ZoneMap on
State column

InMemory Zone Maps
Doesn’t work!!!

Zone Maps pruning fix
1. CREATE TABLE AS SELECT * FROM … ORDER BY
2. CLUSTERING BY LINEAR ORDER YES ON DATA MOVEMENT
+
ALTER TABLE MOVE ONLINE

InMemory size
Options
Volume,
Gb
Load time,
seconds
data in row format 6,5 0
no compress InMememory 7,2 40
memcompress for dml 6 45
memcompress for query low 4 45
memcompress for query high 2,5 49
memcompress for capacity low 3,5 48
memcompress for capacity high 2 50
No significant differences for loading!

FastStart
3x loading speed boost!
Columnar format

Test rerun
Where is our performance?!
Same metrics!

InMemory aggregations
Sort of BLOOM
filter

Zone Maps and JOINs
1. Zone Maps - not efficient!
2. Key Vector transformation
ignored – replaced by Bloom!
The same for VARCHAR!

Join groups
Sort of Oracle JOIN CLUSTER

Join groups limitations
1. JOIN condition must have only JOIN GROUP fields
2. Compression ratio should be equal for 2 tables
3. Simplest and not wide data types required
4. Fails after big DML
5. Required additional PGA
6. FastStart doesn’t work
Hard to predict!

Table structure
1. Java changed
2. all indexes dropped

Final table structure
Searchable via Full Text Search indexes!

Final table structure
+ all indexes are dropped
No InMemory

• Sporadic search degradation – 5-10%
• Happens when a lot of DML operations
Performance spikes

1. Changed records -> Mark as stale
2. Stale records -> Read from row storage
• buffer cache
• disk
3. Stale records -> repopulate IMCU:
• Staleness threshold
• Trickle repopulation process - each 2 minutes
• processes count - INMEMORY_MAX_REPOPULATE_SERVERS
• processes utilization - INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT
Transaction processing

Row storage read evidences
Should be equal 0!

1. INMEMORY_MAX_REPOPULATE_SERVERS = 4
2. INMEMORY_TRICKLE_REPOPULATE_SERVERS_PERCENT = 8
Performance spikes elimination

Troubleshooting
More than 500 statistics!
More than 190 parameters!

Troubleshooting
8 statistics covers 90% cases!

Troubleshooting
3 parameters covers 90% cases!

Troubleshooting
View name Description
V$IM_COL_CU SMU detailed information per column
V$IM_SMU_HEAD
+
V$IM_HEADER SMU header statistics
v$IM_SEGMENTS InMemory segment parameters
4 views covers 90% cases!

Performance with In-Memory final

 InMemory size <> table data size
 All data InMemory <> High performance
 Decent time to be loaded
 8 IM statistics is enough
 Big updates => full inmemory repopulation
 Small updates => trickle repopulation
 ROLLBACK after DML = repopulation
 Trickle parameters relevant to workload
DBAs findings

 Advanced IM features <> significant profit
 Zone maps = numeric and date data type only
 Dictionary pruning – not in Oracle 12.2 and 18
 Simple data types = High performance
 High compression <> Slow ingestion
Developers finding

 Extra memory 
 POC with IM DWH
 523 IM statistics + 190 parameters
 FastStart options
 18c features
Furthers plans

 Always try and measure
 IM works for short queries as well
 Understanding of IM internals is a must
 Application changes are required
 No extra software/hardware introduced
 Fast POC followed by production deploy
 4x times performance boost
 Huge license costs 
Conclusion

Thank you for your time!
Alexander Tokarev
Database expert
DataArt
shtock@mail.ru
https://github.com/shtock
https://www.linkedin.com/in/alexander-tokarev-14bab230 NOUG

P9 speed of-light faceted search via oracle in-memory option by alexander tokarev

More Related Content

Similar to P9 speed of-light faceted search via oracle in-memory option by alexander tokarev (20)

More from Alexander Tokarev (20)

Recently uploaded (20)

P9 speed of-light faceted search via oracle in-memory option by alexander tokarev

Editor's Notes