SlideShare a Scribd company logo
(UN)ORTHODOX
THE JOURNEY OF AN OPTIMIZATION
SPEED
SCALE
SERVICE
SIAN LERK LAU
linkedin.com/in/sianlerk
@kiawin
A little bit about
Jewel…We’re a Singapore-based company with offices in KL and HK too.
We’re hiring!
We’re always looking for talented developers, data scientists, data engineers, devops engineers and more!
Check out our Careers page at
https://www.jewelpaymentech.com/careers.html
Image
Jewel Paymentech
© Copyright 2018 Jewel Paymentech Pte .Ltd. – Reproduction and distribution of this presentation without written permission is
prohibited.
VISION
JOIN THE TEAM
Confidenti
al
Python Developers
Data Scientists
Data Engineers
And many more…
Scan the QR code to
visit our Careers
page:
BACKGROUND
(OUR STORY)
Mohamad Yusri Shaharin Woon Wai Keen
BACKGROUND
(OUR STORY)
The journey of an (un)orthodox optimization
The journey of an (un)orthodox optimization
SPEED.
SCALE.
SERVICE.
THAT’S A LOT OF
WORK TO DO
PROJECT 300s.
4 PROBLEMS
SINGLE
QUEUE
1
1NF, 2NF, 3NF, BCNF, 4NF
2
THE
IDEALS
THE
REALITY
*SQL .
3
(our data)
SIZE DOES MATTER
result:
+----------------------+
| count(DISTINCT r.id) |
+----------------------+
| 294493 |
+----------------------+
1 row in set (4.58 sec)
JOIN PARTY!
MariaDB> SELECT count(DISTINCT r.id) FROM resource r LEFT JOIN resource_location rl
ON (r.id = rl.resource AND ...) LEFT JOIN `ssl` s ON (r.ssl = s.id) LEFT JOIN
location ON (rl.location = location.id AND ...) LEFT JOIN user pub ON (r.publisher =
pub.id AND ...) LEFT JOIN user resOp ON (pub.principal = resOp.id AND ...) LEFT JOIN
user edgeOp ON (location.operator = edgeOp.id AND ...) LEFT JOIN peer ON (location.id
= peer.location AND peer.buyer = resOp.id) LEFT JOIN edge ON (location.id =
edge.location AND ... AND (edge.operator = resOp.id OR (edge.operator = peer.seller
AND ...)) ) LEFT JOIN location edgeLocation ON (edgeLocation.id = edge.location);
(resource_location_sql)
MORE JOIN PARTIES!
MariaDB> SELECT count(DISTINCT r.id) FROM resource r JOIN resource_edgeGroup reg ON
(r.id = reg.resource AND ...) LEFT JOIN `ssl` s ON (r.ssl = s.id) LEFT JOIN edgeGroup
eg ON (eg.id = reg.edgeGroup AND ...) LEFT JOIN edgeGroup_location egl ON (eg.id =
egl.edgeGroup) LEFT JOIN location ON (egl.location = location.id AND ...) LEFT JOIN
user pub ON (r.publisher = pub.id AND ...) LEFT JOIN user resOp ON (pub.principal =
resOp.id AND ...) LEFT JOIN user edgeOp ON (location.operator = edgeOp.id AND ...)
LEFT JOIN peer ON (location.id = peer.location AND peer.buyer = resOp.id ) LEFT JOIN
edge ON (location.id = edge.location AND ... AND (edge.operator = resOp.id OR
(edge.operator = peer.seller AND ...)) ) LEFT JOIN location edgeLocation ON
(edgeLocation.id = edge.location);
(resource_edgegroup_sql)
THE COST OF THE PARTIES
result:
1272632 rows in set (6 min 34.43 sec)
MySQL FTW.
3+
IT’S ALL ABOUT INDEXING
MariaDB> show create table resourceG
*************************** 1. row ***************************
Table: resource
Create Table: CREATE TABLE `resource` (
...
PRIMARY KEY (`id`),
UNIQUE KEY `publishedName_extant` (`publishedName`,`extant`),
KEY `publisher` (`publisher`),
KEY `pushOrigin` (`pushOrigin`),
KEY `ssl` (`ssl`),
...
)
1 row in set (0.00 sec)
Temporary Table is Faster *
CREATE OR REPLACE TEMPORARY TABLE active_resource (
PRIMARY KEY(`id`),
KEY `publisher` (`publisher`),
KEY `pushOrigin` (`pushOrigin`),
KEY `ssl` (`ssl`)
) AS (
SELECT ...
FROM resource
WHERE ... AND extant = 1
);
CREATE OR REPLACE TEMPORARY TABLE active_peer (
...
) AS (
...
);
CREATE OR REPLACE TEMPORARY TABLE active_edge (
...
) AS (
...
);
CREATE OR REPLACE TEMPORARY TABLE temp_result AS (
SELECT DISTINCT
r.id AS resource,
...
FROM active_resource r
JOIN resource_edgeGroup reg ON (r.id = reg.resource)
LEFT JOIN `ssl` s ON (r.ssl = s.id)
LEFT JOIN edgeGroup eg ON (eg.id = reg.edgeGroup)
LEFT JOIN edgeGroup_location egl ON (eg.id = egl.edgeGroup)
LEFT JOIN location ON (egl.location = location.id AND ...)
LEFT JOIN user pub ON (r.publisher = pub.id AND ...)
LEFT JOIN user resOp ON (pub.principal = resOp.id AND ...)
LEFT JOIN user edgeOp ON (location.operator = edgeOp.id AND ...)
LEFT JOIN active_peer peer ON (location.id = peer.location AND
LEFT JOIN active_edge edge ON (location.id = edge.location AND
(edge.operator = resOp.id OR (edge.operator = peer.seller AND ...)
)
LEFT JOIN location edgeLocation ON (edgeLocation.id = edge.location
WHERE eg.status = "ACTIVE"
AND location.status = "ACTIVE" AND location.extant = 1
AND pub.status IN ("RESTRICTED", "ACTIVE") AND pub.extant = 1
AND resOp.status = "ACTIVE" AND resOp.extant = 1
AND edgeOp.status IN ("RESTRICTED", "ACTIVE") AND edgeOp.extant = 1
);
Temporary Table is Faster *
FOR ALL WE KNOW ABOUT MYSQL
result:
Query OK, 22127 rows affected (1.09 sec)
Query OK, 11183 rows affected (0.44 sec)
Query OK, 404 rows affected (0.02 sec)
Query OK, 966172 rows affected (3 min 50.94 sec)
(correction: MariaDB)
PROFILING MYSQL
mysql> SET PROFILING=1;
mysql> SELECT * FROM resource;
mysql> SHOW PROFILES;
+----------+----------+--------------------------+
| Query_ID | Duration | Query |
+----------+----------+--------------------------+
| 0 | 0.000088 | SET PROFILING = 1 |
| 1 | 0.000136 | SELECT * FROM resource |
+----------+----------+--------------------------+
2 rows in set (0.00 sec)
Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
PROFILING MYSQL
mysql> SHOW PROFILE FOR QUERY 1;
+--------------------+----------+
| Status | Duration |
+--------------------+----------+
| query end | 0.000107 |
| freeing items | 0.000008 |
| logging slow query | 0.000015 |
| cleaning up | 0.000006 |
+--------------------+----------+
4 rows in set (0.00 sec)
Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
PROFILING MYSQL
mysql> SHOW PROFILE CPU FOR QUERY 1;
+----------------------+----------+----------+------------+
| Status | Duration | CPU_user | CPU_system |
+----------------------+----------+----------+------------+
| checking permissions | 0.000040 | 0.000038 | 0.000002 |
| creating table | 0.000056 | 0.000028 | 0.000028 |
| After create | 0.011363 | 0.000217 | 0.001571 |
| query end | 0.000375 | 0.000013 | 0.000028 |
| freeing items | 0.000089 | 0.000010 | 0.000014 |
| logging slow query | 0.000019 | 0.000009 | 0.000010 |
| cleaning up | 0.000005 | 0.000003 | 0.000002 |
+----------------------+----------+----------+------------+
7 rows in set (0.00 sec)
Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
“Self-Documenting” Code .
4
WHICH
ONE?
REALITY
HURTS
(if not your eyes)
SETTING OUR BASELINE
sianlerk@host:~$ time python oldbdbtest.py
Resource Locations: 23695
Resource Locations (non-empty): 18718
Resource Type: 23695
name=oldbdbtest.py level=INFO {"msg": "Store resource completed",
"completed_ms": 347310}
real 5m48.521s
user 0m43.167s
sys 0m1.264s
#notbad #maybe
if … not in …
resource_id = x['resource']
if resource_id not in existing_resource_locations:
existing_resource_locations[resource_id] = []
Big-O NOTATION
1. if … not in …
resource_id = x['resource']
if resource_id not in existing_resource_locations:
existing_resource_locations[resource_id] = []
O(n)
1. USE setdefault FOR NEW value
resource_id = x['resource']
if resource_id not in existing_resource_locations:
existing_resource_locations[resource_id] = []
resource_id = x['resource']
existing_resource_locations.setdefault(resource_id, [])
O(1)
2. NO set for uniqueness
resource_locations = set()
O(n)
2. USE DISTINCT for uniqueness
resource_locations = set()
SELECT DISTINCT id ...
3. USE MORE dict FOR COMMON USED value
while True:
x = cursor.fetchone()
if not x:
break
resource_id = x['resource']
resource_operator[x['resource']] = x['resourceOperator']
resource_type[resource_id] = x["type"]
O(1)
4. USE get
if resource_is_http:
edges = mart_location_http_edges.get(location_id, {}) if
is_marketplace else location_http_edges.get(location_id, {})
elif resource_is_stream:
edges = mart_location_stream_edges.get(location_id, {}) if
is_marketplace else location_stream_edges.get(location_id, {})
else:
continue
O(1)
5. MORE dict REPLACING JOIN
# The problematic resource_edgegroup_sql
- resource
- ssl
- edgeGroup
- location
- user, user, user (!!)
- peer
- edge
- location (again)
5. MORE dict REPLACING JOIN
# The problematic resource_edgegroup_sql
- resource
- ssl
- edgeGroup
- location
- user, user, user (!!)
- peer
- edge
- location (again)
PROCESS OF ELIMINATION
(by isolation)
LO AND BEHOLD
(5th Attempt)
sianlerk@host:~$ time python bdb-v5.py
Resource Locations: 19071
Resource Locations (non-empty): 18730
Resource Operator: 19071
Resource Type: 19071
real 0m47.964s
user 0m34.398s
sys 0m0.804s
#notbad
GOOD JOB
HOWEVER
FASTER?
6. FULLY DECONSTRUCT JOIN
# The problematic resource_edgegroup_sql
- resource
- ssl
- edgeGroup
- location
- user, user, user (!!)
- peer
- edge
- location (again)
7. USE try … except … *
try:
edgegroups = resource_edgegroups[resource_id]
except KeyError:
continue
(9th Attempt)
sianlerk@host:~$ time python bdb-v9.py
Resource Locations: 18865
Resource Locations (non-empty): 18525
Resource Operator: 19022
Resource Type: 19022
real 0m17.424s
user 0m15.069s
sys 0m0.596s
OVER-ENGINEERING
(we forgot, let the master does its job)
8. KEEP CALM, dict IS NOT THE SOLUTION
resource_edgegroup_sql = """
SELECT
reg.resource, reg.edgeGroup
FROM resource_edgeGroup reg
LEFT JOIN edgeGroup eg ON (eg.id = reg.edgeGroup)
WHERE eg.status = "ACTIVE"
"""
(for everything)
8. KEEP CALM, USE GROUP_CONCAT
SELECT
r.id AS resource,
...
GROUP_CONCAT(reg.edgeGroup) AS edgeGroups,
...
(10th Attempt)
sianlerk@host:~$ time python bdb-v10.py
Resource Locations: 18865
Resource Locations (non-empty): 18525
Resource Operator: 19022
Resource Type: 19022
real 0m16.764s
user 0m13.549s
sys 0m0.512s
(11th Attempt)
sianlerk@host:~$ time python bdb-v11.py
Resource Locations: 18865
Resource Locations (non-empty): 18525
Resource Operator: 19022
Resource Type: 19022
real 0m15.207s
user 0m11.945s
sys 0m0.384s
(12th Attempt)
sianlerk@host:~$ time python bdb-v12.py
Resource Locations: 18525
Resource Locations (non-empty): 18525
Resource Operator: 19022
Resource Type: 19022
real 0m14.721s
user 0m11.313s
sys 0m0.380s
#jobdone
DISCLAIMER
(we have to credit one more thing)
Python Profiling
# Part 1 - Execute the code with cProfile
python -m cProfile -o something.prof something.py
# Part 2 - Visualize the profiling result with SnakeViz
sianlerk@host~$ pip install snakeviz
sianlerk@host~$ snakeviz result.prof
THANK YOU
linkedin.com/in/sianlerk
@kiawin

More Related Content

PDF
New SPL Features in PHP 5.3
PDF
PHP and MySQL Tips and tricks, DC 2007
KEY
Spl Not A Bridge Too Far phpNW09
PDF
Great BigTable and my toys
PDF
Solr & Lucene @ Etsy by Gregg Donovan
PPTX
Php forum2015 tomas_final
PDF
The Ring programming language version 1.5.4 book - Part 45 of 185
KEY
Potential Friend Finder
New SPL Features in PHP 5.3
PHP and MySQL Tips and tricks, DC 2007
Spl Not A Bridge Too Far phpNW09
Great BigTable and my toys
Solr & Lucene @ Etsy by Gregg Donovan
Php forum2015 tomas_final
The Ring programming language version 1.5.4 book - Part 45 of 185
Potential Friend Finder

What's hot (20)

PPTX
BGOUG15: JSON support in MySQL 5.7
PDF
PHP 7 – What changed internally? (Forum PHP 2015)
PDF
PHP 7 – What changed internally?
PDF
The Ring programming language version 1.8 book - Part 50 of 202
PDF
The Ring programming language version 1.5.1 book - Part 43 of 180
PDF
Solr @ Etsy - Apache Lucene Eurocon
ODP
Terraform for fun and profit
PDF
PHP 7 – What changed internally? (PHP Barcelona 2015)
PDF
Living with garbage
PDF
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
PDF
Internationalizing CakePHP Applications
TXT
Ufind proxo(cucurhatan).cfg
PDF
Using JSON with MariaDB and MySQL
PDF
The Ring programming language version 1.6 book - Part 47 of 189
PDF
Everything About PowerShell
PDF
令和から本気出す
PDF
Solr's Search Relevancy (Understand Solr's query debug)
PDF
Rug hogan-10-03-2012
PDF
The Future of JavaScript (SXSW '07)
PDF
Solr & Lucene at Etsy
BGOUG15: JSON support in MySQL 5.7
PHP 7 – What changed internally? (Forum PHP 2015)
PHP 7 – What changed internally?
The Ring programming language version 1.8 book - Part 50 of 202
The Ring programming language version 1.5.1 book - Part 43 of 180
Solr @ Etsy - Apache Lucene Eurocon
Terraform for fun and profit
PHP 7 – What changed internally? (PHP Barcelona 2015)
Living with garbage
PHP data structures (and the impact of php 7 on them), phpDay Verona 2015, Italy
Internationalizing CakePHP Applications
Ufind proxo(cucurhatan).cfg
Using JSON with MariaDB and MySQL
The Ring programming language version 1.6 book - Part 47 of 189
Everything About PowerShell
令和から本気出す
Solr's Search Relevancy (Understand Solr's query debug)
Rug hogan-10-03-2012
The Future of JavaScript (SXSW '07)
Solr & Lucene at Etsy
Ad

Similar to The journey of an (un)orthodox optimization (20)

PDF
Percona live-2012-optimizer-tuning
ODP
Beyond php - it's not (just) about the code
PDF
Migration from mysql to elasticsearch
ODP
Beyond php it's not (just) about the code
ODP
Beyond PHP - It's not (just) about the code
ODP
Beyond PHP - it's not (just) about the code
ODP
Beyond php - it's not (just) about the code
PPTX
Infrastructure review - Shining a light on the Black Box
ODP
Beyond php - it's not (just) about the code
PDF
Beyond php - it's not (just) about the code
ODP
Beyond php - it's not (just) about the code
PDF
Quick Wins
PDF
Scaling MySQL Strategies for Developers
ODP
Beyond php - it's not (just) about the code
PDF
SQL: Query optimization in practice
PDF
MySQL Query Optimisation 101
PDF
New optimizer features in MariaDB releases before 10.12
PDF
Five steps perform_2013
PDF
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
PDF
MySQL under the siege
Percona live-2012-optimizer-tuning
Beyond php - it's not (just) about the code
Migration from mysql to elasticsearch
Beyond php it's not (just) about the code
Beyond PHP - It's not (just) about the code
Beyond PHP - it's not (just) about the code
Beyond php - it's not (just) about the code
Infrastructure review - Shining a light on the Black Box
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Quick Wins
Scaling MySQL Strategies for Developers
Beyond php - it's not (just) about the code
SQL: Query optimization in practice
MySQL Query Optimisation 101
New optimizer features in MariaDB releases before 10.12
Five steps perform_2013
Simon Elliston Ball – When to NoSQL and When to Know SQL - NoSQL matters Barc...
MySQL under the siege
Ad

More from Sian Lerk Lau (7)

PDF
Solving performance issues in Django ORM
PDF
Velocity. Agility. Python. (Pycon APAC 2017)
PDF
DevOps - Myth or Real
PPTX
Learning python with flask (PyLadies Malaysia 2017 Workshop #1)
PDF
Python and you
PDF
Quality of life through Unit Testing
PDF
Install Archlinux in 10 Steps (Sort of) :)
Solving performance issues in Django ORM
Velocity. Agility. Python. (Pycon APAC 2017)
DevOps - Myth or Real
Learning python with flask (PyLadies Malaysia 2017 Workshop #1)
Python and you
Quality of life through Unit Testing
Install Archlinux in 10 Steps (Sort of) :)

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Tartificialntelligence_presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles - August'25-Week II
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
MYSQL Presentation for SQL database connectivity
Tartificialntelligence_presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Reach Out and Touch Someone: Haptics and Empathic Computing

The journey of an (un)orthodox optimization

  • 1. (UN)ORTHODOX THE JOURNEY OF AN OPTIMIZATION
  • 4. A little bit about Jewel…We’re a Singapore-based company with offices in KL and HK too. We’re hiring! We’re always looking for talented developers, data scientists, data engineers, devops engineers and more! Check out our Careers page at https://www.jewelpaymentech.com/careers.html Image Jewel Paymentech © Copyright 2018 Jewel Paymentech Pte .Ltd. – Reproduction and distribution of this presentation without written permission is prohibited. VISION
  • 5. JOIN THE TEAM Confidenti al Python Developers Data Scientists Data Engineers And many more… Scan the QR code to visit our Careers page:
  • 7. Mohamad Yusri Shaharin Woon Wai Keen
  • 12. THAT’S A LOT OF WORK TO DO
  • 16. 1NF, 2NF, 3NF, BCNF, 4NF 2
  • 20. SIZE DOES MATTER result: +----------------------+ | count(DISTINCT r.id) | +----------------------+ | 294493 | +----------------------+ 1 row in set (4.58 sec)
  • 21. JOIN PARTY! MariaDB> SELECT count(DISTINCT r.id) FROM resource r LEFT JOIN resource_location rl ON (r.id = rl.resource AND ...) LEFT JOIN `ssl` s ON (r.ssl = s.id) LEFT JOIN location ON (rl.location = location.id AND ...) LEFT JOIN user pub ON (r.publisher = pub.id AND ...) LEFT JOIN user resOp ON (pub.principal = resOp.id AND ...) LEFT JOIN user edgeOp ON (location.operator = edgeOp.id AND ...) LEFT JOIN peer ON (location.id = peer.location AND peer.buyer = resOp.id) LEFT JOIN edge ON (location.id = edge.location AND ... AND (edge.operator = resOp.id OR (edge.operator = peer.seller AND ...)) ) LEFT JOIN location edgeLocation ON (edgeLocation.id = edge.location); (resource_location_sql)
  • 22. MORE JOIN PARTIES! MariaDB> SELECT count(DISTINCT r.id) FROM resource r JOIN resource_edgeGroup reg ON (r.id = reg.resource AND ...) LEFT JOIN `ssl` s ON (r.ssl = s.id) LEFT JOIN edgeGroup eg ON (eg.id = reg.edgeGroup AND ...) LEFT JOIN edgeGroup_location egl ON (eg.id = egl.edgeGroup) LEFT JOIN location ON (egl.location = location.id AND ...) LEFT JOIN user pub ON (r.publisher = pub.id AND ...) LEFT JOIN user resOp ON (pub.principal = resOp.id AND ...) LEFT JOIN user edgeOp ON (location.operator = edgeOp.id AND ...) LEFT JOIN peer ON (location.id = peer.location AND peer.buyer = resOp.id ) LEFT JOIN edge ON (location.id = edge.location AND ... AND (edge.operator = resOp.id OR (edge.operator = peer.seller AND ...)) ) LEFT JOIN location edgeLocation ON (edgeLocation.id = edge.location); (resource_edgegroup_sql)
  • 23. THE COST OF THE PARTIES result: 1272632 rows in set (6 min 34.43 sec)
  • 25. IT’S ALL ABOUT INDEXING MariaDB> show create table resourceG *************************** 1. row *************************** Table: resource Create Table: CREATE TABLE `resource` ( ... PRIMARY KEY (`id`), UNIQUE KEY `publishedName_extant` (`publishedName`,`extant`), KEY `publisher` (`publisher`), KEY `pushOrigin` (`pushOrigin`), KEY `ssl` (`ssl`), ... ) 1 row in set (0.00 sec)
  • 26. Temporary Table is Faster * CREATE OR REPLACE TEMPORARY TABLE active_resource ( PRIMARY KEY(`id`), KEY `publisher` (`publisher`), KEY `pushOrigin` (`pushOrigin`), KEY `ssl` (`ssl`) ) AS ( SELECT ... FROM resource WHERE ... AND extant = 1 ); CREATE OR REPLACE TEMPORARY TABLE active_peer ( ... ) AS ( ... ); CREATE OR REPLACE TEMPORARY TABLE active_edge ( ... ) AS ( ... );
  • 27. CREATE OR REPLACE TEMPORARY TABLE temp_result AS ( SELECT DISTINCT r.id AS resource, ... FROM active_resource r JOIN resource_edgeGroup reg ON (r.id = reg.resource) LEFT JOIN `ssl` s ON (r.ssl = s.id) LEFT JOIN edgeGroup eg ON (eg.id = reg.edgeGroup) LEFT JOIN edgeGroup_location egl ON (eg.id = egl.edgeGroup) LEFT JOIN location ON (egl.location = location.id AND ...) LEFT JOIN user pub ON (r.publisher = pub.id AND ...) LEFT JOIN user resOp ON (pub.principal = resOp.id AND ...) LEFT JOIN user edgeOp ON (location.operator = edgeOp.id AND ...) LEFT JOIN active_peer peer ON (location.id = peer.location AND LEFT JOIN active_edge edge ON (location.id = edge.location AND (edge.operator = resOp.id OR (edge.operator = peer.seller AND ...) ) LEFT JOIN location edgeLocation ON (edgeLocation.id = edge.location WHERE eg.status = "ACTIVE" AND location.status = "ACTIVE" AND location.extant = 1 AND pub.status IN ("RESTRICTED", "ACTIVE") AND pub.extant = 1 AND resOp.status = "ACTIVE" AND resOp.extant = 1 AND edgeOp.status IN ("RESTRICTED", "ACTIVE") AND edgeOp.extant = 1 ); Temporary Table is Faster *
  • 28. FOR ALL WE KNOW ABOUT MYSQL result: Query OK, 22127 rows affected (1.09 sec) Query OK, 11183 rows affected (0.44 sec) Query OK, 404 rows affected (0.02 sec) Query OK, 966172 rows affected (3 min 50.94 sec) (correction: MariaDB)
  • 29. PROFILING MYSQL mysql> SET PROFILING=1; mysql> SELECT * FROM resource; mysql> SHOW PROFILES; +----------+----------+--------------------------+ | Query_ID | Duration | Query | +----------+----------+--------------------------+ | 0 | 0.000088 | SET PROFILING = 1 | | 1 | 0.000136 | SELECT * FROM resource | +----------+----------+--------------------------+ 2 rows in set (0.00 sec) Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
  • 30. PROFILING MYSQL mysql> SHOW PROFILE FOR QUERY 1; +--------------------+----------+ | Status | Duration | +--------------------+----------+ | query end | 0.000107 | | freeing items | 0.000008 | | logging slow query | 0.000015 | | cleaning up | 0.000006 | +--------------------+----------+ 4 rows in set (0.00 sec) Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
  • 31. PROFILING MYSQL mysql> SHOW PROFILE CPU FOR QUERY 1; +----------------------+----------+----------+------------+ | Status | Duration | CPU_user | CPU_system | +----------------------+----------+----------+------------+ | checking permissions | 0.000040 | 0.000038 | 0.000002 | | creating table | 0.000056 | 0.000028 | 0.000028 | | After create | 0.011363 | 0.000217 | 0.001571 | | query end | 0.000375 | 0.000013 | 0.000028 | | freeing items | 0.000089 | 0.000010 | 0.000014 | | logging slow query | 0.000019 | 0.000009 | 0.000010 | | cleaning up | 0.000005 | 0.000003 | 0.000002 | +----------------------+----------+----------+------------+ 7 rows in set (0.00 sec) Disclaimer: Adapted from https://dev.mysql.com/doc/refman/8.0/en/show-profile.html
  • 35. SETTING OUR BASELINE sianlerk@host:~$ time python oldbdbtest.py Resource Locations: 23695 Resource Locations (non-empty): 18718 Resource Type: 23695 name=oldbdbtest.py level=INFO {"msg": "Store resource completed", "completed_ms": 347310} real 5m48.521s user 0m43.167s sys 0m1.264s
  • 37. if … not in … resource_id = x['resource'] if resource_id not in existing_resource_locations: existing_resource_locations[resource_id] = []
  • 39. 1. if … not in … resource_id = x['resource'] if resource_id not in existing_resource_locations: existing_resource_locations[resource_id] = [] O(n)
  • 40. 1. USE setdefault FOR NEW value resource_id = x['resource'] if resource_id not in existing_resource_locations: existing_resource_locations[resource_id] = [] resource_id = x['resource'] existing_resource_locations.setdefault(resource_id, []) O(1)
  • 41. 2. NO set for uniqueness resource_locations = set() O(n)
  • 42. 2. USE DISTINCT for uniqueness resource_locations = set() SELECT DISTINCT id ...
  • 43. 3. USE MORE dict FOR COMMON USED value while True: x = cursor.fetchone() if not x: break resource_id = x['resource'] resource_operator[x['resource']] = x['resourceOperator'] resource_type[resource_id] = x["type"] O(1)
  • 44. 4. USE get if resource_is_http: edges = mart_location_http_edges.get(location_id, {}) if is_marketplace else location_http_edges.get(location_id, {}) elif resource_is_stream: edges = mart_location_stream_edges.get(location_id, {}) if is_marketplace else location_stream_edges.get(location_id, {}) else: continue O(1)
  • 45. 5. MORE dict REPLACING JOIN # The problematic resource_edgegroup_sql - resource - ssl - edgeGroup - location - user, user, user (!!) - peer - edge - location (again)
  • 46. 5. MORE dict REPLACING JOIN # The problematic resource_edgegroup_sql - resource - ssl - edgeGroup - location - user, user, user (!!) - peer - edge - location (again)
  • 49. (5th Attempt) sianlerk@host:~$ time python bdb-v5.py Resource Locations: 19071 Resource Locations (non-empty): 18730 Resource Operator: 19071 Resource Type: 19071 real 0m47.964s user 0m34.398s sys 0m0.804s
  • 54. 6. FULLY DECONSTRUCT JOIN # The problematic resource_edgegroup_sql - resource - ssl - edgeGroup - location - user, user, user (!!) - peer - edge - location (again)
  • 55. 7. USE try … except … * try: edgegroups = resource_edgegroups[resource_id] except KeyError: continue
  • 56. (9th Attempt) sianlerk@host:~$ time python bdb-v9.py Resource Locations: 18865 Resource Locations (non-empty): 18525 Resource Operator: 19022 Resource Type: 19022 real 0m17.424s user 0m15.069s sys 0m0.596s
  • 57. OVER-ENGINEERING (we forgot, let the master does its job)
  • 58. 8. KEEP CALM, dict IS NOT THE SOLUTION resource_edgegroup_sql = """ SELECT reg.resource, reg.edgeGroup FROM resource_edgeGroup reg LEFT JOIN edgeGroup eg ON (eg.id = reg.edgeGroup) WHERE eg.status = "ACTIVE" """ (for everything)
  • 59. 8. KEEP CALM, USE GROUP_CONCAT SELECT r.id AS resource, ... GROUP_CONCAT(reg.edgeGroup) AS edgeGroups, ...
  • 60. (10th Attempt) sianlerk@host:~$ time python bdb-v10.py Resource Locations: 18865 Resource Locations (non-empty): 18525 Resource Operator: 19022 Resource Type: 19022 real 0m16.764s user 0m13.549s sys 0m0.512s
  • 61. (11th Attempt) sianlerk@host:~$ time python bdb-v11.py Resource Locations: 18865 Resource Locations (non-empty): 18525 Resource Operator: 19022 Resource Type: 19022 real 0m15.207s user 0m11.945s sys 0m0.384s
  • 62. (12th Attempt) sianlerk@host:~$ time python bdb-v12.py Resource Locations: 18525 Resource Locations (non-empty): 18525 Resource Operator: 19022 Resource Type: 19022 real 0m14.721s user 0m11.313s sys 0m0.380s
  • 64. DISCLAIMER (we have to credit one more thing)
  • 65. Python Profiling # Part 1 - Execute the code with cProfile python -m cProfile -o something.prof something.py # Part 2 - Visualize the profiling result with SnakeViz sianlerk@host~$ pip install snakeviz sianlerk@host~$ snakeviz result.prof