SlideShare a Scribd company logo
Improving MySQL-
based applications
performance with
Sphinx

Maciej Dobrzaoski
(Мачей Добжаньски)
Percona, Inc.
INTRODUCTION
Who am I?
  – Consultant at Percona, Inc.
  – What do I do?
     • Performance audits
     • Fix broken systems
     • Design architectures
  – Typically work from home
INTRODUCTION
What is Percona, Inc.?
   – Consulting company
   – Provides services for MySQL applications
   – Develops open-source software
      • Scalability patches for InnoDB
      • XtraDB storage engine for MySQL
      • Xtrabackup – free backup solution for InnoDB/XtraDB
WHAT IS MYSQL?
WHAT IS MYSQL?
MySQL is...
   – Open-source relational database management system
   – Popular enough to assume everyone here knows it
WHAT IS SPHINX?
WHAT IS SPHINX?
A standalone full-text search engine
   – Consists of two major applications
      • indexer
      • searchd
   – More efficient than MySQL FULLTEXT
      • On larger data sets
WHAT IS SPHINX?
A standalone full-text search engine
   – Can be easily scaled horizontally
      • Sphinx indexes can be distributed across many servers
      • Allows parallel searching
      • One instance becomes a dispatcher
          – Forwards queries to other instances
          – Combines results before sending them back to clients
WHAT IS SPHINX?
WHAT IS SPHINX?
Many additional features beyond just full-text search
   – Indexable attributes for non-FTS filtering
      • numerical, multi-value and now also text
      • Example: limit results to rows which have
        article_score>=2
   – Sorting results by an attribute or an expression
      • Example: @weight+(article_score)*0.1
WHAT IS SPHINX?
Many additional features beyond just full-text search
   – Grouping results by an attribute
      • Additional support for timestamp attributes
      • Returns also row count per group – may be approximate
   – Calculating expressions
      • Much faster than in MySQL as per recent benchmarks
WHAT IS SPHINX?
Anything else?
   – On-line re-indexing
   – Live index updates
   – Extensive API available for many programming languages
      •   PHP
      •   Python
      •   Java
      •   many more
WHAT IS SPHINX?
There’s even more!
   – SphinxQL – MySQL server protocol compatible
      • Connect with any MySQL client
         – command line
         – API call, e.g. mysql_connect()
      • Run SQL-like queries
WHAT IS SPHINX?
Example use of SphinxQL
HOW DOES SPHINX WORK WITH MYSQL?
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is external application; not part of MYSQL
   – Uses own data files
   – Needs memory
   – Has to be queried separately
      • Sphinx API
      • SphinxQL
      • Sphinx Storage Engine for MySQL
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is external application; not part of MySQL
   – Updating Sphinx indexes has to be done separately too
      • Periodic data re-indexing with indexer
          – Some information may be outdated for a while
          – Can be optimized through re-indexing the latest changes only
      • Live index updates from applications
          – Applications need to write twice to both MySQL and Sphinx
          – Available only for attributes; full-text updates to come
HOW DOES MYSQL WORK WITH SPHINX?
Example data source for Sphinx index
sql_query = SELECT mi.id, mi.movie_id, t.production_year,
   t.title, mi.info FROM movie_info mi JOIN title t
   ON t.id = mi.movie_id
sql_attr_uint                   = movie_id
sql_attr_uint                   = production_year
• Notice the source can be any valid SQL query
   – Uses joins to denormalize data for Sphinx
• Two integer attributes – movie_id and production_year
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is not a full database (yet?)
   – It’s primarily a search engine
   – It can return values stored as attributes, e.g:
     movie_id, production_year
   – …but not any full-text searchable columns
   – Results from Sphinx can be used to fetch full details from
     database
IMPORTANT FACTS TO KNOW ABOUT
           MYSQL
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Uses B-TREE indexes to improve search performance
   – Works great for equality operator (=)
   – …and small range lookups: >, >=, <, <=, IN (list), LIKE
      • Range size relative to table size, not an absolute value
      • Large range often turns into plain scan
IMPORTANT FACTS TO KNOW ABOUT MYSQL
MySQL can use any left-most part of an index
   – INDEX (a, b, c) can fully optimize both:
      (1) SELECT * FROM T WHERE a=9
      (2) SELECT * FROM T WHERE a=9 AND b IN (1,2) AND c=4
     …but not any of:
      (3) SELECT * FROM T WHERE b=7 AND c=1
      (4) SELECT * FROM T WHERE a=9 AND c=2 (may still use index for a=9 only)
   – No good indexes means you may need a new one
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Each index slows down writes to a table
   – Index is an organized structure, it has to be maintained
   – There can’t be too many or performance will suffer
MySQL can typically use only one index per query
   – There are rare exceptions – index merge optimizations
   – Merges are often not good enough – an observation
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These work great in MySQL
   – Index optimized searching
      • A query which uses indexes efficiently is fast enough
      • B-TREE lookups are typically very efficient
      • FULLTEXT indexes can be the exception
   – Index optimized sorting and grouping
      • Rows are read in the proper order
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Full table scans
      • No index is used
      • Query reads entire table row by row checking for matches
   – Large scans related to poor selectivity
      • An index is used, but it is not selective enough
      • MySQL has to read a lot of rows and reject many of them
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Search on many combinations of columns in a single table
      • Each combination may require new index
      • Can’t have too many indexes in table at the same time
   – Handling multi-value properties in searches
      • Keywords, tags
      • Such queries often can’t be optimized very well
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Sorting or grouping not done through indexes
      • Requires rewriting rows into temporary storage
      • At least one additional pass over results to complete
      • LIMIT does not work until all matches are found and
        sorted/grouped
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Indexes and data may be cached in memory
   – key_buffer and filesystem cache for MyISAM tables
   – innodb_buffer_pool for InnoDB tables
   – No guarantees what is in RAM
      • MySQL has no option to lock certain data in buffers
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Full-text support in MySQL
   – Available through FULLTEXT keys
   – Only supported by MyISAM engine
      • MyISAM uses table level locking
      • May become a showstopper for busy databases
   – Cannot be used together with any other index
      • Even index merge will not work
IMPORTANT FACTS TO KNOW ABOUT
            SPHINX
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Search remembers no more than max_matches results
  | total           | 1000   |
  | total_found     | 2255   |
  –   Other results are ignored before sending them to client
  –   Saves some CPU and RAM
  –   All results are often unnecessary
  –   Accuracy costs
IMPORTANT FACTS TO KNOW ABOUT SPHINX
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Grouping is done in fixed memory
   – Results may be approximate
      • When number of matches exceeds max_matches
   – Inaccuracy depends on max_matches setting
      • The larger the more accurate grouping results
      • Growing max_matches can reduce performance
   – Accuracy costs
IMPORTANT FACTS TO KNOW ABOUT SPHINX
MySQL                         Sphinx (uses SphinxQL)
SELECT ..., COUNT(1) _c       SELECT *
   FROM movie_info               FROM movies
WHERE                         WHERE
   MATCH (info)                  MATCH ('@info "story"')
   AGAINST ('"story"'         GROUP BY movie_id
       IN BOOLEAN MODE)       ORDER BY @count DESC 4
   GROUP BY movie_id
   ORDER BY _c DESC LIMIT 4
IMPORTANT FACTS TO KNOW ABOUT SPHINX
MySQL                     Sphinx
+----------+----------+   +----------+--------+
| movie_id | COUNT(1) |   | movie_id | @count |
+----------+----------+   +----------+--------+
|    30372 |       15 |   |    30372 |     15 |
|   855624 |       13 |   |   855624 |     13 |
|   590071 |       13 |   |   143384 |     12 |
|   143384 |       12 |   |   590071 |     12 |
+----------+----------+   +----------+--------+
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Full copy of attributes is always kept in RAM
   –   If attribute storage was set to ‘extern’ – the typical use
   –   Preloaded on start
   –   Never read from disk again once Sphinx is up
   –   Guarantees certain performance
   –   Calculate the storage requirements properly
        • Sphinx may want to allocate too much memory
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Sphinx stores rows in blocks
   – 64 rows per block
   – Meta data contains (min, max) range of every attribute
   – Allows quick rejection when filtering by attributes
      • No need to scan every row individually
MYSQL V SPHINX
 PERFORMANCE
FULL-TEXT SEARCH PERFORMANCE

           USES FULL IMDB DATABASE
 IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
FULL-TEXT SEARCH PERFORMANCE
MySQL                        Sphinx (uses SphinxQL)
SELECT COUNT(1)              SELECT *
   FROM movie_info              FROM movies
WHERE                        WHERE
   MATCH (info)                 MATCH ('@info "james
   AGAINST ('"james bond"'      bond"')
       IN BOOLEAN MODE)
FULL-TEXT SEARCH PERFORMANCE
MySQL                     Sphinx
+----------+              +---------------+-------+
| COUNT(1) |              | Variable_name | Value |
+----------+              +---------------+-------+
|     2255 |              | total         | 1000 |
+----------+              | total_found   | 2255 |
1 row in set (0.13 sec)   | time          | 0.003 |
                          ...
SCAN PERFORMANCE

          USES FULL IMDB DATABASE
IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
SCAN PERFORMANCE
MySQL                           Sphinx (uses SphinxQL)
SELECT COUNT(1)                 SELECT *
   FROM title                      FROM titles
WHERE                           WHERE
   production_year >= 1990         production_year >= 1990
   AND                             AND
   production_year <= 2000         production_year <= 2000

No index on `production_year`
SCAN PERFORMANCE
MySQL                     Sphinx
+----------+              +---------------+--------+
| COUNT(1) |              | Variable_name | Value |
+----------+              +---------------+--------+
|   239203 |              | total         | 1000   |
+----------+              | total_found   | 239203 |
1 row in set (1.09 sec)   | time          | 0.051 |
                          ...
MORE COMPLEX CASE
      SEARCH BY KEYWORDS
          USES FULL IMDB DATABASE
IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
SEARCH BY KEYWORDS
MySQL                             Sphinx (uses SphinxQL)
SELECT t.id FROM title t          SELECT *
   JOIN movie_keyword mk             FROM keywords
   ON mk.movie_id = t.id          WHERE
   JOIN keyword k
   ON k.id = mk.keyword_id           MATCH
                                     ('@keywords
WHERE                                     ("beautiful-woman"|
   k.keyword IN ('beautiful-              "women"|"murder")')
   woman', 'women', 'murder')
                                  ORDER BY production_year DESC
GROUP BY t.id ORDER BY               LIMIT 3
   production_year DESC LIMIT 3
SEARCH BY KEYWORDS
MySQL                      Sphinx
+--------+                 +--------+
| id     |                 | id     |
+--------+                 +--------+
| 561959 |                 | 561959 |
| 74273 |                  | 74273 |
| 344814 |                 | 344814 |
+--------+                 +--------+
3 rows in set (1.84 sec)   time = 0.015
SEARCH BY KEYWORDS
Sphinx returns
   – Values of the indexed attrubites
   – Meta information about search and results
   – No text
      • Recent version can actually store and return short strings
      • But only defined as attributes, not full-text searchable
SEARCH BY KEYWORDS
Use that information to fetch full details from MySQL

mysql> SELECT t.id, t.title FROM title t WHERE
        t.id IN(561959, 74273, 344814)
   +--------+---------------------------------------+
   | id     | title                                 |
   +--------+---------------------------------------+
   | 74273 | Blue Silence                           |
   | 344814 | Marvin: The Life Story of Marvin Gaye |
   | 561959 | The Red Man's View                    |
   +--------+---------------------------------------+
SEARCH BY KEYWORDS
MySQL                            Sphinx
+--------+-------------------+   +--------+-----------------+
| id     | title             |   | id     | production_year |
+--------+-------------------+   +--------+-----------------+
| 74273 | Blue Silence       |   | 561959 |            2014 |
| 344814 | Marvin: The Li... |   | 74273 |             2013 |
| 561959 | The Red Man's ... |   | 344814 |            2012 |
+--------+-------------------+   +--------+-----------------+
       Notice MySQL returned rows in different order!
SEARCH BY KEYWORDS
The order in SQL can only be guaranteed with ORDER BY!
What is the solution?
   – Append ORDER       BY production_year DESC
        • applies to only small number of rows, so it’s probably okay
   or
   – Remember the order of Sphinx results in application
   – Restore it after reveiving data from MySQL
SEARCH BY KEYWORDS
What if „keywords” were numerical identifiers?
   – Create „fake keywords” and index them as text
   – Convert numbers into strings when building index
     sql_query = SELECT t.id,
     GROUP_CONCAT(CONCAT('KEY_', mk.keyword_id))
     FROM title t JOIN movie_keyword mk ON t.id = mk.movie_id
     GROUP BY t.id

   – Run full-text searches using strings such as "KEY_1234"
FLEXIBLE SEARCH
FLEXIBLE SEARCH
A data structure describing user profile
CREATE TABLE `members` (
   `user_id` int(10) unsigned,
   `user_firstname` varchar(50) unsigned,
   `user_surname` varchar(50) unsigned,
   `user_dob` date unsigned,
   `user_lastvisit` datetime unsigned,
   `user_datetime` datetime unsigned,
   `user_bio` unsigned,
   `user_hasphoto` tinyint(2) unsigned,
   `user_hasvideo` tinyint(2) unsigned,
   ...
FLEXIBLE SEARCH
Flexible search typically means
   – Search conditions may involve any number of columns in
     any combination
   – Sorting may be done on one of many columns as well

Often impossible to add all necessary indexes in MySQL
FLEXIBLE SEARCH
Many columns may have very low cardinality
   – Example: user_gender
   – MySQL would not even consider using index for such
     column

It may be very difficult to make it work fast in MySQL
   – When tables or traffic are large enough
FLEXIBLE SEARCH
How does Sphinx help?
   –   Scans are optimized
   –   Optimizations apply to all columns
   –   Possibility to use „fake keywords”
   –   Data can be split across several instances
        • Parallel search
        • No extra application logic necessary to combine results
SUMMARY
SUMMARY
Sphinx can be of great help to many MySQL-based apps
   – Developed to work better where MySQL performs poorly
      •   Text search
      •   Large scans
      •   Filtering on many combinations of columns
      •   Handling multi-value properties
SUMMARY
Sphinx can be of great help to any MySQL-based apps
   –   Comes with features that can actually replace database
   –   Easily scalable
   –   Actively developed
   –   You can sponsor development and have features you need
       done soon
        • No need to wait long until some functionality „appears”
Sphinx
http://www.sphinxsearch.com/

Percona Consulting
http://www.percona.com/
THANK YOU!

More Related Content

ODP
Get involved with the Apache Software Foundation
PDF
MySQL@king
PDF
Spark Summit EU talk by Emlyn Whittick
PPTX
Ansible for large scale deployment
PDF
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
PPTX
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
PPT
Hadoop ecosystem framework n hadoop in live environment
PDF
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Get involved with the Apache Software Foundation
MySQL@king
Spark Summit EU talk by Emlyn Whittick
Ansible for large scale deployment
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Hadoop ecosystem framework n hadoop in live environment
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda

What's hot (20)

PDF
Spark Summit EU talk by Dean Wampler
PDF
Spark Summit EU talk by Jakub Hava
PPTX
Pinterest hadoop summit_talk
PDF
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
PDF
Running Spark on Cloud
PPTX
Apis with dotnet postgreSQL and Apsaradb
PDF
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
PDF
An overview of Amazon Athena
PDF
MySQL Query Optimization.
PDF
Apache Solr 5.0 and beyond
PDF
Hadoopsummit16 myui
PDF
Scala and jvm_languages_praveen_technologist
PDF
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
PPTX
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
PDF
Presto
PDF
Using Elasticsearch for Analytics
PPTX
Look Mom nosql
PDF
Building a Business Logic Translation Engine with Spark Streaming for Communi...
PDF
Art of Feature Engineering for Data Science with Nabeel Sarwar
PPTX
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Jakub Hava
Pinterest hadoop summit_talk
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Running Spark on Cloud
Apis with dotnet postgreSQL and Apsaradb
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
An overview of Amazon Athena
MySQL Query Optimization.
Apache Solr 5.0 and beyond
Hadoopsummit16 myui
Scala and jvm_languages_praveen_technologist
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Presto
Using Elasticsearch for Analytics
Look Mom nosql
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Art of Feature Engineering for Data Science with Nabeel Sarwar
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Ad

Similar to Sphinx new (20)

PDF
MariaDB with SphinxSE
PDF
Plugin Opensql2008 Sphinx
PPTX
ElasticSearch as (only) datastore
PDF
Upgrade to MySQL 8.0!
PDF
01 upgrade to my sql8
PDF
Using Sphinx for Search in PHP
PDF
My sql crashcourse_intro_kdl
PDF
MySQL :What's New #GIDS16
PDF
MySQL NDB Cluster 8.0
PDF
Maria db 10 and the mariadb foundation(colin)
PPTX
MySQL: Know more about open Source Database
PDF
Sql Server2008
PPTX
Cassandra
PDF
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
PDF
Breaking data
PPT
Data Warehouse Logical Design using Mysql
PDF
Membase East Coast Meetups
PPTX
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
PDF
MySQL Ecosystem in 2020
PPTX
L6.sp17.pptx
MariaDB with SphinxSE
Plugin Opensql2008 Sphinx
ElasticSearch as (only) datastore
Upgrade to MySQL 8.0!
01 upgrade to my sql8
Using Sphinx for Search in PHP
My sql crashcourse_intro_kdl
MySQL :What's New #GIDS16
MySQL NDB Cluster 8.0
Maria db 10 and the mariadb foundation(colin)
MySQL: Know more about open Source Database
Sql Server2008
Cassandra
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
Breaking data
Data Warehouse Logical Design using Mysql
Membase East Coast Meetups
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
MySQL Ecosystem in 2020
L6.sp17.pptx
Ad

More from rit2010 (20)

PPTX
Microsoft cluster systems ritconf
PPT
анатомия интернет банка Publish
PPT
анатомия интернет банка Publish
PPT
Anatol filin pragmatic documentation 1_r
PPTX
Ilia kantor паттерны серверных comet решений
PDF
Alexei shilov 2010 rit-rakudo
ODP
Alexandre.iline rit 2010 java_fxui_extra
PDF
Konstantin kolomeetz послание внутреннему заказчику
PDF
Bykov monitoring mailru
PDF
Alexander shigin slides
PPTX
иван василевич Eye tracking и нейрокомпьютерный интерфейс
PPT
Andrey Petrov P D P
PPT
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
PDF
Dmitry lohansky rit2010
PDF
Dmitry Lohansky Rit2010
PPTX
Related Queries Braslavski Yandex
PPTX
молчанов сергей датацентры 10 04 2010 Light
PPTX
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
PPT
Serge P Nekoval Grails
PPTX
Pavel Braslavski Related Queries Braslavski Yandex
Microsoft cluster systems ritconf
анатомия интернет банка Publish
анатомия интернет банка Publish
Anatol filin pragmatic documentation 1_r
Ilia kantor паттерны серверных comet решений
Alexei shilov 2010 rit-rakudo
Alexandre.iline rit 2010 java_fxui_extra
Konstantin kolomeetz послание внутреннему заказчику
Bykov monitoring mailru
Alexander shigin slides
иван василевич Eye tracking и нейрокомпьютерный интерфейс
Andrey Petrov P D P
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
Dmitry lohansky rit2010
Dmitry Lohansky Rit2010
Related Queries Braslavski Yandex
молчанов сергей датацентры 10 04 2010 Light
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
Serge P Nekoval Grails
Pavel Braslavski Related Queries Braslavski Yandex

Recently uploaded (20)

PDF
cuic standard and advanced reporting.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Building Integrated photovoltaic BIPV_UPV.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Mobile App Security Testing_ A Comprehensive Guide.pdf
A Presentation on Artificial Intelligence
Digital-Transformation-Roadmap-for-Companies.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.

Sphinx new

  • 1. Improving MySQL- based applications performance with Sphinx Maciej Dobrzaoski (Мачей Добжаньски) Percona, Inc.
  • 2. INTRODUCTION Who am I? – Consultant at Percona, Inc. – What do I do? • Performance audits • Fix broken systems • Design architectures – Typically work from home
  • 3. INTRODUCTION What is Percona, Inc.? – Consulting company – Provides services for MySQL applications – Develops open-source software • Scalability patches for InnoDB • XtraDB storage engine for MySQL • Xtrabackup – free backup solution for InnoDB/XtraDB
  • 5. WHAT IS MYSQL? MySQL is... – Open-source relational database management system – Popular enough to assume everyone here knows it
  • 7. WHAT IS SPHINX? A standalone full-text search engine – Consists of two major applications • indexer • searchd – More efficient than MySQL FULLTEXT • On larger data sets
  • 8. WHAT IS SPHINX? A standalone full-text search engine – Can be easily scaled horizontally • Sphinx indexes can be distributed across many servers • Allows parallel searching • One instance becomes a dispatcher – Forwards queries to other instances – Combines results before sending them back to clients
  • 10. WHAT IS SPHINX? Many additional features beyond just full-text search – Indexable attributes for non-FTS filtering • numerical, multi-value and now also text • Example: limit results to rows which have article_score>=2 – Sorting results by an attribute or an expression • Example: @weight+(article_score)*0.1
  • 11. WHAT IS SPHINX? Many additional features beyond just full-text search – Grouping results by an attribute • Additional support for timestamp attributes • Returns also row count per group – may be approximate – Calculating expressions • Much faster than in MySQL as per recent benchmarks
  • 12. WHAT IS SPHINX? Anything else? – On-line re-indexing – Live index updates – Extensive API available for many programming languages • PHP • Python • Java • many more
  • 13. WHAT IS SPHINX? There’s even more! – SphinxQL – MySQL server protocol compatible • Connect with any MySQL client – command line – API call, e.g. mysql_connect() • Run SQL-like queries
  • 14. WHAT IS SPHINX? Example use of SphinxQL
  • 15. HOW DOES SPHINX WORK WITH MYSQL?
  • 16. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is external application; not part of MYSQL – Uses own data files – Needs memory – Has to be queried separately • Sphinx API • SphinxQL • Sphinx Storage Engine for MySQL
  • 17. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is external application; not part of MySQL – Updating Sphinx indexes has to be done separately too • Periodic data re-indexing with indexer – Some information may be outdated for a while – Can be optimized through re-indexing the latest changes only • Live index updates from applications – Applications need to write twice to both MySQL and Sphinx – Available only for attributes; full-text updates to come
  • 18. HOW DOES MYSQL WORK WITH SPHINX? Example data source for Sphinx index sql_query = SELECT mi.id, mi.movie_id, t.production_year, t.title, mi.info FROM movie_info mi JOIN title t ON t.id = mi.movie_id sql_attr_uint = movie_id sql_attr_uint = production_year • Notice the source can be any valid SQL query – Uses joins to denormalize data for Sphinx • Two integer attributes – movie_id and production_year
  • 19. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is not a full database (yet?) – It’s primarily a search engine – It can return values stored as attributes, e.g: movie_id, production_year – …but not any full-text searchable columns – Results from Sphinx can be used to fetch full details from database
  • 20. IMPORTANT FACTS TO KNOW ABOUT MYSQL
  • 21. IMPORTANT FACTS TO KNOW ABOUT MYSQL Uses B-TREE indexes to improve search performance – Works great for equality operator (=) – …and small range lookups: >, >=, <, <=, IN (list), LIKE • Range size relative to table size, not an absolute value • Large range often turns into plain scan
  • 22. IMPORTANT FACTS TO KNOW ABOUT MYSQL MySQL can use any left-most part of an index – INDEX (a, b, c) can fully optimize both: (1) SELECT * FROM T WHERE a=9 (2) SELECT * FROM T WHERE a=9 AND b IN (1,2) AND c=4 …but not any of: (3) SELECT * FROM T WHERE b=7 AND c=1 (4) SELECT * FROM T WHERE a=9 AND c=2 (may still use index for a=9 only) – No good indexes means you may need a new one
  • 23. IMPORTANT FACTS TO KNOW ABOUT MYSQL Each index slows down writes to a table – Index is an organized structure, it has to be maintained – There can’t be too many or performance will suffer MySQL can typically use only one index per query – There are rare exceptions – index merge optimizations – Merges are often not good enough – an observation
  • 24. IMPORTANT FACTS TO KNOW ABOUT MYSQL These work great in MySQL – Index optimized searching • A query which uses indexes efficiently is fast enough • B-TREE lookups are typically very efficient • FULLTEXT indexes can be the exception – Index optimized sorting and grouping • Rows are read in the proper order
  • 25. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Full table scans • No index is used • Query reads entire table row by row checking for matches – Large scans related to poor selectivity • An index is used, but it is not selective enough • MySQL has to read a lot of rows and reject many of them
  • 26. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Search on many combinations of columns in a single table • Each combination may require new index • Can’t have too many indexes in table at the same time – Handling multi-value properties in searches • Keywords, tags • Such queries often can’t be optimized very well
  • 27. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Sorting or grouping not done through indexes • Requires rewriting rows into temporary storage • At least one additional pass over results to complete • LIMIT does not work until all matches are found and sorted/grouped
  • 28. IMPORTANT FACTS TO KNOW ABOUT MYSQL Indexes and data may be cached in memory – key_buffer and filesystem cache for MyISAM tables – innodb_buffer_pool for InnoDB tables – No guarantees what is in RAM • MySQL has no option to lock certain data in buffers
  • 29. IMPORTANT FACTS TO KNOW ABOUT MYSQL Full-text support in MySQL – Available through FULLTEXT keys – Only supported by MyISAM engine • MyISAM uses table level locking • May become a showstopper for busy databases – Cannot be used together with any other index • Even index merge will not work
  • 30. IMPORTANT FACTS TO KNOW ABOUT SPHINX
  • 31. IMPORTANT FACTS TO KNOW ABOUT SPHINX Search remembers no more than max_matches results | total | 1000 | | total_found | 2255 | – Other results are ignored before sending them to client – Saves some CPU and RAM – All results are often unnecessary – Accuracy costs
  • 32. IMPORTANT FACTS TO KNOW ABOUT SPHINX
  • 33. IMPORTANT FACTS TO KNOW ABOUT SPHINX Grouping is done in fixed memory – Results may be approximate • When number of matches exceeds max_matches – Inaccuracy depends on max_matches setting • The larger the more accurate grouping results • Growing max_matches can reduce performance – Accuracy costs
  • 34. IMPORTANT FACTS TO KNOW ABOUT SPHINX MySQL Sphinx (uses SphinxQL) SELECT ..., COUNT(1) _c SELECT * FROM movie_info FROM movies WHERE WHERE MATCH (info) MATCH ('@info "story"') AGAINST ('"story"' GROUP BY movie_id IN BOOLEAN MODE) ORDER BY @count DESC 4 GROUP BY movie_id ORDER BY _c DESC LIMIT 4
  • 35. IMPORTANT FACTS TO KNOW ABOUT SPHINX MySQL Sphinx +----------+----------+ +----------+--------+ | movie_id | COUNT(1) | | movie_id | @count | +----------+----------+ +----------+--------+ | 30372 | 15 | | 30372 | 15 | | 855624 | 13 | | 855624 | 13 | | 590071 | 13 | | 143384 | 12 | | 143384 | 12 | | 590071 | 12 | +----------+----------+ +----------+--------+
  • 36. IMPORTANT FACTS TO KNOW ABOUT SPHINX Full copy of attributes is always kept in RAM – If attribute storage was set to ‘extern’ – the typical use – Preloaded on start – Never read from disk again once Sphinx is up – Guarantees certain performance – Calculate the storage requirements properly • Sphinx may want to allocate too much memory
  • 37. IMPORTANT FACTS TO KNOW ABOUT SPHINX Sphinx stores rows in blocks – 64 rows per block – Meta data contains (min, max) range of every attribute – Allows quick rejection when filtering by attributes • No need to scan every row individually
  • 38. MYSQL V SPHINX PERFORMANCE
  • 39. FULL-TEXT SEARCH PERFORMANCE USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 40. FULL-TEXT SEARCH PERFORMANCE MySQL Sphinx (uses SphinxQL) SELECT COUNT(1) SELECT * FROM movie_info FROM movies WHERE WHERE MATCH (info) MATCH ('@info "james AGAINST ('"james bond"' bond"') IN BOOLEAN MODE)
  • 41. FULL-TEXT SEARCH PERFORMANCE MySQL Sphinx +----------+ +---------------+-------+ | COUNT(1) | | Variable_name | Value | +----------+ +---------------+-------+ | 2255 | | total | 1000 | +----------+ | total_found | 2255 | 1 row in set (0.13 sec) | time | 0.003 | ...
  • 42. SCAN PERFORMANCE USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 43. SCAN PERFORMANCE MySQL Sphinx (uses SphinxQL) SELECT COUNT(1) SELECT * FROM title FROM titles WHERE WHERE production_year >= 1990 production_year >= 1990 AND AND production_year <= 2000 production_year <= 2000 No index on `production_year`
  • 44. SCAN PERFORMANCE MySQL Sphinx +----------+ +---------------+--------+ | COUNT(1) | | Variable_name | Value | +----------+ +---------------+--------+ | 239203 | | total | 1000 | +----------+ | total_found | 239203 | 1 row in set (1.09 sec) | time | 0.051 | ...
  • 45. MORE COMPLEX CASE SEARCH BY KEYWORDS USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 46. SEARCH BY KEYWORDS MySQL Sphinx (uses SphinxQL) SELECT t.id FROM title t SELECT * JOIN movie_keyword mk FROM keywords ON mk.movie_id = t.id WHERE JOIN keyword k ON k.id = mk.keyword_id MATCH ('@keywords WHERE ("beautiful-woman"| k.keyword IN ('beautiful- "women"|"murder")') woman', 'women', 'murder') ORDER BY production_year DESC GROUP BY t.id ORDER BY LIMIT 3 production_year DESC LIMIT 3
  • 47. SEARCH BY KEYWORDS MySQL Sphinx +--------+ +--------+ | id | | id | +--------+ +--------+ | 561959 | | 561959 | | 74273 | | 74273 | | 344814 | | 344814 | +--------+ +--------+ 3 rows in set (1.84 sec) time = 0.015
  • 48. SEARCH BY KEYWORDS Sphinx returns – Values of the indexed attrubites – Meta information about search and results – No text • Recent version can actually store and return short strings • But only defined as attributes, not full-text searchable
  • 49. SEARCH BY KEYWORDS Use that information to fetch full details from MySQL mysql> SELECT t.id, t.title FROM title t WHERE t.id IN(561959, 74273, 344814) +--------+---------------------------------------+ | id | title | +--------+---------------------------------------+ | 74273 | Blue Silence | | 344814 | Marvin: The Life Story of Marvin Gaye | | 561959 | The Red Man's View | +--------+---------------------------------------+
  • 50. SEARCH BY KEYWORDS MySQL Sphinx +--------+-------------------+ +--------+-----------------+ | id | title | | id | production_year | +--------+-------------------+ +--------+-----------------+ | 74273 | Blue Silence | | 561959 | 2014 | | 344814 | Marvin: The Li... | | 74273 | 2013 | | 561959 | The Red Man's ... | | 344814 | 2012 | +--------+-------------------+ +--------+-----------------+ Notice MySQL returned rows in different order!
  • 51. SEARCH BY KEYWORDS The order in SQL can only be guaranteed with ORDER BY! What is the solution? – Append ORDER BY production_year DESC • applies to only small number of rows, so it’s probably okay or – Remember the order of Sphinx results in application – Restore it after reveiving data from MySQL
  • 52. SEARCH BY KEYWORDS What if „keywords” were numerical identifiers? – Create „fake keywords” and index them as text – Convert numbers into strings when building index sql_query = SELECT t.id, GROUP_CONCAT(CONCAT('KEY_', mk.keyword_id)) FROM title t JOIN movie_keyword mk ON t.id = mk.movie_id GROUP BY t.id – Run full-text searches using strings such as "KEY_1234"
  • 54. FLEXIBLE SEARCH A data structure describing user profile CREATE TABLE `members` ( `user_id` int(10) unsigned, `user_firstname` varchar(50) unsigned, `user_surname` varchar(50) unsigned, `user_dob` date unsigned, `user_lastvisit` datetime unsigned, `user_datetime` datetime unsigned, `user_bio` unsigned, `user_hasphoto` tinyint(2) unsigned, `user_hasvideo` tinyint(2) unsigned, ...
  • 55. FLEXIBLE SEARCH Flexible search typically means – Search conditions may involve any number of columns in any combination – Sorting may be done on one of many columns as well Often impossible to add all necessary indexes in MySQL
  • 56. FLEXIBLE SEARCH Many columns may have very low cardinality – Example: user_gender – MySQL would not even consider using index for such column It may be very difficult to make it work fast in MySQL – When tables or traffic are large enough
  • 57. FLEXIBLE SEARCH How does Sphinx help? – Scans are optimized – Optimizations apply to all columns – Possibility to use „fake keywords” – Data can be split across several instances • Parallel search • No extra application logic necessary to combine results
  • 59. SUMMARY Sphinx can be of great help to many MySQL-based apps – Developed to work better where MySQL performs poorly • Text search • Large scans • Filtering on many combinations of columns • Handling multi-value properties
  • 60. SUMMARY Sphinx can be of great help to any MySQL-based apps – Comes with features that can actually replace database – Easily scalable – Actively developed – You can sponsor development and have features you need done soon • No need to wait long until some functionality „appears”