SlideShare a Scribd company logo
Getting Started with MySQL Full Text Search
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
MySQL Full-Text Search
Matt Lord
MySQL Product Manager
@mattalord
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
MySQL Full-Text Search : Agenda
1
2
3
4
5
An Introduction to Full-Text Search
Common Terms and Concepts
What’s New in MySQL 5.6 and 5.7
A Real World Example
Integration with Lucene, Solr, and Elasticsearch
What’s Next for MySQL Full-Text Search
4
6
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
An Introduction to Full-Text Search
5
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What is it?
• Search entire documents
– Character based fields
• VARCHAR, TEXT, BLOB
• For a search string
– Combinations of words
– Phrases: “specific string to match”
– Wildcards: *
– Requirements: +, -, ~
– Expressions: (…)
– Relevancy weight characters: <, >
6
7
Searching Without an Index
8
Searching With an Index
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What Would I Use it For?
• Content management
– What metadata should be used to describe the information
– This helps to make your searches far more useful
• Search services
– What documents or meta-data contain certain terms or tokens
– What documents are most relevant to the current view
– What data do you think this user would be most interested in
9
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
How Would I Use It?
10
StoreCollect
IndexSearch
• Collect search data
– Existing documents describing the content
– Generated metadata from the incoming content
• Store the data
– Within MySQL tables
• Index the data
– Add Full-Text indexes on the content columns
• Allow for efficient searches
– Provide users with an efficient way to search the content
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Common Terms and Concepts
11
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Common Terms
• Token
– Word or a series of characters
• Dictionary
– What words are related, mean the same thing, are abbreviations for, etc.
• Stop Words
– Words that should not be indexed
• Relevancy and Weight
– How should weight search terms and calculate document relevancy?
12
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Tokens
• Tokens
– Words, or a series of characters that together form common meaning
• Related Server options
– innodb_ft_min_token_size – Don’t bother to index words shorter than this
• These would typically be words that are invalid, or are extremely common
– So they increase the size of the index and decrease search efficiency w/o real benefit
– innodb_ft_max_token_size – Don’t bother to index words longer than this
• These would typically be words that are invalid
– So again, they increase the size of the index and decrease search efficiency w/o real benefit
13
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Stop Words
• Server options
– innodb_ft_enable_stopword – Should stop words be used at all for new indexes?
– innodb_ft_server_stopword_table – Use this global table for the list of stop words
– innodb_ft_user_stopword_table – Use this table for my own stop word list
• All of the above only affect indexes created while they are set
– CREATE INDEX, ALTER TABLE, OPTIMIZE TABLE, ANALYZE TABLE
• Default stop word list
– SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
14
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Relevancy and Weight
• Term Frequency (TF)
– Measure of how often a token/word appears in an individual document
• Inverse Document Frequency (IDF)
– Measure of how common a token/word is across all documents
• Coordinate Level Matching
– Number of query terms that are found within an individual document
• How close together are the matching terms?
• User Modifications
– ‘<‘ and ‘>’ characters can be used to grant terms higher or lower weight
– ‘+’ and ‘–’ characters can be used to require terms be present or absent
15
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
A Full Text Index
• It’s an inverted Index of relationships between tokens and documents
16
This movie is
about a boy
going to war.
This movie
is about a
girl starting
an auto-
shop.
This movie is
about
flowers.
a about
an are as
at be by
com de
en for
from
how i in
is it la of
on or
that the
this to
was
what
when
where
who will
with und
the
www
Min
Token
Size
Max
Token
Size
Document 1
Document 2
Document 3
Stop Words Token Size
Full Text / Inverted Index
ID TOKEN DOCUMENT
1 movie 1,2,3
2 boy 1
3 girl 2
4 going 1
5 starting 2
6 war 1
7 auto-shop 2
8 flowers 3
Token FiltersDocuments
Tokenizer
Tokenizer
Indexer
Indexer
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Document Searches
• Search for “movie about girl”
• Term Frequency (TF)
– “movie” occurs 1 time in Docs 1,2,3
– “girl” occurs 1 time in Doc 2
• No Doc has more than 1 occurrence of either word
• Inverse Document Frequency (IDF)
– “movie” occurs in Docs 1,2,3
– “girl” occurs only in Doc 2
• “girl” is more meaningful or “weighted”
• Docs 1,2,3 match our search, but Doc 2 is most relevant
17
Full Text / Inverted Index
ID TOKEN DOCUMENT
1 movie 1,2,3
2 boy 1
3 girl 2
4 going 1
5 starting 2
6 war 1
7 auto-shop 2
8 flowers 3
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Additional Options & Variables
• innodb_ft_aux_table – View index details for this table
– Via the INNODB_FT_INDEX_TABLE, INNODB_FT_INDEX_CACHE, INNODB_FT_CONFIG,
INNODB_FT_DELETED, and INNODB_FT_BEING_DELETED Information_Schema tables
• innodb_ft_cache_size – In memory cache size for each index
• innodb_ft_total_cache_size – Total in memory cache size limit per server
• innodb_ft_num_word_optimize – Batch size used during tokenization
• innodb_ft_result_cache_limit – In memory cache size limit for individual searches
• innodb_ft_sort_pll_degree – Number of parallel threads to use during index builds
18
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough
• Now let’s quickly demonstrate all of these terms & concepts in action
• We’ll use a very simple made up series of silly short stories
19
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: Table and Data
20
mysql> create table short_stories (author varchar(100), story text);
Query OK, 0 rows affected (0.23 sec)
mysql> insert into short_stories values ("Matt Lord", "I've worked at MySQL and Oracle for about
12 years now. I'm currently the Product Manager for MySQL.");
Query OK, 1 row affected (0.03 sec)
mysql> insert into short_stories values ("Sid Lord", "I'm 10 years old. I like to eat and play
video games. That's pretty much it.");
Query OK, 1 row affected (0.12 sec)
mysql> insert into short_stories values ("Lily Lord", "I'm almost 7 years old. I like to make
art, play with toys, and play video games. And also, dress up. Yay!");
Query OK, 1 row affected (0.03 sec)
• This is the table, column, and data that we’ll add a Full Text index on
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: Custom Stop Words
21
mysql> create table example.ss_words select * from information_schema.INNODB_FT_DEFAULT_STOPWORD;
Query OK, 36 rows affected (0.40 sec)
mysql> insert into ss_words values (“oracle"), (“and”), (“like”);
Query OK, 3 rows affected (0.04 sec)
mysql> select group_concat(value) as stop_words from ss_wordsG
*************************** 1. row ***************************
stop_words:
a,about,an,are,as,at,be,by,com,de,en,for,from,how,i,in,is,it,la,of,on,or,that,the,this,to,was,wha
t,when,where,who,will,with,und,the,www,oracle,and,like
1 row in set (0.00 sec)
mysql> set global innodb_ft_server_stopword_table="example/ss_words";
Query OK, 0 rows affected (0.00 sec)
• This is how we define words that will NOT be included in the Full Text index
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: Token Sizes
• We can define the min and max token/word sizes
– Words that fall outside of this min/max range will NOT be included in the index
• And thus NOT used for searches
• We set constraints on the min and max length of words/tokens that we
want to include in the index
– Very short or very long words are typically invalid or so common as to be worthless
• E.g.: a, an, de, ta, someverylongsentencethataccidentallygotstucktogethersomehowwhoops
• We’ll go with the defaults
– innodb_ft_min_token_size=3 and innodb_ft_max_token_size=84
– Words/Tokens outside of the 3-84 character range are ignored for the index
22
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: Adding the Index
23
mysql> alter table short_stories add fulltext index (story);
Query OK, 0 rows affected, 1 warning (2.07 sec)
# Here we’re setting up the information_schema views so that we can see the index
# record details (on the next slide)
mysql> set global innodb_ft_aux_table="example/short_stories";
Query OK, 0 rows affected (0.00 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: The Final Index
24
mysql> select * from information_schema.INNODB_FT_INDEX_TABLE;
+-----------+--------------+-------------+-----------+--------+----------+
| WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION |
+-----------+--------------+-------------+-----------+--------+----------+
| almost | 4 | 4 | 1 | 4 | 4 |
| also | 4 | 4 | 1 | 4 | 86 |
| art | 4 | 4 | 1 | 4 | 39 |
| currently | 2 | 2 | 1 | 2 | 60 |
| dress | 4 | 4 | 1 | 4 | 92 |
| eat | 3 | 3 | 1 | 3 | 28 |
| games | 3 | 4 | 2 | 3 | 47 |
| games | 3 | 4 | 2 | 4 | 75 |
…
| video | 3 | 4 | 2 | 3 | 41 |
| video | 3 | 4 | 2 | 4 | 69 |
| worked | 2 | 2 | 1 | 2 | 5 |
| yay | 4 | 4 | 1 | 4 | 102 |
| years | 2 | 4 | 3 | 2 | 45 |
| years | 2 | 4 | 3 | 3 | 7 |
| years | 2 | 4 | 3 | 4 | 13 |
+-----------+--------------+-------------+-----------+--------+----------+
29 rows in set (0.00 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Example Walkthrough: Our Final Sample Query
25
mysql> SELECT author, story, MATCH(story) AGAINST("toys and games") AS relevancy
-> FROM short_stories WHERE MATCH(story) AGAINST("toys and games")
-> ORDER BY relevancy DESCG
*************************** 1. row ***************************
author: Lily Lord
story: I'm almost 7 years old. I like to make art, play with toys, and play video
games. And also, dress up. Yay!
relevancy: 0.25865283608436584
*************************** 2. row ***************************
author: Sid Lord
story: I'm 10 years old. I like to eat and play video games. That's pretty much
it.
relevancy: 0.031008131802082062
2 rows in set (0.00 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What’s New in MySQL 5.6 and 5.7
26
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What’s New?
• MySQL 5.6
– InnoDB Full-Text Index support
• Fully ACID compliant, MVCC search
• With performance improvements over MyISAM
• Easily customizable stop-word lists
• MySQL 5.7
– Pluggable Full-Text Parser support
– CJK Support
• N-gram parser for Chinese, Japanese, and Korean
• MeCab parser for Japanese
27
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
A Real World Example
28
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
An Internal Content Management System
• I have tons of valuable business related content
– But it’s spread across various locations and formats
• Wiki pages, PPTs, Word Docs, Txt docs, …
– How can I ingest, aggregate, and correlate this data
– How can I provide a useful search tool
• Let’s build something to vastly increase the value of our intranet content
– Something similar to Google Desktop search or Apple’s Spotlight
• But for the vast amounts of data strewn across our company intranet
– We can then incorporate the search into a MySQL based intranet tool
29
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Gathering The Contents of Our Existing Data
• Use any existing metadata that you already have
• Pull metadata from existing files
– Specialized tools to extract metadata
• Exiftool to gather metadata on image files & Exif2maps to pull location data from image files
• Taglib to pull metadata from sound files
• `libreoffice –headess –convert-to …` to extract plain text from Office formats
• GNU Libextractor to pull metadata and location data from all file types
• Extract text content from binary format files (.ppt, .doc, .pdf, etc.)
– Apache Tika (originally part of Lucene)
• Auto-detects file format and uses appropriate parsing library
• Extracts metadata and structured text content from all popular/common document and file formats
30
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Apache Tika and MySQL
31
Extract
Plain Text
Load
Text Docs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Apache Tika Example
• Downloads, docs, etc. can be found at https://tika.apache.org
32
shell> java -jar tika-app-1.7.jar -z -t /tmp/MySQL_FTS.pptx
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
1
MySQL Full-Text Search
Matt Lord
MySQL Product Manager
2
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
2
3
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a commitment
…
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Apache Tika Example Cont.
33
shell> ls /tmp/*.p*
/tmp/MySQL_5.7_GIS.pptx /tmp/MySQL_5.7_GIS_reborn.pptx /tmp/MySQL_FTS.pptx
/tmp/MySQLGroupReplication.pdf
shell> for file in `ls /tmp/*.p*`; do java -jar tika-app-1.7.jar -z -t $file > $file.txt && echo
-n "#DOC_END" >> $file.txt; done
shell> ls /tmp/*.txt
/tmp/MySQL_5.7_GIS.pptx.txt /tmp/MySQL_5.7_GIS_reborn.pptx.txt /tmp/MySQL_FTS.pptx.txt
/tmp/MySQLGroupReplication.pdf.txt
shell> sed -n '55,62'p /tmp/MySQLGroupReplication.pdf.txt
Program Agenda
MySQL Group Replication Background
Zoom in: Major Building Blocks
Zoom in: The Complete Stack
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Our MySQL Table
34
mysql> show create table intranet_docG
*************************** 1. row ***************************
Table: intranet_doc
Create Table: CREATE TABLE `intranet_doc` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`type` varchar(50) DEFAULT NULL,
`fs_path` varchar(200) DEFAULT NULL,
`doc_host` varchar(60) DEFAULT NULL,
`txt_content` longtext,
PRIMARY KEY (`id`),
KEY `type` (`type`),
FULLTEXT KEY `txt_content` (`txt_content`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.01 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Loading in the Text Content
35
shell> for file in `ls /tmp/*.txt`; do mysql -D intranet_search -e 
"load data infile '$file' into table intranet_doc 
lines terminated by '#DOC_END' (txt_content) SET fs_path='$file', 
doc_host='`uname -n`', 
type=substring_index(substring_index('$file', '.', -2), '.', 1) "; done
mysql> select fs_path, type, doc_host from intranet_doc;
+------------------------------------+------+-------------------+
| fs_path | type | doc_host |
+------------------------------------+------+-------------------+
| /tmp/MySQL_5.7_GIS.pptx.txt | pptx | mylab.localdomain |
| /tmp/MySQL_5.7_GIS_reborn.pptx.txt | pptx | mylab.localdomain |
| /tmp/MySQL_FTS.pptx.txt | pptx | mylab.localdomain |
| /tmp/MySQLGroupReplication.pdf.txt | pdf | mylab.localdomain |
+------------------------------------+------+-------------------+
4 rows in set (0.00 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Our Final Search Query
• Search for PowerPoint docs that mention Apache Tika
36
mysql> SELECT fs_path, doc_host, type
-> FROM intranet_doc
-> WHERE type LIKE "ppt%"
-> AND MATCH(txt_content) AGAINST ("+Tika");
+-------------------------+-------------------+------+
| fs_path | doc_host | type |
+-------------------------+-------------------+------+
| /tmp/MySQL_FTS.pptx.txt | mylab.localdomain | pptx |
+-------------------------+-------------------+------+
1 row in set (0.00 sec)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Integration with Lucene/Solr/Elasticsearch
37
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Apache Lucene
• Lucene is the core Full-text search library
– Written in Java
• Originally created by Doug Cutting (creator of Hadoop)
• Open source project (since 2003)
• Mature
• Easy to learn API
• Stores its indexes as files on disk
• Solr and Elasticsearch provide web services built on top of Lucene
38
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
MySQL Native Full Text VS Lucene
• Eliminates complexity
• Single canonical source
• No need for synchronization
• Single query language (SQL)
• No additional maintenance
• Use
– MySQL based app with basic full-text
search
• e.g. E-commerce app with a product description
search
• Supports very complex searches
• Supports stemming & fuzzy searches
• Very scalable
• Rich document handling (PDF, PPT, …)
• Easy to use RESTful web services
– Solr, Elasticsearch, …
• Use
– Full blown advanced search focused app
• e.g. IMDB
39
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Solr and MySQL
40
• Create simple custom
DataImportHandler
– http://wiki.apache.org/solr/
DataImportHandler
• Full and incremental
indexing
• Scheduled re-indexing to
keep the two in sync
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Solr and MySQL
41
Custom DataImportHandler XML
MySQL Connector/J
• Easy integration
– Index sample sakila database
• http://localhost:8983/solr/sakila/collection1/dataimport?command=full-import
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Elasticsearch and MySQL
• Easy integration
– Index sample sakila.country table
• curl -XPUT 'localhost:9200/_river/sakila_country/_meta' -d '{
"type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/sakila",
"user" : “root", "password" : “mypass",
"sql" : "select * from country"
}
}'
42
JDBC River Plugin
MySQL Connector/J
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
What’s Next for MySQL Full-Text Search
43
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Additional Features
• Improved performance
• More efficient disk space usage
• Support for stemming and facets
• Support for fuzzy string searches
• Support for aliases, synonyms, abbreviations, etc.
• Proximity search and use in relevancy scores
• Automatic ordering by relevancy
• What else would you like to see?
– Let us know!
44
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Appendix : Additional Resources
• Manual
– https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html
• Community forum
– http://forums.mysql.com/list.php?107
• Apache Tika
– https://tika.apache.org
• Report Full-Text bugs and submit feature requests
– http://bugs.mysql.com/
45
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
46
Getting Started with MySQL Full Text Search

More Related Content

PPTX
3G Drive Test Procedure_ By Md Joynal Abaden
PPTX
Pilot Pollution
PPT
Amvdd Data Converter Fundamentals
PDF
03 150323115803-conversion-gate01
PDF
NEI_LTE1235_ready_02.pdf
PPT
Mw training slide
PPTX
CloudAir, Dynamic Spectrum Sharing
PDF
Performance Requirement and Lessons Learnt of LTE Terminal_Transmitter Part
3G Drive Test Procedure_ By Md Joynal Abaden
Pilot Pollution
Amvdd Data Converter Fundamentals
03 150323115803-conversion-gate01
NEI_LTE1235_ready_02.pdf
Mw training slide
CloudAir, Dynamic Spectrum Sharing
Performance Requirement and Lessons Learnt of LTE Terminal_Transmitter Part

What's hot (20)

PDF
E nodeb kpi reference(v100r005c00 02)(pdf)-en
PPTX
Drive Test Using Tems Investation 16
PPTX
WCDMA Based Events
PDF
Intro to Single / Two Rate Three Color Marker (srTCM / trTCM)
PDF
Radio Optimization In Telco - Part 2
PPTX
BGP Update Source
PDF
Basic cdma for 2 g and 3g
PPTX
Kogge Stone Adder
PPTX
GSM , RF & DT
PDF
Lte basic parameters
PDF
Carrier Aggregation Discussion
PPTX
Signal degradation
PDF
Introduction To Antenna Impedance Tuner And Aperture Switch
PPT
08. DRIVE TEST Analysis
PPT
Wcdma Radio Network Planning And Optimization
PDF
How BGP Works
PDF
Performance requirement and lessons learnt of LTE terminal---transmitter part
PDF
AIRCOM LTE Webinar 5 - LTE Capacity
PDF
System(board level) noise figure analysis and optimization
PDF
huawei-lte-kpi-ref
E nodeb kpi reference(v100r005c00 02)(pdf)-en
Drive Test Using Tems Investation 16
WCDMA Based Events
Intro to Single / Two Rate Three Color Marker (srTCM / trTCM)
Radio Optimization In Telco - Part 2
BGP Update Source
Basic cdma for 2 g and 3g
Kogge Stone Adder
GSM , RF & DT
Lte basic parameters
Carrier Aggregation Discussion
Signal degradation
Introduction To Antenna Impedance Tuner And Aperture Switch
08. DRIVE TEST Analysis
Wcdma Radio Network Planning And Optimization
How BGP Works
Performance requirement and lessons learnt of LTE terminal---transmitter part
AIRCOM LTE Webinar 5 - LTE Capacity
System(board level) noise figure analysis and optimization
huawei-lte-kpi-ref
Ad

Similar to Getting Started with MySQL Full Text Search (20)

PDF
Developer day v2
PDF
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
PPTX
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
PPTX
AWR and ASH Deep Dive
PPTX
Biwa summit 2015 oaa oracle data miner hands on lab
PDF
AWR and ASH in an EM12c World
PDF
Database trendsv4
PPTX
Kellyn Pot'Vin-Gorman - Awr and Ash
PPTX
Getting started-php unit
PDF
Webinar: Simpler Semantic Search with Solr
PDF
JDD2014: Multitenant Search - Pablo Barros
PPT
Database Developers: the most important developers on earth?
PPTX
MySQL Quick Dive
PPTX
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
PPTX
Building a personalized web scale application - tht11005 - v1.1
PDF
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
PDF
Using MySQL Enterprise Monitor for Continuous Performance Improvement
PDF
LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...
PDF
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
PDF
MySQL For Linux Sysadmins
Developer day v2
NoSQL and SQL - Why Choose? Enjoy the best of both worlds with MySQL
OUG Scotland 2014 - NoSQL and MySQL - The best of both worlds
AWR and ASH Deep Dive
Biwa summit 2015 oaa oracle data miner hands on lab
AWR and ASH in an EM12c World
Database trendsv4
Kellyn Pot'Vin-Gorman - Awr and Ash
Getting started-php unit
Webinar: Simpler Semantic Search with Solr
JDD2014: Multitenant Search - Pablo Barros
Database Developers: the most important developers on earth?
MySQL Quick Dive
Big Data Management System: Smart SQL Processing Across Hadoop and your Data ...
Building a personalized web scale application - tht11005 - v1.1
AIOUG -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA'...
Using MySQL Enterprise Monitor for Continuous Performance Improvement
LAD -GroundBreakers-Jul 2019 - Introduction to Machine Learning - From DBA's ...
LAD -GroundBreakers-Jul 2019 - The Machine Learning behind the Autonomous Dat...
MySQL For Linux Sysadmins
Ad

More from Matt Lord (13)

PPTX
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
PPTX
MongDB Mobile: Bringing the Power of MongoDB to Your Device
PPTX
MongoDB Mobile: Bringing the Power of MongoDB to Your Device
PPTX
Using MySQL Containers
PDF
Why MySQL High Availability Matters
PDF
MySQL High Availability -- InnoDB Clusters
PDF
Unlocking Big Data Insights with MySQL
PDF
OpenStack Days East -- MySQL Options in OpenStack
PDF
MySQL Group Replication - an Overview
PDF
OpenStack and MySQL
PPTX
MySQL DBaaS with OpenStack Trove
PPTX
Using MySQL in the Cloud
PDF
MySQL 5.7 GIS
Vitess VReplication: Standing on the Shoulders of a MySQL Giant
MongDB Mobile: Bringing the Power of MongoDB to Your Device
MongoDB Mobile: Bringing the Power of MongoDB to Your Device
Using MySQL Containers
Why MySQL High Availability Matters
MySQL High Availability -- InnoDB Clusters
Unlocking Big Data Insights with MySQL
OpenStack Days East -- MySQL Options in OpenStack
MySQL Group Replication - an Overview
OpenStack and MySQL
MySQL DBaaS with OpenStack Trove
Using MySQL in the Cloud
MySQL 5.7 GIS

Recently uploaded (20)

PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ai tools demonstartion for schools and inter college
PDF
medical staffing services at VALiNTRY
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
top salesforce developer skills in 2025.pdf
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Digital Strategies for Manufacturing Companies
PPTX
Introduction to Artificial Intelligence
PDF
System and Network Administration Chapter 2
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
System and Network Administraation Chapter 3
PPTX
Online Work Permit System for Fast Permit Processing
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms II-SECS-1021-03
PTS Company Brochure 2025 (1).pdf.......
Navsoft: AI-Powered Business Solutions & Custom Software Development
Softaken Excel to vCard Converter Software.pdf
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ai tools demonstartion for schools and inter college
medical staffing services at VALiNTRY
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Design an Analysis of Algorithms I-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
top salesforce developer skills in 2025.pdf
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Digital Strategies for Manufacturing Companies
Introduction to Artificial Intelligence
System and Network Administration Chapter 2
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
2025 Textile ERP Trends: SAP, Odoo & Oracle
System and Network Administraation Chapter 3
Online Work Permit System for Fast Permit Processing

Getting Started with MySQL Full Text Search

  • 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Full-Text Search Matt Lord MySQL Product Manager @mattalord
  • 3. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 3
  • 4. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Full-Text Search : Agenda 1 2 3 4 5 An Introduction to Full-Text Search Common Terms and Concepts What’s New in MySQL 5.6 and 5.7 A Real World Example Integration with Lucene, Solr, and Elasticsearch What’s Next for MySQL Full-Text Search 4 6
  • 5. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | An Introduction to Full-Text Search 5
  • 6. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What is it? • Search entire documents – Character based fields • VARCHAR, TEXT, BLOB • For a search string – Combinations of words – Phrases: “specific string to match” – Wildcards: * – Requirements: +, -, ~ – Expressions: (…) – Relevancy weight characters: <, > 6
  • 9. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What Would I Use it For? • Content management – What metadata should be used to describe the information – This helps to make your searches far more useful • Search services – What documents or meta-data contain certain terms or tokens – What documents are most relevant to the current view – What data do you think this user would be most interested in 9
  • 10. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | How Would I Use It? 10 StoreCollect IndexSearch • Collect search data – Existing documents describing the content – Generated metadata from the incoming content • Store the data – Within MySQL tables • Index the data – Add Full-Text indexes on the content columns • Allow for efficient searches – Provide users with an efficient way to search the content
  • 11. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Common Terms and Concepts 11
  • 12. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Common Terms • Token – Word or a series of characters • Dictionary – What words are related, mean the same thing, are abbreviations for, etc. • Stop Words – Words that should not be indexed • Relevancy and Weight – How should weight search terms and calculate document relevancy? 12
  • 13. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Tokens • Tokens – Words, or a series of characters that together form common meaning • Related Server options – innodb_ft_min_token_size – Don’t bother to index words shorter than this • These would typically be words that are invalid, or are extremely common – So they increase the size of the index and decrease search efficiency w/o real benefit – innodb_ft_max_token_size – Don’t bother to index words longer than this • These would typically be words that are invalid – So again, they increase the size of the index and decrease search efficiency w/o real benefit 13
  • 14. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Stop Words • Server options – innodb_ft_enable_stopword – Should stop words be used at all for new indexes? – innodb_ft_server_stopword_table – Use this global table for the list of stop words – innodb_ft_user_stopword_table – Use this table for my own stop word list • All of the above only affect indexes created while they are set – CREATE INDEX, ALTER TABLE, OPTIMIZE TABLE, ANALYZE TABLE • Default stop word list – SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD; 14
  • 15. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Relevancy and Weight • Term Frequency (TF) – Measure of how often a token/word appears in an individual document • Inverse Document Frequency (IDF) – Measure of how common a token/word is across all documents • Coordinate Level Matching – Number of query terms that are found within an individual document • How close together are the matching terms? • User Modifications – ‘<‘ and ‘>’ characters can be used to grant terms higher or lower weight – ‘+’ and ‘–’ characters can be used to require terms be present or absent 15
  • 16. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | A Full Text Index • It’s an inverted Index of relationships between tokens and documents 16 This movie is about a boy going to war. This movie is about a girl starting an auto- shop. This movie is about flowers. a about an are as at be by com de en for from how i in is it la of on or that the this to was what when where who will with und the www Min Token Size Max Token Size Document 1 Document 2 Document 3 Stop Words Token Size Full Text / Inverted Index ID TOKEN DOCUMENT 1 movie 1,2,3 2 boy 1 3 girl 2 4 going 1 5 starting 2 6 war 1 7 auto-shop 2 8 flowers 3 Token FiltersDocuments Tokenizer Tokenizer Indexer Indexer
  • 17. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Document Searches • Search for “movie about girl” • Term Frequency (TF) – “movie” occurs 1 time in Docs 1,2,3 – “girl” occurs 1 time in Doc 2 • No Doc has more than 1 occurrence of either word • Inverse Document Frequency (IDF) – “movie” occurs in Docs 1,2,3 – “girl” occurs only in Doc 2 • “girl” is more meaningful or “weighted” • Docs 1,2,3 match our search, but Doc 2 is most relevant 17 Full Text / Inverted Index ID TOKEN DOCUMENT 1 movie 1,2,3 2 boy 1 3 girl 2 4 going 1 5 starting 2 6 war 1 7 auto-shop 2 8 flowers 3
  • 18. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Additional Options & Variables • innodb_ft_aux_table – View index details for this table – Via the INNODB_FT_INDEX_TABLE, INNODB_FT_INDEX_CACHE, INNODB_FT_CONFIG, INNODB_FT_DELETED, and INNODB_FT_BEING_DELETED Information_Schema tables • innodb_ft_cache_size – In memory cache size for each index • innodb_ft_total_cache_size – Total in memory cache size limit per server • innodb_ft_num_word_optimize – Batch size used during tokenization • innodb_ft_result_cache_limit – In memory cache size limit for individual searches • innodb_ft_sort_pll_degree – Number of parallel threads to use during index builds 18
  • 19. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough • Now let’s quickly demonstrate all of these terms & concepts in action • We’ll use a very simple made up series of silly short stories 19
  • 20. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: Table and Data 20 mysql> create table short_stories (author varchar(100), story text); Query OK, 0 rows affected (0.23 sec) mysql> insert into short_stories values ("Matt Lord", "I've worked at MySQL and Oracle for about 12 years now. I'm currently the Product Manager for MySQL."); Query OK, 1 row affected (0.03 sec) mysql> insert into short_stories values ("Sid Lord", "I'm 10 years old. I like to eat and play video games. That's pretty much it."); Query OK, 1 row affected (0.12 sec) mysql> insert into short_stories values ("Lily Lord", "I'm almost 7 years old. I like to make art, play with toys, and play video games. And also, dress up. Yay!"); Query OK, 1 row affected (0.03 sec) • This is the table, column, and data that we’ll add a Full Text index on
  • 21. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: Custom Stop Words 21 mysql> create table example.ss_words select * from information_schema.INNODB_FT_DEFAULT_STOPWORD; Query OK, 36 rows affected (0.40 sec) mysql> insert into ss_words values (“oracle"), (“and”), (“like”); Query OK, 3 rows affected (0.04 sec) mysql> select group_concat(value) as stop_words from ss_wordsG *************************** 1. row *************************** stop_words: a,about,an,are,as,at,be,by,com,de,en,for,from,how,i,in,is,it,la,of,on,or,that,the,this,to,was,wha t,when,where,who,will,with,und,the,www,oracle,and,like 1 row in set (0.00 sec) mysql> set global innodb_ft_server_stopword_table="example/ss_words"; Query OK, 0 rows affected (0.00 sec) • This is how we define words that will NOT be included in the Full Text index
  • 22. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: Token Sizes • We can define the min and max token/word sizes – Words that fall outside of this min/max range will NOT be included in the index • And thus NOT used for searches • We set constraints on the min and max length of words/tokens that we want to include in the index – Very short or very long words are typically invalid or so common as to be worthless • E.g.: a, an, de, ta, someverylongsentencethataccidentallygotstucktogethersomehowwhoops • We’ll go with the defaults – innodb_ft_min_token_size=3 and innodb_ft_max_token_size=84 – Words/Tokens outside of the 3-84 character range are ignored for the index 22
  • 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: Adding the Index 23 mysql> alter table short_stories add fulltext index (story); Query OK, 0 rows affected, 1 warning (2.07 sec) # Here we’re setting up the information_schema views so that we can see the index # record details (on the next slide) mysql> set global innodb_ft_aux_table="example/short_stories"; Query OK, 0 rows affected (0.00 sec)
  • 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: The Final Index 24 mysql> select * from information_schema.INNODB_FT_INDEX_TABLE; +-----------+--------------+-------------+-----------+--------+----------+ | WORD | FIRST_DOC_ID | LAST_DOC_ID | DOC_COUNT | DOC_ID | POSITION | +-----------+--------------+-------------+-----------+--------+----------+ | almost | 4 | 4 | 1 | 4 | 4 | | also | 4 | 4 | 1 | 4 | 86 | | art | 4 | 4 | 1 | 4 | 39 | | currently | 2 | 2 | 1 | 2 | 60 | | dress | 4 | 4 | 1 | 4 | 92 | | eat | 3 | 3 | 1 | 3 | 28 | | games | 3 | 4 | 2 | 3 | 47 | | games | 3 | 4 | 2 | 4 | 75 | … | video | 3 | 4 | 2 | 3 | 41 | | video | 3 | 4 | 2 | 4 | 69 | | worked | 2 | 2 | 1 | 2 | 5 | | yay | 4 | 4 | 1 | 4 | 102 | | years | 2 | 4 | 3 | 2 | 45 | | years | 2 | 4 | 3 | 3 | 7 | | years | 2 | 4 | 3 | 4 | 13 | +-----------+--------------+-------------+-----------+--------+----------+ 29 rows in set (0.00 sec)
  • 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Example Walkthrough: Our Final Sample Query 25 mysql> SELECT author, story, MATCH(story) AGAINST("toys and games") AS relevancy -> FROM short_stories WHERE MATCH(story) AGAINST("toys and games") -> ORDER BY relevancy DESCG *************************** 1. row *************************** author: Lily Lord story: I'm almost 7 years old. I like to make art, play with toys, and play video games. And also, dress up. Yay! relevancy: 0.25865283608436584 *************************** 2. row *************************** author: Sid Lord story: I'm 10 years old. I like to eat and play video games. That's pretty much it. relevancy: 0.031008131802082062 2 rows in set (0.00 sec)
  • 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What’s New in MySQL 5.6 and 5.7 26
  • 27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What’s New? • MySQL 5.6 – InnoDB Full-Text Index support • Fully ACID compliant, MVCC search • With performance improvements over MyISAM • Easily customizable stop-word lists • MySQL 5.7 – Pluggable Full-Text Parser support – CJK Support • N-gram parser for Chinese, Japanese, and Korean • MeCab parser for Japanese 27
  • 28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | A Real World Example 28
  • 29. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | An Internal Content Management System • I have tons of valuable business related content – But it’s spread across various locations and formats • Wiki pages, PPTs, Word Docs, Txt docs, … – How can I ingest, aggregate, and correlate this data – How can I provide a useful search tool • Let’s build something to vastly increase the value of our intranet content – Something similar to Google Desktop search or Apple’s Spotlight • But for the vast amounts of data strewn across our company intranet – We can then incorporate the search into a MySQL based intranet tool 29
  • 30. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Gathering The Contents of Our Existing Data • Use any existing metadata that you already have • Pull metadata from existing files – Specialized tools to extract metadata • Exiftool to gather metadata on image files & Exif2maps to pull location data from image files • Taglib to pull metadata from sound files • `libreoffice –headess –convert-to …` to extract plain text from Office formats • GNU Libextractor to pull metadata and location data from all file types • Extract text content from binary format files (.ppt, .doc, .pdf, etc.) – Apache Tika (originally part of Lucene) • Auto-detects file format and uses appropriate parsing library • Extracts metadata and structured text content from all popular/common document and file formats 30
  • 31. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Apache Tika and MySQL 31 Extract Plain Text Load Text Docs
  • 32. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Apache Tika Example • Downloads, docs, etc. can be found at https://tika.apache.org 32 shell> java -jar tika-app-1.7.jar -z -t /tmp/MySQL_FTS.pptx Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 1 MySQL Full-Text Search Matt Lord MySQL Product Manager 2 Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 2 3 Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment …
  • 33. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Apache Tika Example Cont. 33 shell> ls /tmp/*.p* /tmp/MySQL_5.7_GIS.pptx /tmp/MySQL_5.7_GIS_reborn.pptx /tmp/MySQL_FTS.pptx /tmp/MySQLGroupReplication.pdf shell> for file in `ls /tmp/*.p*`; do java -jar tika-app-1.7.jar -z -t $file > $file.txt && echo -n "#DOC_END" >> $file.txt; done shell> ls /tmp/*.txt /tmp/MySQL_5.7_GIS.pptx.txt /tmp/MySQL_5.7_GIS_reborn.pptx.txt /tmp/MySQL_FTS.pptx.txt /tmp/MySQLGroupReplication.pdf.txt shell> sed -n '55,62'p /tmp/MySQLGroupReplication.pdf.txt Program Agenda MySQL Group Replication Background Zoom in: Major Building Blocks Zoom in: The Complete Stack
  • 34. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Our MySQL Table 34 mysql> show create table intranet_docG *************************** 1. row *************************** Table: intranet_doc Create Table: CREATE TABLE `intranet_doc` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `type` varchar(50) DEFAULT NULL, `fs_path` varchar(200) DEFAULT NULL, `doc_host` varchar(60) DEFAULT NULL, `txt_content` longtext, PRIMARY KEY (`id`), KEY `type` (`type`), FULLTEXT KEY `txt_content` (`txt_content`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.01 sec)
  • 35. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Loading in the Text Content 35 shell> for file in `ls /tmp/*.txt`; do mysql -D intranet_search -e "load data infile '$file' into table intranet_doc lines terminated by '#DOC_END' (txt_content) SET fs_path='$file', doc_host='`uname -n`', type=substring_index(substring_index('$file', '.', -2), '.', 1) "; done mysql> select fs_path, type, doc_host from intranet_doc; +------------------------------------+------+-------------------+ | fs_path | type | doc_host | +------------------------------------+------+-------------------+ | /tmp/MySQL_5.7_GIS.pptx.txt | pptx | mylab.localdomain | | /tmp/MySQL_5.7_GIS_reborn.pptx.txt | pptx | mylab.localdomain | | /tmp/MySQL_FTS.pptx.txt | pptx | mylab.localdomain | | /tmp/MySQLGroupReplication.pdf.txt | pdf | mylab.localdomain | +------------------------------------+------+-------------------+ 4 rows in set (0.00 sec)
  • 36. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Our Final Search Query • Search for PowerPoint docs that mention Apache Tika 36 mysql> SELECT fs_path, doc_host, type -> FROM intranet_doc -> WHERE type LIKE "ppt%" -> AND MATCH(txt_content) AGAINST ("+Tika"); +-------------------------+-------------------+------+ | fs_path | doc_host | type | +-------------------------+-------------------+------+ | /tmp/MySQL_FTS.pptx.txt | mylab.localdomain | pptx | +-------------------------+-------------------+------+ 1 row in set (0.00 sec)
  • 37. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Integration with Lucene/Solr/Elasticsearch 37
  • 38. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Apache Lucene • Lucene is the core Full-text search library – Written in Java • Originally created by Doug Cutting (creator of Hadoop) • Open source project (since 2003) • Mature • Easy to learn API • Stores its indexes as files on disk • Solr and Elasticsearch provide web services built on top of Lucene 38
  • 39. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | MySQL Native Full Text VS Lucene • Eliminates complexity • Single canonical source • No need for synchronization • Single query language (SQL) • No additional maintenance • Use – MySQL based app with basic full-text search • e.g. E-commerce app with a product description search • Supports very complex searches • Supports stemming & fuzzy searches • Very scalable • Rich document handling (PDF, PPT, …) • Easy to use RESTful web services – Solr, Elasticsearch, … • Use – Full blown advanced search focused app • e.g. IMDB 39
  • 40. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Solr and MySQL 40 • Create simple custom DataImportHandler – http://wiki.apache.org/solr/ DataImportHandler • Full and incremental indexing • Scheduled re-indexing to keep the two in sync
  • 41. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Solr and MySQL 41 Custom DataImportHandler XML MySQL Connector/J • Easy integration – Index sample sakila database • http://localhost:8983/solr/sakila/collection1/dataimport?command=full-import
  • 42. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Elasticsearch and MySQL • Easy integration – Index sample sakila.country table • curl -XPUT 'localhost:9200/_river/sakila_country/_meta' -d '{ "type" : "jdbc", "jdbc" : { "url" : "jdbc:mysql://localhost:3306/sakila", "user" : “root", "password" : “mypass", "sql" : "select * from country" } }' 42 JDBC River Plugin MySQL Connector/J
  • 43. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | What’s Next for MySQL Full-Text Search 43
  • 44. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Additional Features • Improved performance • More efficient disk space usage • Support for stemming and facets • Support for fuzzy string searches • Support for aliases, synonyms, abbreviations, etc. • Proximity search and use in relevancy scores • Automatic ordering by relevancy • What else would you like to see? – Let us know! 44
  • 45. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Appendix : Additional Resources • Manual – https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html • Community forum – http://forums.mysql.com/list.php?107 • Apache Tika – https://tika.apache.org • Report Full-Text bugs and submit feature requests – http://bugs.mysql.com/ 45
  • 46. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 46

Editor's Notes

  • #4: This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your presentation covers material affected by Oracle’s Revenue Recognition Policy To learn more about this policy, e-mail: Revrec-americasiebc_us@oracle.com For internal communication, Safe Harbor Statements are not required. However, there is an applicable disclaimer (Exhibit E) that should be used, found in the Oracle Revenue Recognition Policy for Future Product Communications. Copy and paste this link into a web browser, to find out more information.   http://my.oracle.com/site/fin/gfo/GlobalProcesses/cnt452504.pdf For all external communications such as press release, roadmaps, PowerPoint presentations, Safe Harbor Statements are required. You can refer to the link mentioned above to find out additional information/disclaimers required depending on your audience.
  • #47: This is a Safe Harbor Back slide, one of two Safe Harbor Statement slides included in this template. One of the Safe Harbor slides must be used if your presentation covers material affected by Oracle’s Revenue Recognition Policy To learn more about this policy, e-mail: Revrec-americasiebc_us@oracle.com For internal communication, Safe Harbor Statements are not required. However, there is an applicable disclaimer (Exhibit E) that should be used, found in the Oracle Revenue Recognition Policy for Future Product Communications. Copy and paste this link into a web browser, to find out more information.   http://my.oracle.com/site/fin/gfo/GlobalProcesses/cnt452504.pdf For all external communications such as press release, roadmaps, PowerPoint presentations, Safe Harbor Statements are required. You can refer to the link mentioned above to find out additional information/disclaimers required depending on your audience.