Step 1 — Creating Test Data
In order to try full-text search, we need some data. In this step, we’ll create a database
called testdb with a table called news, which we’ll populate with some example data
representing articles from a fictional news aggregator site.
Note: If you have your own table with text data that you’d like to use instead, you can skip
to Step 2 and make appropriate substitutions while following along.
First, access the MySQL console. You’ll be prompted to enter the root password you set
when you installed MySQL.
mysql -u root -p
Once you’re connected, your prompt will change to mysql>.
Next, create a new database called testdb. This database will contain the test data.
CREATE DATABASE testdb;
Switch to using the testdb database by default so you won’t have to specify the
database’s name to create or update things in it.
USE testdb;
Next, create a table in the database called news with columns for an example news
aggregator’s articles.
CREATE TABLE news (
id INT NOT NULL AUTO_INCREMENT,
title TEXT NOT NULL,
content TEXT NOT NULL,
author TEXT NOT NULL,
PRIMARY KEY (id)
);
Let’s walk through what this command does:
CREATE TABLE is a SQL command that creates a table, similar to many other
databases.
news is the name of the table.
title, content and author are textual columns with unlimited length.
NOT NULL is a declaration used to mark the columns that cannot have null
values (although they may contain empty strings).
id is the table’s primary index with the special type AUTO_INCREMENT, which
automatically fills in the ID field with the next available ID.
Now add some example data to the table.
INSERT INTO news (id, title, content, author) VALUES
(1, 'Pacific Northwest high-speed rail line', 'Currently there are
only a few options for traveling the 140 miles between Seattle and
Vancouver and none of them are ideal.', 'Greg'),
(2, 'Hitting the beach was voted the best part of life in the region',
'Exploring tracks and trails was second most popular, followed by visiting
the shops and then traveling to local parks.', 'Ethan'),
(3, 'Machine Learning from scratch', 'Bare bones implementations of
some of the foundational models and algorithms.', 'Jo');
Let’s walk through what this command does:
INSERT inserts data.
INTO specifies where the data should be inserted. In this case, it’s the news table.
(id, title, content, author) VALUES specifies the columns where each
entry’s data values should be stored.
The last three lines are the three rows of data we’re adding to the table. Each
contains an example article for a news website with a title, some content, and
the author’s name.
Each entry also has a unique identifier which is automatically entered into the database
index. The database index is a data structure that improves the performance of data
retrieval operations. This index is stored separately from the main data. It updates on
any change in the table content at the cost of additional writes and comparatively little
storage space. Its small size and tailored data structure allow indexes to operate much
more effectively than using the main table space for selecting queries.
Now that we have some data, we can start writing queries to search that data using FTS.
Step 2 — Creating a FTS Index and Using FTS Functions
Let’s make an index for the text columns we have so we can use FTS.
To do this, we’ll use a MySQL-exclusive command called FULLTEXT. This command tells
MySQL to put all the fields we want to be able to search with FTS into an internal index.
ALTER TABLE news ADD FULLTEXT (title, content, author);
This works by combining all of the text columns and sanitizing them (e.g. removing the
punctuation and making uppercase letters lowercase). Now that this index is created, it
will be updated by any SQL query that changes the content of the source table.
Next, try performing a full-text search for “Seattle beach” using the function MATCH()
AGAINST().
SELECT * FROM news WHERE MATCH (title,content,author) AGAINST ('Seattle
beach' IN NATURAL LANGUAGE MODE)\G
The MATCH() portion of the command specifies which set of columns are indexed using
FTS; it must match the column list you used to create the index. The AGAINST() portion
specifies which word we’re performing a full-text search for, which is “Seattle beach” in
this example.
IN NATURAL LANGUAGE MODE means the search words are provided directly from user
input without any pre-processing. MySQL assumes natural language mode by default,
so you don’t have to specify it explicitly.
Note: In comparison to natural language mode, word stemming is another useful FTS
technique which makes the index strip the affix of a word, storing only the root portion. For
example, the words “fits” and “fitted” would be identical using FTS with word stemming.
Unfortunately, MySQL doesn’t support word stemming. Stemming is in MySQL’s worklog,
but there’s no time frame to implement and release it yet. FTS is still useful because it’s still
much faster than LIKE clauses. If you’d like to use word stemming, you can investigate
integrating with the Snowball library.