MYSQL Indexing
Notes taken from High Performance MYSQL (Chapter 6) and various internet sources.
Basics
Indexes (also called ‘Keys’ in MYSQL) are data structures that storage engines use to
find rows quickly.
Indexes are critical for good performance.
Poor indexing can also cause a lot of unwanted issues and a cause of real-world
problems.
Optimal Indexing can improve performance by multiple orders of magnitude.
An index contains values from one column or more columns in the table. Order of
columns is very important in an index because MYSQL can only efficiently search on
leftmost prefix of the index. Creating an index on two columns is not same as
creating two separate single column indexes.
Types Of Indexes
B-Tree indexes:
It uses B-Tree data structure to store its data.
The general idea of all that values are stored in order, and each leaf page is the same
distance from the root.
A B-Tree index speeds up data access because the storage engine doesn’t have to
scan the whole table to find the desired data. Instead, it starts at the root node. The
slots in the root node hold pointers to child nodes, and the storage engine follows
these pointers.
The storage engine finds the right pointer by looking at the values in the node pages,
which define the upper and lower bounds of values of the child nodes.
Eventually, the storage engine either determines that the desired value doesn’t
exists or successfully reaches a leaf page.
Leaf pages are special, because they have pointers to indexed data instead of
pointers to other pages.
Types Of Queries that can use a B-Tree Index: An index on last name, first name and DOB.
Match the full value: Match on full key value specifies values for all columns in the
index.
SELECT * from user_table where last_name=”Kumar”
AND first_name=”Ankit”
and DOB=”2010-12-09”;
This query utilizes all the columns of the index.
Match a leftmost prefix: This index can help us find all the people with last name
Allen. This uses only first column in the index.
SELECT * from user_table where last_name =”Kumar” ;
Match a column prefix: We can match on the first part of the column’s value .We can
match last names which begin with “H”. This only uses the first column of the index.
Match a range of values: This index can help us to find people with last names are
between Kumar and Singh. This only uses the first column of the index.
Match one part exactly and match range on another part: This index can help find
people with last_name is Kumar and first_name starts with letter K. This is an exact
match on last_name and range query on second column of index i.e is the first name.
Index only queries: B-Tree indexes can normally support index-only queries, which
are queries that access only the index , not the row storage.
In general , if B-Tree can help you find a row in a particular way , it can help you sort rows by
the same criteria.
Limitations of B-Tree index:
These are not useful if the lookup does not start from the leftmost prefix of the index
i.e the last_name in this case. If we only search on first_name or DOB , then this
index cannot be used.
We cannot skip columns in the index i.e we cannot query on last_name and DOB
only . Index is not used in this case.
The storage engine cannot optimize accesses with any columns to the right of the
first range condition i.e if query is last_name=”Kumar” and first_name LIKE “J%” And
DOB=”2010-09-09”. The index access will use only the first two columns in the index
because the LIKE is a range condition.
For optimal performance we might need to create indexes with same columns in different
orders to satisfy our queries.
Read about Hash Indexes, Spatial Indexes and Full text indexes from the book.