In general, when you want to make a
slow SELECT ... WHERE faster, the first thing to check is whether or not you
can add an index.
All references between different
tables should usually be done with indexes.
You can use the EXPLAIN command to
determine which indexes are used for a SELECT.
Some general tips:
To help MySQL optimize queries better,
run myisamchk --analyze on a table after it has been loaded with relevant data.
This updates a value for each index part that indicates the average number of
rows that have the same value. (For unique indexes, this is always 1, of
course.). MySQL will use this to decide which index to choose when you connect
two tables with 'a non-constant expression'. You can check the result from the
analyze run by doing SHOW INDEX FROM table_name and examining the Cardinality
column.
To sort an index and data according to
an index, use myisamchk --sort-index --sort-records=1 (if you want to sort on
index 1). If you have a unique index from which you want to read all records in
order according to that index, this is a good way to make that faster. Note,
however, that this sorting isn't written optimally and will take a long time
for a large table!
How MySQL Optimizes WHERE Clauses ?
The WHERE optimizations are put in the
SELECT part here because they are mostly used with SELECT, but the same
optimizations apply for WHERE in DELETE and UPDATE statements.
Also note that this section is
incomplete. MySQL does many optimizations, and we have not had time to document
them all.
Some of the optimizations performed by
MySQL are listed below:
Removal of unnecessary parentheses:
((a AND b) AND c OR (((a AND b) AND (c
AND d))))
-> (a AND b AND c) OR (a AND b AND
c AND d)
Constant folding:
(a
-> b>5 AND b=c AND a=5
Constant condition removal (needed
because of constant folding):
(B>=5 AND B=5) OR (B=6 AND 5=50) OR
(B=7 AND 5=6)
-> B=5 OR B=6
Constant expressions used by indexes
are evaluated only once.
COUNT(*) on a single table without a
WHERE is retrieved directly from the table information. This is also done for
any NOT NULL expression when used with only one table.
Early detection of invalid constant
expressions. MySQL quickly detects that some SELECT statements are impossible
and returns no rows.
HAVING is merged with WHERE if you
don't use GROUP BY or group functions (COUNT(), MIN()...).
For each sub-join, a simpler WHERE is
constructed to get a fast WHERE evaluation for each sub-join and also to skip
records as soon as possible.
All constant tables are read first,
before any other tables in the query. A constant table is:
An empty table or a table with 1 row.
A table that is used with a WHERE
clause on a UNIQUE index, or a PRIMARY KEY, where all index parts are used with
constant expressions and the index parts are defined as NOT NULL.
All the following tables are used as
constant tables:
mysql> SELECT * FROM t WHERE
primary_key=1;
mysql> SELECT * FROM t1,t2
WHERE t1.primary_key=1 AND
t2.primary_key=t1.id;
The best join combination to join the
tables is found by trying all possibilities. If all columns in ORDER BY and in
GROUP BY come from the same table, then this table is preferred first when
joining.
If there is an ORDER BY clause and a
different GROUP BY clause, or if the ORDER BY or GROUP BY contains columns from
tables other than the first table in the join queue, a temporary table is
created.
If you use SQL_SMALL_RESULT, MySQL
will use an in-memory temporary table.
Each table index is queried, and the
best index that spans fewer than 30% of the rows is used. If no such index can
be found, a quick table scan is used.
In some cases, MySQL can read rows
from the index without even consulting the data file. If all columns used from
the index are numeric, then only the index tree is used to resolve the query.
Before each record is output, those
that do not match the HAVING clause are skipped.
Some examples of queries that are very
fast:
mysql> SELECT COUNT(*) FROM
tbl_name;
mysql> SELECT
MIN(key_part1),MAX(key_part1) FROM tbl_name;
mysql> SELECT MAX(key_part2) FROM
tbl_name
WHERE key_part_1=constant;
mysql> SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,... LIMIT
10;
mysql> SELECT ... FROM tbl_name
ORDER BY key_part1 DESC,key_part2
DESC,... LIMIT 10;
The following queries are resolved
using only the index tree (assuming the indexed columns are numeric):
mysql> SELECT key_part1,key_part2
FROM tbl_name WHERE key_part1=val;
mysql> SELECT COUNT(*) FROM tbl_name
WHERE key_part1=val1 AND
key_part2=val2;
mysql> SELECT key_part2 FROM
tbl_name GROUP BY key_part1;
The following queries use indexing to
retrieve the rows in sorted order without a separate sorting pass:
mysql> SELECT ... FROM tbl_name
ORDER BY key_part1,key_part2,..
. mysql> SELECT ... FROM tbl_name
ORDER BY key_part1 DESC,key_part2 DESC,...
How MySQL Optimizes DISTINCT ?
DISTINCT is converted to a GROUP BY on
all columns, DISTINCT combined with ORDER BY will in many cases also need a
temporary table.
When combining LIMIT # with DISTINCT,
MySQL will stop as soon as it finds # unique rows.
If you don't use columns from all used
tables, MySQL will stop the scanning of the not used tables as soon as it has
found the first match.
SELECT DISTINCT t1.a FROM t1,t2 where
t1.a=t2.a;
In the case, assuming t1 is used
before t2 (check with EXPLAIN), then MySQL will stop reading from t2 (for that
particular row in t1) when the first row in t2 is found.
How MySQL Optimizes LEFT JOIN and
RIGHT JOIN ?
A LEFT JOIN B in MySQL is implemented
as follows:
The table B is set to be dependent on
table A and all tables that A is dependent on.
The table A is set to be dependent on
all tables (except B) that are used in the LEFT JOIN condition.
All LEFT JOIN conditions are moved to
the WHERE clause.
All standard join optimizations are
done, with the exception that a table is always read after all tables it is
dependent on. If there is a circular dependence then MySQL will issue an error.
All standard WHERE optimizations are
done.
If there is a row in A that matches
the WHERE clause, but there wasn't any row in B that matched the LEFT JOIN
condition, then an extra B row is generated with all columns set to NULL.
If you use LEFT JOIN to find rows that
don't exist in some table and you have the following test: column_name IS NULL
in the WHERE part, where column_name is a column that is declared as NOT NULL,
then MySQL will stop searching after more rows (for a particular key
combination) after it has found one row that matches the LEFT JOIN condition.
RIGHT JOIN is implemented analogously
as LEFT JOIN.
The table read order forced by LEFT
JOIN and STRAIGHT JOIN will help the join optimizer (which calculates in which
order tables should be joined) to do its work much more quickly, as there are
fewer table permutations to check.
Note that the above means that if you
do a query of type:
SELECT * FROM a,b LEFT JOIN c ON
(c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key
MySQL will do a full scan on b as the
LEFT JOIN will force it to be read before d.
The fix in this case is to change the
query to:
SELECT * FROM b,a LEFT JOIN c ON
(c.key=a.key) LEFT JOIN d (d.key=a.key) WHERE b.key=d.key
How MySQL Optimizes LIMIT ?
In some cases MySQL will handle the
query differently when you are using LIMIT # and not using HAVING:
If you are selecting only a few rows
with LIMIT, MySQL will use indexes in some cases when it normally would prefer
to do a full table scan.
If you use LIMIT # with ORDER BY,
MySQL will end the sorting as soon as it has found the first # lines instead of
sorting the whole table.
When combining LIMIT # with DISTINCT,
MySQL will stop as soon as it finds # unique rows.
In some cases a GROUP BY can be
resolved by reading the key in order (or do a sort on the key) and then
calculate summaries until the key value changes. In this case LIMIT # will not
calculate any unnecessary GROUP BY's.
As soon as MySQL has sent the first #
rows to the client, it will abort the query.
LIMIT 0 will always quickly return an
empty set. This is useful to check the query and to get the column types of the
result columns.
The size of temporary tables uses the
LIMIT # to calculate how much space is needed to resolve the query.
No comments:
Post a Comment