Unsorted tips for faster systems:
Use persistent connections to the
database to avoid the connection overhead. If you can't use persistent
connections and you are doing a lot of new connections to the database, you may
want to change the value of the thread_cache_size variable.
Always check that all your queries
really use the indexes you have created in the tables. In MySQL you can do this
with the EXPLAIN command.
Try to avoid complex SELECT queries on
tables that are updated a lot. This is to avoid problems with table locking.
The new MyISAM tables can insert rows in a table without deleted rows at the
same time another table is reading from it. If this is important for you, you
should consider methods where you don't have to delete rows or run OPTIMIZE TABLE
after you have deleted a lot of rows.
Use ALTER TABLE ... ORDER BY
expr1,expr2... if you mostly retrieve rows in expr1,expr2.. order. By using
this option after big changes to the table, you may be able to get higher
performance.
In some cases it may make sense to
introduce a column that is 'hashed' based on information from other columns. If
this column is short and reasonably unique it may be much faster than a big
index on many columns. In MySQL it's very easy to use this extra column: SELECT
* FROM table_name WHERE hash=MD5(concat(col1,col2)) AND col_1='constant' AND
col_2='constant' For tables that change a lot you should try to avoid all
VARCHAR or BLOB columns. You will get dynamic row length as soon as you are
using a single VARCHAR or BLOB column.
It's not normally useful to split a
table into different tables just because the rows gets 'big'. To access a row,
the biggest performance hit is the disk seek to find the first byte of the row.
After finding the data most new disks can read the whole row fast enough for
most applications. The only cases where it really matters to split up a table
is if it's a dynamic row size table (see above) that you can change to a fixed
row size, or if you very often need to scan the table and don't need most of the
columns.
If you very often need to calculate
things based on information from a lot of rows (like counts of things), it's
probably much better to introduce a new table and update the counter in real
time. An update of type UPDATE table set count=count+1 where
index_column=constant is very fast! This is really important when you use
databases like MySQL that only have table locking (multiple readers / single
writers). This will also give better performance with most databases, as the
row locking manager in this case will have less to do.
If you need to collect statistics from
big log tables, use summary tables instead of scanning the whole table.
Maintaining the summaries should be much faster than trying to do statistics
'live'. It's much faster to regenerate new summary tables from the logs when
things change (depending on business decisions) than to have to change the
running application! If possible, one should classify reports as 'live' or
'statistical', where data needed for statistical reports are only generated
based on summary tables that are generated from the actual data.
Take advantage of the fact that
columns have default values. Insert values explicitly only when the value to be
inserted differs from the default. This reduces the parsing that MySQL need to
do and improves the insert speed. In some cases it's convenient to pack and
store data into a blob. In this case you have to add some extra code in your
appliction to pack/unpack things in the blob, but this may save a lot of
accesses at some stage. This is practical when you have data that doesn't
conform to a static table structure.
Normally you should try to keep all
data non-redundant (what is called 3rd normal form in database theory), but you
should not be afraid of duplicating things or creating summary tables if you
need these to gain more speed.
Stored procedures or UDF (user-defined
functions) may be a good way to get more performance. In this case you should,
however, always have a way to do this some other (slower) way if you use some
database that doesn't support this. You can always gain something by caching
queries/answers in your application and trying to do many inserts/updates at
the same time. If your database supports lock tables (like MySQL and Oracle),
this should help to ensure that the index cache is only flushed once after all
updates.
Use INSERT /*! DELAYED */ when you do
not need to know when your data is written. This speeds things up because many
records can be written with a single disk write.
Use INSERT /*! LOW_PRIORITY */ when
you want your selects to be more important.
Use SELECT /*! HIGH_PRIORITY */ to get
selects that jump the queue. That is, the select is done even if there is
somebody waiting to do a write.
Use the multi-line INSERT statement to
store many rows with one SQL command (many SQL servers supports this).
Use LOAD DATA INFILE to load bigger
amounts of data. This is faster than normal inserts and will be even faster
when myisamchk is integrated in mysqld.
Use AUTO_INCREMENT columns to make
unique values.
Use OPTIMIZE TABLE once in a while to
avoid fragmentation when using dynamic table format.
Use HEAP tables to get more speed when
possible.
When using a normal Web server setup,
images should be stored as files. That is, store only a file reference in the
database. The main reason for this is that a normal Web server is much better
at caching files than database contents. So it it's much easier to get a fast
system if you are using files.
Use in memory tables for non-critical
data that are accessed often (like information about the last shown banner for
users that don't have cookies).
Columns with identical information in
different tables should be declared identical and have identical names. Before
Version 3.23 you got slow joins otherwise. Try to keep the names simple (use
name instead of customer_name in the customer table). To make your names
portable to other SQL servers you should keep them shorter than 18 characters.
If you need REALLY high speed, you
should take a look at the low-level interfaces for data storage that the
different SQL servers support! For example, by accessing the MySQL MyISAM
directly, you could get a speed increase of 2-5 times compared to using the SQL
interface. To be able to do this the data must be on the same server as the
application, and usually it should only be accessed by one process (because
external file locking is really slow). One could eliminate the above problems
by introducing low-level MyISAM commands in the MySQL server (this could be one
easy way to get more performance if needed). By carefully designing the
database interface, it should be quite easy to support this types of
optimization. In many cases it's faster to access data from a database (using a
live connection) than accessing a text file, just because the database is
likely to be more compact than the text file (if you are using numerical data),
and this will involve fewer disk accesses. You will also save code because you
don't have to parse your text files to find line and column boundaries. You can
also use replication to speed things up.
Declaring a table with
DELAY_KEY_WRITE=1 will make the updating of indexes faster, as these are not
logged to disk until the file is closed. The downside is that you should run
myisamchk on these tables before you start mysqld to ensure that they are okay
if something killed mysqld in the middle. As the key information can always be
generated from the data, you should not lose anything by using DELAY_KEY_WRITE.
No comments:
Post a Comment