In most cases you can estimate the
performance by counting disk seeks. For small tables, you can usually find the
row in 1 disk seek (as the index is probably cached). For bigger tables, you
can estimate that (using B++ tree indexes) you will need: log(row_count) /
log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1
seeks to find a row.
In MySQL an index block is usually
1024 bytes and the data pointer is usually 4 bytes. A 500,000 row table with an
index length of 3 (medium integer) gives you: log(500,000)/log(1024/3*2/(3+4))
+ 1 = 4 seeks.
As the above index would require about
500,000 * 7 * 3/2 = 5.2M, (assuming that the index buffers are filled to 2/3,
which is typical) you will probably have much of the index in memory and you
will probably only need 1-2 calls to read data from the OS to find the row.
For writes, however, you will need 4
seek requests (as above) to find where to place the new index and normally 2
seeks to update the index and write the row.
Note that the above doesn't mean that
your application will slowly degenerate by N log N! As long as everything is
cached by the OS or SQL server things will only go marginally slower while the
table gets bigger. After the data gets too big to be cached, things will start
to go much slower until your applications is only bound by disk-seeks (which
increase by N log N). To avoid this, increase the index cache as the data
grows.
No comments:
Post a Comment