Wednesday, July 15, 2015

MySQL - Design Choices



MySQL keeps row data and index data in separate files. Many (almost all) other databases mix row and index data in the same file. We believe that the MySQL choice is better for a very wide range of modern systems.

Another way to store the row data is to keep the information for each column in a separate area (examples are SDBM and Focus). This will cause a performance hit for every query that accesses more than one column. Because this degenerates so quickly when more than one column is accessed, we believe that this model is not good for general purpose databases.

The more common case is that the index and data are stored together (like in Oracle/Sybase et al). In this case you will find the row information at the leaf page of the index. The good thing with this layout is that it, in many cases, depending on how well the index is cached, saves a disk read. The bad things with this layout are:

Table scanning is much slower because you have to read through the indexes to get at the data.
You can't use only the index table to retrieve data for a query.
You lose a lot of space, as you must duplicate indexes from the nodes (as you can't store the row in the nodes).
Deletes will degenerate the table over time (as indexes in nodes are usually not updated on delete).
It's harder to cache ONLY the index data.

MySQL Design Limitations/Tradeoffs
Because MySQL uses extremely fast table locking (multiple readers / single writers) the biggest remaining problem is a mix of a steady stream of inserts and slow selects on the same table.

We believe that for a huge number of systems the extremely fast performance in other cases make this choice a win. This case is usually also possible to solve by having multiple copies of the table, but it takes more effort and hardware.

We are also working on some extensions to solve this problem for some common application niches.

No comments: