Wednesday, July 15, 2015

MySQL - Portability



Because all SQL servers implement different parts of SQL, it takes work to write portable SQL applications. For very simple selects/inserts it is very easy, but the more you need the harder it gets. If you want an application that is fast with many databases it becomes even harder!

To make a complex application portable you need to choose a number of SQL servers that it should work with.

You can use the MySQL crash-me program/web-page http://www.mysql.com/information/crash-me.php to find functions, types, and limits you can use with a selection of database servers. Crash-me now tests far from everything possible, but it is still comprehensive with about 450 things tested.

For example, you shouldn't have column names longer than 18 characters if you want to be able to use Informix or DB2.

Both the MySQL benchmarks and crash-me programs are very database-independent. By taking a look at how we have handled this, you can get a feeling for what you have to do to write your application database-independent. The benchmarks themselves can be found in the `sql-bench' directory in the MySQL source distribution. They are written in Perl with DBI database interface (which solves the access part of the problem).

See http://www.mysql.com/information/benchmarks.html for the results from this benchmark.

As you can see in these results, all databases have some weak points. That is, they have different design compromises that lead to different behavior.

If you strive for database independence, you need to get a good feeling for each SQL server's bottlenecks. MySQL is VERY fast in retrieving and updating things, but will have a problem in mixing slow readers/writers on the same table. Oracle, on the other hand, has a big problem when you try to access rows that you have recently updated (until they are flushed to disk). Transaction databases in general are not very good at generating summary tables from log tables, as in this case row locking is almost useless.

To get your application really database-independent, you need to define an easy extendable interface through which you manipulate your data. As C++ is available on most systems, it makes sense to use a C++ classes interface to the databases.

If you use some specific feature for some database (like the REPLACE command in MySQL), you should code a method for the other SQL servers to implement the same feature (but slower). With MySQL you can use the /*! */ syntax to add MySQL-specific keywords to a query. The code inside /**/ will be treated as a comment (ignored) by most other SQL servers.

If REAL high performance is more important than exactness, as in some Web applications, a possibility is to create an application layer that caches all results to give you even higher performance. By letting old results 'expire' after a while, you can keep the cache reasonably fresh. This is quite nice in case of extremely high load, in which case you can dynamically increase the cache and set the expire timeout higher until things get back to normal.

In this case the table creation information should contain information of the initial size of the cache and how often the table should normally be refreshed.

What Have We Used MySQL For?
During MySQL initial development, the features of MySQL were made to fit our largest customer. They handle data warehousing for a couple of the biggest retailers in Sweden.

From all stores, we get weekly summaries of all bonus card transactions, and we are expected to provide useful information for the store owners to help them find how their advertisement campaigns are affecting their customers.

The data is quite huge (about 7 million summary transactions per month), and we have data for 4-10 years that we need to present to the users. We got weekly requests from the customers that they want to get 'instant' access to new reports from this data.

We solved this by storing all information per month in compressed 'transaction' tables. We have a set of simple macros (script) that generates summary tables grouped by different criteria (product group, customer id, store ...) from the transaction tables. The reports are Web pages that are dynamically generated by a small Perl script that parses a Web page, executes the SQL statements in it, and inserts the results. We would have used PHP or mod_perl instead but they were not available at that time.

For graphical data we wrote a simple tool in C that can produce GIFs based on the result of a SQL query (with some processing of the result). This is also dynamically executed from the Perl script that parses the HTML files.

In most cases a new report can simply be done by copying an existing script and modifying the SQL query in it. In some cases, we will need to add more fields to an existing summary table or generate a new one, but this is also quite simple, as we keep all transactions tables on disk. (Currently we have at least 50G of transactions tables and 200G of other customer data.)

We also let our customers access the summary tables directly with ODBC so that the advanced users can themselves experiment with the data.

We haven't had any problems handling this with quite modest Sun Ultra SPARCstation (2x200 Mhz). We recently upgraded one of our servers to a 2 CPU 400 Mhz UltraSPARC, and we are now planning to start handling transactions on the product level, which would mean a ten-fold increase of data. We think we can keep up with this by just adding more disk to our systems.

We are also experimenting with Intel-Linux to be able to get more CPU power cheaper. Now that we have the binary portable database format (new in Version 3.23), we will start to use this for some parts of the application.

Our initial feelings are that Linux will perform much better on low-to-medium load and Solaris will perform better when you start to get a high load because of extreme disk IO, but we don't yet have anything conclusive about this. After some discussion with a Linux Kernel developer, this might be a side effect of Linux giving so much resources to the batch job that the interactive performance gets very low. This makes the machine feel very slow and unresponsive while big batches are going. Hopefully this will be better handled in future Linux Kernels.

What is the difference between mysql_fetch_array and mysql_fetch_object?
mysql_fetch_array — Fetch a result row as an associative ARRAY, a numeric array, or both
mysql_fetch_object — Fetch a result row as an OBJECT

What are the different table present in MYsql?
MyISAM: This is default. Based on Indexed Sequntial Access Method. The above SQL will create a MyISA table.
ISAM : same
HEAP : Fast data access, but will loose data if there is a crash. Cannot have BLOB, TEXT & AUTO INCRIMENT fields
BDB : Supports Transactions using COMMIT & ROLLBACK. Slower that others.
InoDB : same as BDB


No comments: