Friday, April 22, 2011

5 MySQL Best Practices When Designing Schemas

As a follow up to my previous article on Common MySQL Performance Optimizations, I decided to post some of my research findings of MySQL best practices that are commonly suggested by MySQL experts in the community.

Following a set of established good practices can usually prevent common mistakes/errors or performances losses. Admittedly, some MySQL 'best practices' can be elusive, so it is nice to see a few all in one place. Here is a list of practices I try to stick to when creating database schemas:


1) Always specify NOT NULL if possible

For any field that does not need to hold a NULL value (usually most if not all of my fields in a table), specifically define that field as 'NOT NULL'. Each table that does not specify this will actually be one byte larger than it should be. If you had a table with five INTEGER fields, none of which specify NOT NULL, than you are looking at:

5 fields X (4-byte INTEGER size + 1-byte NULL option) == 25 bytes per record.

If you specified NOT NULL:

5 fields X 4-byte INTEGER size == 20 bytes per record.

That's a 20% decrease in record size! This can cause huge performance gains when you're talking potentially hundreds of thousands of records.


2) Specify the correct field types for the data you are storing.

If you have a field that will hold values between 1 and 10, don't specify that field as BIGINT (or even INT, for that matter). Store it in a TINYINT instead.  A Tinyint field can store values between -128 and 127 (0 and 256 if Unsigned), and is only 1-byte in size (as opposed to 4-byte Integer or 8-byte Bigint). If you are storing an IP address, consider using PHP (or your favorite language) to convert it to an integer, and store it in an INT type rather than a VARCHAR. If you know your username's or passwords cannot be longer than 40 characters, don't store them in a CHAR(256). Also, if you have a field that may store different string values such as this:

player_type      VARCHAR(32);   -- can be: 'Admin', 'Standard'

consider instead making player_type an ENUM value.  The nice thing about enum values is that you can still interface with the variable using strings, but the actual values are stored as Ints!

player_type      ENUM('Admin', 'Standard');   -- Yay!

For more information on MySQL data types, check out the official documentation.


3)  Always used UNSIGNED when possible

If you have integer fields in your tables which cannot or should not contain negative values, store them as UNSIGNED.  Not only will this double the amount of values that can be held in the variable, but it also ensures data integrity be not allowing the field to contain erroneous values.


4) Use Indexes to your advantage

Properly defined indexes can be one of the biggest optimizations you can apply to your database.  When used incorrectly however, it can actually decrease performance.  The hardest thing about indexes is knowing when to use them, and when not to use them.  In general I think a good rule of thumb is if you are using a field in a WHERE clause, apply an index to it.  Doing so will help speed up searches, especially when performing joins. As always, be sure to utilize the EXPLAIN and SHOW aspects of MySQL in order to fully optimize your tables based on the queries used in your applications.


5) Use LIMIT 1 when grabbing unique rows from a table

While this does not actually fit into the design stage of a schema, it was still a MySQL best practice that I learned while researching the topic.  Using LIMIT 1 ensures that once your unique row is found, MySQL will stop searching and return the results.  This can potentially provide a small performance boost to your applications. 


There are certainly more established best-practices touted by other MySQL experts in the community, and this is not an exhaustive list by far.  These were simply some of my favorites that I try to stick to whenever I am creating a new schema.  If you know of other best practices that should be followed when using MySQL, let us know about it in the comments!


Tuesday, April 19, 2011

Common MySQL Database Performance Optimizations

While developing the database schema used by Fireteams.net, I realized there were many performance improvements I could make upon my initial design.  After some research, I found that some of the performance boosts I applied were commonly asked questions on the internet. During this research I also learned some new tricks that I thought were very useful. While not a comprehensive list, I thought I would share some common performance optimizations and MySQL best practices that are commonly recommended amongst the MySQL community:



1) Normalize those Tables

It is highly recommended that you start your database designs in (at least) 3rd normal form. If you are unfamiliar with database normalization, the MySQL developer zone has a great introductory article on the subject. In general, normalizing your database involves creating tables such that every non-key element in the table is dependent upon a primary key, and only that key. Doing so increases data integrity by removing large amounts of data redundancy and (in my opinion) makes your database much easier to maintain. Normalization makes for a great starting point into further performance enhancements.



2) Choose the right database Engine.

In MySQL, the choice generally boils down to MyISAM vs. InnoDB.  While a google search here can give you more information than I can about these choices, I will point out some interesting information about the two.  First, is that MyISAM does not support the use of Foreign Keys.  Foreign keys can be very important for data integrity, so anyone wishing to use this functionality will most likely want to use InnoDB as their storage engine.  InnoDB also supports the use of Transactions, another invaluable tool for data integrity (and once again, something MyISAM tables do not support).


Something I learned recently about InnoDB regards its performance when counting rows in a table.  Basically, little metadata is kept about a table. If you wanted to count all rows in a table like this:

SELECT COUNT(*) from Users;

You will always end up iterating through each row in the table. This can cause a huge performance hit if you are running this query often on a table with hundreds of thousands of rows.  MyISAM tables do not have this problem, as the number of rows in the table is kept as metadata and is used directly by MySQL when calculating this value. Work around? Well, the generally accepted way is to create a separate table which stores the row counts. Of course this requires manually keeping a count of the rows, but if you need this data often then it may very well be worth the effort.

Another note about MyISAM is that it is geared towards tables that are heavy on reads, with very little writes. If you have a table with a high number of updates or inserts vs the amount of selects, you will most likely have better performance using InnoDB instead.  For a great reference on the different types of storage engines, check out Nilesh Pawar's blog entry on MySQL Storage Engines. He also has a bunch of other great articles as well.



3) Vertical Partitioning


The concepts of vertical partitioning were new to me. After some poking around however, it turns out that when done correctly it can lead to a substantial performance boost.  The concept is simple: split a large table such that some of its columns exist in one table, and the rest of its columns exist in another table. Initially I thought it seemed rather pointless. You would still need to join the data back together, so wouldn't it simply hurt performance? Not when done correctly... Consider the following example table:

Table User (
      user_id,
      username,
      password,
      email,
      alternate_email,
      date_registered,
      subscribed_to_newsletter
);

The types of these fields are not important (hence them being omitted in this table). What can we note about this table? First is that every time a user logs in, the username, password, and maybe email fields will be touched. The other fields will most likely only be touched in specific portions of the application. When MySQL is searching/reading/writing records, the entire record must be brought into memory. The larger the record (the more fields in the table), the longer is operation can take. In other words, if this table was smaller we might increase the performance of accessing common User fields. If the tables were split "Vertically" like this:


Table User (
      user_id,
      username,
      password,
      email
);

Table User_Profile (
      user_id,
      alternate_email,
      date_registered,
      subscribed_to_newsletter
);


Accessing username and password become much faster, since it no longer has to load each of the other fields into memory just to access them. When date_registered or alternate_email are required, a simple join would re-create the full table. For more information about how vertical partitioning can increase performance, I found this article to be quite helpful.

So, Vertical partitioning. You love it. You want to use it. How do you know which fields to split into another table?  There are many ways you can split a table, and it depends on the goals of your performance boost. Here are some common ways you could think about splitting your tables:


A) Split your read-heavy data from write-heavy data.

When you have to write to a table, on top of access times you also may invalidate the record in the cache, which means slower access times when you want to read it again. By splitting read-heavy from write-heavy data into separate tables, your read-heavy data will stay in the cache, and both sets of data will also benefit from quicker access times (due to each table being smaller).

B) Remove variable-length fields into separate tables.

MySQL makes a distinction between 'static' tables and 'variable-length' tables. Basically, if the table contains any fields which are not fixed in size (such as VARCHAR or TEXT fields) it is treated as variable-length. MySQL treats these tables slightly differently, and can iterate static tables faster than variable-length tables.  If you had a table such as the following:

Table Product (
      id                  INTEGER,
      cost               MEDIUMINT,
      weight          MEDIUMINT,
      description    TEXT
);

You may find that access times will increase if you created a new table called Product_Descriptions, which only held the description field of Products. This example (and Vertical Partitioning in general) will not yield a performance benefit if the majority of your reads to this table also required reading the product description, as now you will also have to perform a join on nearly every query. If you only sometimes need the description, but find yourself reading the cost and weight often, then this type of partitioning may yield a nice performance boost in your application.



4) Horizontal Partitioning


Horizontal partitioning is a type of partitioning supported by MySQL.  It involves logically splitting a table into different physical locations, generally based on some sort of sorting value/function/hash. MySQL supports many different types of functions to use when partitioning a table.  The benefit of horizontal partitioning is that you keep the same logical table, and MySQL will take care of the details behind the scenes.  Each partition has its own index table, so performing queries become much more efficient. As an example, take a look at the following table definition:

CREATE TABLE Log_Entry (
      id             INTEGER UNSIGNED NOT NULL PRIMARY KEY,
      type         TINYINT UNSIGNED NOT NULL
) PARTITION BY RANGE (type)
(
      PARTITION p0 VALUES LESS THAN (10),
      PARTITION p1 VALUES LESS THAN (100),
      PARTITION p2 VALUES LESS THAN MAXVALUE
);

Data will now be placed into one of the three physical partitions based on the value of 'type'.  The performance gain comes into play when performing queries on this table with 'type' in the WHERE clause. The MySQL Optimizer is very good at pruning partitions, so in this scenario the 'type' is in the where clause, MySQL will only search the partitions where the data could be placed in.  Naturally, this pruning prevents much of the data from being scanned, resulting in potentially large performance gains. Partitioning really shines when used on tables used primarily for storage, especially when that data may consist of hundreds of thousands to millions of records.

This sounds like an easy way to get performance gains out of any database, but there IS a catch.  Horizontal partitioning cannot be used on tables which use Foreign Keys, nor can you use Foreign Keys to reference a partitioned table.  That being said, partitioning can still be used very effectively to get large performance boosts, and I would definitely suggest looking into using it as a performance optimization for large tables.

While a quick google search will yield lots of information about horizontal partitioning in MySQL, here is a great success story where partitioning had a positive impact on a large database table.



5) Triggers for data integrity


While not necessarily a performance optimization, using Triggers can be a good practice in regards to ensuring data integrity.  Triggers can be defined as "Before Insert," "After Insert," "Before Delete," etc.. and thus can be used for all sort of great purposes.  Triggers can be used for logging, initializing or validating fields, performing certain operations or logic on other tables, the possibilities are quite endless. For a great in-depth tutorial on Triggers, check out this NetTut's tutorial.



There are many more ways to optimize for better performance, and this post really just scratches the surface as far as what is possible in MySQL.  What do you think of these optimizations?  Do you have a great performance boost that others (like myself) may benefit from knowing? Do share!