While developing the database schema used by Fireteams.net, I realized there were many performance improvements I could make upon my initial design. After some research, I found that some of the performance boosts I applied were commonly asked questions on the internet. During this research I also learned some new tricks that I thought were very useful. While not a comprehensive list, I thought I would share some common performance optimizations and MySQL best practices that are commonly recommended amongst the MySQL community:
1) Normalize those Tables
It is highly recommended that you start your database designs in (at least) 3rd normal form. If you are unfamiliar with database normalization, the MySQL developer zone has a great introductory article on the subject. In general, normalizing your database involves creating tables such that every non-key element in the table is dependent upon a primary key, and only that key. Doing so increases data integrity by removing large amounts of data redundancy and (in my opinion) makes your database much easier to maintain. Normalization makes for a great starting point into further performance enhancements.
2) Choose the right database Engine.
In MySQL, the choice generally boils down to MyISAM vs. InnoDB. While a google search here can give you more information than I can about these choices, I will point out some interesting information about the two. First, is that MyISAM does not support the use of Foreign Keys. Foreign keys can be very important for data integrity, so anyone wishing to use this functionality will most likely want to use InnoDB as their storage engine. InnoDB also supports the use of Transactions, another invaluable tool for data integrity (and once again, something MyISAM tables do not support).
Something I learned recently about InnoDB regards its performance when counting rows in a table. Basically, little metadata is kept about a table. If you wanted to count all rows in a table like this:
SELECT COUNT(*) from Users;
You will always end up iterating through each row in the table. This can cause a huge performance hit if you are running this query often on a table with hundreds of thousands of rows. MyISAM tables do not have this problem, as the number of rows in the table is kept as metadata and is used directly by MySQL when calculating this value. Work around? Well, the generally accepted way is to create a separate table which stores the row counts. Of course this requires manually keeping a count of the rows, but if you need this data often then it may very well be worth the effort.
Another note about MyISAM is that it is geared towards tables that are heavy on reads, with very little writes. If you have a table with a high number of updates or inserts vs the amount of selects, you will most likely have better performance using InnoDB instead. For a great reference on the different types of storage engines, check out Nilesh Pawar's blog entry on MySQL Storage Engines. He also has a bunch of other great articles as well.
3) Vertical Partitioning
The concepts of vertical partitioning were new to me. After some poking around however, it turns out that when done correctly it can lead to a substantial performance boost. The concept is simple: split a large table such that some of its columns exist in one table, and the rest of its columns exist in another table. Initially I thought it seemed rather pointless. You would still need to join the data back together, so wouldn't it simply hurt performance? Not when done correctly... Consider the following example table:
Table User (
user_id,
username,
password,
email,
alternate_email,
date_registered,
subscribed_to_newsletter
);
The types of these fields are not important (hence them being omitted in this table). What can we note about this table? First is that every time a user logs in, the username, password, and maybe email fields will be touched. The other fields will most likely only be touched in specific portions of the application. When MySQL is searching/reading/writing records, the entire record must be brought into memory. The larger the record (the more fields in the table), the longer is operation can take. In other words, if this table was smaller we might increase the performance of accessing common User fields. If the tables were split "Vertically" like this:
Accessing username and password become much faster, since it no longer has to load each of the other fields into memory just to access them. When date_registered or alternate_email are required, a simple join would re-create the full table. For more information about how vertical partitioning can increase performance, I found this article to be quite helpful.
So, Vertical partitioning. You love it. You want to use it. How do you know which fields to split into another table? There are many ways you can split a table, and it depends on the goals of your performance boost. Here are some common ways you could think about splitting your tables:
A) Split your read-heavy data from write-heavy data.
When you have to write to a table, on top of access times you also may invalidate the record in the cache, which means slower access times when you want to read it again. By splitting read-heavy from write-heavy data into separate tables, your read-heavy data will stay in the cache, and both sets of data will also benefit from quicker access times (due to each table being smaller).
B) Remove variable-length fields into separate tables.
MySQL makes a distinction between 'static' tables and 'variable-length' tables. Basically, if the table contains any fields which are not fixed in size (such as VARCHAR or TEXT fields) it is treated as variable-length. MySQL treats these tables slightly differently, and can iterate static tables faster than variable-length tables. If you had a table such as the following:
Table Product (
id INTEGER,
cost MEDIUMINT,
weight MEDIUMINT,
description TEXT
);
You may find that access times will increase if you created a new table called Product_Descriptions, which only held the description field of Products. This example (and Vertical Partitioning in general) will not yield a performance benefit if the majority of your reads to this table also required reading the product description, as now you will also have to perform a join on nearly every query. If you only sometimes need the description, but find yourself reading the cost and weight often, then this type of partitioning may yield a nice performance boost in your application.
4) Horizontal Partitioning
Horizontal partitioning is a type of partitioning supported by MySQL. It involves logically splitting a table into different physical locations, generally based on some sort of sorting value/function/hash. MySQL supports many different types of functions to use when partitioning a table. The benefit of horizontal partitioning is that you keep the same logical table, and MySQL will take care of the details behind the scenes. Each partition has its own index table, so performing queries become much more efficient. As an example, take a look at the following table definition:
CREATE TABLE Log_Entry (
id INTEGER UNSIGNED NOT NULL PRIMARY KEY,
type TINYINT UNSIGNED NOT NULL
) PARTITION BY RANGE (type)
(
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (100),
PARTITION p2 VALUES LESS THAN MAXVALUE
);
Data will now be placed into one of the three physical partitions based on the value of 'type'. The performance gain comes into play when performing queries on this table with 'type' in the WHERE clause. The MySQL Optimizer is very good at pruning partitions, so in this scenario the 'type' is in the where clause, MySQL will only search the partitions where the data could be placed in. Naturally, this pruning prevents much of the data from being scanned, resulting in potentially large performance gains. Partitioning really shines when used on tables used primarily for storage, especially when that data may consist of hundreds of thousands to millions of records.
This sounds like an easy way to get performance gains out of any database, but there IS a catch. Horizontal partitioning cannot be used on tables which use Foreign Keys, nor can you use Foreign Keys to reference a partitioned table. That being said, partitioning can still be used very effectively to get large performance boosts, and I would definitely suggest looking into using it as a performance optimization for large tables.
While a quick google search will yield lots of information about horizontal partitioning in MySQL, here is a great success story where partitioning had a positive impact on a large database table.
5) Triggers for data integrity
While not necessarily a performance optimization, using Triggers can be a good practice in regards to ensuring data integrity. Triggers can be defined as "Before Insert," "After Insert," "Before Delete," etc.. and thus can be used for all sort of great purposes. Triggers can be used for logging, initializing or validating fields, performing certain operations or logic on other tables, the possibilities are quite endless. For a great in-depth tutorial on Triggers, check out this NetTut's tutorial.
There are many more ways to optimize for better performance, and this post really just scratches the surface as far as what is possible in MySQL. What do you think of these optimizations? Do you have a great performance boost that others (like myself) may benefit from knowing? Do share!
1) Normalize those Tables
It is highly recommended that you start your database designs in (at least) 3rd normal form. If you are unfamiliar with database normalization, the MySQL developer zone has a great introductory article on the subject. In general, normalizing your database involves creating tables such that every non-key element in the table is dependent upon a primary key, and only that key. Doing so increases data integrity by removing large amounts of data redundancy and (in my opinion) makes your database much easier to maintain. Normalization makes for a great starting point into further performance enhancements.
2) Choose the right database Engine.
In MySQL, the choice generally boils down to MyISAM vs. InnoDB. While a google search here can give you more information than I can about these choices, I will point out some interesting information about the two. First, is that MyISAM does not support the use of Foreign Keys. Foreign keys can be very important for data integrity, so anyone wishing to use this functionality will most likely want to use InnoDB as their storage engine. InnoDB also supports the use of Transactions, another invaluable tool for data integrity (and once again, something MyISAM tables do not support).
Something I learned recently about InnoDB regards its performance when counting rows in a table. Basically, little metadata is kept about a table. If you wanted to count all rows in a table like this:
SELECT COUNT(*) from Users;
You will always end up iterating through each row in the table. This can cause a huge performance hit if you are running this query often on a table with hundreds of thousands of rows. MyISAM tables do not have this problem, as the number of rows in the table is kept as metadata and is used directly by MySQL when calculating this value. Work around? Well, the generally accepted way is to create a separate table which stores the row counts. Of course this requires manually keeping a count of the rows, but if you need this data often then it may very well be worth the effort.
Another note about MyISAM is that it is geared towards tables that are heavy on reads, with very little writes. If you have a table with a high number of updates or inserts vs the amount of selects, you will most likely have better performance using InnoDB instead. For a great reference on the different types of storage engines, check out Nilesh Pawar's blog entry on MySQL Storage Engines. He also has a bunch of other great articles as well.
3) Vertical Partitioning
The concepts of vertical partitioning were new to me. After some poking around however, it turns out that when done correctly it can lead to a substantial performance boost. The concept is simple: split a large table such that some of its columns exist in one table, and the rest of its columns exist in another table. Initially I thought it seemed rather pointless. You would still need to join the data back together, so wouldn't it simply hurt performance? Not when done correctly... Consider the following example table:
Table User (
user_id,
username,
password,
email,
alternate_email,
date_registered,
subscribed_to_newsletter
);
The types of these fields are not important (hence them being omitted in this table). What can we note about this table? First is that every time a user logs in, the username, password, and maybe email fields will be touched. The other fields will most likely only be touched in specific portions of the application. When MySQL is searching/reading/writing records, the entire record must be brought into memory. The larger the record (the more fields in the table), the longer is operation can take. In other words, if this table was smaller we might increase the performance of accessing common User fields. If the tables were split "Vertically" like this:
Table User (
user_id,
username,
password,
email
);
Table User_Profile (
user_id,
alternate_email,
date_registered,
subscribed_to_newsletter
);
Accessing username and password become much faster, since it no longer has to load each of the other fields into memory just to access them. When date_registered or alternate_email are required, a simple join would re-create the full table. For more information about how vertical partitioning can increase performance, I found this article to be quite helpful.
So, Vertical partitioning. You love it. You want to use it. How do you know which fields to split into another table? There are many ways you can split a table, and it depends on the goals of your performance boost. Here are some common ways you could think about splitting your tables:
A) Split your read-heavy data from write-heavy data.
When you have to write to a table, on top of access times you also may invalidate the record in the cache, which means slower access times when you want to read it again. By splitting read-heavy from write-heavy data into separate tables, your read-heavy data will stay in the cache, and both sets of data will also benefit from quicker access times (due to each table being smaller).
B) Remove variable-length fields into separate tables.
MySQL makes a distinction between 'static' tables and 'variable-length' tables. Basically, if the table contains any fields which are not fixed in size (such as VARCHAR or TEXT fields) it is treated as variable-length. MySQL treats these tables slightly differently, and can iterate static tables faster than variable-length tables. If you had a table such as the following:
Table Product (
id INTEGER,
cost MEDIUMINT,
weight MEDIUMINT,
description TEXT
);
You may find that access times will increase if you created a new table called Product_Descriptions, which only held the description field of Products. This example (and Vertical Partitioning in general) will not yield a performance benefit if the majority of your reads to this table also required reading the product description, as now you will also have to perform a join on nearly every query. If you only sometimes need the description, but find yourself reading the cost and weight often, then this type of partitioning may yield a nice performance boost in your application.
4) Horizontal Partitioning
Horizontal partitioning is a type of partitioning supported by MySQL. It involves logically splitting a table into different physical locations, generally based on some sort of sorting value/function/hash. MySQL supports many different types of functions to use when partitioning a table. The benefit of horizontal partitioning is that you keep the same logical table, and MySQL will take care of the details behind the scenes. Each partition has its own index table, so performing queries become much more efficient. As an example, take a look at the following table definition:
CREATE TABLE Log_Entry (
id INTEGER UNSIGNED NOT NULL PRIMARY KEY,
type TINYINT UNSIGNED NOT NULL
) PARTITION BY RANGE (type)
(
PARTITION p0 VALUES LESS THAN (10),
PARTITION p1 VALUES LESS THAN (100),
PARTITION p2 VALUES LESS THAN MAXVALUE
);
Data will now be placed into one of the three physical partitions based on the value of 'type'. The performance gain comes into play when performing queries on this table with 'type' in the WHERE clause. The MySQL Optimizer is very good at pruning partitions, so in this scenario the 'type' is in the where clause, MySQL will only search the partitions where the data could be placed in. Naturally, this pruning prevents much of the data from being scanned, resulting in potentially large performance gains. Partitioning really shines when used on tables used primarily for storage, especially when that data may consist of hundreds of thousands to millions of records.
This sounds like an easy way to get performance gains out of any database, but there IS a catch. Horizontal partitioning cannot be used on tables which use Foreign Keys, nor can you use Foreign Keys to reference a partitioned table. That being said, partitioning can still be used very effectively to get large performance boosts, and I would definitely suggest looking into using it as a performance optimization for large tables.
While a quick google search will yield lots of information about horizontal partitioning in MySQL, here is a great success story where partitioning had a positive impact on a large database table.
5) Triggers for data integrity
While not necessarily a performance optimization, using Triggers can be a good practice in regards to ensuring data integrity. Triggers can be defined as "Before Insert," "After Insert," "Before Delete," etc.. and thus can be used for all sort of great purposes. Triggers can be used for logging, initializing or validating fields, performing certain operations or logic on other tables, the possibilities are quite endless. For a great in-depth tutorial on Triggers, check out this NetTut's tutorial.
There are many more ways to optimize for better performance, and this post really just scratches the surface as far as what is possible in MySQL. What do you think of these optimizations? Do you have a great performance boost that others (like myself) may benefit from knowing? Do share!
No comments:
Post a Comment