Sunday, June 19, 2011

MySQL Comparing NULL values returns empty set

It has been a little while since I touched some MySQL. Long enough that I forgot about an issue with how MySQL treats NULL values.  What's the issue? You cannot use the typical SQL comparison operators when testing for a NULL value.  It even says so in the MySQL documentation:

http://dev.mysql.com/doc/refman/5.0/en/problems-with-null.html

It states two facts:
  * "In SQL, the NULL value is never true in comparison to any other value, even NULL."
 * "To look for NULL values, you must use the IS NULL test." 


In other words, MySql provides special operators when dealing with NULL values.  So if you were trying to perform a simple query like this:

SELECT id FROM users WHERE email = NULL

you would not get a syntax error. Instead, you just get an empty result returned by the query. Thanks, MySQL. (I sadly spent a good couple of hours tracking this exact problem).  Instead, performing the query like this:

SELECT id FROM users WHERE email IS NULL

will yield the expected results.  There is also IS NOT NULL, and a function IFNULL() to aid in your NULL comparisons.  The documentation is a pretty good read, as mysql handles null quite differently then other languages.

Hopefully this will help save some innocent keyboards being destroyed out of frustration!



Friday, April 22, 2011

5 MySQL Best Practices When Designing Schemas

As a follow up to my previous article on Common MySQL Performance Optimizations, I decided to post some of my research findings of MySQL best practices that are commonly suggested by MySQL experts in the community.

Following a set of established good practices can usually prevent common mistakes/errors or performances losses. Admittedly, some MySQL 'best practices' can be elusive, so it is nice to see a few all in one place. Here is a list of practices I try to stick to when creating database schemas:


1) Always specify NOT NULL if possible

For any field that does not need to hold a NULL value (usually most if not all of my fields in a table), specifically define that field as 'NOT NULL'. Each table that does not specify this will actually be one byte larger than it should be. If you had a table with five INTEGER fields, none of which specify NOT NULL, than you are looking at:

5 fields X (4-byte INTEGER size + 1-byte NULL option) == 25 bytes per record.

If you specified NOT NULL:

5 fields X 4-byte INTEGER size == 20 bytes per record.

That's a 20% decrease in record size! This can cause huge performance gains when you're talking potentially hundreds of thousands of records.


2) Specify the correct field types for the data you are storing.

If you have a field that will hold values between 1 and 10, don't specify that field as BIGINT (or even INT, for that matter). Store it in a TINYINT instead.  A Tinyint field can store values between -128 and 127 (0 and 256 if Unsigned), and is only 1-byte in size (as opposed to 4-byte Integer or 8-byte Bigint). If you are storing an IP address, consider using PHP (or your favorite language) to convert it to an integer, and store it in an INT type rather than a VARCHAR. If you know your username's or passwords cannot be longer than 40 characters, don't store them in a CHAR(256). Also, if you have a field that may store different string values such as this:

player_type      VARCHAR(32);   -- can be: 'Admin', 'Standard'

consider instead making player_type an ENUM value.  The nice thing about enum values is that you can still interface with the variable using strings, but the actual values are stored as Ints!

player_type      ENUM('Admin', 'Standard');   -- Yay!

For more information on MySQL data types, check out the official documentation.


3)  Always used UNSIGNED when possible

If you have integer fields in your tables which cannot or should not contain negative values, store them as UNSIGNED.  Not only will this double the amount of values that can be held in the variable, but it also ensures data integrity be not allowing the field to contain erroneous values.


4) Use Indexes to your advantage

Properly defined indexes can be one of the biggest optimizations you can apply to your database.  When used incorrectly however, it can actually decrease performance.  The hardest thing about indexes is knowing when to use them, and when not to use them.  In general I think a good rule of thumb is if you are using a field in a WHERE clause, apply an index to it.  Doing so will help speed up searches, especially when performing joins. As always, be sure to utilize the EXPLAIN and SHOW aspects of MySQL in order to fully optimize your tables based on the queries used in your applications.


5) Use LIMIT 1 when grabbing unique rows from a table

While this does not actually fit into the design stage of a schema, it was still a MySQL best practice that I learned while researching the topic.  Using LIMIT 1 ensures that once your unique row is found, MySQL will stop searching and return the results.  This can potentially provide a small performance boost to your applications. 


There are certainly more established best-practices touted by other MySQL experts in the community, and this is not an exhaustive list by far.  These were simply some of my favorites that I try to stick to whenever I am creating a new schema.  If you know of other best practices that should be followed when using MySQL, let us know about it in the comments!


Tuesday, April 19, 2011

Common MySQL Database Performance Optimizations

While developing the database schema used by Fireteams.net, I realized there were many performance improvements I could make upon my initial design.  After some research, I found that some of the performance boosts I applied were commonly asked questions on the internet. During this research I also learned some new tricks that I thought were very useful. While not a comprehensive list, I thought I would share some common performance optimizations and MySQL best practices that are commonly recommended amongst the MySQL community:



1) Normalize those Tables

It is highly recommended that you start your database designs in (at least) 3rd normal form. If you are unfamiliar with database normalization, the MySQL developer zone has a great introductory article on the subject. In general, normalizing your database involves creating tables such that every non-key element in the table is dependent upon a primary key, and only that key. Doing so increases data integrity by removing large amounts of data redundancy and (in my opinion) makes your database much easier to maintain. Normalization makes for a great starting point into further performance enhancements.



2) Choose the right database Engine.

In MySQL, the choice generally boils down to MyISAM vs. InnoDB.  While a google search here can give you more information than I can about these choices, I will point out some interesting information about the two.  First, is that MyISAM does not support the use of Foreign Keys.  Foreign keys can be very important for data integrity, so anyone wishing to use this functionality will most likely want to use InnoDB as their storage engine.  InnoDB also supports the use of Transactions, another invaluable tool for data integrity (and once again, something MyISAM tables do not support).


Something I learned recently about InnoDB regards its performance when counting rows in a table.  Basically, little metadata is kept about a table. If you wanted to count all rows in a table like this:

SELECT COUNT(*) from Users;

You will always end up iterating through each row in the table. This can cause a huge performance hit if you are running this query often on a table with hundreds of thousands of rows.  MyISAM tables do not have this problem, as the number of rows in the table is kept as metadata and is used directly by MySQL when calculating this value. Work around? Well, the generally accepted way is to create a separate table which stores the row counts. Of course this requires manually keeping a count of the rows, but if you need this data often then it may very well be worth the effort.

Another note about MyISAM is that it is geared towards tables that are heavy on reads, with very little writes. If you have a table with a high number of updates or inserts vs the amount of selects, you will most likely have better performance using InnoDB instead.  For a great reference on the different types of storage engines, check out Nilesh Pawar's blog entry on MySQL Storage Engines. He also has a bunch of other great articles as well.



3) Vertical Partitioning


The concepts of vertical partitioning were new to me. After some poking around however, it turns out that when done correctly it can lead to a substantial performance boost.  The concept is simple: split a large table such that some of its columns exist in one table, and the rest of its columns exist in another table. Initially I thought it seemed rather pointless. You would still need to join the data back together, so wouldn't it simply hurt performance? Not when done correctly... Consider the following example table:

Table User (
      user_id,
      username,
      password,
      email,
      alternate_email,
      date_registered,
      subscribed_to_newsletter
);

The types of these fields are not important (hence them being omitted in this table). What can we note about this table? First is that every time a user logs in, the username, password, and maybe email fields will be touched. The other fields will most likely only be touched in specific portions of the application. When MySQL is searching/reading/writing records, the entire record must be brought into memory. The larger the record (the more fields in the table), the longer is operation can take. In other words, if this table was smaller we might increase the performance of accessing common User fields. If the tables were split "Vertically" like this:


Table User (
      user_id,
      username,
      password,
      email
);

Table User_Profile (
      user_id,
      alternate_email,
      date_registered,
      subscribed_to_newsletter
);


Accessing username and password become much faster, since it no longer has to load each of the other fields into memory just to access them. When date_registered or alternate_email are required, a simple join would re-create the full table. For more information about how vertical partitioning can increase performance, I found this article to be quite helpful.

So, Vertical partitioning. You love it. You want to use it. How do you know which fields to split into another table?  There are many ways you can split a table, and it depends on the goals of your performance boost. Here are some common ways you could think about splitting your tables:


A) Split your read-heavy data from write-heavy data.

When you have to write to a table, on top of access times you also may invalidate the record in the cache, which means slower access times when you want to read it again. By splitting read-heavy from write-heavy data into separate tables, your read-heavy data will stay in the cache, and both sets of data will also benefit from quicker access times (due to each table being smaller).

B) Remove variable-length fields into separate tables.

MySQL makes a distinction between 'static' tables and 'variable-length' tables. Basically, if the table contains any fields which are not fixed in size (such as VARCHAR or TEXT fields) it is treated as variable-length. MySQL treats these tables slightly differently, and can iterate static tables faster than variable-length tables.  If you had a table such as the following:

Table Product (
      id                  INTEGER,
      cost               MEDIUMINT,
      weight          MEDIUMINT,
      description    TEXT
);

You may find that access times will increase if you created a new table called Product_Descriptions, which only held the description field of Products. This example (and Vertical Partitioning in general) will not yield a performance benefit if the majority of your reads to this table also required reading the product description, as now you will also have to perform a join on nearly every query. If you only sometimes need the description, but find yourself reading the cost and weight often, then this type of partitioning may yield a nice performance boost in your application.



4) Horizontal Partitioning


Horizontal partitioning is a type of partitioning supported by MySQL.  It involves logically splitting a table into different physical locations, generally based on some sort of sorting value/function/hash. MySQL supports many different types of functions to use when partitioning a table.  The benefit of horizontal partitioning is that you keep the same logical table, and MySQL will take care of the details behind the scenes.  Each partition has its own index table, so performing queries become much more efficient. As an example, take a look at the following table definition:

CREATE TABLE Log_Entry (
      id             INTEGER UNSIGNED NOT NULL PRIMARY KEY,
      type         TINYINT UNSIGNED NOT NULL
) PARTITION BY RANGE (type)
(
      PARTITION p0 VALUES LESS THAN (10),
      PARTITION p1 VALUES LESS THAN (100),
      PARTITION p2 VALUES LESS THAN MAXVALUE
);

Data will now be placed into one of the three physical partitions based on the value of 'type'.  The performance gain comes into play when performing queries on this table with 'type' in the WHERE clause. The MySQL Optimizer is very good at pruning partitions, so in this scenario the 'type' is in the where clause, MySQL will only search the partitions where the data could be placed in.  Naturally, this pruning prevents much of the data from being scanned, resulting in potentially large performance gains. Partitioning really shines when used on tables used primarily for storage, especially when that data may consist of hundreds of thousands to millions of records.

This sounds like an easy way to get performance gains out of any database, but there IS a catch.  Horizontal partitioning cannot be used on tables which use Foreign Keys, nor can you use Foreign Keys to reference a partitioned table.  That being said, partitioning can still be used very effectively to get large performance boosts, and I would definitely suggest looking into using it as a performance optimization for large tables.

While a quick google search will yield lots of information about horizontal partitioning in MySQL, here is a great success story where partitioning had a positive impact on a large database table.



5) Triggers for data integrity


While not necessarily a performance optimization, using Triggers can be a good practice in regards to ensuring data integrity.  Triggers can be defined as "Before Insert," "After Insert," "Before Delete," etc.. and thus can be used for all sort of great purposes.  Triggers can be used for logging, initializing or validating fields, performing certain operations or logic on other tables, the possibilities are quite endless. For a great in-depth tutorial on Triggers, check out this NetTut's tutorial.



There are many more ways to optimize for better performance, and this post really just scratches the surface as far as what is possible in MySQL.  What do you think of these optimizations?  Do you have a great performance boost that others (like myself) may benefit from knowing? Do share!


Sunday, March 27, 2011

Initial thoughts on Yii..

My initial thoughts on the Yii PHP framework can be summed up in one word:  Hell yeah!  Okay, it was more like two words, but it doesn't change the fact that I am very impressed so far with Yii.  Admittedly I have not done any extensive work with the framework as of yet.  The majority of my experience with Yii to date has been reading the (very good) documentation, playing around with its code generation utilities, and looking at how the demo applications were built.

So far the aspect of the framework I am most impressed with is how it is integrated with JQuery.  CakePHP injects javascript code throughout the page, a practice that most Jquery developers cringe at the thought of.  This is also why most Cake developers have abandoned the use of the JS helpers in favor of writing their own code.  This is why I like how Yii integrates JQuery so much: for the most part it separates the JQuery code into separate files and includes them at the top of the script.  This makes me happy.  This will also reduce the amount of time I spend creating my own custom JQuery scripts, since I will actually be using the helpful functions that are included with Yii.

Another aspect of Yii that I am impressed with is its documentation.  Cake has some good documentation, but often key elements are outdated or missing. What is also frustrating is the lack of documented best practices to use with the framework. I often found my answers from blog posts or forums, and wondered why such simple answers weren't included in the documentation to begin with.  This is not the case with Yii.  The documentation itself is very thorough, updated constantly, and very clearly defines best practices. What's more is the generated code as well as the demos it ships with show the majority of the functionality shipped with Yii in use.  This is great because if I want to see how to create a form in Yii I can either consult the documentation or pull up any of the demos that came with the framework. Handy!

Finally, the code generation utility shipped with yii is excellent.  In just a matter of minutes, Yii can generate a fully-functioning skeleton application complete with a login/logout system, contact form, CRUD, database access, etc.. all integrated with JQuery.  This has been a great skeleton to start with, and has managed to save me loads of setup time integrating basic components into the application.  (I can't say the same for Cake's skeleton application that is generated).

When I began playing around with Yii, I fully expected to find parts of the framework where I would sit back and reminisce of my days in the Cake world.  To my surprise, the more I learn about Yii the happier I am to embrace it.  Nothing against Cake, but I would recommend Yii to any developer looking for a powerful MVC PHP framework.

Sunday, March 20, 2011

Switching PHP Framework from CakePHP to Yii

After about a year of developing with CakePHP, I have decided to make the switch to Yii.  There are several reasons I decided to make this switch, but the biggest one was performance related.  Don't get me wrong, I think CakePHP is a great development framework.  There were enough things that I personally did not like about Cakephp to justify my switch to learning a new MVC framework.  Here are a few of the reasons I decided to make the switch:

1) Yii is PHP 5 only.

      I really like PHP 5.  While CakePHP supports PHP 5, it also has to support PHP 4, and thus does some things under the hood which hinders its performance a little in order to support legacy PHP code.

2) Yii supports composite primary keys.

      One problem I had with CakePHP was how it did not support composite primary keys.  While this wasn't a huge deal breaker for me in Cake, it was definitely annoying.  Instead, every table had to have a unique ID as the sole primary key and your "composite" keys basically just had to be maintained through your own code.  Not terrible, but still rather annoying.  One thing I really did not like was how CakePHP thought that a "Has One" relationship should work in a database.  Take the example of Apple has one Banana.  Every textbook will tell you that to be in a proper normalized state, the Apple table will contain a field called 'banana_id', which will contain the ID of the corresponding Banana row.  This ensures that the Apple field can only contain one banana reference.  This is easily verified since Apple.banana_id holds one and only one value at a time.  Straight out of the CakePHP hasOne relationship documentation, it clearly states quite the opposite.  Does this produce a hasOne relationship?  Well, when you access models via Cake's API then sure it will hand you back one row, but your database is not in a normalized form.  There's technically nothing stopping you from firing up PhpMyAdmin and adding a few rows with the same Banana.apple_id.  This also means that if you are not careful with your queries or the way you access your data in Cake, you may end up with data redundancy.  This is why putting your databases in a normalized form is so important. There are ways around this in Cake but they require quite a bit of extra work, which of course defeats the purpose of its advertised "magic" in the first place.

3) CakePHP is too slow for my needs.

      If you search google for "CakePHP" and "Slow," you'll see many, many posts from developers wondering why they're application is responding so slowly.  You'll also notice that there are no real good answers as well.  Is CakePHP really that slow?

The quick answer:  No actually, it isn't slow.

Let me explain myself.  CakePHP can be fast.  By default however, it is very, very slow.  When you develop your application you will need to keep caching and performance in mind, something that most developers new to PHP frameworks don't do.  The problem is that in order to develop a fast CakePHP application, you cannot use the majority of the magic that Cake provides.  For example, Cake's documentation and tutorials will lead you to use mod_rewrite for pretty URLs (using .htaccess) and $this->link() to create formatted html links quick and easily.  This creates a lot of easily maintainable functionality quickly, but your performance will suffer greatly down the road.  Even an application relying heavily on XCache and Memcached still needs lots of customized work tailored specifically to your application in order to speed it up.  While this will be true with an application developed in Yii as well, its default performance is significantly greater than Cake's.  So while people complaining that Cake is slow are not necessarily right, you will have to work hard to get it speedy.

This post isn't meant to bash Cake.  I really do like the framework.  As I have gotten more used to developing in the MVC architecture I realize that Cake provided a great learning experience for me and worked as a great stepping stone in creating an MVC application.  I think that Yii will be the next step forward for my project, and future posts in this blog will hilight my experiences using Yii.

Are you currently using CakePHP?  Did you end up switching to Yii, or another framework?  I'd love to hear your thoughts!

Monday, March 14, 2011

Creating MARPAT digital camouflage textures with Photoshop

Fireteams.net was in need of a camouflage texture. After a couple hours of scouring the web searching for a decent MARPAT (MARine PATtern) image, I gave up and decided to make my own.  I put together this little photoshop tutorial showing the techniques I used to create a Digital Desert Camouflage texture.

Just a warning: I'm a programmer, not an artist.  Also, my photoshop abilities are the result of following tutorials on the internet. If you know of better or easier ways to accomplish the same thing, I would love to hear about it in the comments!

Here is the final MARPAT camouflage image we'll be creating:

MARPAT Digital Desert Camouflage

We'll start our tutorial by taking a quick look at real desert MARPAT camouflage.  This will give us a better idea of the look we want to achieve in our final image.

Sample of real desert MARPAT camouflage.
Courtesy of http://www.helloelastic2.co.uk/justlooking/

It's easy to see that there are some unique characteristics that define MARPAT:

Color
  • For Desert MARPAT, there are four colors which are used in the image.
    • Colors are layered with lighter colors underneath darker ones.
    • Blobs become smaller and less connected as they get darker.
  • In digital desert camouflage, each color is a different shade of brown.

Shape
  • Every blob-like shape in this image is composed of small square shapes. 
  • Blobs tend to look stretched, flowing in more of a horizontal direction.
  • Blobs are randomly shaped and have many holes in them.

Now that we have some background on what we want to achieve, let's get started with our tutorial!


1) Begin by selecting four colors to use with your image. You can place each color into a swatch or paint them into a layer for easy access.  Here are the four colors I chose to use:

Base Color: #d7cebd
Light Color: #c1b59c
Medium Color:   #a99265
Dark Color: #88602b


2) Make a new layer and paint it black. Apply a "Difference Clouds" filter on this layer:

Filter > Render > Difference Clouds

This will create a nice spread of light and dark shades which will help us to create our blob shapes later on.

Difference Clouds!


3) Increase the contrast of the clouds.  This can be done through the following menu option:

Image > Adjustments > Brightness / Contrast

Increase the contrast to a value between 75 and 100.  I used a value of 90 for each of my cloud layers.



4) Apply a mosaic filter to the clouds:

Filter > Pixelate > Mosaic

Select a small cell size, I chose a size of 5.  Sizes of 7-8 will produce images that look more zoomed in.  The cell size will represent the smallest "square" that will be in the image.  I encourage playing around with this value until you find the size you desire.



5) Repeat steps 2 through 4 two more times. This will give us a total of three layers of pixelated difference clouds.  The variability found in each cloud layer will help prevent our blobs from having too similar of patterns.


6) Now that we have our clouds, it is time to start creating our camouflage!  We'll start with the lightest color first: the Base color.  Make a new layer and use the paint bucket tool to color the layer with our Base color. That's it! (at least for this layer).  I wish the other layers were this easy though...


7) Before we continue, I would like to quickly explain what we are going to try and accomplish here.  Using the Magic Wand Tool, we're going to select multiple areas of a cloud layer (by holding the Shift key while selecting with the wand), then fill-in our selection with one of our three remaining colors. If while clicking around you accidently let go of shift (and lose all of your selections), just hit ctrl-z to get all your work back. We will use a different cloud image for each of our colors.

Select the Magic Wand Tool (W is the keyboard shortcut) and choose a very small Tolerance. I chose a value of 2. You can play around with larger values, but I recommend no larger than 4. For this layer we're going to create the Light color portion of the camo. This color should cover about 75% of the base color, and should contain large, connected blobs.  Here are a couple images of my selections in progress. This will give you a good idea on how to progress.

Result after a few shift-clicks

Final selection. Takes a while, but I swear it is worth it!


8) Now that we have our selection, copy (Ctrl-C) and paste (Ctrl-V) your selection into a new layer.  Set a color overlay on this layer to the color you chose for your Light color.  This can be done through:

Layer > Layer Style > Color Overlay

Make sure that Blend Mode is set to Normal, and Opacity to 100%.  Click on the colored icon, and set it to your color.  This is only one of many ways to set the color for our selection. I like this method as it makes it very easy to change the color of the entire layer at once (especially handy if you want to change the color later).

Result after a copy and paste.  I moved the Base layer
above the clouds so that we can actually see the selection.

Result after setting the blending options. Finally starting
to look like camouflage.


9) Repeat steps 7-9 for our next two colors, using a different cloud layer for each color. Remember that these blobs get smaller and less connected as they get darker. Below are the results for my next two colors:

Selection mask for Medium color.

Result after applying medium color layer.


Dark color selection mask.

Final camouflage image

That is all there is to it! This technique can be done on a larger scale to produce nice backgrounds or textures in games. If this tutorial was useful to you, or you found better techniques or easier ways to accomplish things, please let me know in the comments!

Hope you found this useful!