Sunday, November 23, 2008

Looking to optimize "Group by" in MySQL?

Few months back I was asked for the reason behind the recommendation by the consultants from MySQL. When the system was performing poorly the consultants had walked in and simply requested them to add the "order by NULL" clause to the queries. To the joy of the client and amazement of the tecnical team at the client location, this simple fix had done the trick and suddenly there was a marked improvement in performance! This had left the technical team confused, as they started to wonder why and how this happened. WhenI heard this I was confused ( as always ;) ), as I could not think of any possible logical explanation for this behaviour.

Last week while I was reading the planet MySQL I bumbed into an article talking about order by NULL and I think I had found the possible reason. Any how it is not a statement, which would optimize all queries , instead it would optimize all the statements with a group by clause.

To make things simple lets take small example,

EXPLAIN SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode \G

This gives the following output,

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: City
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4079
Extra: Using temporary; Using filesort

The extra column indicates that both filesort and temporary is being used, which is an indication of poor performance.

Alternatively when its applied with a Order by NULL as shown below,

EXPLAIN SELECT CountryCode, COUNT(*) FROM City GROUP BY CountryCode order by NULL \G

gives an ouput,

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: City
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 4079
Extra: Using temporary

Which indicates a bit of an improvement as it does not any longer have Using filesort.

This addition of a order by NULL could give a performance increament by many folds at times.

The next most important question is how does this happen?

MySQL by default when having a group by statement would also order by the same column, which would require an additional sorting to take place. Any how when the order by NULL is included, it does not do a sort and thereby give a performance improvement.

No comments: