MySQL data compression

A table with millions of records, one of them a long text field was taking up too much space, especially that it is on a solid state drive (Expensive and small), so i decided it was not done correctly

The data in the text field was a page, there was no need to search the page, and the data is only needed when we know the row, so i decided the text should be compressed

The PHP site could compress the data before storing it, and decompress it when displaying it, surely there is overhead in compression and decompression, but since we are not talking about millions of hits a day, it should not be a big problem

to begin with, there were 2 candidate functions to do the mission

gzdeflate – gzinflate
bzcompress – bzdecompress

So, i chose deflate since it is more efficient (Much faster), and while testing produced results comparable to bz2, infact, the results were almost identical.

I did the testing on the first 9k records

What you need to pay attention to is that the text field should no longer be UTF_GENERAL_xx , the same compressed data when the filed was UTF8 was 80Mb (the first 9000 records), while it was only 50Mb when the data was stored as binary large object.

Here are the results of storing the first 9k records

Plain text in UTF8 field = 183MB
gzdeflate in UTF8 field = 80MB
gzdeflate in BLOB field (The right way to do it) = 50MB

Once i knew what i was supposed to do, the data ended taking up 1/4 of the space it required in plain text.

You may ask why 1/4 when the database compresses to 10% of the size when we have a SQLDUMP file, the truth is, every field of every row is compressed alone, if we compressed all the fields into one gz string, we would have got the 1/10 but since we can only compress field by field, 1/4 sounds good to me.

Enjoy

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply