A table with millions of records, one of them a long text field was taking up too much space, especially that it is on a solid state drive (Expensive and small), so i decided it was not done correctly
The data in the text field was a page, there was no need to search the page, and the data is only needed when we know the row, so i decided the text should be compressed
The PHP site could compress the data before storing it, and decompress it when displaying it, surely there is overhead in compression and decompression, but since we are not talking about millions of hits a day, it should not be a big problem
to begin with, there were 2 candidate functions to do the mission
gzdeflate – gzinflate
bzcompress – bzdecompress
So, i chose deflate since it is more efficient (Much faster), and while testing produced results comparable to bz2, infact, the results were almost identical.
I did the testing on the first 9k records
What you need to pay attention to is that the text field should no longer be UTF_GENERAL_xx , the same compressed data when the filed was UTF8 was 80Mb (the first 9000 records), while it was only 50Mb when the data was stored as binary large object.
Here are the results of storing the first 9k records
Plain text in UTF8 field = 183MB
gzdeflate in UTF8 field = 80MB
gzdeflate in BLOB field (The right way to do it) = 50MB
Once i knew what i was supposed to do, the data ended taking up 1/4 of the space it required in plain text.
You may ask why 1/4 when the database compresses to 10% of the size when we have a SQLDUMP file, the truth is, every field of every row is compressed alone, if we compressed all the fields into one gz string, we would have got the 1/10 but since we can only compress field by field, 1/4 sounds good to me.
Enjoy