What a rush

So yeah, my work managed to get in the way of my documentation efforts, and I’ve been running like mad for the last month.

As of today, I have committed code for 30 days in a row. In addition, I have sent out four grant applications, and set up a number of other business related things.

I think I might take a rest sometime soon.

Compressing floats

As a test, I implemented TSXor for floating point number compression, because I need to shunt large amounts of time series and geospatial data around and it’s nice if that doesn’t eat up all the bandwidth and storage space.

TSXor uses a three pronged approach to compression, relying first on a sliding window of recent values, then a Gorilla-style xoring of values together, followed by a fallback mechanism of simply writing out the pure value, with a full byte prefixed to indicate an issue.

The paper claims significantly better compression ratios and decompression speeds than the FPC and Gorilla algorithms, but much worse compression speeds than Gorilla. This sounds pretty good to me though.

A first implementation, run on my own test data, was giving me compression ratios ranging from 0.94 to 1.03 – which means that in many cases the file was actually getting bigger rather than smaller. I implemented it both for 32 and 64 bit IEEE floating point numbers, and (perhaps unsurprisingly, when you look at how TSXor is defined) the 32 bit version was consistently bad.

I don’t yet know if I did something wrong in my implementation or whether I simply have more difficult data than they are expecting, but I am definitely getting the impression that converting to fixed point and storing as delta-delta coded integers is a better strategy.

I might have to get the data sets that they used in the paper to verify my metrics.