The Last of Jan!

It’s the last day of January! Y’all know what that means! Good, because I don’t. Today is snowy and cold in Reykjavík, but the coffee is warm, the lunch-pizza is baked and eaten, and a ton of code has been made. I struggled with focus in the early morning, but still got an acceptable amount done today. Tapered off towards the end though… but all in all, pretty good.


I submitted a talk to MCH2022 today. Here’s the abstract, but I don’t know yet if it’ll be accepted:

People have been modeling different parts of Earth’s systems for decades, on different scales and with different goals from short term weather forecasting through actuarial risk prediction to long term climate models. In this talk I’ll explore some of the typical models, methods, data formats, infrastructure layouts and design assumptions that go into such models, and discuss some low hanging fruit available to improve them.

Either way, very much looking forward to going camping! I’ve been to quite a few of the Dutch hacker camps. My first was HAL2001 - more than 20 years ago. Yikes, I’m old.

I still don’t know which village I’ll be in or anything, but lmk if you’re coming.

The Goa Project

Yesterday I talked at an event of The Goa Project, about transparency in government operations. The main thrust of my argument is that while transparency is important, it’s useful to frame it in terms of increasing availability of government services. This helps to align the incentives of the people working within government agencies and the people in society who need the services. If you just ask for all government documents to be publicly available, you’re actually asking for a massive increase in bureaucracy. However, if you ask for digitization of services to standardize interactions, reduce workload, guarantee good recordkeeping and increase self-service possibility, frequently that also means that documents get digitized (or rather, never become non-digital) in the process as a matter of course, and so adding transparency to that becomes an easy and obvious side-effect.

I might post my notes for the session if somebody asks for them.

Packing bytes

Okay, this morning’s code was about efficient delivery of data. It is depressingly common for large volumes of data to be passed to web pages as JSON just because it’s easy to read and most web developers are scared of binary data. But when you’re dealing with, say, ocean temperature measurements at 0.1°×0.1° sampling rate, a JSON blob can quickly become pretty big. How big? Well, assuming we are extremely frugal, we’re talking about 37MB.

So what we need right now is a binary-packed self-describing binary format for gridded geographical data. Tasks are:

  • A definition of the format.
  • An encoder for this in Jai.
  • A decoder for this in Typescript. (We’ll probably never encode data from the browser.)

I tried to get it done in just over 2 hours, and got 2/3 of the items done, albeit with the need for some further testing. The early results suggest that the same ocean measurements would come out to about 6MB in this format. That’s a helluvalot better.

In practice, we probably don’t want to be sending the entire planet at once, and will probably split the planet’s surface up into some number of tiles. But even then, we’d like to avoid sending 6x more data than necessary.

Obviously the correct thing to do is also compress this data. I put a flag in the format header to allow for optional compression, and will probably either do some kind of super simple entropy encoding, or just use LZ4, because it’s pretty simple and reasonably common. Actually, since this is almost always essentially just a sequence of numbers in similar value ranges across a planet, this might be an ideal candidate for Golomb coding.

Somebody might argue that the compressed JSON and the compressed binary blob are probably going to be similar in size because that’s how information density works. In theory, yes. In practice, no, because formats vary in quality and nothing is perfect. In fact, the pigeonhole principle tells us that no compression algorithm can be efficient for all types of data. But I will definitely do the comparison.

Either way, I can’t afford more time on this now; other things need attention. Will revisit in coming days.

Moar authentication

I should probably write something insightful here about how over the top silly React’s Redux thing is, and how 99.9% of the complexity would just go away if people were less freaked out about managing global state. Global state is not your enemy. Poorly designed code is your enemy.

Anyway, I’m a little bit more annoyed at React than I was going into the last weekend.