Slow Friday

Booster shot today. Worried it might screw up my next day. Round 2 took me out for a bit. We’ll see. Not a lot of code today, but lots of planning work and meetings. Slowly converging on infrastructure and nailing a bunch of details. Whee. I’m going to say that I won’t bore you with the details, mostly because I can’t be bothered to write them out. But honestly, the details were fun, and you’re missing out.

Schemas and data transformations

Sturgeon was right, he just spelled CRUD wrong. 90% of everything is in fact CRUD – Creating, Reading, Updating, Deleting. Except, nowadays, we have schemas to at least partially automate the highly efficient creation of an infinity of crap. My Tech Stack is Janky The problem arises when you have six operational domains: Data warehouse (Original data) Batch processing data store (Largely Parquet files derived from original data) The programs that do the batch processing (mostly Jai, some Fortran, C and Python) need to have some internal representation of the data it is processing Operational data store (PostgreSQL tables derived from original data or batch outputs) Django ORM (Python schema reflecting the ODS) React frontend schemas (Typescript schema reflecting the output from Django REST API) The closer we get to the user, the more the data will have been transformed into something directly useful.

Zombie mode (work)

I am equally fascinated by the world’s fascination with zombies, as I am unfascinated with zombies. They appear to represent (in the modern manifestation) some kind of deep fear of the uncontrollable masses. LeCorbusier asked rhetorically, “Is there anything more pitiful than an undisciplined crowd?” The answer is yes. For instance, such a crowd trapped in mediated self-loathing, by way of a zombie trope. Somebody suggested a zombie/vampire feedback loop that follows a right-wing/left-wing sentiment shift: when people are fearful of their prosperity due to competition from other workers, people shift right wing and zombie movies get made.

Confusion (Work log)

One of the many ironies of Electron-based apps is that nowadays I rely on so many of them, I wish I could arrange them into tabs. Good morning! Another day, another box of stolen pens, amirite? No, but seriously. I found some energy lurking in the shadowy depths of my tordid existence last night and did half of the conversion to Typescript that I was bitching about yesterday. It’s amazing what you can get done when you just stop complaining for half a minute.

Work log, 12022-01-17

I’ve been doing these “work logs” publicly for a week now. The process is easy – when I sit down to start a workday, I start a new Markdown file. During the day I document what I’m doing, broadly, noting things like breakthroughs, frustrations, insights, useful links that come up. At the end of the day, instead of just forgetting the document exists, I commit it to my blog repo. Perhaps the public nature of this will come back to bite me at some point, but for now this is having a positive effect on me.

Work notes 12022-01-14

Short day today; have to go carry heavy objects for people. Just a few things. MuPDF Cloned MuPDF to try and determine how hard it would be to use as a library for something. Saw it had a makefile. Typed “make”, expecting the usual shenanigans, but no. It just freaking compiled. And quick, too. Like: No CMake shit, no Configure script, none of that madness. Just a well crafted Makefile.

Data truncation and plant characteristics

Let us return briefly to the entire GRIB saga. Yesterday I complained about some weird overlaps in the imported data. Upon further inspection, it turns out the dates are being interpreted wrong. The way GRIB files store dates is that the beginning of the message indicates the start date and the templates indicate time step size. Data is then tagged as X time steps from origin, more or less. Except, how the time step is interpreted depends on the template itself, and apparently wgrib2 is reading that wrong for my data.

Meteorological data fun (day 3)

I’m making some progress. Naturally, I was being a bit dense by the end of the day yesterday. Of course Apache Arrow comes with Debian repositories and good instructions for installation. I mean, it’s not quite as good as just working™, but close enough. Solved. I also read up on the GLib-bindings mechanism that they provide, which makes a lot of things easier. The Ruby bindings base off it, for instance, and probably the Python bindings too.

GRIB Woes, Part 2

Yesterday I spent a few hours making a decoder for the GRIB file format, which was a mostly pleasant experience, although the deeper I got the more I realized that there was something very wrong with the state of affairs. The data I’m trying to work with is the ECMWF’s Seasonal-to-Subseasonal daily averaged data, which is a pretty typical data set as meteorology and climate science goes. It’s very well curated, very precisely defined, and deployed to the world in a really difficult to use way.

The GRIB file format

I love a good file format. GRIB, however, appears to be… I don’t even know. It’s a file format for meteorological data, and stands for GRIdded Binary. There are two main versions, and they appear to be binary incompatible. Unsurprisingly for meteorological formats, a lot of the reference code is in Fortran. Sigh. GDAL has support for the GRIB format, but unsurprisingly, since GDAL is designed more around managing geographical imagery data, it makes it a bit complicated to just read the data out of a GRIB file.