Data truncation and plant characteristics

2022-01-13 579 words 3 minutes

Contents

Let us return briefly to the entire GRIB saga. Yesterday I complained about some weird overlaps in the imported data. Upon further inspection, it turns out the dates are being interpreted wrong.

The way GRIB files store dates is that the beginning of the message indicates the start date and the templates indicate time step size. Data is then tagged as X time steps from origin, more or less. Except, how the time step is interpreted depends on the template itself, and apparently wgrib2 is reading that wrong for my data. As a result it’s thinking my input data is at roughly 3.5 day increments instead of roughly 6 hour increments. Bdöh.

So, with this in mind, I started to play with the -time_processing flag on wgrib2, and after my first experiment on a month of GRIB data, I started noticing that nothing at all was happening – no errors, but also no processing. This seemed awfully strange, and I had a hunch about what had happened. Sure enough, wgrib2 had truncated my GRIB input file to 0.

I’m really glad that wasn’t the only copy of that data. Well done, software.

At any rate, I need to fix the date handling. No time for that today though.

“I suppose my lack of wings is also a lack of a dorsal fin”

I’ve been looking for a good database of plant characteristics for the better part of a year. A friend of mine, Liam, has insisted that the only way to obtain a database that isn’t poorly construed is to talk to a guy named Úlfur. This morning Liam showed up and dragged me out on a road trip, causing me to inadvertently spend the day in what I believe the experts call “shit-ass meteorological conditions”.

Talked to Úlfur, who provided some ideas and insights, in particular around things that can be measured and could be beneficial as inputs. Generally, my takeaway is that this type of thing is hard – hardly a stunning insight. There are simply so many traits, so many factors that are affected by biological communities, and so little understanding of the interactions. Most of the traits aren’t even useful for describing most species – a lack of wings may also be a lack of dorsal fins, to steal an observation.

But there’s some low hanging fruit available too. Generally the quality of available data is pretty low, the communication between scientists and practicioners isn’t great, and there’s (apparently) a high degree of people not really caring if a few thousand tree seedlings die for unknown reasons, because operations are on the hundreds of thousands scale. Also, data is gathered somewhat sporadically, which might also explain the low quality.

I had come across the TRY database a while back, so I put in a data request to them today. The data came back in the form of an Excel file, and either I asked for the wrong data or this is somewhat disappointing. Aside from most of the fields being kind of useless (again, I perhaps requested the wrong ones), there isn’t even a hint of normalization in this data. I shall have to revisit this later.

Some papers that were interesting

These are mostly unrelated to the above: