Since reading Unix: A History and a Memoir by Brian Kerninghan (of K&R), I’ve been a little enamored with the idea of writing small command line tools. This article titled Hints for Writing Unix Tools hits on a bunch of things, and is worth reading, but one that really jumped out at me was this:
Output should be simple to parse and compose. This usually means representing each record as a single, plain-text formatted line of output whose columns are separated by whitespace. (No JSON, please.) Most venerable Unix tools—grep, sort, and sed among them—assume this.
Trying to parse the output of something like this on the command line is incredibly frustrating:
{
myThing = "Blah"
percent = 0.5
...
}
However, if the output is tab delimited:
myThing blah
percent 0.5
Parsing it with Awk
, sed
, or other Unix filter tool can usually be done as easily as awk '/myThing/ { print $2 }'
. I wish more tools paid attention to this. Maybe if you were writing Unix from scratch today, you’d have processes communicate using structured data (I think this is what PowerShell does?), but you have to work with the system you have.
Everyone knows that regular expressions are terrible, but what my theory presupposes, is that a large percentage of the “how can I parse …” questions on places like Stack Overflow that are solved by a crazy regex wouldn’t exist if more tools had sane output.