Unix comes equipped with some powerful special-purpose code generators for purposes like building lexical analyzers (tokenizers) and parsers; we'll survey these in Chapter 15. But there are much simpler, lighter-weight sorts of code generation we can use to make life easier without having to know any compiler theory or write (error-prone) procedural logic.
Here are a couple of simple case studies to illustrate this point:
Called without arguments, ascii generates a usage screen that looks like Example 9.5.
The most naïve way to generate the usage screen would have been to put each line into a C initializer in the ascii.c source code, and then have all lines be written out by code that steps through the initializer. The problem with this method is that the extra data in the C initializer format (trailing newline, string quotes, comma) would make the lines longer than 79 characters, causing them to wrap and making it rather difficult to map the appearance of the code to the appearance of the output. This, in turn, would make the display difficult to edit, which was annoying when I was tinkering it to fit in 24×80 screen cells.
A more sophisticated method using the string-pasting behavior of the ANSI C preprocessor collided with a variant of the same problem. Essentially, any way of inlining the usage screen explicitly would involve punctuation at start and end of line that there's no room for.[98] And copying the table to the screen from a file at runtime seemed like a fragile expedient; after all, the file could get lost.
Here's the solution. The source distribution contains a file that just contains the usage screen, exactly as listed above and named splashscreen. The C source contains the following function:
void showHelp(FILE *out, char *progname) { fprintf(out,"Usage: %s [-dxohv] [-t] [char-alias...]\n", progname); #include "splashscreen.h" exit(0); }
And splashscreen.h is generated by a makefile production:
splashscreen.h: splashscreen sed <splashscreen >splashscreen.h \ -e 's/\\/\\\\/g' -e 's/"/\\"/' -e 's/.*/puts("&");/'
By generating the code from data, we get to keep the editable version of the usage screen identical to its display appearance. This promotes transparency. Furthermore, we could modify the usage screen at will without touching the C code at all, and the right thing would automatically happen on the next build.
Let's suppose that we want to put a page of tabular data on a Web page. We want the first few lines to look like Example 9.6.
The superficially clever way to handle this would be to make this data a three-column relation in a database, then use some fancy CGI[99] technique or a database-capable templating engine like PHP to generate the page on the fly. But suppose we know that the list will not change very often, don't want to run a database server just to be able to display this list, and don't want to load the server with unnecessary CGI traffic?
There's a better solution. We put the data in a tabular flat-file format like Example 9.7.
We then write a script in shell, Perl, Python, or Tcl that massages this file into an HTML table, and run that each time we add an entry. The old-school Unix way would revolve around the following nigh-unreadable sed(1) invocation
sed -e 's,^,<tr><td>,' -e 's,$,</td></tr>,' -e 's,:,</td><td>,g'
or this perhaps slightly more scrutable awk(1) program:
awk -F: '{printf("<tr><td>%s</td><td>%s</td><td>%s</td></tr>\n", \ $1, $2, $3)}'
(If either of these examples interests but mystifies, read the documentation for sed(1) or awk(1). We explained in Chapter 8 that the latter has largely fallen out of use. The former is still an important Unix tool that we haven't examined in detail because (a) Unix programmers already know it, and (b) it's easy for non-Unix programmers to pick up from the manual page once they grasp the basic ideas about pipelines and redirection.)
A new-school solution might center on this Python code, or on equivalent Perl:
for row in map(lambda x:x.rstrip().split(':'),sys.stdin.readlines()): print "<tr><td>" + "</td><td>".join(row) + "</td></tr>"
These scripts took about five minutes each to write and debug, certainly less time than would have been required to either hand-hack the initial HTML or create and verify the database. The combination of the table and this code will be much simpler to maintain than either the under-engineered hand-hacked HTML or the over-engineered database.
A further advantage of this way of solving the problem is that the master file stays easy to search and modify with an ordinary text editor. Another is that we can experiment with different table-to-HTML transformations by tweaking the generator script, or easily make a subset of the report by putting a grep(1) filter before it.
I actually use this technique to maintain the Web page that lists fetchmail test sites; the example above is science-fictional only because publishing the real data would reveal account usernames and passwords.