When a Unix program starts up, the environment accessible to it includes a set of name to value associations (names and values are both strings). Some of these are set manually by the user; others are set by the system at login time, or by your shell or terminal emulator (if you're running one). Under Unix, environment variables tend to carry information about file search paths, system defaults, the current user ID and process number, and other key bits of information about the runtime einvironment of programs. At a shell prompt, typing set followed by a newline will list all currently defined shell variables.
In C and C++ these values can be queried with the library function getenv(3). Perl and Python initialize environment-dictionary objects at startup. Other languages generally follow one of these two models.
There are a number of well-known environment variables you can expect to find defined on startup of a program from the Unix shell. These (especially HOME) will often need to be evaluated before you read a local dotfile.
Login name of the account under which this session is logged in (BSD convention).
Login name of the account under which this session is logged in (System V convention).
Home directory of the user running this session.
The number of character-cell columns on the controlling terminal or terminal-emulator window.
The number of character-cell rows on the controlling terminal or terminal-emulator window.
The name of the user's command shell (often used by shellout commands).
The list of directories that the shell searches when looking for executable commands to match a name.
Name of the terminal type of the session console or terminal emulator window (see the terminfo case study in Chapter 6 for background). TERM is special in that programs to create remote sessions over the network (such as telnet and ssh) are expected to pass it through and set it in the remote session.
(This list is representative, but not exhaustive.)
The HOME variable is especially important, because many programs use it to find the calling user's dotfiles (others call some functions in the C runtime library to get the calling user's home directory).
Note that some or all of these system environment variables may not be set when a program is started by some other method than a shell spawn. In particular, daemon listeners on a TCP/IP socket often don't have these variables set — and if they do, the values are unlikely to be useful.
Finally, note that there is a tradition (exemplified by the PATH variable) of using a colon as a separator when an environment variable must contain multiple fields, especially when the fields can be interpreted as a search path of some sort. Note that some shells (notably bash and ksh) always interpret colon-separated fields in an environment variable as filenames, which means in particular that they expand ~ in these fields to the user's home directory.
Although applications are free to interpret environment variables outside the system-defined set, it is nowadays fairly unusual to actually do so. Environment values are not really suitable for passing structured information into a program (though it can in principle be done via parsing of the values). Instead, modern Unix applications tend to use run-control files and dotfiles.
There are, however, some design patterns in which user-defined environment variables can be useful:
Application-independent preferences that need to be shared by a large number of different programs. This set of ‘standard’ preferences changes only slowly, because lots of different programs need to recognize each one before it becomes useful.[102] Here are the standard ones:
The name of the user's preferred editor (often used by shellout commands).[103]
The name of the user's preferred mail user agent (often used by shellout commands).
The name of the user's preferred program for browsing plaintext.
The name of the user's preferred program for browsing Web URLs. This one, as of 2003, is still very new and not yet widely implemented.
What both user and system environment variables have in common is that it would be annoying to have to replicate the information they contain in a large number of application run-control files, and extremely annoying to have to change that information everywhere when your preference changes. Typically, the user sets these variables in his or her shell session startup file.
A value varies across several contexts that share dotfiles, or a parent needs to pass information to multiple child processes. Some pieces of start-up information are expected to vary across several contexts in which the calling user would share common run-control files and dotfiles. For a system-level example, consider several shell sessions open through terminal emulator windows on an X desktop. They will all see the same dotfiles, but might have different values of COLUMNS, LINES, and TERM. (Old-school shell programming used this method extensively; makefiles still do.)
A value varies too often for dotfiles, but doesn't change on every startup. A user-defined environment variable may (for example) be used to pass a file system or Internet location that is the root of a tree of files that the program should play with. The CVS version-control system interprets the variable CVSROOT this way, for example. Several newsreader clients that fetch news from servers using the NNTP protocol interpret the variable NNTPSERVER as the location of the server to query.
A process-unique override needs to be expressed in a way that doesn't require the command-line invocation to be changed. A user-defined environment variable can be useful for situations in which, for whatever reason, it would be inconvenient to have to change an application dotfile or supply command-line options (perhaps it is expected that the application will normally be used inside a shell wrapper or within a makefile). A particularly important context for this sort of use is debugging. Under Linux, for example, manipulating the variable LD_LIBRARY_PATH associated with the ld(1) linking loader enables you to change where libraries are loaded from — perhaps to pick up versions that do buffer-overflow checking or profiling.
In general, a user-defined environment variable can be an effective design choice when the value changes often enough to make editing a dotfile each time inconvenient, but not necessarily every time (so always setting the location with a command-line option would also be inconvenient). Such variables should typically be evaluated after a local dotfile and be permitted to override settings in it.
There is one traditional Unix design pattern that we do not recommend for new programs. Sometimes, user-set environment variables are used as a lightweight substitute for expressing a program preference in a run-control file. The venerable nethack(1) dungeon-crawling game, for example, reads a NETHACKOPTIONS environment variable for user preferences. This is an old-school technique; modern practice would lean toward parsing them from a .nethack or .nethackrc run-control file.
The problem with the older style is that it makes tracking where your preference information lives more difficult than it would be if you knew the program had a run-control file under your home directory. Environment variables can be set anywhere in several different shell run-control files — under Linux these are likely to include .profile, .bash_profile, and .bashrc at least. These files are cluttered and fragile things, so as the code overhead of having an option-parser has come to seem less significant preference information has tended to migrate out of environment variables into dotfiles.
Environment variables have only very limited portability off Unix. Microsoft operating systems have an environment-variable feature modeled on that of Unix, and use a PATH variable as Unix does to set the binary search path, but most of other variables that Unix shell programmers take for granted (such as process ID or current working directory) are not supported. Other operating systems (including classic MacOS) generally do not have any local equivalent of environment variables.
[102] Nobody knows a really graceful way to represent this sort of distributed preference data; environment variables probably are not it, but all the known alternatives have equally nasty problems.
[103] Actually, most Unix programs first check VISUAL, and only if that's not set will they consult EDITOR. That's a relic from the days when people had different preferences for line-oriented editors and visual editors.