All the communication methods we've discussed so far have a sort of implicit hierarchy about them, with one program effectively controlling or driving another and zero or limited feedback passing in the opposite direction. In communications and networking we frequently need channels that are peer-to-peer, usually (but not necessarily) with data flowing freely in both directions. We'll survey peer-to-peer communications methods under Unix here, and develop some case studies in later chapters.
The use of temp files as communications drops between cooperating programs is the oldest IPC technique there is. Despite drawbacks, it's still useful in shellscripts, and in one-off programs where a more elaborate and coordinated method of communication would be overkill.
The most obvious problem with using tempfiles as an IPC technique is that it tends to leave garbage lying around if processing is interrupted before the tempfile can be deleted. A less obvious risk is that of collisions between multiple instances of a program using the same name for a tempfile. This is why it is conventional for shellscripts that make tempfiles to include $$ in their names; this shell variable expands to the process-ID of the enclosing shell and effectively uniquifies the filename (it's also supported in Perl).
Finally, if an attacker knows the location to which a tempfile will be written, it can step on that name and possibly either read the producer's data or spoof the consumer process by inserting modified or spurious data into the file. [55] This is a security risk. If the processes involved have root privileges, it is a very serious one.
All these problems aside, tempfiles still have a niche because they're easy to set up, they're flexible, and they're less vulnerable to deadlocks or race conditions than more elaborate methods. And sometimes, nothing else will do. The calling conventions of your child process may require that it be handed a file to operate on. Our first example of a shellout to an editor demonstrates this perfectly.
The simplest and crudest way for two processes on the same machine to communicate with each other is for one to send the other a signal. Unix signals are a form of soft interrupt; each one has a default effect on the receiving process (usually to kill it). A process can declare a signal handler which overrides the default action for the signal; the handler is a function which is executed asynchronously when the signal is received.
Signals were originally designed into Unix as a way for the operating system to notify programs of certain errors and critical events, not as an IPC facility. The SIGHUP signal, for example, is sent to every program started from a given terminal session when that session is terminated. The SIGINT signal is sent to whatever process is currently attached to the keyboard when the user enters the currently-defined interrupt character (often control-C). Nevertheless, signals can be useful for some IPC situations (and the POSIX-standard signal set includes two signals, SIGUSR1 and SIGUSR2, intended for this use). They are often employed as a control channel for daemons (programs that run constantly, invisibly, in background), a way for an operator or another program to tell a daemon that it needs to either re-initialize itself, wake up to do work, or write internal-state/debugging information to a known location.
A technique often used with signal IPC is the so-called pidfile. Programs that will need to be signaled will write a small file to a known location (often in /var/run or the invoking user's home directory) containing their process ID or PID. Other programs can read that file to discover that PID. The pidfile may also function as an implicit lock file in cases where no more than one instance of the daemon should be running simultaneously.
There are actually two different flavors of signals. In the older implementations (notably V7, System III, and early System V), the handler for a given signal is reset to the default for that signal whenever the handler fires. The results of sending two of the same signal in quick succession are therefore usually to kill the process, no matter what handler was set.
The BSD 4.x versions of Unix changed to “reliable” signals, which do not reset unless the user explicitly requests it. They also introduced primitives to block or temporarily suspend processing of a given set of signals. Modern Unixes support both styles. You should use the BSD-style non-resetting entry points for new code, but program defensively in case your code is ever ported to an implementation that does not support them.
Receiving N signals does not necessarily invoke the signal handler N times. Under the older System V signal model, two or more signals spaced very closely together (that is, within a single time-slice of the target process) can result in various race conditions[56] or anomalies. Depending on what variant of signals semantics the system supports, the second and later instances may be ignored, may cause an unexpected process kill, or may have their delivery delayed until earlier instances have been processed (on modern Unixes the last is most likely).
The modern signals API is portable across all recent Unix versions, but not to Windows or classic (pre-OS X) MacOS.
Many well-known system daemons accept SIGHUP (originally the signal sent to programs on a serial-line drop, such as was produced by hanging up a modem connection) as a signal to re-initialize (that is, reload their configuration files); examples include Apache and the Linux implementations of bootpd(8), gated(8), inetd(8), mountd(8), named(8), nfsd(8) and ypbind(8). In a few cases, SIGHUP is accepted in its original sense of a session-shutdown signal (notably in Linux pppd(8)), but that role nowadays generally goes to SIGTERM.
SIGTERM (‘terminate’) is often accepted as a graceful-shutdown signal (this is as distinct from SIGKILL, which does an immediate process kill and cannot be blocked or handled). SIGTERM actions often involve cleaning up temp files, flushing final updates out to databases, and the like.
When writing daemons, follow the Rule of Least Surprise: use these conventions, and read the manual pages to look for existing models.
The fetchmail utility is normally set up to run as a daemon in background, periodically collecting mail from all remote sites defined in its run-control file and passing the mail to the local SMTP listener on port 25 without user intervention. Fetchmail sleeps for a user-defined interval (defaulting to 15 minutes) between collection attempts, so as to avoid constantly loading the network.
When you invoke fetchmail with no arguments, it checks to see if you have a fetchmail daemon already running (it does this by looking for a pidfile). If no daemon is running, fetchmail starts up normally using whatever control information has been specified in its run-control file. If a daemon is running, on the other hand, the new fetchmail instance just signals the old one to wake up and collect mail immediately; then the new instance terminates. In addition, fetchmail -q sends a termination signal to any running fetchmail daemon.
Thus, typing fetchmail means, in effect, “poll now and leave a daemon running to poll later; don't bother me with the detail of whether a daemon was already running or not.” Observe that the detail of which particular signals are used for wakeup and termination is something the user doesn't have to know.
Sockets were developed in the BSD lineage of Unix as a way to encapsulate access to data networks. Two programs communicating over a socket typically see a bidirectional byte stream (there are other socket modes and transmission methods, but they are of only minor importance). The byte stream is both sequenced (that is, even single bytes will be received in the same order sent) and reliable (socket users are guaranteed that the underlying network will do error detection and retry to ensure delivery). Socket descriptors, once obtained, behave essentially like file descriptors.
At the time a socket is created, you specify a protocol family which tells the network layer how the name of the socket is interpreted. Sockets are usually thought of in connection with the Internet, as a way of passing data between programs running on different hosts; this is the AF_INET socket family, in which addresses are interpreted as host-address and service-number pairs. However, the AF_UNIX (aka AF_LOCAL) protocol family supports the same socket abstraction for communication between two processes on the same machine (names are interpreted as the locations of special files analogous to bidirectional named pipes). As an example, client programs and servers using the X window system typically use AF_LOCAL sockets to communicate.
All modern Unixes support BSD-style sockets, and as a matter of design they are usually the right thing to use for bidirectional IPC no matter where your cooperating processes are located. Performance pressure may push you to use shared memory or tempfiles or other techniques that make stronger locality assumptions, but under modern conditions it is best to assume that your code will need to be scaled up to distributed operation. More importantly, those locality assumptions may mean that portions of your system get chummier with each others' internals than ought to be the case in a good design. The separation of address spaces that sockets enforce is a feature, not a bug.
To use sockets gracefully, in the Unix tradition, start by designing an application protocol for use between them — a set of requests and responses which expresses the semantics of what your programs will be communicating about in a succinct way. We've already discussed the design of application protocols in Chapter 5 (Textuality).
Sockets are supported in all recent Unixes, under Windows, and under classic MacOS as well.
Where two processes using sockets to communicate may live on different machines (and, in fact, be separated by an Internet connection spanning half the globe), shared memory requires producers and consumers to be co-resident on the same hardware. But, if your communicating processes can get access to the same physical memory, shared memory will be the fastest way to pass information between them.
Shared memory may be disguised under different APIs, but on modern Unixes the implementation normally depends on the use of mmap(2) to map files into memory that can be shared between processes. POSIX defines a shm_open(3) facility with an API that supports using files as shared memory; this is mostly a hint to the operating system that it need not flush the pseudo-file data to disk.
Because access to shared memory is not automatically serialized by a discipline resembling read and write calls, programs doing the sharing have to handle contention and deadlock issues themselves, typically by using semaphore variables located in the shared segment. The issues here resemble those in multithreading (see the end of this chapter for discussion), but are more manageable because they're better contained, since the default is not to share memory.
On systems where it is available and reliable, the Apache webserver's scoreboard facility uses shared memory for communication between an Apache master process and the pool of Apache images that it manages to handle connections. Modern X implementations also use shared memory, to pass large images between client and server when they are resident on the same machine, in order to avoid the overhead of socket communication. Both uses are performance hacks justified by experience and testing, rather than architectural choices.
The mmap(2) call is supported under all modern Unixes, including Linux and the open-source BSD versions; this is described in the Single Unix Specification. It will not normally be available under Windows, MacOS classic, and other operating systems.
After Version 7 and the split between the BSD and System V lineages, the evolution of Unix inter-process communication took two different directions. The BSD direction led to sockets. The AT&T line, on the other hand, developed named pipes (as previously discussed) and an IPC facility, specifically designed for passing binary data and based on shared-memory bidirectional message queues. This is called called ‘System V IPC’ — or, among old timers, ‘Indian Hill’ IPC after the AT&T facility where it was first written.
The upper, message-passing layer of System V IPC has largely fallen out of use. The lower layer, which consists of shared memory and semaphores, still has segnificant applications in circumstances where one needs to do mutual-exclusion locking and some global data sharing among processes running on the same machine. By using the shared-memory and semaphore facilities (shmget(2), semget(2), and friends) one can avoid the overhead of copying data through the network stack.
Large commercial databases (including Oracle, DB2, Sybase, and Informix) use this technique heavily. As of 2003 it is supported under Linux and Windows, but not the BSDs, OS X, or classic MacOS.
Unix (born 1969) long predates TCP/IP (born 1980) and the ubiquitous networking of the 1990s and later. Anonymous pipes, redirection, and shellout have been in Unix since very early days, but the history of Unix is littered with the corpses of APIs tied to obsolescent IPC and networking models, beginning with the mx() facility that appeared in Version 6 (1976) and was dropped before Version 7 (1979).
Eventually BSD sockets won out as IPC was unified with networking. But this didn't happen until after fifteen years of experimentation that left a number of relics behind. It's useful to know about these because there are likely to be references to them in your Unix documentation that might give the misleading impression that they're still in use. These obsolete methods are described in more detail in Unix Network Programming [Stevens90]
The real explanation for all the dead IPC facilities in old AT&T Unixes was politics. The Unix Support Group was headed by a low-level manager, while some projects that used Unix were headed by vice presidents. They had ways to make irresistible requests, and would not brook the objection that most IPC mechanisms are interchangeable. | ||
--Doug McIlroy |
These are message-passing facilities based on the System V shared memory facility we described earlier.
Programs which cooperate using System V IPC usually define shared protocols based on exchanging short (up to 8K) binary messages. The relevant manual pages are msgctl(2) and friends. As this style has been largely superseded by text protocols passed between sockets, we shall not give an example here.
The System V IPC facilities are present in Linux and other modern Unixes. However, as they are a legacy feature, they are not exercised very often. The Linux version is still known to have bugs as of mid-2003. Nobody seems to care enough to fix them.
Streams networking was invented for Unix Version 8 (1985) by Dennis Ritchie, and first became available in the 3.0 release of System V Unix (1986). The streams facility provided a full-duplex interface (functionally not unlike a BSD socket, and like sockets, were accessible through normal read(2) and write(2) operations after initial setup) between a user process and a specified device driver in the kernel. The device driver might be hardware such as a serial or network card, or it might be a software-only pseudo-device set up to pass data between user processes.
An interesting feature of streams is that it is possible to push protocol-translation modules into the kernel's processing path, so that the device the user process ‘sees’ through the full-duplex channel is actually filtered. This could be used, for example, to implement a line-editing protocol for a terminal device. Or one can implement protocols such as IP or TCP without wiring them directly into the kernel.
Streams didn't take over the world because sockets and TCP/IP did. Streams began as a research exercise apparently stimulated by the now-dead OSI 7-layer networking model; as TCP/IP drove out other protocol stacks and migrated into Unix kernels, the extra flexibility provided by streams had less and less utility. In 2003, System V Unix still supports streams, as do some System V/BSD hybrids such as Digital Unix and Solaris.
Linux and other open-source Unixes have effectively discarded streams. Linux kernel modules and libraries are available from the LiS project, but (as of mid-2003) are not integrated into the stock Linux kernel and have significant known bugs. They will not be supported under non-Unix operating systems.
[55] A particularly nasty variant of this attack is to drop a named Unix-domain socket where the producer and consumer programs are expecting the tempfile to be.
[56] A ‘race condition’ is a class of problem in which correct behavior of the system relies on two independent events happening in the right order, but there is no mechanism for ensuring that they actually will. Race conditions produce intermittent, timing-dependent problems that can be devilishly difficult to debug.