Problems in the Design of Unix

Plan 9 cleans up Unix, but only really adds one new concept (private namespaces) to its basic set of design ideas. But are there serious problems with those basic design ideas? In Chapter 1 we touched on several issues that Unix arguably got wrong. Now that the open-source movement has put the design future of Unix back in the hands of programmers and technical people, these are no longer decisions we have to live with forever. We'll reexamine them in order to get a better handle on how Unix might evolve in the future.

A Unix file is just a big bag of bytes, with no other attributes. In particular, there is no capability to store information about the file type or a pointer to an associated application program outside the file's actual data.

More generally, everything is a byte stream; even hardware devices are byte streams. This metaphor was a tremendous success of early Unix, and a real advance over a world in which (for example) compiled programs could not produce output that could be fed back to the compiler. Pipes and shell programming sprang from this metaphor.

But Unix's byte-stream metaphor is so central that Unix has trouble integrating software objects with operations that don't fit neatly into the byte stream or file repertoire of operations (create, open, read, write, delete). This is especially a problem for GUI objects such as icons, windows, and ‘live’ documents. Within a classical Unix model of the world, the only way to extend the everything-is-a-byte-stream metaphor is through ioctl calls, a notoriously ugly collection of back doors into kernel space.

Fans of the Macintosh family of operating systems tend to be vociferous about this. They advocate a model in which a single filename may have both data and resource ‘forks’, the data fork corresponding to the Unix byte stream and the resource fork being a collection of name/value pairs. Unix partisans prefer approaches that make file data self-describing so that effectively the same sort of metadata is stored within the file.

The problem with the Unix approach is that every program that writes the file has to know about it. Thus, for example, if we want the file to carry type information inside it, every tool that touches it has to take care to either preserve the type field unaltered or interpret and then rewrite it. While this would be theoretically possible to arrange, in practice it would be far too fragile.

On the other hand, supporting file attributes raises awkward questions about which file operations should preserve them. It's clear that a copy of a named file to another name should copy the source file's attributes as well as its data — but suppose we cat(1) the file, redirecting the output of cat(1) to a new name?

The answer to this question depends on whether the attributes are actually properties of filenames or are in some magical way bundled with the file's data as a sort of invisible preamble or postamble. Then the question becomes: Which operations make the properties visible?

Xerox PARC file-system designs grappled with this problem as far back as the 1970s. They had an ‘open serialized’ call that returned a byte stream containing both attributes and content. If applied to a directory, it returned a serialization of the directory's attributes plus the serialization of all the files in it. It is not clear that this approach has ever been bettered.

Linux 2.5 already supports attaching arbitrary name/value pairs as properties of a filename, but at time of writing this capability is not yet much used by applications. Recent versions of Solaris have a roughly equivalent feature.

The Unix experience proves that using a handful of metaphors as the basis for a framework is a powerful strategy (recall the discussion of frameworks and shared context in Chapter 13). The visual metaphor at the heart of modern GUIs (files represented by icons, and opened by clicking which invokes some designated handler program, typically able to create and edit these files) has proven both successful and long-lived, exerting a strong hold on users and interface designers ever since Xerox PARC pioneered it in the 1970s.

Despite considerable recent effort, in 2003 Unix still supports this metaphor only poorly and grudgingly — there are lots of layers, few conventions, and only weak construction utilities. A typical reaction from a Unix old hand is to suspect that this reflects deeper problems with the GUI metaphor itself.

 

I think part of the problem is that we still don't have the metaphor right. For example, on the Mac I drag a file to the trashcan to delete it, but when I drag it to a disc it gets copied, and can't drag it to a printer icon to print it because that's done through the menus. I could go on and on. It's like files were in OS/360, before Unix came along with its simple (but not too simple), file idea.

 
-- Steve Johnson  

We quoted Brian Kernighan and Mike Lesk to similar effect in Chapter 11. But the inquiry can't stop with indicting the GUI, because with all its flaws there is tremendous demand for GUIs from end users. Supposing we could get the metaphor right at the level of the design of user interactions, would Unix be capable of supporting it gracefully?

The answer is: probably not. We touched on this problem in considering whether the bag-of-bytes model is adequate. Macintosh-style file attributes may help provide the mechanism for richer support of GUIs, but it seems very unlikely that they are the whole answer. Unix's object model doesn't include the right fundamental constructs. We need to think through what a really strong framework for GUIs would be like — and, just as importantly, how it can be integrated with the existing frameworks of Unix. This is a hard problem, demanding fundamental insights that have yet to emerge from the noise and confusion of ordinary software engineering or academic research.

People with VMS experience, or who remember TOPS-20 often miss these systems' file-versioning facilities. Opening an existing file for write or deleting it actually renamed it in a predictable way including a version number; only an explicit removal operation on a version file actually erased data.

Unix does without this, at a not inconsiderable cost in user irritation when the wrong files get deleted through a typo or unexpected effects of shell wildcarding.

There does not seem to be any foreseeable prospect that this will change at the operating system level. Unix developers like clear, simple operations that do what the user tells them to do, even if the user's instructions could amount to commanding “shoot me in the foot”. Their instinct is to say that protecting the user from himself should be done at the GUI or application level, not in the operating system.

Unix has, in one sense, a very static model of the world. Programs are implicitly assumed to run only briefly, so the background of files and directories can be assumed static during their execution. There is no standard, well-established way to ask the system to notify an application if and when a specified file or directory changes. This becomes a significant issue when writing long-lived user-interface software which wants to know about changes to the background.

Linux has file- and directory-change notification features,[156] and some versions of BSD have copied them, but these are not yet portable to other Unixes.

Apart from the ability to suspend processes (in itself a trivial addition to the scheduler which could be made fairly inoffensive) what job control is about is switching a terminal among multiple processes. Unfortunately, it does the easiest part — deciding where keystrokes go — and punts all the hard parts, like saving and restoring the state of the screen, to the application.

A really good implementation of such a facility would be completely invisible to user processes: no dedicated signals, no need to save and restore terminal modes, no need for the applications to redraw the screen at random times. The model ought to be a virtual keyboard that is sometimes connected to the real one (and blocks you if you ask for input when it isn't connected) and a virtual screen which is sometimes visible on the real one (and might or might not block on output when it's not), with the system doing the multiplexing in the same way it multiplexes access to the disk, the processor, etc... and no impact on user programs at all.[157]

Doing it right would have required the Unix tty driver to track the entire current screen state rather than just maintaining a line buffer, and to know about terminal types at kernel level (possibly with help from a daemon process) so it could do restores properly when a suspended process is foregrounded again. A consequence of doing it wrong is that the Unix kernel can't detach a session, such as an xterm or Emacs job, from one terminal and re-attach it to another (which could be of a different type).

As Unix usage has shifted to X displays and terminal emulators, job control has become relatively less important, and this issue does not have quite the force it once did. It is still annoying that there is no suspend/attach/detach, however; this feature could be useful for saving the state of terminal sessions between logins.

A common open-source program called screen(1) solves several of these problems.[158] However, since it has to be called explicitly by the user, its facilities are not guaranteed to be present in every terminal session; also, the kernel-level code that overlaps with it in function has not been removed.

C lacks a facility for throwing named exceptions with attached data.[159] Thus, the C functions in the Unix API indicate errors by returning a distinguished value (usually −1 or a NULL character pointer) and setting a global errno variable.

In retrospect, this is the source of many subtle errors. Programmers in a hurry often neglect to check return values. Because no exception is thrown, the Rule of Repair is violated; program flow continues until the error condition manifests as some kind of failure or data corruption later in execution.

The absence of exceptions also means that some tasks which ought to be simple idioms — like aborting from a signal handler on a version with Berkeley-style signals — have to be performed with code that is complex, subject to portability glitches, and bug-prone.

This problem can be (and normally is) hidden by bindings of the Unix API in languages such as Python or Java that have exceptions.

The lack of exceptions is actually an indicator of a problem with larger immediate implications; C's weak type ontology makes communication between higher-level languages implemented in it problematic. Most of the more modern languages, for example, have lists and dictionaries as primary data types — but, because these don't have any canonical representation in the universe of C, attempting to pass lists between (say) Perl and Python is an unnatural act requiring a lot of glue.

There are technologies that address the larger problem, such as CORBA, but they tend to involve a lot of runtime translation and be unpleasantly heavyweight.

The ioctl(2) and fcntl(2) mechanisms provide a way to write hooks into a device driver. The original, historical use of ioctl(2) was to set parameters like baud rate and number of framing bits in a serial-communications driver, thus the name (for ‘I/O control’). Later, ioctl calls were added for other driver functions, and fcntl(2) was added as a hook into the file system.

Over the years, ioctl and fcntl calls have proliferated. They are often poorly documented, and often a source of portability problems as well. With each one comes a grubby pile of macro definitions describing operation types and special argument values.

The underlying problem is the same as ‘big bag of bytes’; Unix's object model is weak, leaving no natural places to put many auxiliary operations. Designers have an untidy choice among unsatisfactory alternatives; fcntl/ioctl going through devices in /dev, new special-purpose system calls, or hooks through special-purpose virtual file systems that hook into the kernel (e.g. /proc under Linux and elsewhere).

It is not clear whether or how Unix's object model will be enriched in the future. If MacOS-like file attributes become a common feature of Unix, tweaking magic named attributes on device drivers may take over the role ioctl/fcntl now have (this would at least have the merit of not requiring piles of macro definitions before the interface could be used). We've already seen that Plan 9, which uses the named file server or file system as its basic object, rather than the file/bytestream, presents another possible path.

Perhaps root is too powerful, and Unix should have finer-grained capabilities or ACLs (Access Control Lists) for system-administration functions, rather than one superuser that can do anything. People who take this position argue that too many system programs have permanent root privileges through the set-user-ID mechanism; if even one can be compromised, intrusions everywhere will follow.

This argument is weak, however. Modern Unixes allow any given user account to belong to multiple security groups. Through use of the execute-permission and set-group-ID bits on program executables, each group can in effect function as an ACL for files or programs.

This theoretical possibility is very little used, however, suggesting that the demand for ACLs is much less in practice than it is in theory.

Unix unified files and local devices — they're all just byte streams. But network devices accessed through sockets have different semantics in a different namespace. Plan 9 demonstrates that files can be smoothly unified with both local and remote (network) devices, and all of these things can be managed through a namespace that is dynamically adjustable per-user and even per-program.

Was having a file system at all the wrong thing? Since the late 1970s there has been an intriguing history of research into persistent object stores and operating systems that don't have a shared global file system at all, but rather treat disk storage as a huge swap area and do everything through virtualized object pointers.

Modern efforts in this line (such as EROS[160]) hint that such designs can offer large benefits including both provable conformance to a security policy and higher performance. It must be noted, however, that if this is a failure of Unix, it is equally a failure of all of its competitors; no major production operating system has yet followed EROS's lead.[161]

Perhaps URLs don't go far enough. We'll leave the last word on possible future directions of Unix to Unix's inventor:

 

My ideal for the future is to develop a file system remote interface (a la Plan 9) and then have it implemented across the Internet as the standard rather than HTML. That would be ultimate cool.

 
-- Ken Thompson  


[156] Look for F_NOTIFY under fcntl(2).

[157] This paragraph is based on a 1984 analysis by Henry Spencer. He went on to note that job control was necessary and appropriate for POSIX.1 and later Unix standards to consider precisely because it oozes its way into every program, and hence has to be thought about in any application-to-system interface. Hence, POSIX's endorsement of a mis-design, while proper solutions were “out of scope” and hence were not even considered.

[158] There is a project site for screen(1) at http://www.math.fu-berlin.de/~guckes/screen/.

[159] For nonprogrammers, throwing an exception is a way for a program to bail out in the middle of a procedure. It's not quite an exit because the throw can be intercepted by catcher code in an enclosing procedure. Exceptions are normally used to signal errors or unexpected conditions that mean it would be pointless to try to continue normal processing.

[161] The operating systems of the Apple Newton, the AS/400 minicomputer and the Palm handheld could be considered exceptions.