The Unix GUI stack obeys Unix design principles by being constructed as a set of layers or modules connected by documented APIs, and designed so that each piece can be swapped out for alternate implementations. This gives the whole system a great degree of flexibility and robustnness, though at the cost of making it difficult to guarantee a consistent user experience.
The base of the stack, is, of course, hardware. In 2004 Unix systems typically feature the following devices relevant to GUI programming:
Improvements in this hardware are still happening at time of writing, but the pace is slowing as their enabling technologies mature and the cost of each increment in capability rises. High-end graphics cards are approaching the limit of about 80,000 polygons per second at which they will be able to refresh with more speed and higher resolutions than the human eye can follow; we can expect that to happen about 2006-2007. The changeover from electron-gun monitors to liquid-crystal flatscreens is actually slowing the rise in screen resolutions, as large LCDs are even more prone to manufacturing defects than phosphor masks. Mice and keyboards are stable, old, and thoroughly commoditized technology; the largest recent change there has been a revival of trackballs in elegant thumb-operated versions that have better ergonomics than mice but are functionally identical to them.
Sitting atop the hardware will be an X server. The X server's job is to manage access to all the underlying hardware listed above, except for the sound card which is handled by a different mechanism which we'll specify further on.
The X server accepts TCP/IP connections from GUI applications. Those applications will read that connection to get input events, which include both keyboard presses and releases, mouse-button presses and releases, and mouse motion notifications. The applications will write requests to the TCP/IP connection for the server to change pixels on the display in various ways. There is a standard X protocol specifying how requests and responses are structured.
Each X application makes calls to a linked copy of a service library that handles the application end of the X protocol connection to the server. The raw API of this library is notoriously complex and only describes low-level services, so typically the application actually calls it through a second library called a toolkit.
The function of the toolkit library is to provide higher-level services and the basics of an interface policy. There are several tookit libraries, which we'll survey later in this chapter. The services each one provides are typically organized as a collection of widgets such as buttons, scrollbars, and canvas areas that can be drawn on. An interface policy is a set of rules (or at least defaults) about how the widgets behave — a “look and feel”.
Toolkit libraries are written in C. If the application is as well, it will call the toolkit directly. In the increasingly common case that the application is written in a scripting language such as Tcl, Perl or Python, the application will call the toolkit library through a language binding which translates between the language's native data objects and the C data types understood by the toolkit.
The application itself will, after initial setup, go into an event loop. The loop will accept input event notifications from the server, dispatch the input events to be handled by application logic, and ship requests for screen updates back to the server.
This layered organization contrasts sharply with the way things are done on non-Unix systems. Elsewhere, the graphics engine and toolkit layer are combined into a single service, and the GUI environment layer is at best semi-separable from either. This has two practical consequences. First, GUI applications must run on the same machine that hosts their display. Second, attempts to change the interface look-and-feel risk destabilizing the graphics engine code.