About This Manual
This COMIT manual is derived from the text of the book "Programming with COMIT II" by Victor H. Yngwe, MIT Press 1972. Book material is reproduced here for educational and research purposes; all copyrights remain with publisher and/or the author’s heirs (he died in 2012). This text is not for sale and should be distributed with the open-source COMIT interpreter only.
Because the purpose of this manual is strictly to document the COMIT II language, various scholarly apparatus has been jettisoned, including an introductory chapter on program decomposition and a bibliography. Material that wandered too far afield into linguistics or the mechanics of a programming-language course in 1972 has also been omitted. So have a few simple flowcharts, fashionable accessories that were taken to signify an air of seriousness in a programming text of that time but conveyed little information even then.
In a few notes that refer specifically to the implementation this manual ships with, it is referred to as "COMIT Revived".)
Historical Context
COMIT, dating from 1957, was the very first computer language designed for string processing. It was originally designed for early work in automated translation of natural languages. In this role it (and other similar and slightly later efforts such as TRAC) were, alas, largely failures; neither the naive text-transformation theory of the time nor the computing power available was adequate to the job. The first really high-quality computer translations would not become public until nearly fifty years later, and use statistical fuzzy-matching methods very different in style from COMIT’s.
Nevertheless, COMIT cast a long shadow. Text transformation turned out to be a valuable tool for domains less messy than natural languages. The first really successful language in this class, SNOBOL, inherited some central concepts and terminology from COMIT. Through SNOBOL these would propagate into the ed(1) and sed(1) utilities; this is why older versions of sed documentation used the term "workspace" for what is now often called "pattern space". What passes for control flow in sed(1) - fall through or branch to label - is also inherited from COMIT. Another relic of COMIT is the use of $1 $2 … $n as magic string variables.
Accordingly, every Unix scripting language from shell through early Awk and Perl to Python and onward owes a significant debt to COMIT.
THE BASIC NOTIONS OF COMIT
2.1 COMIT Data Structure
§2.1.0.0 Data as it is submitted to a COMIT program is in the form of strings of characters just as they are punched on cards or typed on the typewriter. The output that a COMIT program produces is also in the form of strings of characters, which are printed, punched, or stored for further use. The function of the COMIT program is to operate on strings of characters. It can perform grouping, rearranging, inserting, deleting, sorting, testing, tagging, counting, and so forth. To facilitate these manipulations in a computer, characters are organized into groups called constituents. For example, if the input data consists of text punched on cards, the data can be brought into the computer simply as a string of characters. But it might be more convenient for some purposes to organize the characters of the text into constituents representing individual words. In the COMIT language, the boundaries between constituents are represented by plus signs. For example, if the following words were punched on a card and presented to a COMIT program as data,
THIS IS DATA.
they could be read in and grouped one character per constituent for further processing:
T + H + I + S + - + I + S + - + D + A + T + A + . + *.
where the hyphens represent spaces and the constituent *. is an end-of-line symbol. It could equally well be brought in with the letters compressed into one word per constituent:
THIS + IS + DATA + . + *.
or with spaces not deleted:
THIS + - + IS + - + DATA + . +.*.
§2.1.0.1 When data is read into the computer by a COMIT program, it enters an area of storage called the workspace. The COMIT workspace is that area of storage in which most of the data manipulations take place. Figure 1 shows how the workspace occupies a central position with respect to data flow. Data can flow into the workspace from the input and from the program, out from the workspace to the output, and back and forth between the workspace and the auxiliary storage areas called the dispatcher and the shelves. The data in the dispatcher is
normally used to specify which branches the flow of control is to take, and will be described in more detail in later chapters. The shelves are mainly used to hold data temporarily while it is not being worked on. Shelf zero also functions to hold the automatic subroutine return pushdown. The flow of data between the workspace and the other areas of storage is under the control of the COMIT program that has previously been read into the computer. It is the program, too, that specifies the various manipulations that are to be carried out on the data in the workspace.
2.2 Form of COMIT Program
§2.2.0.0 The main part of the COMIT program consists of instructions for moving and manipulating the data and for directing the flow of control. These instructions are called rules. In other computer languages, an instruction may be called a statement. Normally, COMIT rules are punched one to a card, and are submitted to the computer between a special title card and an end card. When the program is being executed by the computer, the rules are executed one at a time, with the flow of control going from rule to rule according to directions written in the rules and subject to the outcome of various tests made on the data.
§2.2.0.1 A COMIT rule has five sections and can be represented schematically as follows:
name left-half = right-half // routing go-to
§2.2.1 The NAME Section The name section serves to identify the rule for the purposes of flow of control. Control can be directed to a rule from any other rule in the program by means of its rule name. If a rule does not need a name, an asterisk is written in the name section.
§2.2.2 The LEFT-HALF The left-half serves to specify the data in the workspace that is to be manipulated. It does this by a left-to-right search of the constituents in the workspace. For example, if the expression
A + B
is written in the left half, it causes the workspace to be searched from left to right for the first occurrence of two adjacent constituents A and B. If the search is successful, the two workspace constituents found are given temporary numbers 1 and 2 for later reference in the rule, and the rest of the rule is executed. But if the search is unsuccessful, there being no matching constituents in the workspace, the rule is said to fail. It does not apply, and the rest of the rule is not executed. Instead, control goes to the next rule in sequence, that is, the next rule below in the deck of cards. Thus the left-half also serves as a test that can control the flow of control and can be used, for example, to terminate a loop when there remains no more data to be operated on.
§2.2.3 The RIGHT-HALF The right-half expresses the operations that are to be carried out on the data found. For example, if the left-half search has found Vtwo constituents A and B in the workspace and has temporarily numbered them 1 and 2, the right-half expression 2 + 1, referring by means of these numbers to the workSpace constituents found, would cause the two constituents found to be interchanged, that is replaced by the second one followed by the first one.
§2.2.4 The ROUTING Section The routing section of the rule contains input and output instructions, instructions for routing the data between the workspace and the shelves, and some other miscellaneous instructions, such as instructions for compressing several workspace constituents into one or for expanding a constituent into several, one for each character.
§2.2.5 The GO-TO Section The go-to section is for flow of control. Here may be written the name of the rule to which control is to be transferred. There are several other types of go-to expressions. The most frequently used is the asterisk, which sends control to the next rule in sequence. A fraction bar (/) sends control back to the same rule again.
§2.2.6 Punctuation of the COMIT Rule A name section in a rule is required. It begins in column 1 at the extreme left of the card and is separated from the rest of the rule by one or more spaces. A go-to is required. It is the last string of nonspace characters on the card and is separated from the rest of the rule by one or more spaces. The other sections are optional except that some sections require others; a right-half requires a left-half, for example. A left-half is terminated by an equal sign and optional spaces. A routing section is introduced by two fraction bars. Additional optional spaces are allowed anywhere in the COMIT rule where a single Space is allowed or required
2.3 Manipulating Data in the Workspace
§2.3.0.0 The various parts of the rule are executed in left-to-right sequence. Under control of the left-half and the right-half, data in the workspace can be tested and manipulated. Data can be replaced, rearranged, inserted, deleted, and duplicated. In this section, some basic left-half and right-half operations will be presented. A more detailed presentation of the name, routing, and go-to sections will be deferred to later chapters.
§2.3.1 Replacement Material in the workspace can be replaced by other material. Suppose the German sentence Der Mann ist alt. (The man is old.) is represented in the workspace as
-DER + -MANN + -IST + -ALT +
and, as part of a translation program, it is desired to replace the German word -DER by the English word -THE. The rule is written as follows:
* -DER = -THE *
The left-half contains -DER, which causes the computer to search the workspace in a left-to-right search for a constituent -DER. If such a constituent is found, the rule proceeds, and the right-half expression -THE is executed, causing replacement in the workspace of -DER by -THE. But if such a constituent is not found because the workspace does not contain -DER, the search is said to fail and control goes immediately to the next rule, the rest of the rule not being executed.
§2.3.1.1 The left-half serves several functions Simultaneously. It serves to refer to the particular data in the workspace that is to be manipulated; it serves to specify a search of the workspace for a particular constituent (or constituents); and it serves to affect the flow of control by whether the search succeeds or fails. Left-half tests are the most usual tests in COMIT for loop control, and left-half failure is the most common form of exit from a loop for the reason that it automatically results when the Workspace no longer contains material appropriate for further processing.
§2.3.1.2 Thus, if the workspace contained
A+A+A
the following one-rule loop would apply three times before left-half rule failure would cause control to go to the next rule.
* A = B /
The first time through, the leftmost A would be found and replaced by a B. The second and third A would be found and replaced on the second and third time through, leaving the workspace with
B+B+B
On the fourth time through, the left-half search would fail to find an unreplaced A, causing control to go immediately to the next rule on account of rule failure, instead of control continuing through the right-half and go-to sections of the rule.
§2.3.2 Rearrangement Material in the workspace can be rearranged. Suppose that the misspelled word recieve is represented in the workspace as
R+E+C+I+E+V+E
and it is desired to correct this spelling. A possible rule is as follows:
The numbers 2 and 1 in the right-half of the rule refer to the second and first constituents mentioned in the left-half. The result is that the I and E are interchanged in the workspace. A more general rule that would mechanize part of the spelling rule i before e except after c is as follows:
* C + I + E = 1 + 3 + 2 *
§2.3.2.1 We have mentioned that one of the functions of the left-half of the rule is to refer to data. The search mechanism by which this is accomplished is as follows. Suppose the workspace contains
A + B + B + C + B + C + A
And we have a rule:
* B + C = 2 + 1 *
The search that is set up is for a constituent B in the workspace and an immediately following G. Since the search is from left to right, it will be satisfied only with the second B and the following C, that is the third and fourth constituents from the left. When the search has been satisfied, the constituents that have been found are marked temporarily for easy reference in other parts of the rule by numbers called relative constituent numbers. These numbers are assigned in numerical order from left to right to the constituents that have been found. Thus after the left-half of the above rule has been executed, the constituents found are marked as follows:
A + B + B + C + B + C + A [1] [2]
§2.3.2.2 The right-half expression then merely directs the computer to replace these workspace constituents by the constituents mentioned in the right-half in the indicated order. Constituents that have not been found by the left-half search remain unchanged. Thus our rule above would leave the workspace with the B and C reversed:
A + B + C + B + B + C + A
After the right-half has been executed, the left-half relative constituent numbers disappear. They are replaced by right-half relative constituent numbers for convenient reference in the routing. Then, when the execution of the rule has been finished, all relative constituent numbers disappear.
§2.3.3 Insertion Material may be inserted at any point in the workSpace. Suppose the workspace contains
ACE + KING + QUEEN + TEN
and it is desired to insert JACK after QUEEN. The following rule will do it:
* QUEEN = l + JACK *
After the computer has executed the rule? the workspace will contain
ACE + KING + QUEEN + JACK + TEN
There is no trouble in squeezing new constituents in between old ones.
§2.3.4 Deletion Material may be deleted from the workspace in two ways. In the first way, everything that is mentioned in the left~half is deleted by writing a zero in the right-half. As an example, suppose that the word colour is represented in the workspace as
C + O + L + O + U + R
and it is desired to Americanize the spelling by deleting the letter U. The rule is as follows:
* U = 0 *
§2.3.4.1 A rule of this type would not work properly in all cases. Suppose the word house is misspelled housue and is represented in the workspace as
H + O + U + S + U + E
Since the search is from left to right, the above rule would delete the first U and not the second one. After the execution of the rule, the workspace would contain
H + O + S + U + E
§2.3.4.2 This brings us to the second method of deletion: replace two constituents by one of them (or three by two or one of them, etc.). We have again in the workspace:
H + O + U + S + U + E
and we want to delete the U that follows the S. The rule is
* S + U = 1 *
In other words, if any operation is specified in the right-half, any constituent that is mentioned in the left-half but not in the right-half will be deleted.
§2.3.5 No Right-Half It is important to realize that if no operation is specified by the right-half, the workspace remains unchanged. A rule with no right-half might very well be used for a test. Thus the rule
* U = A *
should serve as a test that would direct control to A if the workspace contained constituent U; otherwise the rule would fail and control would go to the next rule.
§2.3.6 Duplication Material in the workspace may be duplicated. Suppose that ED been added to an English verb by an insertion rule, and, since the verb is it is necessary to double the final consonant. One has in the workspace:
P + L + A + N + E + D
The following rule will do it:
* N = 1 + 1 *
The constituent found by the left-half can be copied and inserted in the workspace as many times as necessary by merely repeating in the right-half the figure representing the constituent.
§2.4 Summary
The following left-half and right-half notations have been introduced:
* A = FOUND
Test for A. Search in the workspace for an A, and if found, go to the rule FOUND; otherwise fail and go to the next rule.
* A + B = FOUND
Test for A + B . Search in the workspace for an A followed by a B, and if found, go to the rule FOUND; otherwise fail and go to the next rule.
* A = B *
Replacement. Search in the workSpace for an A, and if found, replace it by a B and go to the next rule; otherwise fail and go to the next rule.
* A + B = 2 + 1 *
Rearrangement. Search in the wogkspace for an A followed by a B, and if found, replace them by the second followed by the first and go to the next rule; otherwise fail and go to the next rule.
* A = 1 + B *
Insertion. Search in the workspace for an A, and if found, replace it by itself followad by a B and go to the next rule; otherwise fail and go to the next rule.
* A = 0 *
Deletion. Search in the workspace for an A, and if found, delete it and go to the next rule; otherwise fail and go to the next rule.
* A + B = 2 *
Deletion. Search in the workspace for an A followed by a B, and if found, replace them by the second and go to the next rule; otherwise fail and go to the next rule.
* A = 1 + 1 *
Duplication. Search in the workspace for an A, and if found, replace it by itself followed by itself and go to the next rule; otherwise fail and go to the next rule.
-
A + B + C = 2 + D + 1 + F + 2 *
A combination. Search in the workspace for an A followed by a B followed by a C and if found, replace them in the workspace by the second followed by a D followed by the first followed by an F followed by the second and then go to the rule Q; otherwise fail and go to the next rule.
It is important to note that the only places in the preceding examples where if space is required are just after the NAME section and just before the GO-TO ction. Wherever else a space is shown, it is optional.
The reader should now be in a position to work through a few illustrative problems.
2.5 Problems for Chapter 2
-
The workspace contains several letters. If two adjacent constituents are O and R, insert a U between them. (Answer)
-
Change every O + U + R to O + R. (Answer)
-
Delete every B in the workspace. (Answer)
-
Change every B to a C. (Answer)
-
Interchange the first X immediately followed by a 8 With the B. (Answer)
-
Insert an X before the first B. (Answer)
-
The leftmost constituent in the workspace is an X, and all the other constituents are B’s. Move the X to the rightmost position. (Answer)
-
The workspace contains a number of constituents. They are all A’s except for an X somewhere in the middle. Change all the A’s that are to the left of the X to B’s. Leave everything else unchanged. (Answer)
THE FLOW OF CONTROL IN COMIT
3.1 The COMIT Program
(Editor’s note: the introductory material on card formats, COM, END, and COMSET is obsolete and included here only for historical flavor. COMIT Revived reads COMIT II programs from text files, does not require COM or END lines, and does not interpret COMSET lines.)
§3.1.0.0 In this chapter we will discuss in detail how a COMIT program is organized and how the flow of control is directed. Material to be submitted to the computer, both program and data, is punched on cards or typed in lines on an on-line typewriter. When a program is being punched or typed, only the first 72 positions on the card are used to carry program information. Since a punched card has 80 columns there are 8 extra positions at the far right that may be left blank or may be used for identification numbers. This section of the card is called the identification field or ID field. Data cards, however, can have data punched in all 80 card columns. The COMIT program can read the whole card and send information from all 80 columns to the workspace.
§3.1.1 Title Card The first card of the COMIT program is a title card, which must have COM in columns 8, 9, and 10 and may have any punching in any of the other columns, usually program identification information, such as the programmer’s name and the name of the program. This information will later be automatically printed at the top of the output so that it can readily be identified and separated from the output of other programs. The title card signals to the compiler that this is the beginning of a COMIT program. Then, when the compiler program is run, the flow of control starts with the first rule after the title card.
§3.1.2 END Card The last card of the program must be an END card, which is a card with END anywhere in the first 72 card columns, the rest being blank. The END card signals to the compiler that this is the end of the COMIT program. It also serves to terminate the flow of control during the running of the program. Input data may follow the END card or may be read separately from other input devices.
§3.1.3 COMSET Cards After the title card there may be one or more COMSET cards. These cards have various functions such as setting the limits on the length of the run, changing the bell and margin settings for printed output, and setting other options when the standard values provided are inappropriate for a partic ular program. These COMSET cards all have a blank in column 1.
§3.1.4 Rules Between the COM card and the END card and after any COMSET cards: the program itself is inserted. The rules making up the COMIT program are executed one at a time by the computer. When a rule is being executed we say that control is in that rule, and as the rules are executed one after another we say that control passes fromsone rule to another.
§3.1.4.1 A rule may extend onto more than one card if necessary. If a hyphen (or a minus sign) is the last nonblank character before column 73 on a card, the rule continues to the next card. Each rule must always start at the beginning of a card with a nonblank character in column 1. Rules are thus distinguished from COMSET cards, which have column 1 blank.
§3.1.5 Names The first and the last sections of the rule have to do only with a specification of the flow of control and are required sections of all rules. In the first section of the rule, the name section, a name or label can be written so that control can be directed to that rule by name. The optional center sections of the rule not only deal with the data but also have some function in the flow of control in that they may contain tests and branches The last section of a rule, the go-to section, is executed after all the sections of the rule have been executed. If a name is written in the go-to section, control will be directed to the named rule. Of course it would be error if there were no rule by that name or if there were two or more rules by that name.
§3.1.5.1 Names may be freely and arbitrarily invented. Programmers usually names that serve to remind them of what each rule is intended to accomplish Programs with mnemonic names of this sort tend to be easier to read. A name may be up to 12 characters in length and may consist of letters and numbers. Periods and hyphens (or minus signs) may appear in medial position, that is not at the beginning or the end. Examples of legal names are START, FIRST-RETURN, 27, READ, 25.3A. Rules may have names reflecting numbered boxes in a flowchart, and this is a great convenience when a program has been carefully flowcharted.
3.2 Methods of Directing the Flow of Control
§3.2.1 Linear Flow The simplest way to arrange COMIT rules into a program or a routine is to line them up one after another in a linear arrangement. After each rule has been executed, control will go to the next rule.
l ... = ... 12 12 ... = ... 13 13 ... = ... 2
where rule names and go-to names are shown at the beginning and end of each rule. The dots represent other sections of the rules.
§3.2.1.1 Since a linear flow of control is quite frequently used, COMIT allows the abbreviation of replacing a go-to name by an asterisk, which means go to the next rule below:
1 ... = ... * 12 ... = ... * 13 ... = ... *
§3.2.1.2 If control is not directed to a rule by name, but only from the pre- ceding rule, either by an * go-to or by rule failure in the preceding rule, then there is no need for the rule to have a name and We can use the additional abbreviation of writing an asterisk in column 1 in place of the rule
* ... = ... * * ... = ... * * ... = ... *
Of course, it does no harm to name the rules, and sometimes there is a mnemonic advantage in so doing.
§3.2.2 Tests and Loops In order to write a loop, We need only to write a test that will terminate the 100p, arranging that, in the case of one of the outcomes of the test, control is directed to the next rule to be executed within the loop, while in the case of the other outcome of the test, control is directed to some point outside of the loop.
§3.2.2.1 The simplest type of loop to write in COMIT is the one-rule loop. The left-half search is used for the test. In case the search is successful, control continues on in the rule where right-half and routing operations may be specified. The go-to carries control back again to the same rule. But if the left-half search is unsuccessful, rule failure carries control to the next rule, which is outside the one-rule loop. An example of such a loop is
LOOP1 A = B LOOP1
§3.2.2.2 Since one-rule loops are frequently written, a simpler notation is provided in COMIT, namely, the solidus (or fraction bar), which means, when written in the go-to section of a rule, to go back to the same rule. The rule may then not need a name unless it is referred to by name in some other go-to.
Thus our one-rule loop could be written in either of the two following ways:
LOOP1 A = B /
or
* A = B /
§3.2.2.3 Loops frequently involve more than one rule. The simplest case of this is similar to the one-rule loop in that the loop is terminated by rule failure that sends control to the next rule. The test must therefore be incorporated in the last rule, and the go-to of this rule must send control back to the beginning of the loop.
LOOP2 ... * * ...... * * test... LOOP2 NEXT ...
In this example, the three rules beginning with LOOP2 will be executed over and over until the result of the test is rule failure, sending control to the rule NEXT, which is outside of the loop. Sometimes it is important that the test be the first rule executed when control enters the loop. This may be the case if it is necessary to prevent the loop from being executed even once in case the test should initially fail. There are two ways of doing this. The first is to direct control from the previously executed routine directly to the test rule:
...... TEST3 LOOP3 ... * * ..... * TEST3 test... LOOP3 NEXT ...
This scheme has the possible disadvantage that control may inadvertently enter the first rule of the loop in case of rule failure of the rule above. The other way overcomes this disadvantage by moving the rule with the test from last position to first position. If exit from the loop is still to take place on rule failure, an extra rule must be inserted after the test to send control out of the loop. Also the test rule must have a go-to that jumps the loop flow of control over this exit rule. This is accomplished by the special go-to abbreviation: two asterisks, which means skip a rule.
...... * TEST4 test... * NEXT * ...... * * ...... TEST4 NEXT ...
In the above example, it can be seen that the three rules of the loop are the second, the fourth, and the fifth.
§3.2.2.4 Sometimes the exit from a loop depends on rule success rather than rule failure, although this is less frequent than loop exit on rule failure. There are several ways that such loops can be programmed, the simplest is to put the test rule first.
TEST5 test NEXT * ...... * * ...... TEST5 NEXT ...
§3.2.2.5 It is worth noting that a test that depends on rule success for loop exit can not be put last in a 100p, because the normal or nonexit outcome of the test is rule failure, which sends control to the next rule, which must be part of the loop. A disadvantage of depending on rule success for loop exit is that program errors can more easily lead to infinite loops.
§3.2.2.6 Another use for tests is in an error check. It is frequent in a COMIT program to include tests that normally would succeed unless something were wrong with the program or with the data supplied to it. Usually these tests involve rule failure, consequently the double asterisk go-to can be used in the normal case to skip the flow of control past an error exit. This error exit is a COMIT rule that sends control to an error’program provided by the programmer to cope with the eventuality. This rule might not have any center section because it might not have to do any processing. This use of the double asterisk go-to is shown below.
* ...... * * test.. ** * ERROR-EXIT * ...... *
§3.2.2.7 The normal way of terminating a run is to direct the flow of control to the END card. Since the END card does not have a rule name, the only ways of ending a program are to direct control to the END card by means of a rule failure or an asterisk go-to in the last rule, or by means of a double asterisk go-to in either of the last two rules before the END card.
§3.2.3 Branches A branching flow of control can be handled in COMIT by a rule that is more complex than those we have been talking about -a rule that is made up of a number of subrules. The first subrule of the rule looks just like the rules we have been writing, except that its name section contains, in addition to the rule name, a subrule name. The succeeding subrules of the rule start on separate cards and have only their subrule names in the name section, with the first column of the card containing a blank. Each subrule may have its own right-half, routing, and go-to, but it isflonly the first subrule that has the left-half for the entire rule. Rule failure and the asterisk go-to, of course, send control to the next rule. An example of a rule with several subrules is as follows:
BRANCH1 A + B = 2 + 1 / 2 = 0 * 3 ** 4 = D NEXT
In this rule, BRANCH is the rule name and the numbers are subrule names. When the rule is executed, first the left-half search takes place. If it is successful, the right-half and go-to of one of the subrules is executed. The choice of subrule is determined by the dispatcher in a manner that will be explained in a later chapter. One of the possibilities is that the subrule to be executed will be chosen at random, a convenient feature for certain interesting types of programs.
§3.2.4 Subroutines It is very easy to write closed subroutines and to call them in COMTT, and there is a built-in pushdown that automatically takes care of storing and obtaining the correct return point. Suppose we have a three-rule subroutine called B. It might be written as follows:
B ...... * * ...... * * ...... /
It is to be noted that a + is written in the go-to of the last rule of the subroutine to be executed. This means that the flow of control should be returned to the proper place in the calling program according to the current contents of the built-in pushdown.
§3.2.4.1 In order to call a subroutine, one simply writes the name of the subroutine in a go-to, then, after a plus sign, the name of the return point. Thus the go-to expression SUB+RET1 means go to the subroutine starting at rule SUB, and, on exit, return to the rule named RETl. If it is desired to call the above subroutine B from several points in the program //as was illustrated in figure 7, section 1.2.6.1, the following rule names and go-to names would be used:
A ...... B+C C ... . . . L ...... B+M M ... . . . X ...... B+Y Y ...
§3.2.4.2 The scheme operates as follows: when the subroutine is called, the name of the return point is automatically saved on a pushdown. Then, if the called routine calls another before returning, the second return point is stored on the same pushdown. Whenever a plus go-to is executed, the name of the proper return point is obtained from the next item on the pushdown, and this item is deleted. Control then returns to the proper point in the calling program or routine.
§3.2.4.3 As an example of the use of a rule with subrules and of the go-to, we will shortly exhibit a routine called SUB-Q that will change a Q in the workspace to a simple English sentence, selected at random, and return control to a rule called PRINT, which would print out the sentence.
§3.2.4.4 There is a very deep connection between the organization of subrou- tines and subroutine calls by means of a pushdown and the phrase-structure organization of words and phrases in the grammar of a language like English. This is shown in figure 2, which illustrates how the A+B type go-to can be used to represent the two constituents of a phrase and how a rule with subrules can be used to provide alternative words or constructions.
Figure 2. An organization of subroutines paralleling the phrase-structure organization of some simple English sentences. The boxes contain the names of the program rules. The brackets indicate alternative choices using the random-choice feature of the rule with subrules.
The routine is given next. It is assumed that the workspace contains a Q when the routine is entered. On exit to the rule PRINT, the workspace will contain a sentence that has been selected at random by the routine from the sentences described by the grammar of figure 19.
SUB-Q *+PRINT SENTENCE Q = 1 + 1 SUBJECT+PREDICATE SUBJECT A Q = -JOHN + B = -BlLL + C = -TOM + D = -HE + PREDICATE I INTRANSITIVE T TRANSlTlVE INTRANSITIVE 1 Q = -LAUGHS + 2 = -WORKS + 3 = -TALKS + TRANSITIVE Q = 1 + 1 VERB-T+OBJECT VERB-T 1 Q = -SEES + 2 -HEARS + 3 -KNOWS + 4 -LIKES + 5 -HATES + OBJECT 1 Q = -MARY + 2 -JANE + 3 -SUE + 4 -HER +
§3.2.4.5 The last type of go-to that will be described in this chapter is one similar to the A+B type used for a subroutine call, but written with two plus signs instead of one. This go-to expression is similar in operation to the one with one plus except that the name of the return point is placed one down on the pushdown so that control will return to this point only after it has re- turned to the point named at the head of the pushdown. It turns out that this behavior is just what is needed to produce a verb like "call up" in two parts with the object between them as in "call her up." This is illustrated in figure 3, which gives some additions to the program of figure 19 so that it includes verbs of this type.
Figure 3. A portion of figure 19 modified to take care of discontinuous constructions such as CALLS…UP as in CALLS MARY UP. The dotted line represents the fact that the return point ADV-D has been placed one down on the pushdown, below OBJECT}by a go-to of the A++B type.
§3.2.4.6 The modified rules and additional rules required to incorporate thi material into the previous program are given below.
VERB-T 1 Q = -SEES + 2 -HEARS + 3 -KNOWS + 4 -LIKES + 5 -HATES + 6 DISC-V DISC-V Q = 1 + 1 VERB-D++ADV-D VERB-D 1 Q = -CALLS + 2 = -BRINGS + 3 = -TAKES + ADV-D 1 Q = -UP + 2 = -DOWN + 3 = -OVER +
3.3.0 Summary
§3.3.0.0 The following is a summary of the flow-of-control conventions in COMIT. A COMIT program starts with a title card having COM in columns 8, 9, and 10. It ends with an END card. Following the COM card there may be COMSET cards to set various options. A COMSET card has a blank in column 1.
§3.3.0.1 COMIT rules follow the COM card and any COMSET cards and precede the END card. All these cards may have program information punched in the first 72 columns. The remaining 8 card columns are free for optional card identification numbers. Data cards to be read by the COMIT program may follow the END card; all 80 card columns are read.
§3.3.0.2 A COMIT rule starts with a name section that may contain the name of the rule or an asterisk in case the rule does not need a name. In either case punchng is started in column 1. A name may be up to 12 characters long and consist of letters, numbers, and periods and hyphens in medial position. The name section is separated from the center sections of the rule by one or blanks (spaces).
§3.3.0.3 In a rule with several subrules, the first subrule has the rule name starting in column 1, and succeeding subrules have one or more blanks starting in column 1. Next, each subrule has a subrule name. In the case of the first subrule, this is separated from the rule name by one or more blanks. The first subrule has the left-half for the rule, but each subrule may have its own left-half, routing, and go-to.
§3.3.0.4 The go-to section of the COMIT rule is the last section in the rule and is separated from the center sections by one or more Spaces. No spaces may appear in the string of characters that make up the go-to. The following go-to expressions have been introduced:
- NAME
-
means go to the rule named NAME.
- *
-
means go to the next rule
- **
-
means skip a rule
- NAME1+NAME2
-
means go to the rule NAMEl and store NAME2 on the pushdown.
- NAME1++NAME2
-
means go to the rule NAMEl and store NAME2 on the pushdown just beyond the item that is next.
- +
-
means go to the rule named in the next item on the pushdown and delete that item from the pushdown.
SOME POWERFUL NOTATIONS
§4.0.0.0 In chapter 2, the basic operations of replacement, rearrangement, insertion, deletion, and duplication were introduced, as well as a basic description of the left-half search and test operations. With just these notations, however, the COMIT programmer would be very restricted in expression. The greatest restriction is the fact that so far we cannot refer in the left- half to a workspace constituent unless we know exactly what it is. This chapter begins by introducing some left-half notations for reference that correspond roughly to the following English expressions: something, nothing, three things, anything but, everything up to a point, the same as. It then proceeds to the nine most important routing instructions, namely, those used for moving data between the workspace and the shelves and expanding and compressing constituent symbols.
4.1 Some Left-Half Notations
§4.1.1 Finding an Unknown Constituent It is sometimes necessary to carry out an operation like replacement, rearrangement, deletion, or duplication on a constitutent when it is not known what the constituent is but only where it is. We need a notation like the x in algebra for use in the left-half. Suppose, as in the example under Duplication in section 2.3.6, that ED has been added to one of the group of English verbs that double the final consonant. It is desired to write a rule that will apply no matter what the final consonant is.
The workspace might contain any of the following:
P+E+R+M+I+T+E+D P+L+A+N+E+D B+A+G+E+D S+H+I+P+E+D
The unknown consonant cannot be represented by X since X written in the left-half means find the letter X. The Special abbreviation $1, read dollar one, is used, which means find any single constituent. The rule to double any final consonant before ED can now be written
* $1 + E + D = 1 + 1 + 2 + 3 *
§4.1.1.1 The abbreviation $1 can also be used to locate the first constituent§ in the workspace. Suppose the workSpace contains a number of constituents resulting from some subroutine, and it is desired to insert the word RESULT before the first constituent in the workspace. The following rule will do it:
* $1 = RESULT + 1 *
The $1 in this rule will find the first constituent in the workspace. Since no other left-half match criteria have been mentioned, it will find the first constituent it comes to in the left-to-right search.
§4.1.1.2 Another important use for $1 is to test for an empty workspace. The following rule:
* $1 = NOT-EMPTY
will fail if the workspace is empty but will not fail if the workspace contains even one constituent.
§4.1.2 Finding Anything But Related to $1 is a notation that allows one to refer to any constituent that is not the one specified. This is done by $-SYM read dollar not symbol, where SYMBOL represents any of the types of workspa constituent symbols that have been introduced so far, that is, one or more compressed characters, as explained in section 2.1.0.0. As an example, take problem H of section 2.5, in which the workspace contained an X somewhere in the middle of a string of A’s, and it was desired to change all the A’s to left of the X to B'5. It can now be done in a one-rule loop:
* A + $-A = B + 2 /
The $-A will first find the X, then on succeeding times through it will find a B. When all the A’s to the left of the X have been changed to 8'5, the rule will fail because there will no longer be an A in the workspace that is followed by a constituent that is not an A.
§4.1.3 Finding n Constituents The left-half notation $2 (dollar two) will find any two consecutive constituents, $3 will find any three consecutive constitu- ents, and so on. A group of n workspace constituents found by the single left- half constituent $n can be replaced, rearranged, deleted, duplicated, and so forth, as a unit. As an example, the following rule will exchange the three constituents preceding the constituent POST with the two constituents following the constituent POST:
* $3 + POST + $2 = 3 + 2 + 1 *
§4.1.3.1 When an expression such as $3 is used to refer to several constituents at a time, the several constituents are grouped together and given a single relative constituent number. As an example, suppose the workspace contains
AB + CDE + F + CH + IJK + L + MN
and the left-half of the following rule is applied:
* F + $3 = 1 *
After the left-half search has Succeeded, the constituents found are marked with left-half relative constituent numbers as follows:
AB + CDE + F + GH + IJK + L + MN [_1_] [2__________]
The right-half, of course, will delete the group of three constituents.
§4.1.4 Null Constituents It is also possible to set up a left-half relative constituent number that is not associated with any workspace constituent. This is done with the left-half notation $¢ (dollar zero), which assigns a relative constituent number to a null constituent, that is, a place in the workspace which does not contain any workspace constituents.
§4.1.4.1 Although there are important reasons for wanting to refer in later sections of the rule to such null constituents by mentioning their relative constituent numbers, the most frequent use of $0 in COMIT programming is to specify in a left-half search either the left end of the workspace or the right end. If $0 is written at the left end of the left-half, or if it is the only notation in the left-half, it refers to the left end of the workspace; if it is written at the right end of the leftshalf, it refers to the right end of the workspace. Thus the rule:
* $0 + A = B *
will fail if the workspace does not have an A at the left end. This use of $0 is very important, because otherwise it would be necessary to explicitly mark the left end by introducing a marker constituent in a separate rule in some such fashion as the following:
* $1 = X + 1 * * X + A = B *
There is a further advantage of using the $0. When the workspace does not have an A at the left end, the rule with $¢ will fail immediately; otherwise, the whole workspace must be searched for an X followed by an A. This can be a considerable saving in computer time if the rule is frequently executed and the workspace contains many constituents.
§4.1.4.2 The following rule gives an example of the use of $Q to mark the right end of the workspace. If the last constituent in the workspace is an A, it will be changed to a B.
* A + $0 = B *
§4.1.4.3 It is important to understand how the $0 sets up a null constituent and assigns a relative constituent number to it. Suppose the workspace contains
A + B + C + D
and the left-half of the following rule is applied:
* B + $0 + C = rest of rule
The left-half search will only succeed if the workspace contains a constituent B followed by a constituent C with no constituent between them. This is the case here. After the left-half search has succeeded, the constituents are marked with relative constituent numbers as follows:
A + B + + C + D [1] [2] [3]
It can be seen that a null constituent has been set up in the workspace and given the relative constituent number 2. This provides a place in the workspace into which data can be inserted from the input or from a shelf under the direction of a routing instruction. Any null constituents left after the rule has been executed disappear along with the relative constituent numbers.
§4.1.4.4 A $0 can also be used in the right-half for this purpose. The following left-half and right-half would give the same results as the above left-half, except that the relative constituent numbers set up would be right-half relative constituent numbers.
* B + C = 1 + $0 + 2 rest of rule
§4.1.4.5 Suppose it is desired to find out whether the workspace has exactly 23 constituents in it. The following rule makes such a test:
* $0 + $23 + $0 = YES
§4.1.4.6 Of course, if the leftmost $0 finds the left end of the workspace, and the rightmost $0 finds the right end of the workspace, the following rule will be a test that will succeed only if the workspace is empty:
* $0 + $0 = EMPTY
§4.1.4.7 It should be noted that if the $Q is the only notation in the left-half, it will find the left end of the workspace even if the workspace is empty. It may thus be used to set up a null constituent for bringing data into an empty workspace.
§4.1.4.8 The use of $0 can be summarized as follows:
-
A $0 alone in the left-half or at the left end of the left-half means find the left end of the workspace.
-
A $0 at the right end of the left-half means find the right end of the workspace.
-
A left-half consisting of $0 + $0 therefore means find both ends of the workspace with nothing between, that is, it will find only an empty workspace.
-
A $0 in the left-half that is neither at the left end nor at the right end of the left-half has no effect on whether the left-half search succeeds or fails.
-
Whenever a left-half search succeeds, a null constituent is inserted in the workspace for each $0 in the left-half and provided with the appropriate left-half relative constituent number.
-
In the right-half, if $0 appears anywhere, it inserts a null constituent into the workspace and provides it with the appropriate right-half relative constituent number. (See section 2.3.2.2.)
-
When the execution of the rule is finished, all relative constituent numbers disappear, and any remaining null constituents in the workspace are deleted.
§4.1.5 Finding an Unknown Number of Constituents Sometimes even the number of constituents it is desired to find is unknown. For this we use the indefinite dollar sign $ (dollar) without a number. The left-half expression $ will find any number of constituents (including none) between specified boundary constituents or the ends of the workspace. It is the only other left-half notation that can refer to an empty workspace or can set up a null constituent.
§4.1.5.1 Suppose it is desired to delete an unknown expression between commas. The workspace might contain
-BILL + , + -THEY + -SAY + , + -IS + -RETIRED
The following rule:
* , + $ + , = 0 *
will delete the commas and everything between them, no matter how many constituents there are. The Workspace will be left with
-BILL + -IS + -RETIRED
§4.1.5.2 As another example of the use of the indefinite dollar sign $, suppose we have a filing system set up in the workspace, and it is desired to file the first constituent in the workspace under CLOTHING. The following rule will do it:
* $1 + $ + CLOTHING = 2 + 3 + 1 *
The first constituent is moved to its new place immediately to the right of CLOTHING no matter how many intervening constituents there are.
§4.1.5.3 Or, if one had a filing system with headings and subheadings, the following rule would find the first constituent filed under the PROTECTIVE that is under SHOES and move it into the first position in the workspace.
* $ + SHOES + $ + PROTECTIVE + $1 = 5 + 1 + 2 + 3 + 4 *
The first $ finds everything between the left end of the workspace and SHOES; thus it serves to mark the left end of the workspace for the desired rearrangement.
4.l.5.4 The indefinite dollar sign also makes a problem like G in section 2.5 a lot easier. It is desired to move thealeftmost constituent to the right end of the workspace. The following rule:
* $1 + $ = 2 + 1 *
will do it without the necessity for writing a loop. The $1 finds the leftmost constituent and the $ finds all the rest. And we don’t have to know that the first constituent is an X and all the rest are B’s as in the problem.
§4.1.5.5 The indefinite dollar sign $, if written alone in the left-half, will find everything in the workspace. The following rule:
* $ = START + Q *
will find everything in the workspace and replace it with the constituents
START + Q
§4.1.5.6 As another example, the following rule:
* $ = 0 *
will delete everything in the workspace. Compare this with the loop that is necessary when using only the facilities of chapter 2 for problem 3 of that chapter (section 2.5) and the necessity there of knowing what is in the workspace before it can be deleted.
§4.1.5.7 The notation $ written alone in the left-half is also capable of referring to an empty workspace. This is important because a COMIT program always starts off with an empty workspace when it is run. The following rule will insert A + B into an empty workspace, or if the workspace has something in it, it will all be replaced by A + B.
* $ = A + B *
When the workspace is empty, the $ sets up a null constituent and assigns it a left-half relative constituent number 1. This is then replaced by the A + B. The $ written alone in the left-half and the $Q written alone in the left-half will both set up a null constituent when the workspace is empty. They differ when the workspace is not empty. The $ finds everything in the workspace; the $0 finds the left end.
§4.1.5.8 The way in which a left-half search involving an indefinite dollar sign $ works is as follows: There are three cases: the $ at the left end of the left-half, the $ in medial position, and the $ at the right end of the left-half. In each case the $ will refer to all the constituents from the last definite position defined in the search to the next definite position. Thus in the rule:
* $ + A + B + $ + C + $ = 2 + 4 *
the first $ will find all constituents from the left end of the workspace, which is the first definite position, up to the first A followed by a B, which define the second definite position. If the A and B are the first and second constituents in the workspace, the $ will find a null constituent. The second $ will find everything between the A + B that has just been found and the next C, which is the next definite position. The third $ will find everything between the C that has been found and the right end of the workspace, which is the last definite position. The right half will then delete everything except the A and the constituents between the B and the C.
§4.1.5.9 Thus, the left-half of the above rule will set up, in the workspace indicated below, three null constituents and constituent numbers as follows:
A + B + C [1] [2] [3] [4] [5] [6]
But if the workspace Were as follows, it would not have to set up any null constituents:
A + A + A + B + B + B + B + C + C [1___] [2] [3] [4________] [5] [6]
§4.1.6 Finding Matching Constituents One of the most powerful left-half notations is a method of finding a constituent in the workspace that matches one already found. This can be done without knowing specifically what the constituents are. The constituent already found can act as if it had been written in the left-half. Thus if the workspace contains
A + B + C + A + D
and the following rule is executed:
* $1 + $ + 1 = 1 + 2 *
the result of the left-half search will be
A + B + C + A + D [1] [2____] [3]
The left-half number 1 finds the second A just as if the workspace A found by the $1 had been written in the rule as follows:
* $1 + $ + A = 1 + 2 *
It is worth noting that had the workspace contained
A + B + C + C + B + C + C
the rule
* $1 + $ + 1 = 1 + 2 *
would have failed in spite of there being two B’s. This is because the $1 finds the A, and whenever an expression to the left of a $ matches the workspace somewhere, it defines a definite position and will never be moved. Note also that a left-half number n can refer only to a single constituent in the workspace found by a left-half constituent such as $1, and not by $ or $n, where n is greater than 1.
§4.1.6.1 The importance of the left-half number notation is not only that one can find items in the workspace that are the same, but also that one can effectively plug items from the workspace into a rule where they can specify the search. This is one of the ways in which the data can control a COMIT program without the danger of error inherent in executing data.
§4.1.6.2 As a simple example, take the problem of algebraic factoring, where it would be desired to convert AB+AC to A(B+C). As will be explained in detail in a later chapter, + , ( , and ) are represented in the workspace by the double characters *+ , *( , and *) . The rule for factoring culd then be written as follows:
* $1 +$1 + *+ +1 + $1 = 1 + *( + 2 + 3 + 5 + *) *
§4.1.6.3 So far in this chapter, we have introduced six important left-half notations: $1, $-SYMBOL, $n, $¢, $, and n. They will be summarized at the end of the chapter.
4.2 The Routing
§4.2.0.0 Up to this point, we have covered basic features of the simple COMIT rule, and how COMIT rules are organized into routines and programs. With this apparatus, a wide diversity of tasks can already be programmed. What remains is to take up some additional features that make it much easier to write certain useful types of routines and make possible more efficient programs.
§4.2.1 Expanding and Compressing It is possible to change the grouping of characters in constituents by means of a routing expression. For example, if the workspace contains
-THE + -MAN
it is possible to expand it so that each character becomes a separate constituent.
- + T + H +E + - + M + A + N
The rule for this is as follows:
* $1 + $1 = // *E1 2 *
where the numbers after the E refer to the left-half or right-half relative constituent numbers of the workspace expression or expressions to be expanded. The abbreviation in the routing section says to expand the first and second constituents. But note that double characters such as *+ and *( remain double and are not expanded into two constituents.
§4.2.1.1 The resulting eight constituents can be compressed again into the original two by the following rule:
* $4 + $ = //*Kl, *K2 *
where the number after each K refers to a group of constituents to be compressed.
§4.2.1.2 Relative constituent numbers referred to with each *E and *K must be consecutive numbers. They may be left-half or right-half relative constituent numbers, whichever are current in the workspace. The string of constituents formed by expanding with *E is assigned, as a relative constituent number, the first of the consecutive numbers written after the *E. Null constituents are introduced into the workspace corresponding to the other numbers. Similarly, the constituent formed by compressing with a *K is assigned, as a relative constituent number, the first of the consecutive numbers written after the *K, and null constituents are introduced into the workspace corresponding to the others.
54.2.1.3 In either case, any subscripts (see next chapter) that any of the expanded or compressed constitutents may have had will be lost.
§4.2.2 The Reduction of Search Time A computer operates rapidly, but not instantaneously. If a program can be made to run ten times as fast or even twice as fast, there may be a significant saving in running time on the computer.
§4.2.2.1 In section 2.5, problem D asked to change every B to a C. The answer given was
* B = C /
The running time of this routine is very sensitive to the amount of material in the workspace. Suppose the workspace contains nothing but B’s. Then if the routine operates with a workspace having 10 times as many constituents, it will take approximately 100 times as long for the routine to run. This is because, not only will the rule be executed 10 times as often, but also the average search will be about 10 times as long, since the time of search is roughly proportional to the amount of material searched over.
§4.2.2.2 COMIT handles this situation by making it possible to remove material from the workspace when it is no longer needed for search purposes and to store it on a shelf where it is out of the way but can be found in a hurry when it is needed again. There is another reason why it is good practice to keep data that is currently not being worked onaout of the workspace. Left-half failure is a frequent means of exit from a loop. If excess material is being kept in the workspace, the left-half may find some of this material by mistake, causing an error in the loop-exit test.
§4.2.3 Shelving The procedure used to remove material from the workspace to a shelf is shown in the following routine, which replaces every B by a C. It will run much faster than the above routine if there are more than a few constituents in the workspace.
* $ + B = 1 + C // *Q23 1 2 / * $0 = // *A23 1 *
In the first rule of this routine, the left-half finds everything up to and including the first B. The right-half changes the B to a C. The instruction *Q23 1 2 in the routing section of the rule serves to queue onto the right end of shelf 23 the workspace expressions having right-half relative constituent numbers 1 and 2. Thus it removes from the workspace everything up to and including this C and moves them onto the right end of shelf 23, keeping them in their original workspace order and not disturbing anything that may already be on the shelf. When the rule is executed again, this material that has already been searched over is no longer in the workspace and will not be searched over again. After all the B’s have been changed to C’s, the rule fails and the next rule brings all from shelf 23 back to the beginning of the workspace again. The whole routine is fast because the workspace is only searched once and the shelving operations are very fast.
§4.2.3.1 For a more complex example, suppose that shelf 5 contains
X + Y + Z
and the following rule is executed:
* $2 + W + $3 = // *QS 3 1 *
If the workspace is as follows, the left-half will set up relative constituent numbers:
A + B + C + W + D + E + F + G [1_____] [2] [3________]
Then, when the shelving instruction is executed, first the constituents with relative constituent number 3 are moved as a unit to the right end of the shelf, then the constituents with the relative constituent number 1. The workspace is left with null constituents as shown:
A + + W + + G [1] [2] [3]
and the shelf is left with
X+Y+Z+D+E+F+B+C
§4.2.3.2 The two shelf instructions that send material from the workspace to a shelf operate according to the following summary:
*Q5 1 2 ││ ─┬─ ││ │ ││ └────── 1 and 2 refer to constituents by their right-half ││ relative constituent numbers. ││ │└───────── a shelf number in the range1 through 127. │ *Q means queue onto right end of shelf. *S means store onto left end of shelf.
§4.2.3.3 The two shelf instructions that send material from shelf to workspace follow the same pattern except that only one relative constituent number is mentioned to designate the workspace expression to be replaced by data from the shelf:
*A means take all from the shelf.
*N means take the next constituent from the left end of the shelf.
§4.2.3.4 It is worth noting that the routing instructions refer to workspace constituents by right-half relative constituent numbers that replace the left- half numbers after the right-half has been executed. If there is no right-half, the right-half relative constituent numbers are the same as the left-half ones.
§4.2.3.5 When material is brought off a shelf into the workspace, it replaces whatever material in the workspace is marked by the relative constituent number mentioned in the shelving instruction. When material is removed from the workspace by a shelving instruction, a null constituent is left. It is possible to bring material off a shelf into a null constituent, which thereby is no longer null if there was anything on the shelf. The expression $¢ may be used in the right-half for the purpose of setting up right-half null constituents for receiving shelf material; $0 is the only one of the $ type expressions introduced in this chapter that can be used in the right-half.
§4.2.3.6 A routing section of a rule may contain any number of instructions chosen from the group: *E, *K, *Q, *S, *A, *N. When there are more than one, they are executed in left-to-right sequence, each operating on the contents of the workspace and shelves as left after execution of the previous instruction. When the rule is finished, the relative constituent numbers and any remaining null constituents in the workspace disappear.
§4.2.4 Addressable Storage The 128 shelves, numbered from ¢ through 127, provide the further advantage that comes with addressable storage. The contents of any one shelf can be moved quickly to the workspace without a search, merely by mentioning the shelf number.
§4.2.4.1 As an example, suppose shelf 5 has a number of A’s, shelf 6 has B’s, shelf 7 has C’s and shelf 8 has D’s. It is desired to accumulate on shelf 103 a sequence A + B + C + D + A + B + C + D that continues until one of the shelves runs out of constituents. The following routine will do it. (Note that different routing instructions are separated by commas.)
REAP $ = $0 + $0 + $0 + $0 // *NS 1, *N6 2, *N7 3, *N8 4 * COUNT $4 = // *Q103 1 REAP
The process of taking the next constituents from shelves 5, 6, 7, and 8, putting them in the workspace in place of the null constituents, and queuing them onto shelf 103 will continue until one of the shelves becomes empty. At this point a null constituent will be brought from at least one of the shelves. The $4 will then fail to find four constituents, there being three or less, and the routine will terminate.
§4.2.4.2 When a COMIT program starts, all of the shelves as well as the work- Space are empty. At times, however, it is necessary to empty a group of shelves. The following rule will empty shelves 5, 6, and 7 without disturbing the workspace:
* $0 = // *A5 1, *A6 1, *A7 1, *A7 1 *w
The $0 sets up a null constituent at the left end of the workspace. The shelving instruction *A5 1 brings everything from shelf 5 into the workspace in place of the null constituent. Shelf 5 is now empty. The next instruction, *A6 1, brings everything from shelf 6, which replaces in the workspace what was previously on shelf 5. The next instruction then clears shelf 7 by bringing everything from it to the workspace where it replaces the material from shelf 6. All three shelves are now empty, but the workspace still contains the material that was on shelf 7. The last routing instruction, *A7 1, then replaces this material with what is now on shelf 7, namely, nothing. Thus the workspace is left with a null constituent as before, and all the material from the shelves has vanished. The null constituent disappears before the go-to is executed.
§4.2.5 Exchange The remaining shelf instruction to be discussed is the *X23 (exchange) instruction, which, when used, must be written last in the routing. It serves to exchange the entire contents of the workspace for the entire contents of the designated shelf. Thus the instruction *X23 would exchange the contents of the workspace with the contents of shelf 23. This instruction is useful when two or more separate tasks are being carried on concurrently. The exchange instruction is executed in approximately the time it takes to pass over two workspace constituents in a left-half search.
§4.2.6 Input and Output The two most important input and output instructions are introduced here. They are very simple and are adequate for most input and output needs. The total facilities provided by COMIT for input and output are more extensive and extremely flexible. These will be treated in later chapters.
§4.2.6.1 Input and output instructions are similar to the shelving instructions. Instead of Q, S, A, N, or X to indicate a shelf operation, they have R for read or W for write. Instead of a shelf number to designate the source or destination, of the data taken into or out of the workspace, they have a COMIT channel letter, to designate the external connection through which the data flows during reading or writing. And between these two there is a mode designation. All of this will be spelled out later. For our purposes now it is sufficient to know that the routing instruction *RCK1 means read a card from COMIT channel K (the cards that follow the END card of the COMIT program are data cards and can be read from channel K) and put it in the workspace where it replaces the workspace expression having relative constituent number 1, and that the rOuting instruction *WAM1 2 means to write in format A on the monitor (output) tape the workspace expressions referred to by the relative constituent numbers 1 and 2. These are the usual means of reading and writing. If one wants the output punched on cards instead of printed, one uses COMIT channel I, the punched- output tape, instead of M; thus we have *WAI1 2.
(Editor’s note: In COMIT Revived, R channels are ignored, with all input taken from standard input. In W instructions, channel M directs to standard error and all other channels to standard output.)
§4.2.6.2 As an example of the use of these input and output expressions, the following complete COMIT program will read a deck of cards, apply the spelling rule i before e except after c, and punch out a corrected deck.
A E + I = 2 + 1 / * C + I + E = 1 + 3 + 2 / * $ = // *WAI1, *RCK1 A
§4.2.6.3 In this program, the first two rules will fail on an empty workspace the first time through. In the third rule, the $ finds an empty workspace so the write instruction *WAI1 writes out nothing. The read instruction *RCK1 reads the first card from the input channel K, puts the material into the work- space one character per constituent, and follows it by a *. constituent to sym- bolize the end of the card. Control then goes to A and the spelling rules are applied. When control again reaches the third rule, the corrected card is found by the $ and it is punched out by the *WAI1 instruction. The process then repeats. We thus have a loop that reads, corrects, and punches. All read instructions will fail to the next rule, called end-of-file failure, when there is no more input material to be read. In our program, control exits from the loop when there are no more data cards to be read. The program terminates because, when the rule with the read instruction fails, control falls to the END card. This is the reason that the read instruction was placed in the last rule instead of in a first rule, ahead of the spelling rules.
§4.2.6.4 Another program to do the same thing faster by using shelves is given below:
A $ + E + I = // *027 1 3 2 / * $0 = // *A27 1 * * $ + C + I + E = // *Q27 1 2 4 3 / * $0 + $ = // *A27 1, *WAI1 2, *RCK1 A
4 Each spelling rule places corrected text on shelf 27 so as to reduce search time. The rearrangement is accomplished by the relative constituent numbers in the routing.
4.3 Summary
§4.3.0.0 In this chapter, the following left-half notations have been introduced:
$1 |
a single constituent. |
$-SYMBOL |
a single constituent not SYMBOL. |
$n (n>l) |
n constituents. |
$0 |
a null constituent. |
$ |
any number of constituents, or none. |
n (n>0) |
a constituent matching the constituent already found and given the relative constituent number n. |
These, together with the simple left-half notations of chapter 2 include all the COMIT notations that refer to workspace constituents composed of one or more compressed characters. The additional possible internal structure of constituents will be discussed in the next chapter.
§4.3.0.1 The expand and compress notations have been introduced:
*Ecns expand the indicated constituents.
*Kcns compress the indicated constituents.
where cns is a list of one or more right-half consecutive relative constituent numbers.
§4.3.0.2 All of the shelving instructions have been introduced, except that we have not yet introduced the scheme for indirectly referring to the shelves via the workspace constituents.
- *Qsd ns
-
queue onto the right end of shelf sd the workspace expressions with relative constituent numbers ns.
- *Ssd ns
-
store onto the left end of shelf sd the workspace expressions numbered ns.
- *Asd n
-
take all from shelf sd to the workspace where it will replace the workspace expression numbered n.
- *Nsd n
-
take the next (leftmost) constituent from shelf sd to the workspace where it will replace the workspace expression numbered n.
- *Xsd
-
exchange the data in the workspace for that on shelf sd.
sd is a shelf designation, 0 through 127, ns is a list of one or more relative constituent numbers, not necessarily consecutive, and n is a single relative constituent number.
§4.3.0.3 The following input and output instructions have been introduced. They are the most frequently used ones.
*WAMns write in format A on the monitor (printed output) tape the workspace expressions numbered ns.
*WAIns write in format A on channel I (punched output tape) the workspace expressions numbered ns.
*RCKn read in format C (a card, one character per constituent) from channel K (input) into the workspace, replacing the workspace expression numbered n.
§4.3.0.4 You are now in a position to write rather sophisticated and powerful COMIT programs. You have control of the most important basic features. In later chapters you will be introduced to some additional features that ease the programmer’s task and allow for much flexibility.
4.4 Problems for Chapter 4
-
Write a test to see if the workspace contains one A and nothing else. (Answer)
-
Write a test that will fail if there is anything in the workspace. (Answer)
-
Write a test that will fail unless the third constituent in the workspace is an A. (Answer)
-
Write a test that will fail if the third constituent in the workspace is an A. (Answer)
-
Delete the fifth constituent in the workspace. (Answer)
-
Delete the last constituent in the workspace. (Answer)
-
Write a rule to put a Z at the right end of the workspace if there isn’t one already at the right end. (Answer)
-
Interchange the second L in the workspace with the constituent following it. (Answer)
-
Insert an X after the first adjacent pair of identical constituents in the workspace. (Answer)
-
Find the first string of two or more A’s in the workspace and delete all but one of the A’s. (Answer)
-
Bring everything from shelf 23 to the left end of the workspace. (Answer)
-
Bring everything from shelf 23 to the right end of the workspace. (Answer)
-
Find the first A immediately followed by a B and insert the next item from shelf 23 between them. (Answer)
-
Put everything in the workspace on shelf 23 and a copy on shelf 24. Leave the workspace empty. (Answer)
-
Clear shelf 22 and move everything from shelf 23 onto it. (Answer)
-
Interchange the contents of shelf 22 and shelf 23 without disturbing the workspace. (Answer)
-
Exchange everything except the first constituent on shelf 23 with the contents of the workspace. (Answer)
-
Exchange everything except the first two constituents on shelf 23 with the contents of the workspace. (Answer)
-
Shelf 23 is empty. Reverse the order of the constituents in the workspace. (Answer)
-
The workspace is empty. Reverse the order of the constituents on shelf 23. (Answer)
-
Assume the workspace is empty. Test to seee if shelf 12 is empty. If it is, leave the single constituent EMPTY in the workspace. Otherwise, leave the single constituent NOT in the workspace, and leave the shelf as it was. (Answer)
SUBSCRIPTS
5.1 Types of Subscripts
§5.1.0.0 In this chapter the subscript features of the COMIT language are presented. There is a system of numerical subscripts that permits counting and simple arithmetic and a system of logical subscripts for tagging and classifying constituents and for numerous other purposes.
§5.1.1 Counting and Arithmetic There are several ways for determining when a rule or a routine has been repeated often enough. One of these ways is by counting the number of times control goes around the loop and testing to see if it has gone around the right number of times. This can be done by the use of numerical subscripts, which make possible the programming of counting and simple arithmetic in COMIT.
§5.1.1.1 As an example, figure 21 shows a routine that gives the flow of control for a program that goes around a loop 100 times under control of a numerical subscript. (Editor’s note: flowchart omitted.)
§5.1.1.2 Numerical subscripts are represented in the workspace in the following way:
A + B/.2 + C + D/.l9 + E/.476
The second constituent has a numerical subscript with a value of 2. The fourth and fifth constituents have numerical subscripts with values of 19 and 476.
Each workspace constituent has a symbol and may have one numerical subscript. The part of the constituent on the left of the fraction bar is called the symbol of the constituent. Subscripts are written to the right of the fraction bar. The numerical part of the subscript is its value. The period distinguishes it as a numerical subscript instead of a logical subscript. A numerical subscript may have as a value any integer from O to 32767 (= 215 - 1) inclusive.
§5.1.1.3 The program, corresponding to the flowchart, that will do something 100 times is as folloWs:
* $0 = C/.1 * (insert constituent with numerical subscript) A _________ * (do something) * _________ * * _________ * * $0 + C/.L100 = 2/.I1 A (test subscnipt and add 1) * C = $0 * (remove subscript)
The first rule inserts a constituent at the beginning of the workspace. This constituent has C as its symbol and .1 as its numerical subscript. The rule A and the two rules that follow are assumed to be the rules that do something. The next to the last rule has a left-half that will find the C constituent at the left end of the workspace only if it has a numerical subscript less than 100. This is represented by the abbreviation .L100. (The left-half subscript notations .100 and .G100 can be used to indicate tests for a numerical subscript equal to 100 and greater than 100. The left-half constituent C/.G7, .L12 would indicate a test for a C having a numerical subscript greater than 7 and less than 12. Here, the individual left-half subscript notations .G7 and .L12 are separated by a comma.) The right-half of the next to the last rule serves to increase the value of the numerical subscript by one. This is indicated by .I1. (The abbreviation .D1 will cause the numerical subscript to be decreased by one. The abbreviation .1 in the right-half will cause a numerical subscript with the value 1 to be inserted in the workspace on the indicated constituent where it will replace any value that may already be there.)
§5.1.1.4 Numerical subscripts can also be multiplied and divided. Details will be given in a later section. There is in addition a machine-language feature whereby a user can write subroutines in another language, such as one that is organized around arithmetic computations, and transfer control back and forth between them and his COMIT program.
§5.1.2 Indirect Addressing It is possible to specify a shelf number indirectly by a numerical subscript in the workspace instead of directly by a number in the shelving instruction. This is done by replacing the shelf number in the instruction by an * followed by the relative constituent number of the constituent having the numerical subscript. Suppose the workspace contains
A/.15 + B/.23 + 0/.54
Then the following rule will bring everything from shelf 23 and put it at the beginning of the workspace:
* $ + B = $0 + 1 + 2 // *A*3 1 *
§5.1.2.1 As another example, the following closed subroutine will place at the left end of each of the shelves numbered 1 through 127 a constituent A with a numerical subscript equal to the shelf number:
SN $ = A/.l * SHELVE A/.L128 = 1 + 1/.I1 // *S*1 1 / * $ = 0 +
§5.1.3 Logical Subscripts Besides numerical subscripts, COMIT offers a system of logical subscripts that have a number of important uses. We have seen that a workspace constituent consists of a symbol and subscripts separated by a fraction bar. There may be one numerical subscript and any number of logical subscripts, separated by commas. An example might be the following:
GEORGE/.26, SEX MALE, OCCUPATION CLERK, HOBBIES GOLF CHESS
Each logical subscript consists of a subscript name followed by up to 36 subscript value names. The subscript names and subscript value names may be freely invented by the programmer and conform to the same convention as rule names; that is, they can be up to twelve characters long and composed of letters, numbers, and in medial positions periods and hyphens.
5.2 Subscript Operations
$5.2.1 Insertion and Deletion of Subscripts Additional logical subscripts may be inserted on workspace constituents by mentioning them in the right-half of a rule.
* GEORGE = 1/MARRIED NO *
They may be deleted by mentioning the subscript name after a minus sign.
* GEORGE = 1/-HOBBIES *
The subscript mentioned, complete with any values it might have, will be deleted from the workspace. The right-half subscript expression -. will delete the numerical subscript, and -$ will delete all subscripts from a given workspace constituent.
§5.2.1.1 Sometimes it is necessary in a rule to mention simultaneously many values of a subscript. In such a case it is often easier to mention only the values the subscript does not have, instead of those it does have. This is done by preceding the values by a minus sign.
* GEORGE = l/EATS -SHRIMP CLAMS *
This is clearly easier than mentioning the list of things he does eat. COMIT understands the significance of this notation in the context of the whole pro- gram. To do this, during the compilation stage it collects all subscript values mentioned anywhere in the program as values of the subscript EATS (or as sub- rule names of a rule EATS) and uses all of these except SHRIMP and CLAMS as values. This notation is only a shorthand for use in the rule. In the workspace, the subscript would carry all the values corresponding to what he does eat .
§5.2.2 Complementing Subscripts It is possible to replace the values of a specified subscript in the workspace by the complementary set, that is, just those values it does not have. This is done by writing *C after that subscript name in the right-half. As an example, if George’s marital status changes, We would write
* GEORGE = 1/MARRIED*C *
If the subscript had been MARRIED NO and the only values mentioned anywhere in the the program were YES and NO, it would change to MARRIED YES, and conversely.
COMIT determines the full set of possible values with respect to which complementation takes place (here only YES and NO) as all the different values mentioned anywhere in the COMIT program for that given subscript or as the set of subrule names associated with a rule having the same name as the subscript. It should be noted that if there were also a rule MARRIED, it would have to have among its subrule names all of the values that any subscripts by that name had. This will be explained in the next chapter.
§5.2.3 Merging Subscript Values Besides the insertion, deletion, and complementing of subscripts, there are other frequently used operations with sub- scripts. Often a subscript will have only one value and it is desired to replace this value with another one. Suppose the workspace contains
GEORGE/OCCUPATION CLERK
and it is desired to represent a change of occupation. The following rule:
* GEORGE = 1/OCCUPATION ACCOUNTANT *
will replace the subscript value CLERK by the subscript value ACCOUNTANT, and the workspace will contain
GEORGE/OCCUPATION ACCOUNTANT
On the other hand, it is often desired not to replace some values with others but to leave the workSpace subscript with just those values that both the workspace subscript and the right-half subscript had in common. For this we would like to have it so that if the workspace contains
CULPRIT/POSSIBLY MAID BUTLER DUKE GARDENER
then the following rule:
* CULPRIT = l/POSSIBLY DUKE NEIGHBOR MAID COOK *
will reduce the possibilities to only those held in common. The workspace will then contain
CULPRIT/POSSIBLY MAID DUKE
§5.2.3.1 In order to allow for both of these possibilities, a convention is adopted in COMIT to make possible the use of the same type of right-half subscript notation without confusion or ambiguity for both the replace and the and operation. The combined operation is called merging. It operates in the following way: If the right-half subscript has no value names in common with the workspace subscript, the right-half values are substituted for the old workspace values (replacement). But if the right-half subscript does have values in : common with the workspace subscript, the new workspace Subscript will have just those overlapping values that both the right-half subscript and the old workspace subscript had in common (and operation).
§5.2.3.2 Thus the right-half notation is appropriately interpreted in the most usual situations. The result is always replacement if the right-half has just one value; the result is always replacement (actually insertion) when the workspace Subscript has no values; and the result is always replacement (actually deletion) when the right-half subscript has no values. If none of these situations holds and replacement is the desired operation, the old workspace values can be deleted before the new values are inserted. The following rule will always replace the values on the subscript SUB no matter which ones or how many it had:
* $1 = 1/SUB, SUB FIRST SECOND * 6
§5.2.3.3 In those cases where the and operation is the desired one, it is often guaranteed by the nature of the task being programmed that there will be an overlap. In those cases where overlap would not normally be expected in every case, it can be ensured by the simple expediEnt of always carrying an extra, otherwise unused value (say Z) in both workspace and rule to ensure overlap by at least one common value.
§5.2.4 The Or Operation Sometimes it is required that the subscript values in the workspace be combined with the subscript values in the right-half in such a way that the workspace subscript is left with just those values that the workspace subscript or the right-half subscript or both had. It was not deemed necessary to provide a Special notation in COMIT for this less frequent operation in light of the well-known fact that its effect can be had by combining the and and the complement operations. Thus we know from logic that A or B = -(-A and -B), where - means not or complement of. The procedure is to complement the workspace subscript values, then merge (and operation) with the complement of the right-half subscript values, then complement the result. It would be written as in the following rule, which will replace the old values on the workspace subscript SUB with new ones such that each new value either was an old value on the workspace subscript SUB or it was a value written in the right-half subscript SUB (or it was in both places):
OR $1 = 1/SUB*C, SUB -VAL1 VAL2, SUB*C *
It is of course assumed that, as usual, there is at least one value in common when using the and operation.
§5.2.5 Order of Subscripts and Values Since subscripts are merged, not inserted, when the workspace constituent already has a subscript with the same name, it is clear that a workspace constituent can never have more than one subscript with the same name. And although the order of constituents in the workspace is sig- nificant, so that A + B is not the same as B + A, the order of subscripts on a constituent in the workspace and the order of the values in a subscript are not significant, so that A/S, T is the same as A/T, S and A/S X Y is the same as A/S Y X.
§5.2.6 Carrying Over of Subscripts In the preceding sections, the subscripts that were inserted in the workspace or merged with workspace constituents came from the right-half of a rule. It is also possible to insert or merge subscripts found in the workspace.
§5.2.6.1 As an example, suppose the workspace contains either
SUBJECT/NUMBER SINGULAR + VERB
or
SUBJECT/NUMBER PLURAL + VERB
and it is desired to insert a subscript on VERB that will make the Verb agree with the number of the subject, that is, have the same subscript. For this we
can use the following rule:
* SUBJECT + VERB = 1 + 2/NUMBER*1 *
The right-half subscript instruction specifies that the NUMBER subscript is to be carried over from the number one constituent as found by the left-half, and inserted or merged onto the number two constituent. The result in the workspace will be either
SUBJECT/NUMBER SlNGULAR + VERB/NUMBER SINGULAR
or
SUBJECT/NUMBER PLURAL + VERB/NUMBER PLURAL
§5.2.6.2 As another example, logical subscripts can be used to represent, for German words, the.various possibilities of gender, number, and case that could be assigned to them by a dictionary. When words appear in a phrase, such as a noun phrase composed of an article followed by a noun, some of the possibili- ties of gender, number, and case for each word would in general not be possible in View of the necessity of grammatical agreement with the other word. In German, there are 16 possible combinations of gender, number, and case. There are the 4 possibilities of gender and number: masculine singular, feminine singular, neuter singular, and plural; and 4 possibilities of case: nominative, genitive, dative, and accusative. But there are fewer than 16 forms for each word since some forms are ambiguous. A given word form can be marked in the dictionary as to which ones of these 16 possibilities it can be. This would be done by assigning to each word a subscript with the proper set of values chosen from these 16 possibilities:
/GNC MN MG MD MA FN FG FD FA NN NG ND NA PN PG PD PA
where GNC is the subscript name and means gender-number-case, M means masculine' N means nominative, and so on. The following rule would carry over both origin-workspace subscripts to the other constituent and carry out a merge operation on each, which will be interpreted as an and operation. It could thus reduce the gender-number-case ambiguity of both an article and a noun when they appeared together.
* ART + NOUN = 1/GNC*2 + 2/GNC*1 *
§5.2.6.3 It is also possible to carry over all of the subscripts that a constituent may have. This is indicated by the following right-half subscript instruction:
/$*__j__
where j is a relative constituent number that refers to the constituent from which the subscripts are to be carried over. This instruction will also carry over a numerical subscript if there is one (replacement). The numerical subscript alone will be carried over if
/-*__j__
is written in the right-half. A numerical subscript will be carried over and i added to or subtracted from another if the right-half contains /.1.*j
or
/.D.*j
§5.2.7 The Left-Half Search with Subscripts? The general rationale behind the left-half notations for specifying searches includes an emphasis on reducing to a minimum the amOunt of writing that the programmer has to do. Thus the programmer does not have to be any more explicit than necessary to uniquely specify the workSpace constituent of interest to him. If there is only one constituent in the workspace or if the one he wants is the first one on the left, a simple $1 will suffice to find it, no matter what its symbol may be and no matter what subscripts it may have. Likewise a left half SYMBOL or $-SYMBOL will find a workspace constituent on the basis of the symbol match and independent of any subscripts. If the left-half expression $1, $-SYMBOL, or SYMBOL has a subscript expression on it, then, not only must the symbol part match, but also the subscript part must match; it is irrelevant whether the workspace constituent has additional subscripts or not. Similarly, if both a subscript name and one or more values are specified, the workspace constituent must have at least this subscript and, for this subscript, at least these values in order to match. We might call this the principle of minimum necessary specification. It operates roughly in English too. We don’t say "that diamond gray 1959 VW sunroof sedan with no radio antenna, red wheels, New York license 123456,…," when "that car" or "that VW” or "that car with New York plates" will do to distinguish it from the rest of the objects in question.
§5.2.7.1 Suppose the workspace contains a number of constituents of the GEORGE type, as discussed earlier, and it is desired to search for someone that is male, unmarried, a professional radio operator, and an amateur mountain climber.
The following left-half will set up this search:
* $1/SEX MALE, MARRIED NO, OCCUPATION RADIO-OPRTR, HOBBIES MOUNTAINEER =
But if only a radio operator is needed, then the following would suffice:
* $1/OCCUPATION RADIO-OPRTR =
§5.2.8 Other Subscript Operations Other operations with subscripts can be built up out of the search, merge, and complement Operations. Two examples follow.
§5.2.8.1 Test to see if two subscripts have values in common: Suppose that the constituents A and B in the workspace both have a subscript SUB and it is re quired to find out whether they have values in common. The following rules will do it:
* A + B = 1 + 2 + Q/SUB*1, SUB*2 + Q/SUB*2, SUB*1 * * Q + 1 = 0 YES
If the subscripts SUB on A and B have values in common, the merge operation will be interpreted as and, leaving just the values they have in common on the new subscripts on the Q’s. The left-half search in the next rule will succeed. But if the subscripts on A and B do not have values in common, the merge operation will be interpreted as replace, and since the order of replacement is different on the two Q’s, the resulting sets of values will be different and the left-half search in the next rule will fail.
§5.2.8.2 Suppose that the constituents A and B in the workspace both have a subscript SUB and it is desired to insert a constituent D in the workspace with a subscript SUB having all the values that the subscript on A has, or the subscript on B has, or both. We assume that if both the subscript on A and the subscript on B are complemented, the resulting subscripts will have values in common. This is a fact that the programmer usually knows, and he can test for it if he doesn’t. The following routine will do it:
* A + B = 1/SUB*C + 2/SUB*C * * A + B = 1/SUB*C + 2/SUB*C + D/SUB*1, SUB*2, SUB*C *
To understand how this routine works, one must realize that the numbers written in the right-half refer to left¥half relative constituent numbers, which of course refer to the constituents as originally found in the left-half search and before they have been replaced by the new constituents as specified by the right-half operations.
§5.2.9 Numerical Sorting Suppose the workspace contains a number of constituents, each with a numerical subscript, and it is desired to sort them into numerical order. There are many schemes for sorting. Of these, perhaps the simplest, although certainly not the most efficient, method is the method of repeatedly exchanging adjacent items if they are out of order.
* $1 + $l/.L.*1 = 2 + 1 /
The left-half means: find any constituent that is followed by a constituent having a numerical subscript less than the numerical subscript on the number one constituent. Successive interchanges will thus continally move smaller numbered constituents to the left, and the rule will fail when no two constituents are still out of order. If the opposite order had been desired, with the larger numbers at the left instead of the smaller ones, the letter G for "greater than" would replace the L for "less than."
§5.2.9.1 There are also the left-half subscript expressions /.*j and /SUB*j that refer to workspace constituents having, respectively, a numerical subscript or a logical subscript with name SUB that would be matched by the numerical or logical subscript on the workspace constituent already found and given relative constituent number j. The expression $*j cannot be used in the left-half. If it is desired to refer to workspace constituents having subscripts that would be matched by all the subscripts on a workspace constituent already found, one must use data structures that ensure identity of symbols so that the method of section 4.1.6 can be used. If important information must be carried in symbols and associated with subscripts, two constituents can be ‘ used for each item, one with a symbol only and the other carrying the subscripts on a standard symbol, thus:
X/.27, SEX MALE, EATS -FISH + ROY
The left-half, instead of being written
* $1 + $ + 1 =
would be written
* $1 + $1 + $ + 1 + $1 =
5.3 Further Applications
§5.3.1 Polish Prefix Notation An important application of numerical subscripts is the efficient handling in COMIT of data structures that are organized in a branching tree or hierarchical structure by representing them in a linear form in the workspace without parentheses. To explain this notation, an example from logic, the area in which the notation was developed, is most convenient. Take as an example the logical expression
(A AND B) 0R (C AND D)
This is in the ordinary infix notation, that is, the connectives AND and OR appear between the expressions they connect, and parentheses are used to distinguish this from other expressions such as
((A AND B) OR C) AND D
§5.3.1.1 These expressions could also be represented as trees, as shown in figure 22.
§5.3.1.2 In the Polish prefix notation, due to Lukasiewicz, the connective is placed to the left of the items it connects and parentheses are removed. The above two expressions would then be
OR AND A B AND C D AND OR AND A B C D
In spite of the absence of parentheses, it is always possible to read the expressons unambiguously if it is known for each connective how many items
it connects. This information could be obtained from a dictionary that could assign a subscript of 2 to AND and to OR because they are each binary connectives, and a subscript of $ to A, B, C, and D, because they do not connect anything.
OR~2~ AND~2~ A~0~ B~0~ AND~2~ C~0~ D~0~ AND~2~ OR~2~ AND~2~ A~0~ B~0~ C~0~ D~0~
such an expression can be represented in the workspace in the following way:
OR/.2 + AND/.2 + A/.0 + B/.0 + AND/.2 + C/.9 + D/.0
In order to be able to refer to subexpressions in an expression, it suffices to calculate a new set of subscripts that can take the place of parentheses. This is done by starting the procedure at the left with an initial value of 1, then moving to the right, each time calculating a new subscript by adding to the result of the previous step the subscript obtained from the dictionary and then subtracting one.
§5.3.1.3 The procedure can be expressed easily in COMIT in terms of the following closed subroutine:
P $ = START/.1 + 1 * * $1 + $1 = 1 + 2/.I.*l, .D` // *Q23 1 / * $$ = // *A23 1 +
The result for our two expressions would be
START~1~ 0R~2~ AND~3~ A~2~ B~1~ AND~2~ C~1~ D~0~ START~1~ AND~2~ 0R~3~ AND~4~ A~3~ B~2~ C~1~ D~O~
We note that if an expression is well-formed, the last subscript must be zero.
§5.3.1.4 Now if we insert a marker anywhere in such an expression, it is easy to locate the subexpression that falls immediately to the right of that marker. The procedure is simple. Give the marker a subscript one less than that of the item to its immediate left. The required subexpression then extends to the right of the marker to and including the first item with a subscript equal to that of the marker. If the subexpression is removed and the marker left in its place, the original expression is still well-formed.
§5.3.1.5 The following closed subroutine will assign the proper subscript to a marker MARK and find and remove the subexpression to shelf 23:
M $1 + MARK = 1 + 2/.*1, .D1 * * MARK + $ + $1/.*1 = // *Q23 2 3 +
And the following rule will restore the subexpression to its place:
R MARK = // *A23 1 +
§5.3.1.6 In order to make use of the prefix notation internally and still allow the user to write input expressions in the infix notation, the following subroutine will translate from fully-parenthesized infix notation to prefix notation. This routine makes use of the simplifying assumptions that operators and operands are represented by one-character symbols and that the operators are all binary and have been tagged with the subscript OP. A typical expression for translation might be
*( + A + *+/OP + *( + B + X/OP + C + *) + *)
The subroutine for translation is:
T *( + $1 + $1/OP + $1 + *) = 3 + 2 + 4 // *K1 2 3 / * $ = // *E1 +
§5.3.2 Shelf Zero and the Subroutine Return Pushdown In chapter 3, on the flow of control, the pushdown facility for storing subroutine returns was introduced. We are now in a position to describe in detail how the mechanism works. When a go-to such as SUB+RET is executed, a constituent is set up with SUB as the symbol and RET as a logical subscript. This constituent SUB/RET is then stored on a special shelf, shelf zero. Then when a plus go-to is executed, the next constituent on shelf zero is obtained, its logical subscript is used for a go-to, and the constituent is deleted.
§5.3.2.1 All the shelving instructions also work on shelf zero; so it is possible to examine and manipulate the contents of the subroutine return pushdown.
5.4 Summary of Subscript Operations
§5.4.0.0 The following subscript operations either have been introduced in this chapter or are implied by the ones introduCed. This list includes all of the left-half subscript expressions and most of the right-half subscript expressions. The rest will be introduced later in connection with other topics.
§5.4.0.1 Left-half subscript expressions and their meanings:
/ |
having |
, |
and |
.n |
a numerical subscript with the value n. |
.*j |
a numerical subscript with a value equal to the numerical sub script on the number j constituent in the workspace. |
.Gn |
a numerical subscript with a value greater than n. |
.G.*j |
a numerical subscript with a value greater than the numerical sunscript on the number j constituent in the workspace. |
.Ln |
a numerical subscript with a value less than n. |
.L.*j |
a numerical subscript with a value less than the numerical subscript on the number j constituent in the workspace. |
SUB |
at least the subscript SUB. |
SUB V1 V2 |
at least the subscript SUB having at least the values_V1_ and V2. |
SUB- V1 V2 |
at least the subscript SUB having at least all values except the values V1 and V2. |
SUB- |
at least the subscript SUB having all values (at least all values except none). |
SUB*j |
at least the subscript SUB having at least the values of the subscript SUB on the number j constituent in the workspace. |
-. |
no numerical subscript. |
-SUB |
no subscript with the name SUB. |
-$ |
no subscripts. |
§5.4.0.2 In the case of the left-half operations involving *j, there will be no match if constituent j does not have the appropriate subscript.
§5.4.0.3 Right-half Subscript expressions and their meanings:
/ |
and then |
, |
and then |
.n |
set the numerical subscript to n. |
.*j |
set the numerical subscript equal to the numerical subscript on the number j constituent in the workspace. |
.Dn |
decrease the numerical subscript by n. |
.D.*j |
decrease the numerical subscript by the numerical subscript on the number j constituent in the workspace. |
.In |
increase the numerical subscript by n. |
.I.*j |
increase the numerical subscript by the numerical subscript on the number j constituent in the workspace. |
SUB |
merge in the subscript SUB with no values. |
SUB V1 V2 |
merge in the subscript SUB with values V1 and V2. |
SUB- V1 V2 |
merge in the Subscript SUB with all values except V1 and V2. |
SUB- |
merge in the subscript SUB with all values. |
SUB*C |
complement the values of the subscript SUB. |
SUB*j |
merge in the subscript SUB found on the number j constituent in the workspace, with all its values. |
$*j |
merge in all the subscripts found on the number j constituent in the workSpace, with all their values. |
-. |
delete the numerical subscript. |
-SUB |
delete the subscript SUB. |
-$ |
delete all the subscripts. |
§5.4.0.4 In the case of the right-half operations involving *j, there will be no operation carried out if constituent j does not have the appropriate subscript.
§5.4.0.5 When workspace constituents are expanded or compressed, any subscripts they may have are lost.
5.5 Problems for Chapter 5
Assume shelf 23 is available for use if needed.
-
The workspace contains a constituent A with a numerical subscript. Write a test that will fail if the subscript equals zero; otherwise it will decrease the subscript by one and go to LOOP. (Answer)
-
The workspace contains a constituent A with a numerical subscript. Write a test that will go to LOOP if this subscript has a value from 37 to 89 inclusive; otherwise it will go to the next rule. (Answer)
-
The workspace contains a constituent A with a numerical subscript. Write a test that will go to LOOP if this subscript does not have a value from 37 to 89 inclusive; otherwise it will go to the next rule. (Answer)
-
The workspace contains three constituents, each with a numerical subscript. Replace these by a constituent having the symbol SUM and a numerical subscript equal to the sum of those on the original three. (Answer)
-
0 The workspace contains several constituents. Number them with numerical subscripts starting with 1 at the left. (Answer)
-
The workspace contains several constituents. Write a routine that will leave the workspace containing 32 = 25 copies of its original contents. Use a numerical subscript to keep count. (Answer)
-
Write a routine to find the constituent in the workspace that has the largest numerical subscript, and leave that constituent on shelf 23. (Answer)
-
Assume that the workspace and shelves 51 through 54 are empty. Move the data on shelves 1 through 50 to shelves 5 through 54. Data from shelf 1 goes to shelf 5, shelf 2 to shelf 6, and so on. (Answer)
-
Assume that shelves 21 through 50 are empty and available. It is desired to use these shelves in a pushdown of shelves arrangement for storing the contents of the workspace. Write two closed subroutines PUT and FETCH. PUT will put the contents of the workspace on the next available shelf or go to FULL if there are no free shelves left. FETCH will bring to the beginning of the workspace the contents of the most recently filled shelf or go to EMPTY if all the shelves are empty. Assume that shelf 20 contains a single constituent A with a numerical subscript for keeping track of the current shelf. This subscript has been set originally to 20. (Answer)
-
Write a test to see if George is married or not. Go to either rule YES or rule NO. ASSume constituents and subscripts in the workspace like those used in the text as examples. (Answer)
-
Write a routine to test if George eats everything. Go to either YES or NO. Assume that he eats something. (Answer)
-
Write a test to see if there are at least three constituents in the workspace with a subscript A. If there are, go to YES. (Answer)
-
Write a test to see if the second constituent in the workspace has at least a subscript FURNITURE with at least the values TABLE and DESK. (Answer)
-
Write a test to see if there is an adjacent pair of constituents in the workspace, the second of which has the same symbol as the first, the same numerical subscript if the first one has one, and at least the same logical subscripts, each with at least the same values as those on the first. (Answer)
-
Assume shelf 22 contains a file of items. Each item consists of two constituents. The first has a symbol A and one or more logical subscripts representing subject index catggories. The second constituent identifies the document that has been so indexed. The workspace contains a search request in terms of a constituent with symbol A and logical subscripts representing the index categories that must be satisfied for those documents. desired. Write a closed subroutine SEARCH that will remove to shelf 23 those items that satisfy the search request. Leave the rest of the items on 22 and the search request in the workspace. (Answer)
-
The workspace contains a constituent with a logical subscript Q having at least two values, but possibly more. Write a rule that will remove the value C if it is present. (Answer)
-
Why cannot the above rule be used to remove the value C if it is the only value present? (Answer)
-
What practical way can be used in a COMIT program so that the above rule will always work correctly? (Answer)
-
The workspace contains a constituent with a logical subscript Q. Write a rule to add the value C to this subscript. (Answer)
-
When would this rule fail to work properly, and what is to be done about (Answer)
-
A board game is played on a 5 x 5 checkerboard of 25 squares. Each piece is an irregularly shaped cardboard that will just cover one or more squares and may be played only if those squares are not already covered. Assume the workspace contains two constituents: PIECE, with subscript SQUARES and values indicating which squares that piece covers; and BOARD, with subscript SQUARES and values indicating which squares are still not covered. Write a test that goes to FIT if the piece can be played, otherwise to the next rule. Leave the workspace undisturbed in either case. (Answer)
-
Write a closed subroutine PLAY that will play a piece that fits in the above game. To play a piece, change the subscript values on the constituent BOARD to reflect the additional squares covered, and delete the constituent PIECE. Assume that a 26th square has been defined that is never covered. (Answer)
BRANCHES AND SUBRULES
§6.0.0.0 Consider a subroutine for writing an invitation to a person. Part of the logic should consist of asking whether the person is single, or a married man, or a married woman. One branch will consist of a three-way choice between paths leading to writing either "you,” or "you and your wife," or "you and your husband."
6.1 Subrules and the Dispatcher
§6.1.1 Preset Program Branches There are many circumstances where the decision as to which branch in a program should be taken can best be made at an earlier point in the program before control has reached the branch point. In the example we have been discussing, it would be much easier to test whether that person was single, etc., at the much earlier point in the program where the list of names was being examined to determine whether there was still an unchecked name on the list (figure 3). If the test were put there and the branch kept in its original place, the result of the test would have to be remembered for the intervening period.
§6.1.1.1 In COMIT, the result of a test destined to effect a later branch of control is remembered in a special section of memory called the dispatcher. The dispatcher is like the logical subscripts of a constituent, but it is not in the workspace or on a shelf. Each dispatcher entry is in the form of a logical subscript that can be set to control a program branch.
§6.1.1.2 The flowchart is illustrated in figure 5.
§6.1.1.3 The relevant parts of the COMIT program corresponding to this flowchart are
* (unchecked off name?) ** * OUT * (Single?) // GUEST SINGLE CHECK-OFF * (man?) // GUEST HUSBAND CHECK-OFF * // GUEST WIFE * CHECK-OFF * . . . GUEST SINGLE $ = -YOU * HUSBAND = -YOU-AND-YOUR-WIFE * WIFE = -YOU-AND-YOUR-HUSBAND * * $ = // *WAMl *
In this section of program, the third, fourth, and fifth rules contain dispatcher instructions written in the routing. These instructions operate like logical subscript expressions in the right-half in that they merge with the current dispatcher subscript entry to give a new one. In the operations shown, the new value will replace the old value, whatever it might have been. When a COMIT program is started, each dispatcher subscript entry starts out with no values.
§6.1.1.4 This new value is retained as the setting of the dispatcher for GUEST. When the flow of control eventually reaches the rule GUEST, the correct one of the three branches is taken. These three branches are the three subrules of the rule GUEST. Each subrule has a subrule name: SINGLE, HUSBAND, or WIFE, and the choice of which subrule of the three is executed is determined by the current setting of the dispatcher entry GUEST. It is to be noted that the rule GUEST has a single left-half, but each subrule can have its own right-half, routing and go-to. The * go-to on any subrule means, of course, to go to the next rule, and if there is failure in the left-half, control similarly goes to the next rule.
§6.1.1.5 It is thus clear that there is a correspondence between subscript names and rule names, and between subscript value names and subrule names. In fact, if a rule and a subscript have the same name, the set of subrule names is taken to be the complete set of possible value names for the subscript. If a subscript name is the same as a rule name, the associated subscript value names must all be chosen from among the subrule names of the rule. Also, rules having the same name must have identical sets of subrule names in the same order.
§6.1.2 Multiple Position Switch Another example of the use of the dispatcher to control the choice of Subrules is this subroutine for dealing a hand of bridge. We assume that shelf 5 contains 52 constituents representing 52 cards and that they are to be dealt out one at a time onto shelves 11, 12, 13, and 14.
DEAL // HAND EAST * NEXT $ = // *N5 1 * HAND NORTH $1 = // *S11 1, HAND EAST NEXT EAST // *S12 1, HAND SOUTH NEXT SOUTH // *S13 1, HAND WEST NEXT WEST // *S14 1, HAND NORTH NEXT * +
§6.1.2.1 The first rule sets the dispatcher to start the deal with EAST. The rule NEXT brings the next card off of shelf 5. The rule HAND has four subrules for NORTH, EAST, SOUTH, and WEST. Initially the subrule EAST is chosen. This subrule stores the card on shelf 12, changes the dispatcher setting to the next subrule, SOUTH, and goes to the rule NEXT, which brings the next card off of shelf 5. The process will continue, dealing the cards from shelf 5 in turn onto shelves 11, 12, 13, and 14, until there are no more cards on shelf 5. When this happens, the rule NEXT will bring a null constituent off the shelf leaving an empty workspace. The left-half of the rule HAND will fail, and the routine will terminate with a + go-to. Note again that the rule with several subrules has only one left-half, written in the first subrule. A rule may have up to 36 subrules.
§6.1.3 Random Choice In certain types of programs, the introduction of a random element is essential. In COMIT, there is a feature that provides for the random choice‘of subrules during execution of a program. When execution of a program begins, the initial state of the diSpatcher is that it contains a logi- cal subscript for each rule name or subscript name that has subrule names or value names associated with it. But none of these dispatcher subscripts has any values; so the dispatcher contains no information for choosing among subrules during execution. In this situation, the choice of subrule to be executed is determined automatically by a simple internal mathematical routine that supplies a sequence of "pseudo-random" numbers. Since the routine is deterministic, it will supply the same sequence of pseudo-random numbers each time the COMIT program is run - this amount of predictability is a great aid in finding programming errors.
§6.1.3.1 An example of the use of the random choice of Subrules is the following routine to shuffle a deck of cards. We assume that each card is repre- sented by a constituent and that the deck has been cut, part of the cards being on shelf 2 and the remainder on shelf 4. The routine shuffles these two partial decks together and places them on shelf 3.
SHUFFLE LEFT $ = // *NZ 1 * RIGHT // *N4 1 * fl * $1 = // *Q3 1 SHUFFLE * $ = $¢ + $® // *A2 1, *A4 2, *Q3 1 2 +
§6.1.3.2 Another example of a routine involving random choice is a simple Markov source, which produces a sequence of symbols with statistical properties characteristic of the source. The state diagram of one Such Markov source is given in figure 6, with the output symbols and transition probabilities indicated.
§6.1.3.3 The program is given below. The output symbols are written on the monitor tape.
S0 1 $ = A // *WAMI S1 2 = B // *WAMI S2 3 = B // *WAMI S2 S1 1 $ = A // *WAMl S0 2 = A // *WAMl S2 S2 1 $ = A // *WAMl S0 2 = A // *WAMl S0 3 = B // *WAMl S1
§6.1.4 Limited Random Choices Suppose we have the following rule in a program:
FLAVOR A $ = VANILLA * B = CHOCOLATE * C = STRAWBERRY * D = COFFEE * E = ORANGE * F = LIME *
We have seen that if the dispatcher has not been set, there will be a random choice from among the six subrules; and if the dispatcher has been set to one of the subrules, say by a dispatcher instruction FLAVOR E, then that subrule is selected. There is a third possibility; the dispatcher can be set to several of the subrules and the random choice will then be limited to those subrules. If the dispatcher were set to FLAVOR C E F, there would be a random choice of strawberry, orange, or lime.
§6.1.4.1 When a dispatcher instruction is executed, the new dispatcher entry is merged into the entry already in the dispatcher in the same way that subscripts are merged. Let us assume that the dispatcher contains FLAVOR A B C.
If a dispatcher instruction FLAVOR B C D is executed, the dispatcher will contain the values in common, FLAVOR B C. But if the dispatcher instruction is FLAVOR A B or FLAVOR D E, the dispatcher will contain the new values in either case. The dispatcher subscript may be cleared for a completely random choice again by the entry FLAVOR, with no values indicated. Therefore, to reset a dispatcher subscript when it is not known what values it has, two entries should be used, one to clear the subscript and the other to reset it: FLAVOR, FLAVOR B C D. The routing instruction *D- will clear the complete dispatcher.
§6.1.5 Merging Dispatcher Subscripts onto Workspace Constituents The righthalf subscript instructions
/ __SUB__*D / $*D
are similar in operation to the right-half subscript instructions
/ __SUB__*__j__ / $*__j__
(see section 5.2.6) except that the subscripts are carried over and merged onto the workspace constituent from the dispatcher instead of from the constituent having the relative constituent number j.
§6.1.6 Merging Subscripts into the Dispatcher The following COMIT rule will merge into the dispatcher all of the subscripts on the indicated workspace con- stituent.
An example of the use of a rule like this is in the following routine that obtains the spelling of a verb from a dictionary and adds the correct inflection. We assume that the workSpace contains a constituent that specifies, by means of subscripts, which verb is desired and how it should be inflected.
A/VERB C, INFL lNF
The routine first sets the dispatcher for these values. The verb stem is obtained from the rule VERB. The correct inflectional category is indicated in the go-to, where control passes to one of the two INFL rules. (More than two would actually be needed for English.) Finally, the exit is to rule R, and the workspace contains the correctly inflected verb.
* $1 = // *D1 * VERB A $1 = LIK B B = WALK A C = LOOK A D = BOR B . . . . . . A * INFL INF $1 = R 3RD = 1 + S R PAST = 1 + ED R PART = 1 + ING R B * INFL INF $1 = 1 + E R 3RD = 1 + ES R PAST = 1 + ED R PART = 1 + ING R
§6.1.6.1 It is worth noting the use of rules A and B with asterisk go-to’s and no center sections. They serve merely to direct the flow of control to one or the other of the two rules named INFL. Although it is legal to have two or more rules by the same name, as in the case of INFL, this name cannot be used as a go-to without causing an undefined go-to error. It is also worth noting that no confusion is caused if the same name is used as a rule name and a subrule name, or as a subscript name and a value name.
6.2 Fast Search of Left-Halves
§6.2.1 List Rules Consider the following program for word-for-word substitu- tion. We assume that the workspace contains a German sentence. Each letter is a constituent and words are separated by - characters, representing spaces. There is no punctuation, but a - at the beginning and at the end. It is desired to look the words up in a list, substitute English "equivalents," and print them out.
A - + $ + - + $ = // *WAM1, *KZ, *Q5 3 4 DICT * STOP DICT IST = IS B * DER = THE B . . . B $ = // *WAM1, *A5 1 A STOP *
§6.2.1.1 What we are really doing in this program is searching through a number of rules for a left-half that matches the workspace. The trouble with the program is that it will run quite slowly if the list is very long. In COMIT, there is a facility for writing an equivalent program that runs quite rapidly:
A - + $ + - = // *WAMl, *L2 DICT * STOP -DICT IST = IS B DER = THE B . . . B $ + - = // *WAM1 A STOP *
§6.2.1.2 In this program, the first rule locates a word as a string of characters between spaces. The first space is written out. The routing instruction *L2 says to compress these characters temporarily into one constituent without subscripts and look it up in the list mentioned in the go-to DICT. The list rule -DICT is distinguished by a hyphen (or minus sign) in column 1. The left-halves of the various list subrules are single constituents. Each list subrule can have a normal right-half, routing, and go-to. If the temporarily compressed constituent cannot be found in the list, the original uncompressed constituents are restored to the workspace complete with their subscripts, and control goes to the rule after the list rule (rule B).
§6.2.1.3 This program, written with a list rule, will run much faster than the other program because the COMIT compiler automatically sorts the left-halves of the list subrules into alphabetic order. Then whenever a constituent is looked up in a list, COMIT can use a fast search procedure. If there are more than a few items in the list, the list-rule method is to be preferred.
§6.2.1.4 If a routing section contains several instructions, the *L must be the one written last. This is true also of the *X exchange instruction; it too must be written last, but there would never be an occasion for using both in the same routing section. It is important that a list rule never be entered without having just executed a *L instruction to specify what is to be looked up in the list. Thus it is an error to enter a list rule by left-half failure in the preceding rule. It is good programming practice to anticipate this problem by always writing a special rule above the list rule to catch the flow of control in case of left-half failure. In the example, the rule between A and -DICT will do this.
§6.2.1.5 There is no limit to the length of a list rule except the number of memory registers available in the core memory of the computer. A list rule to look up words in a dictionary and tag each with a logical subscript and values could contain up to 4,000 or 5,000 entries before running out of space in core memory. If it is important to have a large number of list subrules, space can be saved by using an asterisk go-to in each subrule rather than any other kind, since the asterisk go-to saves one register of computer memory per list subrule over any other type of go-to.
§6.2.1.6 The only problem introduced by using the asterisk go-to in list subrules is that control will go to the next rule whether the item is found in the list or not. Therefore, unless it is known that items looked up in the list will always be found, a test should be added to the program to determine whether the item had been found or ngt. For example, if the item to be looked up originally consists of two or more constituents, the test $0+$1+$0 for a single constituent would fail if the item had not been found. As another ex- ample, $0+$1/-$ would fail if the item looked up had not been given a subscript by a list subrule right-half.
§6.2.2 Input Control of Program A user who has a specific and narrow set of tasks in mind will often write, in a compiler language, a program that is general enough to do all of the tasks of interest to him in response to brief instruc- tions that he invents for the purpose. An interesting example of how this can be done uses a list rule and shelves. Suppose one has some elementary special-purpose COMIT closed subroutines for processing particular types of data. These routines have been named READ, WRITE, INSERT, REMOVE, SKIP, BACK, and LOOK. In addition, there are some higher-level closed subroutines that call some of these elementary subroutines using the A+B type of go-to. The higher-level rou- tines might be called EDIT, PROCESS, BRING, SAVE, PREPARE, FINISH, and so on. The user would like to treat these words as a higher-level programming language so that he can write a program, in terms of these words, for each different data set. The following COMIT routine allows him to do this. He can punch his higher-level program on cards using these words, stack these cards behind the END card of his COMIT program, and follow with a data set.
D $ = // *Q23 1 WORD+INTERPRET INTERPRET $1 = // *L1 F * $ = // *A231, *S01 + -F READ = A/READ D WRITE = A/WRITE D INSERT = A/INSERT D REMOVE = A/REMOVE D SKIP = A/SKIP D BACK = A/BACK D LOOK = A/LOOK D EDIT = A/EDIT D PROCESS = A/PROCESS D BRING = . SAVE = . PREPARE = . FINISH = . * ERROR
§6.2.2.1 The above routine is entered at rule D with an empty workspace; thus the first time through, the shelving instruction has no effect. Control goes to a closed subroutine WORD, not shown, This subroutine will read the cards following the END card, using input instructions that will be described later, and will return control via a + go-to to the rule INTERPRET. We assume that when control returns from WORD, there will be a word, compressed into one constituent, in the workspace; but if the end of the user’s program of words has been reached, the workspace will be empty. The rule INTERPRET will cause the list rule to be entered to look up the word. Each list subrule replaces a word by one or more constituents with logical subscripts. The point of this is that after all the words of the user’s program have been translated, they are placed on shelf zero where the subscripts can be used as go-to names when a + go-to is executed. This process is started by the + go-to in the third rule. In this way the user’s program controls the flow of control in the COMIT program. If the user misspells a word or uses a word not in the list, the list rule fails and control goes to the next rule and subsequently to ERROR.
§6.2.3 List Rule Execution In order to clarify the exact operation of the list rule, it is worthwhile to consider how the relative constituent numbers are assigned and changed during the execution of the rule with the *L instruction and during the execution of the list rule itself.
§6.2.3.1 Suppose the workspace originally contains
A + B + C/D + E + F/G + H + I + J
and the rule with the *L instruction is
* B + $2 + $2 + $1 = 1 + 3 + 2 + 4 // *L2 3 LIST
After the execution of the right-half, the workspace will be rearranged and assigned right-half relative constituent numbers as follows:
A + B + F/G + H + C/D + E + I + J [1] [2______] [3______] [4]
It is the constituents numbered 2 and 3 that are specified to be looked up in the list. The *L2 3 instruction marks these constituents, which we will call the key.
F/G + H + C/D + E [key ]
The *L2 3 instruction then compresses copies of the constituents to form what is called the list long symbol (1.s.), which is to be used in comparing with the left-halves of the list subrules. The list long symbol would then be
FHCE [1.s.]
The go-to is then executed, and control goes to the list rule itself.
§6.2.3.2 In the list rule, the long symbol is compared, by means of a fast search, with the list subrule left-halves. If there is no match between the long symbol and a list subrule left-half, the long symbol disappears, the workspace remains as it was before the list rule was entered, and the list rule fails, control going to the next rule.
§6.2.3.3 But if the long symbol matches one of the list subrule left-halves, the right-half, routing, and go-to of this subrule are executed. A number 1 written in the right-half refers to the long symbol; but in the absence of a right-half, a number 1 written in the routing refers to the key. There are thus several important cases to consider.
§6.2.3.4 If the list subrule has no right-half or routing:
FHCE = R
the workspace will be left as it was before the list rule was executed and control will go to the rule R.
§6.2.3.5 If the list subrule has a rOuting but no right-half:
FHCE = // *S23 1 R
the routing instruction will move the key to shelf 23, which would then contain
F/G + H + C/D + E + . . .
and the workspace would be left with
A + B + I + J
§6.2.3.6 If the list subrule has an exchange instruction in the routing and
no right-half:
FHCE = // *X23 R
the workspace will be left as it was before the list rule was entered, and then the entire contents of the workspace will be exchanged with shelf 23.
§6.2.3.7 At this point it is convenient to introduce a new go-to, the $ go-to. It operates something like the + go-to except that it refers to the workspace constituent that has the relative cfinstituent number 1 instead of shelf 6 and the constituent is not deleted. Control goes to the rule with the same name as the name of the logical subscript on this constituent. The subscript may have been in the workspace, it may have been introduced from the right-half, or the constituent may have been introduced by a shelving instruction in the routing.
§6.2.3.8 If a list subrule has no right-half or routing but a $ go-to:
FHCE = $
the workspace will be left as it was before the list rule was executed, and the $ will refer to the first constituent in the key:
F/G + H + C/D + E
which contains the logical subscript G. Control will go to rule G.
§6.2.3.9 If a list subrule has a right-half, the key will be replaced by what- ever is written in the right-half. If the number 1 is used in the right-half, it will refer to the list long symbol, which is the same as the list subrule left-half.
§6.2.4 List Rules for Sorting The subject of efficient sorting is a very complex one, and one that has developed a considerable literature. Section 5.2.9 gave a simple interchange method of sorting according to numerical subscripts. Additional methods of sorting in COMIT using numerical subscripts have been described.
§6.2.4.1 If the items to be sorted are not already tagged with numerical sub- scripts that can be used as sort keys, the list rule provides a ready means for converting alphabetic characters in constituent symbols into numerical subscripts. Since numerical subscripts can encode up to 15 binary digits, it is possible to encode two or three characters in one numerical subscript. To do this, the characters are looked up one at a time in a list rule and provided with numerical subscripts within the range of zero to n - 1, where n is the number of characters in the alphabet used. Then the subscripts for the several characters are combined by successive multiplication by n and addition of the subscript for the next character. For multiplication see section 10.0.8.
§6.2.4.2 Alternatively the list rule can provide the nucleus for very efficient sorting methods that do not utilize numerical subscripts. One scheme is to bring each item to be sorted into the workspace and look up one of the characters in the tag or sort key. The whole item is then thrown on a shelf determined by the result of the list-rule lookup. Repetition of an inner loop that sorts each item on one character results in the items being sorted onto n shelves. They can then be brought off of the shelves in the proper sort order and the process repeated, sorting on the next character.
§6.2.4.3 As an illustration, the following inner loop of a sort program will sort items according to fixed-length keys using a scheme of working through the key characters in right-to-left order. Assume that each of the shelves onto which the items are to be sorted has been initialized by a constituent having the shelf number as a numerical subscript (section 5.1.2.1). Items to be sorted are separated by - characters. Each item consists of first a key of n characters, then a . character, and then an entry compressed into one constituent.
- + K + E + Y + . + ENTRY + -
The inner loop of the routine follows. Start at rule A with the file to be sorted in the workspace.
A $1 + $2 + - + $ = // *S3 4 3, *K2, *L1 B * (finished sorting on this character position) -B A = // *X11 C B = // *X12 C . . . . . . - = (finished sorting) * (character not in list) C $1 + $ = // *S*1 2 1, *A3 1 A
§6.2.4.4 Other somewhat similar routines can easily be developed for left-to-right sorting and for sorting with variable length keys.
§6.2.4.5 Such schemes can also be used for numerical sorting where the numerical key is in the form of constituent symbols instead of numerical subscripts. In this case, since there are only 10 characters (digits) in the alphabet, it is possible to sort on two at once and use 100 shelves with a corresponding improvement in speed and efficiency.
§6.2.4.6 It is also possible to use a variation of this scheme to sort with a complex or arbitrary sort order or collating sequence, or one that changes frequently. The actual collating sequence can be varied by varying the order in which the shelves containing the sorted items are emptied when the items are gathered up after sorting. This order can be variable, under the control of a string of constituents on a pilot shelf. Each constituent would have a different numerical subscript representing a sort shelf, and the order of the constituents on the pilot shelf would control the order in which the shelves were emptied.
6.3 Summary of New Expressions
- $ go-to
-
go to the rule named by the subscript of the first constituent in the group having the relative constituent number l.
- \// SUB V1 V2
-
merge into the dispatcher the subscript SUB with the values V1 and V2.
- \// SUB- V1 V2
-
merge into the dispatcher the subscript SUB with all values except V1 and V2.
- \// SUB
-
merge into the dispatcher the subscript SUB with no values (clear it for the subscript SUB).
- \// SUB-
-
merge into the dispatcher the subscript SUB with all values.
- \// \*D-
-
clear the dispatcher.
- \// \*Dn
-
merge into the dispatcher the subscripts on the constituent with relative constituent number n. i
- \// \*Lcns
-
look up in the list rule named in the go-to a list long symbol formed from the workspace expressions referred to by the consecutive relative constituent numbers cns.
- / _SUB*D
-
a right-half expression. Merge into the workspace constituent the subscript SUB with the values it currently has in the dispatcher.
- / $*D
-
a right-half expression. Merge into the workspace constituent all the subscripts represented in the dispatcher and having the values they currently have in the dispatcher.
6.4 Problems for Chapter 6
-
The workspace contains a number of constituents. Write a routine that will put them all on shelf 23 and leave a constituent *A in the workspace with a subscript NUMB EVEN or NUMB ODD, depending on whether the number put on the shelf is even or odd. Assume EVEN and ODD are the only values ever as- sociated with the subscript NUMB. (Answer)
-
Write a test to see if the dispatcher has at least the value A for the subscript SUB. (Answer)
-
Write a test to see if there are no values on the subscript SUB in the dispatcher. (Answer)
-
There is a rule MAYBE with subrules A, B, C, D, E, and F; and another rule TRAP. It is not known how the dispatcher has been set. Write a routine that will go to MAYBE if the dispatcher has been set for at least one or more of the values A, B, and C; and arrange it so that the rule MAYBE will choose at random between subrule D and whichever of the values A, B, and C had been originally set in the dispatcher. If neither A nor B nor C had originally been set, go to TRAP. Shelf 23 is empty and may be used if needed. (Answer)
-
Write a routine that will send control to YES with a probability of 0.6 and to NO with a probability of 0.4. (Answer)
-
A company is looking for a trade name. Write a routine that will generate at random three-letter words on the consonant-vowel-consonant pattern, using only the letters A, B, C, D, E, F, G, L, O, U, and write them out. Use subrules and the
go-to. (Answer) -
Suppose you have written a closed subroutine STPR1, and you desire to test it. Write a program that will test the subroutine by entering it five times, each time with a different character A, B, C, D, or E, in the workspace for input data. The test routine is to put the input data on shelf 1, the output: data from STPR1 on shelf 2, and call a closed subroutine PRINT each time. Indicate where the subroutines STPR1 and PRINT are to be inserted. (Answer)
-
Each of the five shelves 21, 22, 23, 24, and 25 has a constituent at its left end carrying the shelf number as a numerical subscript. Write a closed subroutine QUEUE that will queue the entire contents of the workspace onto one of the shelves, depending on whether the first constituent in the workspace is ONE, TWO, THREE, FOUR, or FIVE, respectively. Go to EMPTY if the workspace is empty and to ERROR if the first constituent is something else. Use a list rule. Shelf 20 is empty and can be used. (Answer)
OUTPUT
§7.0.0.0 Up to this point we have dealt with data in the workspace, where it is manipulated, and on the shelves, where it is out of the way until wanted. We have yet to discuss in detail how to get data out of the computer so it can be read or used; that is the topic of this chapter. In a later chapter we will cover various methods of getting data into the computer.
7.1 Character Representation
§7.1.0.0 The COMIT system has been designed to offer the programmer as much flexibility and convenience as possible in arranging his data and printing it out, or punching it on cards. It is desirable to be able to print out any of the characters available on the printer. This means that all the available characters have to be represented in the workspace. The following is a list of the complete set of (IBM 7094 FORTRAN) characters that a COMIT program can handle as input data or output data:
A...Z . , 0...9 + - = * $ / ( ) blank end of card or line
If these characters are all to be handled by COMIT, it must be possible somehow to represent them all in workspace constituent symbols and to refer to them all by constituent symbols in the rules of the COMIT program. But a prob- lem arises since some of the characters have been set aside for special uses in the rule, notably the characters + = / * and some others, particularly the numbers 0 - 9. How then will it be possible to write a rule to refer unambiguously to these characters in the workspace and still be able to use them unambiguously in their special COMIT meanings? For example, suppose one wanted to insert into the workspace a constituent with the character 1 as its symbol.
How could this be done since the rule
$ = 1
means something else? The list of characters that pose problems of this sort is:
0...9 + - = * $ / ( ) blank
§7.1.1 Transliteration The solution has already been hinted at when it was pointed out that a space (blank) is represented in the workspace by a hyphen and that the characters + ( and ) are represented in the workspace as *+ *( and *) respectively. There is a transliteration that takes place automatically when characters are read into the workspace as constituent symbols from punched cards or other external media. Then, during output, when the workspace characters are printed out or punched on cards, the reverse transliteration takes place automatically. The above list of problem characters thus need never be written in the constituent symbol of a rule. Instead We use a hyphen or minus - for the blank and write double characters for the others, formed by prefixing asterisks to them as in the following table:
*0...*9 *+ *- *= ** *$ */ *( *) \-
The required automatic translation of input characters to their representation in workspace constituent symbols, and the reverse automatic translation from the workspce representation to output characters is given in the following table.
§7.1.2 Format A Transliteration Table
Input |
Constituent Symbol |
Output |
(Workspace or rule) |
||
A…Z |
A…Z |
A…Z |
. , |
. , |
. , |
0…9 |
*0…*9 |
0…9 |
+ - = * |
+ *- *= * |
+ - = * |
$ / ( ) |
*$ */ *( *) |
$ / ( ) |
blank |
\- |
blank |
end of line |
*. |
end of line |
§7.1.2.1 The transliteration leaves the letters A … Z and the period and comma unchanged. A blank and an end of line are represented in the workspace as - and *. respectively. The ten digits and the remaining eight special characters are represented in the workspace with a prefixed asterisk that can be read as "real." Thus the right-half = *1 would mean to insert into the workspace a real one, that is, a constituent whose symbol would be printed out as 1. The rule for output transliteration is even simpler: The characters - and *. become space and end of line; all double characters lose their asterisks; all other characters are unchanged.
§7.1.2.2 It is only in constituent symbols that this transliteration is needed. The previously described conventions for names still apply: Rule names, subrule names, subscript names, and value names are up to 12 characters in length and are composed from the letters, numbers, and, in medial position, periods and hyphens. Asterisks never appear in names.
§7.1.2.3 It is worth noting that when a constituent symbol in the workspace containing double characters is expanded by executing the routing instruction *En, each double character remains intact. Thus the workspace constituent symbol
AB*1*2*+*3*4
would become when expanded
A + B + *1 + *2 + *+ + *3 + *4
In addition, when any workspace constituents with subscripts are expanded or compressed, the Subscripts are lost. If it is desired to keep the subscripts, they can be saved before expanding or compressing by duplicating the constitu- ents or carrying over the subscripts to other constituents using /$*j in the right-half.
§7.1.2.4 The COMIT programmer obtains an additional bonus from the transliteration scheme. An extra alphabet *A … *Z is defined for use in the workspace and in the rules. These characters cannot be introduced into the workspace from input data in format A, C, or T. If they appear in the workspace, they must have been inserted from the right-half of a rule. These double characters have a special important use. They can be used as markers in the workspace for later unambiguous reference in a left-half, since they cannot possibly be con- fused with the input data that is being processed.
§7.1.2.5 Normally COMIT makes no distinction between the minus sign (an 11 punch on the card) and a hyphen (a 4 and 8 punch on the card). This avoids con- fusion since these characters print the same. Some key punches and printers, however, have been changed to print an apostrophe instead of a hyphen. In order to use the apostrophe as an extra character, one merely has to use a COMSET card with the word APOSTROPHE on it and a blank in column one. This card should be inserted between the COM card and the first rule. The apostrophe is represented as *' in a constituent symbol.
§7.1.2.6 As an extra convenience to COMIT programmers who deal extensively with numbers in constituent symbols, a special abbreviated notation is avail able that reduces the number of asterisks that have to be written in the left-half and the right-half. In a left-half or right-half constituent symbol, it is only the left-most character, if it is a number, that could be confused with the mention of a relative constituent number. Therefore the requirement of writing asterisks in front of "real" numbers in a rule is relaxed except in the case of a number that is the left-most character of a constituent symbol. Numbers in the workspace still retain their asterisks, and the full notation is also still permissible in the rule. Thus the following pairs are synonymous when written as constituent symbols in a rule, and either may be used:
*l*2*3 *123 A*1*2*3 A123 *A*l*2*3 *A123 *1*2*3A *123A *1*2*3*A *123*A *(*1*2*3*) *(123*) *1*2*+*2*3*=*3*5 *12*+23*=35
(Editor’s note: This feature is not implemented in COMIT Revived.)
7.2 Output of Symbols
§7.2.1 Format A Output With the transliteration preliminaries out of the way, it becomes easy to describe how to program for output.
§7.2.1.1 Suppose the programmer wishes to have the output printer print his name at the top of his pages of printed output. He can use a rule like the following:
* $ = -JOHN-DOE*. // *WAM1 *
The abbreviation *WAM1 in the routing actually causes the writing. The first character to be written out is a - character, which, when it is written out in column one, is interpreted by the printer as a code that causes the printer to move the paper to the next line before printing. The last character to be written out is a *. double character, which is the code for end of line. When the whole constituent is written out it disappears from the workspace. On paper it will appear as
JOHN DOE
since the second - is written as a space.
§7.2.1.2 Suppose that a program has produced a lot of text in the workspace as follows:
- T + H + I + S + - + O + U + T + P + U + T + - + T + E + X + T + - + H + A + S + - + B + E + E + N + - + P + R + O + D + U + C + E + D + - + B + Y + - + A + - + C + O + M + I + T + - + P + R + O + G + R + A + M + - + A + N + D + - + I + T + - + I + S + - + D + E + S + I + R + E + D + - + T + O + - + W + R + I + T + E + - + I + T + - + O + U + T + - + I + N + - + L + l + N + E + S + - + O + F + - + A + B + O + U + T + - + F + O + R + + Y + - + F + I + V + E + - + C + H + A + R + A + C + T + E + R + S + - I + N + - + L + E + N + G + T + H + + -
The following rule will form the text into lines averaging about 45 characters in length.
* $43 + $ + - = 1 + 2 + 3 + *. // *WAM1 2 4 / * $ = 1 + *. // *WAM1 2 *
The $43 will find the first 43 characters. The $ will find all the rest of the characters up to the next - between words. The right-half adds an end-of-line symbol. The write instruction writes out the first, second and fourth constitu- ents mentioned in the right half. The - is left in the workSpace so that it will be at the beginning of the next line where it will direct the printer to move the paper before printing that line. On its first two applications, the rule will produce the following output:
THIS OUTPUT TEXT HAS BEEN PRODUCED BY A COMIT PROGRAM AND IT IS DESIRED TO WRITE IT OUT IN
It is noted that in format A writing all of the + signs between constituents have been deleted. Subscripts will also be lost, and symbols will be trans- literated as has been explained.
§7.2.1.3 It is not necessary, however, for the COMIT programmer to explicitly divide his text up into lines by counting characters. The following rule would put out the entire text, with COMIT automatically breaking it into lines of convenient length as will be explained in the next section.
* $ = 1 + *. // *WAM1 2 *
§7.2.1.4 We can summarize here some of the output instructions before going on to a more detailed description of how they work and the kinds of flexibility they afford.
*WAM1 2 │││─┬─ │││ │ │││ └── refer to right-half relative constituent numbers ││M means monitor output tape ││I means output tape for punching ││L means on-line printer ││P means on-line card punch │A means format A, treated here │S means format S, treated shortly *W means write output
(Editor’s note: In COMIT Revived, only format A is implemented; format S was very tied to the IBM 704 batch environment. Channel M is standard error, all others go to standard output.)
§7.2.1.5 In the above illustration, the relative constituent numbers used refer to workspace constituents by their right-half relative constituent numbers. The numbers may be used in any order, and the material is written out in the order indicated. The channel letters M, l, L, and P are the four that are most frequently used for output, although other tape units or output devices may occasionally be referred to by other COMIT channel letters. The usual way to pro- duce printed output is to write on channel M, which designates the magnetic tape that COMIT uses as a monitor output tape for compiler and interpreter error comments. Use of channel M is convenient, for then all printed output is together. Channel I designates the output tape used to prepare punched cards and would be used if the programmer desired some of his results in the form of punched cards. Channels L and P, for the on-line printer and punch, are less often used because their extensive use slows the computer down unduly. Usually it is preferable to write the material quickly on tape and then later have it printed or punched from the tape while the computer goes on to other programs.
§7.2.2 Line Length A single format A write instruction might refer to any number of workspace constituents and could result in sending to the output as little as a single character or as much as a dozen or more pages of text. In case the material sent to the output does not make up a full line, it may re- quire the execution of a write instruction several times before a line is completed. Since most of the output equipment operates on a line-by-line basis, rather than character by character, COMIT keeps output material of less than a line length in an area of storage called an output buffer. When the line is complete, the contents of the buffer are written out on the appropriate output device. The function of the special workspace double character *. (end of line) is to cause COMIT to print out as a line the current contents of the output buffer.
7.3 Output of Constituents
§7.3.1 Format S Output We have seen how in the case of format A output, plus signs and subscripts are removed and constituent symbols are transliterated in a scheme that allows any character to be printed at any point on the page. Thus the programmer achieves output flexibility. There are times, however, when the programmer would rather print out or punch out material from the workspace in workSpace format, untransliterated, complete with plus signs and all subscripts. For this, format S output is provided. Format S output is a convenient way of looking at what is going on in a program during debugging. It also has a use in storing workspace material temporarily on cards or tape for later reading by a COMIT program. For example, the following rule will print out a copy of the contents of the workspace and leave the workspace undisturbed:
* $ = 1 + l // *WSM2 *
If the workspace contained
A + B/.3, SUB VALl VAL2 + C + *.
the rule would print
A + B/.3, SUB VALl VALZ + C + *.
whereas, if format A output had been specified instead of format S, the output would have been
ABC
§7.3.1.1 Since format S output does not involve transliteration, any *. double characters that may occur in constituent symbols are not interpreted as end-of-line indicators. The length of line is entirely under the control of the format S bell and margin settings. The standard values are the same as for format A, namely, 108 and 120 for printing (channels M and L) and 60 and 72 for punching (channels I and P). However, format A and S do differ on channels other than M, L, I, and P; format A standard settings on other channels are 108 and 120, appropriate for printing, and format S settings are 60 and 72, appropriate for punching.
(Editor’s note: COMIT Revived does not support Format S. The above is included for its historical interest only.)
7.4 Some Output Subroutines
§7.4.0.0 As an illustration of the use of the output instructions, several typical output subroutines will now be described.
§7.4.1 Output of Text Suppose one has a program that generates sentences. The sentences may be of any length and may take up one or more lines. It is convenient to use a shelf to accumulate the words of a sentence as they are generated; then, when the sentence is finished, control is to be sent to a closed subroutine WRITE that will write out the sentence. For this example we will assume that the normal bell and margin settings can be used, for no word will be more than 12 characters. The subroutine is
WRITE $0 = - + 1 + *. // *A23 2, *WAM1 2 3 +
It is to be noted that a space character is put in column one to cause the printer to move to the next line before printing the first line of the sentence; and a *. is provided to end the last line. Each sentence will begin a new line but will automatically continue onto as many lines as needed.
§7.4.2 Centering Captions Suppose now that the workspace contains a short caption of a few words. The caption is expanded so that each constituent is a letter. It is desired to have a closed Subroutine CENTER that will write out this caption centered at the top of the next page. Two solutions are presented here. They both require that shelf 23 have on it half as many spaces as there are characters in a full line. The following is a closed subroutine to do that
little job-- put 60 spaces on shelf 23.
60-SPACES $ = ------------ * * $ = 1 + 1 + 1 + 1 + 1 * * $ // *El, *Q23 1 +
§7.4.2.1 The first solution to our subroutine CENTER is as follows:
CENTER $ = - + 1 * * - + $2 = // *N23 1, *023 2 / * - + $ = *1 + 1 + 2 + *. // *A23 2, *WAMl 2 3 4 +
In this routine, one space is taken off the shelf every time two characters of the caption are added to the right end. In the last rule, everything is brought from the shelf and written out with a leading *1 to cause the paper to feed to the top of the next page.
§7.4.2.2 The second solution works on the same principle, but uses the compress instruction:
CENTER $2 = // *Kl, *Q23 1 / * $0 = // *A23 1 * * $ + $60 + $0 = *1 + 2 + *. // *WAM1 2 3 +
This works because the number of leading spaces needed to center a caption of n characters is half the line length minus half the number of characters in the caption, in this case 60 - n/2. The $60 counts out 60 constituents: one for each two characters in the caption and enough leading spaces to make up 60.
§7.4.3 Aligning Center Words Our next example is the print routine for a KWIC program (key word in context), a special index or concordance program. The input to such a program is a text to be studied. The program searches the text for occurrences of certain specified key words. Then it prints out each occurrence found, together with its surrounding context, for study. The closed print subroutine shown below writes out one IDS-character line consisting of 50 characters of the preceding context, the key word starting in character position 51, and with the rest of the line filled with following context. The routine assumes that the search part of the program has left at least 50 characters of preceding context on shelf 23 and the key word and its following context, at least 55 characters, on shelf 24.
KWIC-PRINT $ = // *A23 1 * * $ + $50 + $0 = - + 2 // *WAM1 2, *A24 1 * * $55 = 1 + *. // *WAM1 2 +
The second rule of this routine sends the first 50 characters of the line to the output buffer. The third rule follows it with the rest of the line and an end-of-line character.
§7.4.4 Printing Columns The last example is a routine COLUMNS that will print out tabular material. Assume that the material to be printed in columns con- sists of words, compressed one word per constituent. Four columns are desired, each of 15-character width. The material has already been gathered onto shelf 23 in such a way that the first four constituents go on the first line, the second four go on the second line, and so on. The problem is to provide the proper number of spaces between items so that the words will start uniformly in columns 1, 16, 31, and 46.
COLUMNS $ = // *N23 1 * * $1 = 1 + --------------- // *EL 2, *Q24 1 COLUMNS * $0 = // *A24 1 * * $ + $-- + $14 + $ + $-- + $14 + $ + $-- + $14 + $ + $-- + $ + - = - - + 2 + 3 + 5 + 6 + 8 + 9 + 11 + 12 + *. - // *WAMl 2 3 4 5 6 7 8 9 10 /
The first two rules serve to insert more than enough spaces between each two items to fill up the space between a word and the one in the next column. It is worth noting that after the expand instruction operates, the expanded string of characters and blanks is left with the relative constituent number one. The shelf instruction then puts it all on the shelf. The fourth rule is hyphenated onto three cards for clarity. It merely counts out the number of characters for each column and throws away the extra spaces. The result is written out a line at a time.
7.5 Summary
§7.5.0.0 Transliteration from input to workspace: letters, period, and comma are unchanged; blank becomes -; end of line symbol *. added at right end; num- bers and remaining punctuation characters prefixed with * and become double characters.
§7.5.0.1 Transliteration from workspace to output: - becomes a blank character; *. signals the end of the line; the * is removed from all other double characters.
§7.5.0.2 Output instructions (ns refers to right-half relative constituent numbers)
*WAMns write in format A on the monitor tape (printed output) *WAIns punched output tape *WALns on-line printer *WAPns on-line punch *WSMns S on the monitor tape (printed output) *WSIns punched output tape *WSLns on-line printer *WSPns on-line punch
For channels M and L, the bell is set at 108 and the margin at 120. For channels I and P, the bell is set at 60 and the margin at 72. For other channels, format A bell and margin are 108 and 120, format S bell and margin are 60 and 72.
INPUT SUBSCRIPT CONVERSION
§8.0.0.0 The facilities for data input in COMIT are extremely flexible, offer- ing the programmer many options. In this chapter, we will first describe the standard input facilities, without describing any of the special options. These standard facilities should be adequate for many applications. Second, there will be a detailed discussion of the various special input options available. Then the instructions for converting subscripts to symbols and symbols to subscripts in the workspace will be described. They are described at this point because they share some of the conventions used in input.
8.1 Standard Input Facilities
§8.1.0.0 Data to be read into the workspace is usually in the form of a deck of punched cards, but it may be material that has been written out onto tape or other storage medium by a program, or it may result from typing on a typewriter console. In any case, the data that is to be read is organized into records and files. A record corresponds to a line of typing and may consist of the information from one card, or it may represent a string of up to 3072 char- acters. A file consists of a group of records terminating in an end-of-file indication. The end-of-file indication may be the physical end of a deck of cards, it may be a card with a special punched pattern that varies from one computer installation to another, or it may be a distinctive mark on tape. Whatever the end-of-file indication, COMIT reacts to it in the same way: The read instruction fails, and control goes to the next rule.
(Editor’s note: In COMIT Revived, a record is a text line terminated by newline. The 3072 length limit is not enforced.)
§8.1.1 Format C Input One of the easiest ways of providing a COMIT program with data is to punch the data on cards and place these cards right after the END card of the program. Suppose, as an example, the data consists of the two cards:
SOME TEXT $ MORE TEXT.
The following rules to read this data could be incorporated in the program:
READ $ = // *RCK1 ** * DONE
The first time the first rule is executed, it will read the first data card into the workspace. The workspace will then contain
S + O + M + E + - + T + E + X + T + - + *$ + *.
When the card is read, each character goes into the workspace as a single con- stituent, transliterated according to the scheme of chapter 7.
§8.1.1.1 The next time the rule is executed, the workspace would contain
M + O + R + E + - + T + E + X + T + . + *.
§8.1.1.2 Then the next time the rule is executed, it would encounter an end-of- file indication. The behavior of any read instruction on reading an end-of-file is the same. A null constituent is introduced into the workspace, and the rule fails in the routing, control going immediately to the next rule. The rest of the routing and the go-to are not executed.
§8.1.1.3 The read instructions are similar to the write instructions. They can be compactly listed as follows:
§8.1 Standard Input Facilities 140
*RCK1 ││││ ││││ │││└────── refers to a right-half relative constituent number ││K means regular input, following the END card of the program ││R means on-line card reader ││ or other characters for other tape units, etc. │C means format C (cards) │A means format A (a character per constituent) │T means format T (text) │S means format S (subscripts) *R means read input
(Editor’s note: in COMIT Revived, K and R are both standard input. Only formats C and T are implemented.)
§8.1.1.4 In the above list, the number refers to a right-half relative constit- uent number, and it specifies where in the workspace the material is to be put that has been read. The channel letters K and R designate which input device is to be read. If K is used, the input cards can be stacked behind the END card of the COMIT program and read in from the same tape unit.
§8.1.2 Format A Input Format A input is exactly like format C input except that it reads only one character each time it is executed, instead of a card or record.
§8.1.3 Format T Input Format T input is an input format especially adapted for large quantities of text. It is fast. It reads a card or record just as format C does, but instead of bringing in each character as a separate constituent, it brings in words as separate constituents according to the following simple scheme: Unbroken strings of letters are compressed into single constituents. All blanks are deleted. Numbers and punctuation are brought in one character per constituent. The same transliteration takes place as in formats C and A. Many flexible options are available for defining how characters are to be grouped together.
§8.1.3.1 As an example, suppose we had the same two data cards as before:
SOME TEXT $ MORE TEXT.
The following rules would read this data in format T:
READ $ = // *RTK1 ** * DONE
The first two times the rule READ is executed, these cards would be read into the workspace as:
SOME + TEXT + *$ + *.
and
MORE + TEXT + . + *.
The third time it is executed, it would put a null constituent into the workspace; the rule would fail, control going to the next rule.
§8.1.3.2 In section 6.2.2 Input Control of Program, a subroutine WORD was assumed, which would read the cards following the END card, obtain the next word and return control with the word in the workspace, or return with the workspace empty if an end-of-file is met. The closed subroutine for this is as follows:
WORD $ = // *N23 1 * * $-*. = + * $ = // *RTK1, *823 1, *N23 1 + * +
§8.1.3.3 In this routine, the first rule brings the next item off the shelf. There are three cases: If this item is a constituent and not a *. end-of-record mark, it is assumed to be a word and control is returned with the word in the workspace. If the item is a *. or if there was nothing on the shelf, the next record is read in format T, put on shelf 23, and the first word is put into the workspace and control is returned. If an end-of-file is encountered, the read instruction puts a null constituent into the workspace and the rule fails.
8.2 Input Options
(Editor’s note: Much of what was in this section was closely tied to the IBM 704 environment and has been omitted. What remains is not implemented in COMIT Revived, and probably will not be; it is included for historical interest only. COMSET cards were facility to pass directives to COMIT before the program source)
§8.2.4 Format T Options Format T is especially rich in options. The available options operate by handling differently characters that belong to different categories. There are five categories, and each character is assigned to one category; but a character may be reassigned by a COMSET card. The five categories and the characters initially assigned to them are
LETTERS |
A … Z |
(normally compressed) |
NUMBERS |
0 … 9 |
(normally expanded) |
PUNCTUATION |
. , ( ) = + - ' / $ * |
(normally expanded) |
BLANK |
the blank |
(normally deleted) |
SPECIAL |
none |
To reassign a character to another category, a COMSET card is used that mentions the name of the category and the character or characters that are to be changed. to it. Thus to change the blank and the comma to the category called NUMBERS, one would use
NUMBERS BLANK ,
or to change the asterisk to the SPECIAL category, one would use
SPECIAL *
§8.2.4.1 During a format T read operation, unbroken strings of characters assigned to the LETTERS category are compressed into single constituents. This method of compressing words into single constituents is much faster than doing it by a COMIT program after format C input.
§8.2.4.2 The treatment of unbroken strings of characters of other categories depends on which options are in force, and options can be changed by COMSET cards. Strings of characters assigned to the NUMBERS category will be sent to the workspace either in expanded form with one NUMBER character per constituent or in compressed form with the string of consecutive NUMBER characters as one constituent. To change from the normal expanded-form option to the compressed-form option, the following COMSET can be used:
K NUMBERS
To change back to the normal option, the following COMSET can be used:
E NUMBERS
Unbroken strings of characters belonging to the PUNCTUATION category are handled similarly. The relevant COMSET cards are
K PUNCTUATION E PUNCTUATION
The expanded form is the normal option.
§8.2.4.3 There are four format T Options concerned with the treatment of blanks and characters assigned to the BLANK category. The COMSETS for them follow. The normal option
NO BLANKS
deletes characters belonging to this category. The COMSET
ONE BLANK
retains as a separate constituent only the first of a string of one or more BLANK characters. The COMSET
E BLANKS
retains each BLANK character as a separate constituent. Finally, the COMSET
K BLANKS
retains a string of one or more BLANK characters compressed into a single constituent.
§8.2.4.4 Normally, no characters are assigned to the SPECIAL category. The assignment of a character such as the asterisk to the SPECIAL category allows the handling of text punched according to the format of Newman, Swanson, and Knowlton. footnote[S. M. Newman, R. W. Swanson, and K. C. Knowlton, "A Notation System for Transliterating Technical and Scientific Texts for Use in Data Processing Systems," in Advances in Documentation and Library Science, ed. Jesse H. Shera (New York: Wiley, 1957-), vol. 3, Information Retrieval and Machine Translation, pt. 1 (1960), pp. 345-376.] The processing associated with SPECIAL characters takes precedence over any other format T options. It is discussed below.
§8.2.4.5 One of the ways of punching text on cards is to fill the first 72 columns of each card with text, breaking words without hyphens by running directly from column 72 of one card to column one of the next. The advantage of this method of punching is that hyphenation problems are eliminated and fewer cards are needed for a given text. The proper COMSET for reading such cards would be CHANNEL K IN MARGIN 72 FILL LOGICAL
§8.2.4.6 But when cards are punched full of text, it is difficult to make corrections that involve changing the number of characters. The first function of the SPECIAL character option is to implement a solution to this problem. Suppose we assign the asterisk to the SPECIAL category. Then, whenever the asterisk is followed by one or more spaces or the end of the physical record, it is deleted, along with the string of spaces. When a correction involves eliminating one or more characters, the new card is punched and an asterisk punched in column 72, or earlier, immediately after the shorter corrected text. Or if the correction involves adding characters, an extra card must be inserted with a few characters on it followed immediately by an asterisk and blanks. In either case the effect is as if the text were again continuous, for the asterisk and the string of blanks following it are deleted. The text after these deletions is then further processed. Specifically, a string of letters preceding the deleted characters, for example, is compressed with a string of letters following the deleted characters.
§8.2.4.7 The other aspect of the SPECIAL character option allows the special abbreviations and notations proposed by Newman, Swanson, and Knowlton that make C possible the complete encoding of printed material, including the indication of capitalization, italics, Greek letters, and so forth. For example, a single asterisk before a letter designates an uppercase letter and is not compressed with the letter. The symbols ( and ) mean, respectively, begin and end a string of capitalized words; C means colon; *= and *$ mean begin and end italics; Y means the next letter represents a lowercase Greek letter, and so on.
§8.2.4.8 The SPECIAL character option of format T makes it possible to create such markers as single constituents in the following way: A string of two or more SPECIAL characters is compressed as a constituent with the following LETTER, NUMBER, or PUNCTUATION character. A string of two or more SPECIAL characters followed by a BLANK or a *. is compressed into a single constituent after the last special character and any BLANK characters have been deleted. A single SPECIAL character is compressed with a following PUNCTUATION character. Other single special characters become single constituents except those followed by a BLANK or *., which are deleted as explained above.
8.3 Subscript Conversion
§8.3.0.0 At this point we introduce four very handy right-half subscript instructions. They are related to input and output because they involve transliteration and the same question of legality of subscripts that figures in format S input.
§8.3.1 Numerical Subscript to Symbol If the workspace contains
A/.286 + B/.100
and the following rule is executed:
* A + B = 1 + 2/*CNS1, .3, SUB *
the workspace will be left with
A/.286 + *2*8*6/.3, SUB
The abbreviation *CNSl means to convert the numerical subscript to a symbol from the constituent having a left-half relative constituent number 1, As is clear from the example, the numerical subscript is transliterated into the workspace notation for a symbol, and this symbol replaces the number 2 constituent in the workspace. If there is no such subscript, no operation takes place and there is no error message.
§8.3.2 Symbol to Numerical Subscript The reverse operation can also be per- formed. Suppose the workspace contains
*2*8*6 + A/.100
and the following rule is executed:
* $1 + $1 = 1 + 2/*CSN1, .I2 *
the workspace will be left with:
*2*8*6 + A/.288
The abbreviation *CSN1 means to convert the symbol to a numerical subscript from the constituent having the left-half relative constituent number 1. The symbol that is to be converted to a numerical subscript must be such that it will be a legal numerical subscript, that is, it must be from *8 to *3*2*7*6*7. It may have leading zeros: That is *0*0*0*0*6 and *0*0*0*1*2*3 would be legal.
§8.3.3 Logical Subscripts to Symbols Suppose the following constituents have been placed in the workspace by a right-half expression or by a format S input operation:
L/.5, A 1 2, B 1 2 + M
and the following rule is executed:
* $1 + $1 = 1 + 2/*CLS1, S V *
The workspace will be left with:
L/.5, A 1 2, B 1 2 + B/S V, 2, 1 + A/2, 1
The abbreviation *CLS2 means to convert the logical subscripts into symbols from the constituent having the relative left-half constituent number i. As can be seen from the example, each logical subscript becomes a constituent. Subscript values, if there are any, become subscript names on the new constituents, and thus these value names must also be found among the subscript names and rule names that exist in the program, or there will be an interpreter error. Any numerical subscript is not converted. This string of constituents replaces the currently evolving workspace constituent. If the *CLS instruction is followed by other subscript instructions, they operate on the first constituent of the string. If there are no logical subscripts in the workspace to be converted, no operation takes place and there is no error message.
§8.3.3.1 Note that the constituents formed from the subscripts in the above example are not in the same order that they were in before being placed in the workspace. As was explained in section 5.2.5, the order of subscripts on a con- stituent and the order of values in a logical subscript are not significant, so that A/S, T is the same as A/T, 8 when written in a rule. Actually, COMIT, for speed of search, keeps subscripts and values in the workspace in a standard order depending on the internal numerical code for their names. This order can be discovered by noting the order of subscripts and values as printed by a COMDUMP or with format S output. The order of the constituents obtained by converting two or more subscripts is the same as the internal order of the subscripts.
§8.3.4 Symbol to Logical Subscripts If the workspace contains
A-*l-*2,-B-*3-*4,.*2*5 + C/.5, SUB
and the following rule is executed:
* $1 + $1 = 1 + 2/*CSL1, D *
the workspace will be left with
A-*l-*2,-B-*3-*4,.*2*5 + C/.25, SUB, D, B 4 3, A 2 1
The abbreviation *CSL1 means to convert the symbol into logical (and numerical) subscripts from the constituent having the left-half relative constituent number 1. The symbol to be converted must be the proper workspace transliteration of a legal subscript expression. The numerical subscript must be within the correct range, and the logical subscript names and value names must be chosen from among those actually used in the program. The converted subscripts are merged with the other subscripts and values of the constituent.
8.4 Summary
§8.4.0.0 The following table summarizes the four convert instructions. In each case the constituent with left-half relative constituent number n remains unchanged.
- *CNSn
-
convert the numerical subscript from the workspace constituent with the left-half relative constituent number n to a symbol replacing the current evolving workspace constituent. If there is no such subscript, do nothing.
- *CSNn
-
convert the symbol of the workspace constituent with the left-half relative constituent number n to a numerical subscript and place it on the current evolving workspace constituent. (The number must be in the range from 0 to 32767 inclusive.)
- *CLSn
-
convert the logical subscripts of the workspace constituent with the left-half relative constituent number n to constituent symbols replacing the current evolving workspace constituent. If there are no such subscripts, do nothing. Convert the values of each of the logical subscripts to subscript names on the appropriate constituents. (The value names must be legal as subscript name in the program.) Additional subscript expressions after the *CLS operate on the first of these constituents.
- *CSLn
-
convert the symbol of the workspace constituent with the left-half relative constituent number n to logical and numerical subscripts and values, and place them on the current evolving workspace constituent. (The symbol must be the properly transliterated representation of a legal numerical and logical subscript expression that could be read in format S without error.) Conversion proceeds through the symbol until either the end of the symbol or a *+ is reached.
§8.4.0.1 Summary of input instructions (n refers to a right-half relative constituent number):
*RCK__n__ read in format C from the input tape *RAK__n__ A *RTK__n__ T *RSK__n__ S *RCR__n__ C from the on-line reader *RAR__n__ A *RTR__n__ T *RSR__n__ S
§8.4.0.2 Summary of input processing steps, in their logical order and the relevant COMSET cards:
For format A, send the record to the input buffer, from which one-character constituents go to the workspace, one for each read instruction executed.
For format C, send the record to the workSpace, one character per constituent.
For format T, execute the SPECIAL character options below and then send the record to the workspace.
If any characters have been assigned to the SPECIAL category:
SPECIAL *
Assign the * to the SPECIAL category. The normal option is that no characters are assigned to this category.
- One SPECIAL before LETTERS or NUMBERS:
- The SPECIAL becomes a constituent.
-
One SPECIAL before PUNCTUATION: The SPECIAL and one PUNCTUATION become a constituent.
- One SPECIAL before one or more BLANKs:
-
Delete the SPECIAL and string of BLANKS.
- One SPECIAL before end-of-record (*.):
-
Delete the SPECIAL.
- String of SPECIALS before LETTERS, NUMBERS, or PUNCTUATION:
-
The string of SPECIALS and the following character become a constituent.
- String of SPECIALS before one or more BLANKs:
-
Delete the last SPECIAL and string of BLANKs; the rest of the SPECIALS bei come a constituent.
- String of SPECIALS before end-of-record (*.):
-
Delete the last SPECIAL. The rest of the SPECIALS become a constituent.
Then the remaining options in format T: Compress all strings of LETTERS in single constituents.
E NUMBERS
Expand NUMBERS into separate constituents. This is the normal option.
K NUMBERS
Compress all strings of NUMBERS into single constituents.
E PUNCTUATION
Expand PUNCTUATION into separate constituents. This is the normal option.
K PUNCTUATION
Compress all strings of PUNCTUATION into single constituents.
NO BLANK
Delete all BLANK characters. This is the normal option.
ONE BLANK
In each string of BLANK characters, delete all but one.
E BLANKS
Expand BLANKS into separate constituents.
K BLANKS
Compress all strings of BLANKS into siigle constituents.
§8.4.0.3 Examples of some COMSET cards changing the assignment of characters to classes are
LETTERS 3 5 NUMBERS A C E PUNCTUATION BLANK BLANKS $ +
8.5 Problems for Chapter 8
-
What would be the best format T options to read cards containing 10 columns of numbers where it is desired to ignore sequence numbers? Assume normal options are in force. Give the COMSETS to change to the proper options for channel Z. (Answer)
-
It is desired to read cards with text, compress words into single constituents, and delete all other characters. What are the proper COMSETs for format T reading? (Answer)
-
The input tape has text on it. It is desired to use format T to read this text into the workspace one constituent per character, but abbreviated by re- moving all vowels, A, E, I, 0, U. Give the COMSETs. (Answer)
-
A word-count program has counted words of a text and left the results on shelf 4. For each different word found in the text, there is a constituent with the word as a symbol and a numerical subscript for the observed frequency of that word. Example: THE/.1089. It is desired to print the results with the numbers in columns 2 through 5 and the words starting in column 10. Write the routine. (Answer)
-
It is desired to read text in format T and then to convert each word to a subscript name. Write a test to determine whether a proposed word in the workspace (one constituent) can safely be converted to a subscript name. (Answer)
-
Write a closed subroutine NUMBER that will return with a new line number or page number as a constituent symbol in the workspace. Assume that the current number is kept on shelf 25 as a numerical subscript initialized as *N/.1. (Answer)
Appendix
Answers to problems for chapter 2
A: (Back)
* O + R = 1 + U + 2 *
B: (Back)
* O + U + R = 1 + 3 /
C: (Back)
* B = 0 /
D: (Back)
* B = C /
E: (Back)
* X + B = 2 + 1 *
F: (Back)
* B = X + 1 *
G: (Back)
* X + B = 2 + 1 /
H: (Back)
LOOP A = B * * A + X = LOOP
an alternative solution:
* A + X = B + 2 * * A + B = B + 2 /
Answers to problems for chapter 4
A: (Back)
* $0 + A + $0 = YES
B: (Back)
* $0 + $0 = EMPTY
C: (Back)
* $9 $2 + $- A = FIND
D: (Back)
* $0 + $2 + $-A = FIND
E: (Back)
* $4 + $1 = 1 *
F: (Back)
* $1 + $0 = 0
G: (Back)
* $-Z + $0 = 1 + Z *
H: (Back)
* L + $ + L + $1 = 1 + 2 + 4 + 3 *
I: (Back)
* $1 + 1 = 1 + 2 + X *
J: (Back)
* A + A + $ + $-A = l + 4 *
K: (Back)
* $0 = // *A23 1 *
L: (Back)
* $ + $0 = // *A23 2 *
M: (Back)
* A + $0 + B = // *N23 2 *
N: (Back)
* $ = 1 + 1 // *023 1, *024 2 *
O: (Back)
* $0 = // *A22 1, *A23 1, *Q22 1 *
P: (Back)
* $0 + $0 + $ = // *A22 1, *A23 2, *Q22 2, *Q23 1 *
Q: (Back)
* $0 = // *N23 1, *X23 *
R: (Back)
* $0 + $0 + $ = // *N23 1, *N23 2, *X23 *
S: (Back)
* $1 = // *S23 1 / * $ = // *A23 1 *
T: (Back)
* // *X23 * * $1 = // *S23 1 /
U: (Back)
Solution 1
* $ = // *A12 1 * * $1 + $ = 1 + 2 + NOT // *Q12 1 2 ** * $ = EMPTY *
Solution 2
* // *X12 * * $O + $0 = EMPTY ** * $ = 1 + NOT // *Q12 1 *
Solution 3
* // *Xl2 * * $1 = // *X12 ** * $ = EMPTY ** * $ = NOT *
Answers to problems for chapter 5
A: (Back)
* A/.GO = 1/.D1 LOOP
B: (Back)
* A/.G36, .L90 = LOOP
C: (Back)
First solution
* A/.L37 = LOOP * A/.G89 = LOOP
Second solution
* A/.G36, .L9O = ** * LOOP
D: (Back)
* $1 + $1 + $1 = SUM/.*1, .I.*2, .I.*3 *
E: (Back)
* $1 = 1/.1 * * $1 + $1 = 1 + 2/.*1, .I1 // *Q23 1 / * $0 = // *A23 1 *
F: (Back)
* $0 = A/.0 * * $0 + $1/.L5 + $ = 2/.I1 + 3 + 3 / * $1= 0 *
G: (Back)
* $1 + $ + $1/.G.*1 = // *Q23 1 2 / * $0 + $1 = // *A23 1, *S23 2 *
H: (Back)
* $ = A/.51 + B/.55 * * $1/.G1 + $1 = 1/.D1 + 2/.D1 + $0 // *A*1 3, *Q*2 3 / * $ = 0 *
I: (Back)
PUT $0 = // *N20 1 * * $0 + $1/.L50 + $ = 2/.I1 + 3 // *Q*1 2, *S20 1 + * $1 = // *S20 1 FULL FETCH $0 = // *N20 1 * * $0 + $1/.G20 = 2 + 2/.D1 // *A*1 1, *S20 2 + * $1 = // *S20 1 EMPTY
J: (Back)
* GEORGE/MARRIED YES = YES * NO
K: (Back)
* GEORGE/EATS- = YES * NO
L: (Back)
* $1/A + $ + $1/A + $ + $1/A = YES
M: (Back)
* $0 + $1 + $1/FURNITURE TABLE DESK = YES
N: (Back)
* $1 + 1 = YES
O: (Back)
SEARCH $1 + $ = // *A22 2 * * $1 $ + 1 + $1 // *022 2, *023 3 4 / * $1 $ = // *022 2 +
P: (Back)
* $1 1/Q- C *
Q: (Back) Because there will then be no overlap of values and the rule will replace the last subscript value by all values except this last value.
R: (Back) Put on the subscript Q in the workspace a value Z that is never removed and that has no other use in the program than to assure an overlap of values.
S: (Back)
* $1 = 1/Q*C, Q- C, Q*C *
T: When trying to add the last value, or if all values occur originally on Q and one tries to add C, values would be replaced rather than merged. Q would end up with only the value C. To prevent this, define a value Z (by mention- ing it somewhere in the program) that will never be added to the workspace subscript.
U: (Back)
* $1 + $1/SQUARES*1 = FIT
V: (Back)
PLAY $1 + $1 l/SQUARES*C + 2 * * $1 + $1 2/SQUARES*1 +
Answers to problems for chapter 6
A: (Back)
$0 - *A/NUMB EVEN * $1 + $1 = 1/NUMB*C + 2 // *Q23 2 /
B: (Back)
$0 = *A/SUB*D * $0 $1/SUB A = 0 YES $1 0 = NO
C: (Back)
* $0 = *A/SUB*D, SUB*C * * $0 + $1/SUB- = 0 YES * $1 = 0 NO
D: (Back)
* $0 = *A/MAYBE*D * * $0 + $1/MAYBE A SET * $0 + $1/MAYBE B SET * $0 + $1/MAYBE C SET * TRAP SET $0 + $1 = 2/MAYBE A B C, MAYBE*C, MAYBE- D, MAYBE*C // *D1, *A23 1 MAYBE
E: (Back)
A 1 YES 2 YES 3 YES 4 NO 5 NO
F: (Back)
START WORD+WRITE WORD *+CONSONANT * *+VOWEL CONSONANT 1 $ + $0 = l + B + 2 = 1 + C + 3 = 1 + D + 4 = 1 + F + 5 = l + G + VOWEL 1 $ + $0 = l + A + 2 = 1 + E + 3 = 1 + I + 4 = 1 + O + 5 = 1 + U + WRITE $ = - + 1 // *WAM1 2 START
G: (Back)
* // TEST 1 * TEST 1 $ = A // TEST 2 2 = B // TEST 3 3 = C // TEST 4 4 = D // TEST 5 5 = E // TEST 6 6 DONE * $ = 1 + 1 // *Q1 1 STRPR1+OUT OUT $ = // *Q2 1 PRINT+TEST (INSERT STRPR1 AND PRINT HERE) DONE *
H: (Back)
First solution
QUEUE $1 = // *L1 A * EMPTY -A ONE = // *X21 ** TWO = // *X22 ** THREE = // *X23 ** FOUR = // *X24 ** FIVE = // *X25 ** * ERROR * $ = // *S*1 1 +
Second solution
QUEUE $1 = // *L1 A * EMPTY -A ONE = -/.21 ** TWO = -/.22 ** THREE = -/.23 ** FOUR = -/.24 ** FIVE = -/.25 ** * ERROR * $1 + $ = // *Q*1 2, *A20 1
Answers to problems for chapter 8
A: (Back)
CHANNEL Z |N MARGIN 72 K NUMBERS
B: (Back)
BLANK 0 1 2 3 BLANK . , ( ) -
C: (Back)
BLANK A E I O NUMBERS BLANK C D F G H J K L M N P Q R S T V W X Y Z
D: (Back)
PRINT $ = // *N4 1 * * $1 = ---- + -/*CNSl + 1 // *E1 2 ** * ** * $ + $5 + $1 + $0 = 2 + ---- + 3 + *. // *WAM1 2 3 4 PRINT
The routine leaves a blank in column 1 for carriage control.
E: (Back)
$0 = A/$*D // *D- * $1 = 1 + B/$*D // *D1 * $1 + $1 + $1 = 3 + l/*CLS2 * $1 + $ + 1 + $ = 1 YES $1 + $ = 1 NO
F: (Back)
NUMBER $0 = // *N25 1 * * $1 = 1/*CNS1 + 1/.I1 // *S25 2 +
G: (Back)
COM RENUMBER APOSTROPHE LETTERS 0 1 2 3 4 5 6 7 8 9 BLANK LETTERS . , ( ) = + - / $ * ' IN MARGIN 72 FILL * $ = *N/.1 // *S25 1 * AGAIN NUMBER+SPACES