SYNOPSIS
cupl [-f fieldwidth] [-v debuglevel] [-w linewidth]
OPTIONS
The -w options sets the line width (default 80).
The -f option sets the field width (default 20).
The -v option enables debugging output. At level 1, the parse tree is prettyprinted. At level 2, definition/reference counts for each variable and label are printed after each run. At level 3, an execution trace is displayed as the parse tree is printed. At level 4, each token intern and cons-cell allocation during parsing is also dumped. A suffix of y enables parser debugging messages.
DESCRIPTION
CUPL was an early (1966) teaching language implemented as a batch compiler on the IBM/360 at Cornell University. It was descended from an earlier (1962) experimental language called CORC (CORnell Compiler), which was in turn derived loosely from Algol-58. Statements made without qualification about CUPL below also apply to CORC.
CUPL is documented in "CUPL: The Cornell University Programming Language", by R.J. Walker, a manual first printed in November 1966. This implementation is based on the July 1967 second printing.
CORC is documented in "An Instruction Manual For CORC", by R.W. Conway, W.L. Maxwell, R.J. Walker. This implementation tracks the 2nd edition of September 1963.
The purpose of this implementation is to preserve a CUPL/CORC implementation for the edification of historians and students of programming-language design. CUPL and CORC were representative of a significant class of teaching languages in its period, and study of their design casts a clear light on the preoccupations of their time.
Introduction to the Languages
The source distribution includes, in the file 'cupl.doc', a transcription of all the relevant parts of the CUPL manual (the bulk of the text is a general tutorial on scientific programming). Another file, 'corc.doc', similarly excerpts the CORC manual.
CUPL has only one scalar type, a long floating-point real corresponding to C double (round-off rules coerce scalars to integer in contexts like subscripting). It supports vector and matrix aggregates, and has operations specialized for linear-algebra calculations. There is no function abstraction and all variables are global; program chunking is achieved through BLOCK or BEGIN blocks which resemble parameterless subroutines.
CUPL rather resembles early BASICs, minus BASIC’s string facility. It is oriented towards scientific calculation and linear algebra, and would be nearly impossible (or, at any rate, extremely painful) to use for anything else.
The programming-support features of CUPL and CORC resembled those of the better-known WATFOR and WATFIV compilers, incorporating elaborate error-correction and trace output features using a runtime monitor.
The only incompatibility between the CUPL and CORC languages documented was the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it’s go to end of block (which in CUPL is GO TO <block> END. The interpreter switches on CORC interpretation whenever it detects a CORC-specific word (such as NOTE) during lexing.
The CORC statement TITLE and the triple iteration construct have no counterparts in CUPL.
CUPL
We reproduce here a nearly exact transcription of Appendix A of the Walker manual, "CUPL: The Cornell University Programming Language".
Tags of the form [See m-n: …] not part of the original document; they are references (by section-page number in the original manual) to notes which follow the appendix transcription. These notes are also excerpts from the manual.
There is one correction in the text. The original manual listed both LN and LOG as built-in functions and wrote "LOG(a)" is "natural log of a". We believe this is incorrect; we have changed the "LOG(a)" in the original to LN(a) and inserted a new "LOG(a)" entry. This implementation’s LOG function is, accordingly, log_10() and not log_e().
There are a few typographical changes to fit it into the ASCII character set. The differences:
-
^[-+]nnn is used to render exponent superscripts.
-
subscripts are simply appended to their metavariables.
-
"x" between digits is used to render the multiplication sign.
-
lines of dashes below headings indicate underscores.
-
page breaks in the original are represented by form feeds here.
Otherwise, the appendix A transcription is exact, even down to hyphen breaks and spacing. ^L represents a page break. In the following notes, hyphen breaks and exact spacing are not preserved, but the original text is, with the following additional typographical changes:
-
|a| is used to render the absolute-value operation.
-
⇐ and < are used to render non-ASCII symbols.
In the original, a couple of instances of |x * 10n| and |y * 10n| were actually set as |10^nx| and |10^ny|, where ^n represents a superscript. This is excessively hard to read in ASCII.
The combination of Appendix A and the notes includes essentially all the manual’s documentation of the CUPL language itself. We have not transcribed appendix D, "Error Considerations and Actions", nor appendix B, "Functions", because the former depends on the parsing machinery of the original compiler and the latter documents range restrictions and precision constraints for the special functions (which are not duplicated in our implementation).
The following .cupl files, included with this distribution, are also transcribed from the manual. We include every non-pathological program example. The following list maps programs to original page numbers:
cubic.cupl -- 7-7 (boxed) fancyquad.cupl -- 2-8 (coding form example) poly11.cupl -- 7-9 (boxed) power.cupl -- 3-4 (exercise 5b) prime.cupl -- 12-5 (boxed) quadratic.cupl -- 2-2 (boxed) random.cupl -- 10-2 (boxed) rise.cupl -- 3-15 (exercise 7) simplequad.cupl -- 2-1 (boxed) squares.cupl -- 5-7 (output example) sum.cupl -- 3-4 (exercise 5a) ---
We have supplied leading comments for most of these; they are otherwise unaltered.
Appendix A Summary of CUPL ELEMENTS OF THE LANGUAGE Characters ---------- Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Digits: 0 1 2 3 4 5 6 7 8 9 Special Characters: + - * / ( ) . , ' = Numbers ------- Normal decimal usage: e.g. 3, 1.725, -.06 . "Scientific" notation: -1.2E-3 for -1.2 x 10^3 . Truncated to 9 significant figures by the system. Range: Absolute values from 10^-78 to 10^76, and 0 . Variables and Labels -------------------- a. Consist of 1 to 8 letters or digits, beginning with a letter; no blanks or special characters. b. Must not be one of the following "reserved words": ABS DOT IF OR THEN ALL ELSE INV PERFORM TIMES ALLOCATE END LE POSMAX TO AND EXP LET POSMIN TRC ATAN FLOOR LN RAND TRN BLOCK FOR LOG READ WATCH BY GE LT SGM WHILE COMMENT GO MAX SIN WRITE COS GT MIN SQRT DET IDN NE STOP c. Must be unique--same name cannot be used for both a label and a variable. d. Variables have automatic initial value of zero. e. Variables can be scalar, vector, or matrix. [See 8-3 and 9-8] If A is a matrix variable: A refers to the entire matrix. A(I,J) refers to the element at the intersection of the ith row and the jth column A(*,J) refers to the jth column [see 9-2] A(I,*) refers to the ith row [see 9-2] ^L A-2 If V is a vector variable: V refers to the entire vector V(I) refers to the ith component. [See 11-3 for subscript round-off rules] f. A vector is a 1-column matrix g. IDN is the identity matrix of any appropriate size. Arithmetic Operators -------------------- a. +, -, /, * for multiplication, ** for exponentiation b. Normal precedence: **, * and /, + and - Parentheses from inner to outer. Sequence of + and - from left to right. c. Types of operand: +, -, /, *, ** for scalars +, -, *, and numerator of / for vectors and matrices. Spacing ------- No spaces, or splitting at end of line, in any number, variable, label, reserved word, or **. Spaes allowable anywhere else. Functions --------- Form Action Type of Argument ---- ------ ---------------- ABS(a) absolute value of a any expression ATAN(a) arctangent of a, in numerical valued radians expression COS(a) cosine of a, a in numerical valued radians expression EXP(e) e raised to a power numerical valued expression FLOOR(a) greatest integer not numerical valued exceeding a expression LN(a) natural log of a numerical valued expression LOG(a) log to base 10 of a numerical valued expression SQRT(a) positive square root numerical valued of a expression SIN(a) sine of a, a in numerical valued radians expression MAX(a,b,...) maximum value of all any expressions elements of all arguments ^L A-3 MIN(a,b,...) minimum value of all any expressions elements of all arguments RAND(a) next pseudo-random any expression number, or array of numbers, in a sequence in which a was the last DET(a) determinant of a square array valued expression DOT(a,b) dot product of a vector-valued expres- and b sions of equal dimensions INV(a) inverse of a square array valued expression POSMAX(a) row position of maxi- array valued expression mum element of a POSMIN(a) row position of mini- array valued expression mum element of a SGM(a) sigma of a (sum of array valued expression all elements) TRC(a) trace of a (sum of array valued expression elements on prin- cipal diagonal) TRN(a) transpose of a array valued expression Relations --------- Symbol With scalar expressions With array expressions ------ ----------------------- ---------------------- = equals all corresponding ele- ments equal NE not equal to at least one pair of corr. ele. not equal LE less than or equal to all corr. ele. less than or equal GE greater than or equal all corr. ele. greater to than or equal LT less than all corr. ele. less than or equal; at least one less than GT greater than all corr. ele. greater than or equal; at least one greater than [See 11-4 for round-off rules applying to relations] STATEMENTS The following symbols are used in the statement descriptions: v1, v2, ... variables (scalar, vector, or matrix except as noted) ^L A-4 r1, r2, ... relations slabel1, slabel2, ... statement labels blabel1, blabel2, ... block labels e1, e2, ... arithmetic expressions (a meaningful and conformable combination of numbers, variables, functions, and arithmetic operators) Statements should begin in column 10 of the programming form. If continued onto more than one line, the second and aubsequent lines should begin in column 15. Columns 73 to 80 must not be used. Any statement may be given a label--beginning in column 1 of the form. Assignment Statement -------------------- LET v1 = e1 Sequence Control Statements --------------------------- GO TO slabel1 GO TO blabel1 Used only inside block 'blabel'; causes skip to end of block. IF e1 r1 e2 THEN s1 ELSE s2 where s1 and s2 are any type of statement except IF or PERFORM. Either the THEN phrase or the ELSE phrase may be omitted, but one or both must be given. Compound conditions may be used: IF e1 r1 e2 AND e3 r2 e4 AND ... THEN s1 ELSE s2 IF e1 r1 e2 OR e3 r2 e4 OR ... THEN s1 ELSE s2 but AND and OR phrases may not be mixed in the same statement. --- STOP Iteration Control Statements ---------------------------- A 'block' consists of a sequence of statements preceded by blabel1 BLOCK and followed by blabel1 END ^L A-5 A block may be located anywhere in the program; it is executed only by a PERFORM statement calling it by name. Blocks may be nested but not overlapped. A block may contain any kind of statent, including PERFORM, except for a PERFORM referring to the block itself. PERFORM blabel1 PERFORM blabel1 e1 TIMES where e1 has integer value. PERFORM blabel1 WHILE e1 r1 e2 Compound conditions may be used: WHILE e1 r1 e2 AND e3 r2 e4 AND ... WHILE e1 r1 e2 OR e3 r2 e4 OR ... but AND and OR phrases may not be mixed in the same statement. PERFORM blabel1 FOR v1 = e1, e2, ... FOR sv1 = e1 TO e2 BY e3 where sv1 is a scalar vari- able The order of the TO and BY phrases can be reversed; the BY phrase can be omitted if e3 = 1. [See 8-3 for more] Communication Statements ------------------------ READ v1, v2, ... [See 5-2 and 9-2 for description] WRITE v1, v2, ... , 'title message', ... , /v3, /v4 Three types of elements may appear in the list after WRITE: 1. Variable. Prints: name of scalar and current value; name of vector and current values of components; name of each row vector of matrix and current value of components. 2. Variable preceded by a / . Current values only are printed. ^L A-6 3. A message enclosed in single quotes. The exact image will appear as a title on the output. Any characters except the quote may be used in such a message. A message cannot continue onto a second line on the programming form--a separate item in the same or another WRITE state- ment must be used. [See 5-3 and 8-10] for more formatting details] [WRITE ALL See 4-12 and 8-3 for a description] COMMENT A comment line can be inserted at any time by writing COMMENT in the label field (columns 1-7) of the programming form. Such a line will appear in the program listing, but has no effect on execution. Dimensioning of Vectors and Matrices ------------------------------------ ALLOCATE mv1(e1, e2), vv2(e3), ... where mv1 is a matrix variable and vv2 is a vector variable, and e1, e2 and e3 have integer values. When space is initially allocated to an array, the values of all the elements are zero. If space is later changed by another allocation the values of those elements common to the old and the new alloca- tions are unchanged; the values of new elements are zero. [See also 9-1] Tracing Changes in Value During Execution ----------------------------------------- WATCH v1, v2, ... where v1 and v2 are scalar variables. This will cause the system to monitor the values of each of the variables listed and print the new value each time one of the listed variables is assigned a value by a LET or READ statement. This operation is temporary and is automatically discontinued for a particular variable after 10 such assignments. [See 8-3] DATA Data to be read by the execution of the READ statements is provided on the same form, after the last statement of the program. The first data line is indicated by writing *DATA in columns 1 to 5. Data items may be entered on this line beginning in column 7 and in columns 1 tto 72 of any following lines ^L A-7 Data items are separated by commas. An item may be either a number, n1, or an expression of the form v1 = n1 . The latter form is for checking purposes--it must correspond to the variable v1 in the associated READ statement. Data items are read in sequence as the program is executed. To an array with q elements appearing in a READ statement, there must correspond q successive items in the data list. If the array is a matrix, the items must be ordered by rows. --- Other quotes: From 4-12: --- Another statement that can be used for checking purposes is WRITE ALL This will cause the values of all of the variables in the program to be printed. There is no WATCH ALL statement. --- From 5-2: --- If READ v occurs in the program and the corresponding entry on the datalist is w=n, where w is not the same as v, an error message is given. The value n is assigned to v, no change is made in w, and the program continues. Thus the inclusion of items of the type v=n in the data list provides checks against the accidental omission of data or inclusion of extra numbers. If the total data list is too short, the machine will give an error message and will give the value 1 for each of the missing numbers. If the list is too long the extra entries will be ignored but no error message will be given. --- From 5-3: --- Numbers are given to 9 significant figures. A number n /= 0 in the range .0001 <= |n| < 100000 is printed in the usual decimal form, e.g. -327.512736, 0.0243472150 . Zero is printed simply as 0. All other numbers appear in the form mEp with 1 <= |m| < 10, e.g. -2.31562007E+04, 5.00000000E-17 . A line of output is divided into six "fields", each 20 characters long. Each variable name or value occupies one field. The decimal point of a number always comes at the seventh position in a field. Each WRITE statement starts a new line, but within a given WRITE statement fields are used consecutively, new lines being started as necessary. The one exception occurs when the name of a variable would come at the end of a line and its value on the next line; in this case, the last field is left blank and the name starts the next line. A field may be purposely skipped by simply omitting a variable name; for example, WRITE ,A,,,/B will put A and its value in fields 2 and 3, and the value of B in 6, leaving fields 1, 4 and 5 vacant. The statement WRITE will cause a whole line to be skipped. --- From 8-3: --- The only restrictions on the use of a subscripted variable are in PERFORM k FOR v = --- and WATCH v,... Here v must be a non-subscripted variable. Also, WRITE ALL will print only the values of the non-subscripted variables. --- From 8-10: --- 1. A new line is started for each vector and each row of a matrix. 2. Te values of the elements of a vector or a row of a matrix are put successively in fields 2 through 6, repeating as necessary. 3. In field 1 of the first line used for a vector or for a row of a matrix is put the name of the vector or a symbol for the row of the matrix, unless these are suppressed by a slash before the array in the WRITE list. The slash does not change the spacing described in 2 above. 4. A variable, subscripted or not, appearing in the WRITE list immediately after an array, starts a new line. There is one exception to these rules. If M is a matrix with only one column, to save space it will be printed as if it were a vector, that is, in the form M = m11 m21 m31 etc. --- From 9-1: --- Vectors as matrices ------------------- Consideration of relations involving matrices and vectors can be simplified by regarding a vector as a 1-column matrix. This convention is adopted in CUPL, and so, for example ALLOCATE X(7) and ALLOCATE X(7, 1) have precisely the same meaning. After either of these allocations the variables X(2) and X(2, 1) are meaningful and have the same value; X(M, N) is meaningful only if N has the value 1. --- From 9-2: --- For any matrix M, N(*, J) denotes the 1-column matrix (a vector) which is the J-th column of M. Similarly, M(I, *) denotes the 1-row matrix (not a vector) which is the I-th row of M. If M is m x n, then M(*, J) and M(I, *) are m x 1 and 1 x m, and they can be used as matrices of these sizes in any statement except ALLOCATE. For example: READ M(*, 3) will read data into the third column of M, leaving the rest of M unchanged. --- From 9-8: --- So much space is needed to compute INV(A) or DET(A) that the size of A is limited to 40x40 in these expressions. --- From 11-3: --- Automatic Integer Round-off --------------------------- a. The value of a subscript is rounded to the nearest integer. b. If the round-off involves a change of greater than 10^-9 (approximately) an error message is given. --- From 11-4: --- Automatic Relative Round-off for x r y -------------------------------------- a. If both x and y are zero the condition is applied as it stands. b. If either x or y is not zero: (i) Both x and y are multiplied by 10**n, where n is chosen so that the larger of |x * 10**n|, |y * 10**n| lies between .1 and 1. (ii) x * 10**n and y * 10**n are truncated to 14 decimal places (iii) The specified condition is interpreted on the resulting numbers. ---
CORC
We include here a transcription of appendix F of our reference document, "An Instruction Manual For CORC", R.W. Conway, W.L. Maxwell, R.J. Walker.
There are a few typographical changes to fit it into the ASCII character set. The differences:
-
^[-+]nnn is used to render exponent superscripts.
-
The square root radical sign surrounding b is rendered as b^-2
-
|a| is used to render the absolute-value operation.
-
"x" between digits is used to render the multiplication sign.
-
lines of dashes below headings indicate underscores.
-
page breaks in the original are represented by ^L here.
Error corrections:
-
Under "Sequence Control Statements", the IF keyword in the three if-statement templates was erroneously typed as "If".
-
Under "Iteration Control Statements", the first AND keyword in the compound-AND example was incorrectly lowercased. In item 1 of the BEGIN explanation, the word "statement" was incorrectly uppercased.
Notes:
-
Tags [See *1], [See *2], etc, are not part of the original; they reference footnotes following the transcript.
-
Tags such as [See 5-8] are not part of the original, they reference other quotes from the text given below by page number.
Otherwise, the appendix F transcription is exact, even down to hyphen breaks and spacing (allowing for the fact that the original typewriter spacing was somewhat irregular…).
The combination of Appendix F and the notes includes essentially all the manual’s documentation of the CORC language itself. Most of the text is tutorials and exercises.
This implementation preserves the CORC-62 distinction between LOG and LN, contrary to the transcript below which describes CORC-63 (which identifies both with the natural-log function). CORC-62 also lacked the INT function, allowed only non-compound logical expressions in REPEAT…UNTIL, and allowed an alternate spelling "TIME" of "TIMES".
The following .corc files, included with this distribution, are also transcribed from the manual. We include every complete program example. The following list maps programs to original page numbers:
simplecorc.corc — 4-6 gasbill.corc — 4-9 hearts.corc — 4-10 sumsquares.corc -- 5-4 powercorc.corc -- 5-6 factorial.corc -- 5-6 quadcorc.corc -- 5-9 title.corc -- 7-3 (note: this one uses continuations) ---
We have supplied leading comments and test data for these programs; they are otherwise unaltered.
APPENDIX F Summary of the CORC Language Acceptable Characters --------------------- Letters: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Digits: 0 1 2 3 4 5 6 7 8 9 Special Characters: + - / ( ) * $ , . = Numbers ------- Normal decimal usage -- + sign may be omitted; decimal point may be omitted from an integer. Only 11 significant figures will be considered (8 on Burroughs 220) Output: 8 significant figures. Acceptable range: Absolute values from 10^-308 to 10^+308 (10^-49 to 10^+50 on Burroughs 220), and 0. Scientific notation: 1.2345 x 10^6 may be written 1.2345*10$(6). Variables and Labels -------------------- a. 1 to 8 letters or digits; no blanks or special characters. b. First character must be a letter. c. Must not be one of the following "reserved" words: ABS DECREASE GO LET NOTE STOP AND ELSE GTR LN OR THEN ATAN END IF LOG RAND TIMES BEGIN EQUAL INC LSS READ TITLE BY EXP INCREASE MAX REPEAT TO COS FOR INT MIN SIN UNTIL DEC GEQ LEQ NEQ SQRT WRITE d. Statement labels must be unique -- particular label appears only once in label field. Block label used for only one set of BEGIN-END statements. e. Variables must be listed in Dictionary. If no initial value is given zero is assumed. [See *1] f. One or two subscripts, enclosed in parentheses. May be any expression (including subscripted variables) whose value at execution is a positive integer not greater than the maximum declared for this variable in Dictionary. [See D-2] In Dictionary and Input Data subscripts given as integers in subscript field; parentheses are not used. [See *1] ^L Arithmetic Operators -------------------- + Additions - Subtraction / Division * Multiplication (must be expressed, no implicit multiplication) $ Exponentiation; a^b written as a $(b) Rules of Precedence ------------------- a. Expressions in parentheses first, from inner to outer. b. $ before *, /, + or - c. * or / before + or - d. Sequence of + and - from left to right Functions --------- Argument any expression, a: ABS(a) |a| EXP(a) e^a SIN(a) sin a, a in radians COS(a) cos a, a in radians ATAN(a) arctan a, a in radians INT(a) [a], greatest integer less than or equal to a Arguments two or more expressions, a, b, ... f: MAX(a, b, ... , f) value equal to greatest of expressions listed MIN(a, b, ... , f) value equal to least of expressions listed Argument any positive expression, b: LN(b) or LOG(b) log b, natural logarithm of b e Argument any non-negative expression, b: SQRT(b) + b^-2 Argument a variable, v: RAND(v) next in a sequence of pseudo-random numbers. See Chapter 7 Relations (used only in IF and REPEAT ... UNTIL statements) --------- EQL = NEQ != LSS < LEQ <= GTR > GEQ >= ^L Card Numbers ------------ Strictly increasing sequence of 4 digit numbers, beginning with 0010. Initially right digit should be zero on all cards to leave room for later additions. Program Arrangement [See *1] ------------------- Arrange and number programming forms in the following order: 1. Preliminary Description 2. Dictionary [See 2-10] List all variables to be used Specify initial values if not zero Specify maximum value for each subscript. 3. Program Statements One statement per line; for continuation place C in column 21 of following line and begin in column 42. Do not break in middle of variable, label, or number. THEN, ELSE, OR and AND phrases of IF statement on following lines, beginning in col 42, but C in col 21 not required. 4. Input Data Variables, Subscripts and values of data to be called by READ statements in program. CORC Statements --------------- The following symbols are used in the statement descriptions: v, w, x variables r, s, t relations a, c statement labels b block label e, f, g, h, j, k arithmetic expressions (any meaningful combination of numbers, variables, functions and arithmetic operators) Computation Statements ---------------------- LET v = e INCREASE v BY e or INC v BY e DECREASE v BY e or DEC v BY e ^L Sequence Control Statements --------------------------- GO TO a Statement with label a to be executed next. GO TO b Used only inside block b; causes skip to END of block b. IF e r f Go to statement a if condition e r f is THEN GO TO a satisfied; otherwise go to statement c. THEN GO TO c IF e r f Go to statement a if all of the conditions AND g s h listed are satisfied; otherwise go to ... statement c. THEN GO TO a THEN GO TO c IF e r f Go to statement a if all of the conditions OR g s h listed is satisfied; otherwise go to state- ... ment c. THEN GO TO a AND and OR phrases cannot be mixed in THEN GO TO c the same IF statement. STOP Last statement in execution of program; not necessarily written last on Program State- ment sheet. Iteration Control Statements ---------------------------- b BEGIN Define limits of a block; b in label field, BEGIN-END. b END statement field. 1. Block may be entered (executed) only by REPEAT statement. 2. Block may be located anywhere in program. 3. Blocks may be nested, but not overlapped. 4. Block b may contain any type of CORC statement, in- cluding REPEAT, but not REPEAT b, ... , a REPEAT Statement referring to itself. REPEAT b e TIMES Value of e a non-negative integer. REPEAT b UNTIL e r f Continue repetition of block b until condition e r f is satisfied. REPEAT b UNTIL e r f AND g s h AND ... Continue repetition of block b until all conditions listed are satisfied. (Not available on Burroughs 220) REPEAT b UNTIL e r f OR g s h OR ... Continue repetition of block b until any one of the conditions listed is satisfied. (Not available on Burroughs 220) ^L REPEAT b FOR v = e, f, g, ..., ..., (h, j, k), ... Repeat block b once for each expression on list, with value of expression assigned to variable v. Three expressions on list enclosed in parentheses mean from h to k in steps of k. [See 5-8] Communication Statements ------------------------ READ v, w, x, ... Read an Input Data card for each variable on list; variables on cards read must agree with variables on list. WRITE v, w, x, ... Print variable and current value, three to a line. Each WRITE statement starts a new line. TITLE message Print "message" in computational results when this statement is encountered in execution of program. NOTE message "Message" will appear in copy of program only, not in execution. Used for program notes only. --- [*1] This implementation of CORC does not support or allow a Dictionary section. Instead, variable initializations must be done via CUPL-style DATA and ALLOCATE statements. [*2] Ignore this section. Program statements are free-format, with continuations not supported (though the example program test/title.corc shows the syntax, it will break cupl). Data for read statements is accepted in CUPL format following the keyword *DATA. For completeness, however, the Dictionary feature is documented here. From 2-10: --- Dictionary of Variables ----------------------- In addition to a set of statements CORC requires a pro- gram to contain another part known as the Dictionary. The Diction- ary of a program is merely a list of all the variables used in the program. along with, if desired, the initial assigned values of the variables. If no initial value is specified the computer assigns the initial value zero. --- From 2-11: --- In the above example the Dictionary might look like this: A 1 B -1 C -6 ROOT X1 X2 --- The above example is the simplecorc.corc program. The CORC Dictionary is equivalent to a CUPL DATA section, but also allowed the programmer to dimension array variables. The example form on 2-12 makes it clear that the Dictionary was distinguished from the program proper by being in a different lower range of card line numbers. From 5-8: --- Three expressions on a list, enclosed in parentheses, are in- terpreted in the following way: 1. The first expression gives the initial value for the vari- able. 2. The second expression gives the difference between con- secutive values. 3. The third expression indicates where to stop -- the final value for the variable is less than or equal to the value of the third expression. [... examples omitted ...] More than one such "triple" may be used on a list, and "triples" may be intermixed with separate expressions; --- From 6-5: --- A particular variable either has no subscripts, one sub- script, or two subscripts and this use must be consistent through- out a program. A variable cannot appear as X(I) in one statement and X(I, J) or just X in another statement of the same program. The nature of a variable (whether it is to be subscripted or not) must be indicated when the variable is listed in the CORC Diction- ary. This is done by giving the maximum value of any subscripts that will be used in columns 21-25 of the Dictionary form. If no subscripts will be used these columns will be left blank in the line for that variable. If one subscript is to be used, the maxi- mum value that that subscript will take on anywhere in the pro- gram must be given in columns 21-23; columns 24-25 are left blank. (Note that this is the maximum value of the subscript, and not the maximum value of the variable.) If two subscripts are to be used the maximum value of the first is given in columns 21-23 and the maximum value of the second in columns 24-25. For ex- ample, if the Dicitonary [sic] looks like the following: SCALAR VECTOR 45 MATRIX 100 2 ARRAY 3 32 then SCALAR is a simple variable that will not have any sub- scripts anywhere in the program. VECTOR will have one subscript everwhere it appears in the program [...] MATRIX [...] will appear each time with two subscripts [...] ARRAY will also always have two subscripts; [...] --- From D-2: --- Automatic Integer Round-off --------------------------- a. The value of a subscript is rounded off to the nearest integer. b. If the round-off involves a change of greater than 10^-9 (approximately; the number is subject to some variation) an error message is given. --- From D-3: --- Automatic Relative Round-off for x r y -------------------------------------- a. If both x and y are zero the condition is applied as it stands. b. If either x or y is not zero: (i) Both x and y are multiplied by MAX(|x|, |y|); (ii) The results are rounded off to the nearest integer if this involves a change of less than 10^-9 (10^-7 for the Burroughs 220), but not otherwise; (iii) The specified condition is interpreted for the resulting numbers. ---
Differences from the original CUPL
The most obvious differences are also the most trivial. CUPL was first implemented on an IBM/360 Model 30; CORC on Burroughs 1604 and 220 machines. Both used a small capital-letters-only character set SIXBIT, and followed the archaic IBM practice of using a slashed-O for alphabetic O and plain 0 for zero. Original CUPL/CORC listings thus look rather odd to the modern eye.
The original CUPL was a batch system with a fixed-field card format; labels in columns 1-8, statements in 10-72, statement continuations beginning in column 15 (CORC’s format differed only in detail from this). In CUPL, data for the program was supplied following a special *DATA label in the same deck as the program; CORC did not require this marker (it is not clear from the CORC documentation how end-of-program was recognized).
On modern output devices, slashed-0 tends to be used, if at all, for zero. We have not tried to preserve IBM’s reversal. Nor have we tried to enforce the columnation requirements, and we don’t implement the continuation convention (new CUPL is free-format, with newlines ignored). We do preserve much of the visual appearance of CUPL listings by insisting on all caps and tab-indenting statements. We also preserve the *DATA mechanism for supplying initializations.
More significant differences arise from differences between the word size and floating-point format in CUPL’s original host and those of typical modern C implementation. The 360 had a 36-bit word; original CUPL scalars ranged from 1e76 to 1e-78 with nine decimal digits of precision. As for CORC: the Burroughs 1604 was documented as having a much wider range, 1e308 to 1e-308 with 11 digits of precision; the Burroughs 220 supported 1e-49 to 1e50 with 8 digits of precision.
Standard C floats are 32 bits and have roughly 1e+38 to 1e-38 range and 9 digits precision; doubles are 64 bits, with range roughly 1e308 to 1e-304 and 19 digits of precision. This implementation use doubles to emulate CUPL/CORC scalars.
We know from the documentation that the original CUPL compiler ran in 64K of core. The present implementation is easily twice that size. However, given the cycle speeds of the 1960s, it certainly runs a good deal faster that original CUPL, even with interpretation overhead.
We don’t implement original CUPL’s error-correction facilities. Though clever, they would make the parser forbiddingly complex, and are anyway much less important in an interactive environment.
There are many limits in original CUPL/CORC that we do not enforce. There is no limit on the length of variable names short of the lexer’s very long token buffer length. There is no hard limit on the number of statements in a program. There is no hard limit on the size of arrays.
While the format of number output does not exactly conform to the original CUPL/CORC rules, it is sufficiently ugly to please any but the pickiest. We implement all of 5.2 except the fixing of the decimal point at position 7 in each field. Instead we simply use printf(3)'s %f and %e at field-width precision.
Also, by default, we wrap after three 20-char fields rather than 6, so as to fit on an 80-column line. Command-line options to change the line and field widths are available.
Unix Implementation Notes.
The CUPL/CORC implementation is built around YACC and LEX. The rest is ANSI C.
The YACC grammar just builds a parse tree, which is passed to interpret() for interpretation. This method requires that all programs are always small enough that the entire tree can be held in memory, but it has the advantage that front and back end are very well separated.
One hack that greatly simplifies the grammar productions is that the lexer actually returns parse tree nodes, even for atoms like identifiers, strings, and numbers. In fact, the lexical analyzer even does label and variable name resolution with the same simple piece of code; each IDENTIFIER token is looked up in the identifier list when it’s recognized, so the parse tree early becomes a DAG. (The -v1 option causes the compiler to dump its parse tree for inspection.)
Most of the smarts are in interpret() and its sub-functions. Because array variables can be re-allocated, the internals have to use a dynamic vector/array type with its own indexing machinery. The code to manipulate this type lives in monitor.c.
Note that much of this machinery is quite generic and could be re-used for other languages with little change.
The implementation trades away some possible efficiencies for simplicity. Most importantly, each value has an attached malloc object to hold its elements, even when there is only one such element (as for scalars) which could reasonably be represented by a static field.
There are some comments in the code which discuss the possibility of a back end that would emit C. This would be easy to do if there were any serious corpus of CUPL/CORC code demanding to be translated. The compiler back end would emit code shaped like the parse tree, which would then link monitor.c as runtime support.
The only nontrivial difference between CUPL and CORC is the interpretation of GO TO <label> when <label> is associated with a block. In CUPL, this is a go to beginning of block; in CORC, it’s go to end of block (which in CUPL is GO TO <block> END. The interpreter sets a flag when it sees any of the appropriate CORC-specific keywords (NOTE, BEGIN, DEC, DECREASE, EQL, GEQ, GTR, INC, INCREASE, INT, LEQ, LSS, NEQ, REPEAT, TITLE, UNTIL, $) during lexing, and execute() modifies its behavior appropriately.
AUTHOR
Eric S. Raymond <esr@snark.thyrsus.com>.