Preface The Z-machine was created on a coffee table in Pittsburgh in 1979. It is an imaginary computer whose programs are adventure games, and is well-adapted to its task, implementing complex games remarkably compactly. They were still perhaps 100K long and the Z-machine seems to have made the first usage of virtual memory on a microcomputer. Further ahead of its time was the ability to efficiently save and restore the entire execution state (something we would do well to rediscover as parallel processing takes over). The design's cardinal principle is that any game is 100% portable to different computers: that is, any legal program exactly determines its behaviour. This portability is largely made possible by a willingness to constrain maximum as well as minimum levels of performance (for instance, dynamic memory allocation is impossible) and by a very primitive operating-system interface (so file-naming issues hardly arise). The strategy is the opposite extreme to that of the C language, which sacrifices predictable behaviour for performance: for instance, a programmer never knows how many bits will make up an int or whether char will be signed. But this is not a historical or theoretical paper, because the Z-machine is widely used in practice to play Infocom and Inform-produced games. It is a standards document which aims to exactly describe the correct behaviour, and is a variorum description in that it describes every different Version of the machine. (However, the Version 6 standard will remain provisional until we have more experience with it.) Why do we need a "standards" document? Since the end of the 1980s, interpreters have been in the public domain which almost properly implement the Z-machine. Good portable source code for these has twice been published. Each interpreter was then ported to many different machines, where its behaviour was subtly altered, usually because the porter noticed a missing feature and added it, or had to guess something. The ports have grown elaborate and corrections are now difficult to propagate. The casual user who downloads an interpreter cannot be sure how accurate it will be: a "new" interpreter (with a beautiful new user interface) may be built on a partly-repaired core which is five years old. One reason for a standard, then, is to increase the pressure to return to a good common release. Players will know what to ask for (can you get interpreter 1.1 for the Mac?) and porters will be aware if the core has changed. More fundamentally, the problem has changed. Until 1993 there were only about 130 story files known, variant forms of 35 games and a few oddments, all produced by the same compiler's code generator. An interpreter could safely be incomplete. For instance, the not opcode was unnecessary since it never occurred in any game. Today there is a quite large base of Inform users and many more games will be in circulation: and designers of these new games want to know what they can depend on. There is also pressure for future extension of the format, so a game itself will need to know what kind of interpreter is running it. 2 Preface So what is "standard"? To call itself "Standard", an interpreter should (as far as anyone knows) obey this document exactly for every Version of the Z-machine it claims to interpret. (There's no problem with a standard interpreter which interprets Version 5 only, for instance.) Each edition of this document will be given a Revision number (from 1.0 upwards), somewhat like the JFIF identification number used by the JPEG standard. A standard interpreter should communicate the revision number it obeys in three ways: (a) To someone downloading it from an FTP site or bulletin board: by including it in its filename. (b) To the player: for instance by means of an "information" option on a menu, or in an initialisation sequence. (c) To the game: by writing it into bytes in the header which were always left zero before this standard was devised (see §11). A game compiled with Inform library 5/12 or later prints the revision number in its banner (if this isn't 0.0). Few arbitrary choices have been made in writing this document. On the few points where Infocom's own shipped interpreters disagree it has usually been possible to decide which was "correct". Elsewhere, minimum levels of performance have been invented where necessary. (For example, a minimum call-stack size is needed for programmers to be sure of what level of recursion is safe. The call-stack size currently used by Zip has been taken as the standard.) Existing interpreters are close to the standard already. Most "difficult" features (colours, fonts, sound effects, pictures, etc.) are optional, so that a port only needs to set some header bit to indicate that it can't oblige. The big exception is timed input (in which an interrupt routine is run every few tenths of a second while the player is deciding what to type). Some ports can't manage this for operating-system reasons, others can but don't because it's too much trouble. In Infocom's specification the feature is mandatory, but many ports of Zip ignore it. In this document it is optional and a new header bit has been allocated: see §11. The very few paragraphs which actually extend the Infocom format, such as the one describing this header bit, are marked ***. Terminology It is assumed that the reader is familiar with terms like `object', `tree', `attribute', `property', `local and global variable'. (See Chapter I of the Inform Designer's Manual for explanation of these.) So far, eight Versions of the Z-machine exist, and the first byte of any "story file" (that is: any game program in the Infocom format) gives the Version number it must be interpreted under. The opcode names used in this document are those used by Inform 5.4 and later. The names are extended from those chosen by Mark Howell for his disassembler Txd and were agreed on between him and the author as a standard set. We hope this will provide interpreter writers and others with a common lexicon, and it would be helpful if future interpreter sources use these names internally. 3 Preface Hexadecimal numbers are written with an initial dollar, as in $ff, while binary numbers are written with a double-dollar as in $$11011, according to Inform conventions. The bits in a byte are numbered 0 to 7, 0 being the least significant and the top bit, 7, the most. Where are all the grammar tables? The Z-machine has some lexical acuity but it doesn't contain a full parser: it's like a computer without an operating system. A game program has to contain its own parser and the tables this uses are not part of the formal Z-machine specification. (The Infocom games have similar parsing table formats since all the Versions 1 to 5 games used a parser which slowly evolved from the `Zork I' parser.) Inform's parsing table format is documented in the Inform Technical Manual. For the usual format of Infocom's parsing tables, see the C source code to Mark Howell's utility "Infodump". Acknowledgements There is an obvious resemblance between an unreadable script and a secret code; similar methods can be employed to break both. But the differences must not be overlooked. The code is deliberately designed to baffle the investigator; the script is only puzzling by accident. - John Chadwick, The Decipherment of Linear B The Z-machine was originally devised by Joel Berez and Marc Blank in 1979. Marc Blank made most of the Version 4 extensions, and Version 5 was created by Dave Lebling (with contributions from others including Brian Moriarty, Duncan Blanchard and Linde Dynneson). Version 6 was largely the work of Tim Anderson and Dave Lebling. In the reverse direction, decipherment is mostly due to the InfoTaskForce (David Beazley, George Janczuk, Peter Lisle, Russell Hoare and Chris Tham), Matthias Pfaller, Mike Threepoint, Mark Howell and Paul David Doherty. (Only a few of the pieces in the jigsaw were placed by myself.) I gratefully acknowledge the help of Paul David Doherty and Mark Howell, who each drafts of this paper and sent back detailed corrections; also, of Stefan Jokisch and Marnix Klooster who have put a great deal of work into the fine detail of the specification; and of all those who commented on the circulated draft, whose comments were mainly presentational but no less important for that. Mistakes and misunderstandings remain my own. Graham Nelson St Anne's College, Oxford 15 November 1995 4 1 The memory map 1.1 The memory map of the Z-machine is an array of bytes with "byte addresses" running from 0 upwards. This is divided into three regions: "dynamic", "static" and "high". Dynamic memory begins from byte address $00000 and runs up to the byte before the byte address stored in the word at $0e in the header. (Dynamic memory must contain at least 64 bytes.) Static memory follows immediately on. Its extent is not defined in the header (or anywhere else), though it must end by the last byte of the story file or by byte address $0ffff (whichever is lower). High memory begins at the "high memory mark" (the byte address stored in the word at $04 in the header) and continues to the end of the story file. The bottom of high memory may overlap with the top of static memory (but not with dynamic memory). 1.1.1 Dynamic memory can be read or written to (either directly, using loadb, loadw, storeb and storew, or indirectly with opcodes such as insert_obj and remove_obj). 1.1.1.1 By tradition, the first 64 bytes are known as the "header". The contents of this are given later but note that games are not permitted to alter many bits inside it. 1.1.1.2 It is legal for games to alter any of the tables stored in dynamic memory above the header, provided they leave the tables in legal states. 1.1.2 Static memory can be read using the opcodes loadb and loadw. It is illegal for a game to attempt to write to static memory. 1.1.3 Except for its (possible) overlap with static memory, high memory cannot be directly accessed at all by a game program. It contains routines, which can be called, and strings, which can be printed using print_paddr. 1.1.4 The maximum permitted length of a story file depends on the Version, as follows: V1-3 V4-5 V6 V7 V8 128 256 576 320 512 1.2 There are three kinds of address in the Z-machine, all of which can be stored in a 2-byte number: byte addresses, word addresses and packed addresses. 1.2.1 A byte address specifies a byte in memory in the range 0 up to the last byte of static memory. 1.2.2 A word address specifies an even address in the bottom 128K of memory (by giving the address divided by 2). (Word addresses are used only in the abbreviations table.) 1.2.3 *** A packed address specifies where a routine or string begins in high memory. Given a packed address P, the formula to obtain the corresponding byte address B is: 2P Versions 1, 2 and 3 4P Versions 4 and 5 4(P + Ro) Versions 6 and 7, for routine calls 4(P + So) Versions 6 and 7, for print_paddr 8P Version 8 5 1 The memory map An example memory map of a small game Start Contains Dynamic 00000 header 00040 abbreviation strings 00042 abbreviation table 00102 property defaults 00140 objects 002f0 object descriptions and properties 006e3 global variables 008c3 arrays Static 00b48 grammar table 010a7 actions table 01153 preactions table 01201 adjectives table 0124d dictionary High 01a0a Z-code 05d56 static strings 06ae6 end of file Ro and So are the routine and strings offsets (specified in the header as words at $28 and $2a, respectively). Remarks. Inform never compiles any overlap between static and high memory (it places all data tables in dynamic memory). However, many Infocom games group tables of static data just above the high memory mark, before routines begin; some, such as `Nord 'n' Bert...', interleave static data between routines, so that static memory actually overlaps code; and a few, such as `Seastalker' release 15, even contain routines placed below the high memory mark. (The original idea behind the high memory mark was that everything below it should be stored in the interpreter's RAM, while what was above could reasonably be kept in "virtual memory", i.e., loaded off disc as needed.) Note that the total of dynamic plus static memory must not exceed 64K. (In fact, 64K minus 2 bytes.) This is the most serious limitation on the Z-machine (though it has not yet been reached by anyone). Throughout the specification, Versions 7 and 8 are identical to Version 5 except as stated at 1.1.4 and 1.2.3 above. 6 2 Numbers and arithmetic 2.1 In the Z-machine, numbers are usually stored in 2 bytes (in the form most-significant-byte first, then least-significant) and hold any value in the range $0000 to $ffff (0 to 65535 decimal). 2.2 These values are sometimes regarded as signed, in the range -32768 to 32767. In effect -n is stored as 65536 - n and so the top bit is the sign bit. 2.2.1 The operations of numerical comparison, multiplication, addition, subtraction and printing of numbers are signed; bitwise operations, division and remainder-after-division are unsigned. (In particular, since comparison is signed, it is unsafe to compare two addresses using simply jl and jg.) 2.3 Arithmetic errors: 2.3.1 It is illegal to divide by 0 (or to ask for remainder after division by 0) and an interpreter should halt with an error message if this occurs. 2.3.2 Formally it has never been specified what the result of an out-of-range calculation should be. The author suggests that the result should be reduced modulo $10000. 2.4 The Z-machine needs a random number generator which at any time has one of two states, "random" and "predictable". When the game starts or restarts the state becomes "random". Ideally the generator should not produce identical sequences after each restart. 2.4.1 When "random", it must be capable of generating a uniformly random integer in the range 1<=x<=n, for any value 1<=n<=32767. Any method can be used for this (for instance, using the host computer's clock time in milliseconds). The uniformity of randomness should be optimised for low values of n (say, up to 100 or so) and it is especially important to avoid regular patterns appearing in remainders after division (most crudely, being alternately odd and even). 2.4.2 The generator is switched into "predictable" state with a seed value. On any two occasions when the same seed is sown, identical sequences of values must result (for an indefinite period) until the generator is switched back into "random" mode. The generator should cope well with very low seed values, such as 10, and should not depend on the seed containing many non-zero bits. 2.4.3 The interpreter is permitted to switch between these states on request of the player. (This is useful for testing purposes.) Remarks. It is dangerous to rely on the ANSI C random number routines, as some implementations of these are very poor. This has made some games (in particular, `Balances') unwinnable on some Unix ports of Zip. The author suggests the following algorithm: 1.In "random" mode, the generator uses the host computer's clock to obtain a random sequence of bits. 7 2 Numbers and arithmetic 2.In "predictable" mode, the generator should store the seed value S. If S < 1000 it should then internally generate 1, 2, 3, ... , S, 1, 2, 3, ... , S, 1, ... so that random n produces the next entry in this sequence modulo n. If S >= 1000 then S is used as a seed in a standard seeded random-number generator. (The rising sequence is useful for testing, since it will produce all possible values in sequence. On the other hand, a seeded but fairly random generator is useful for testing entire scripts.) 3 How text is encoded and printed This technique is similar to the five-bit Baudot code, which was used by early Teletypes before ASCII was invented. - Marc S. Blank and S. W. Galley, How to Fit a Large Program Into a Small Machine 3.1 A string of encoded text is stored as a sequence of 2-byte words. Each of these is divided into three 5-bit `Z-characters', plus 1 bit left over, arranged as --first byte------- --second byte--- 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 bit --first-- --second--- --third-- The bit is set only on the last 2-byte word of the text, and so marks the end. 3.2 There are three `alphabets', A0 (lower case), A1 (upper case) and A2 (punctuation) and during printing one of these is current at any given time. Initially A0 is current. The meaning of a Z-character may depend on which alphabet is current. 3.2.1 In Versions 1 and 2, the current alphabet can be any of the three. The Z-characters 2 and 3 are called `shift' characters and change the alphabet for the next character only. The new alphabet depends on what the current one is: from A0 from A1 from A2 Z-char 2 A1 A2 A0 Z-char 3 A2 A0 A1 Z-characters 4 and 5 permanently change alphabet, according to the same table, and are called `shift lock' characters. 3.2.2 In Versions 3 and later, the current alphabet is always A0 unless changed for 1 character only: Z-characters 4 and 5 are shift characters. Thus 4 means "the next character is in A1" and 5 means "the next is in A2". There are no shift lock characters. 3.2.3 An indefinite sequence of shift or shift lock characters is legal (but prints nothing). 8 3 How text is encoded and printed 3.3 In Versions 3 and later, Z-characters 1, 2 and 3 represent abbreviations, sometimes also called `synonyms' (for traditional reasons): the next Z-character indicates which abbreviation string to print. If z is the first Z-character (1, 2 or 3) and x the subsequent one, then the interpreter must look up entry 32(z-1) + x in the abbreviations table and print the string at that word address. In Version 2, Z-character 1 has this effect (but 2 and 3 do not, so there are only 32 abbreviations). 3.3.1 Abbreviation string-printing follows all the rules of this section except that an abbreviation string must not itself use abbreviations and must not end with an incomplete multi-Z-character construction (see §3.6.1 below). 3.4 Z-character 6 from A2 means that the two subsequent Z-characters specify a ten-bit character code: the next Z-character gives the top 5 bits and the one after the bottom 5. As detailed below, this is printed using an extended form of the ASCII standard (for seven-bit character codes). 3.4.1 Some Inform users require unusual accented characters (in one case, Chinese characters). Inform is able to produce `ASCII' values which use the full 10 bits (using the @@ string escape). Game designers may want to be able to modify the interpreter to print something suitable when a value of 256 or above is found, and interpreter writers are asked to make this easy. 3.4.1.1 The author wishes to reserve the `ASCII' values 768 to 1023 for future specification. (One idea would be that such a code causes a routine in Z-code to be called. This would allow much greater flexibility in variable printing.) 3.4.2 ASCII "control codes" in the range 0 to 31 are illegal (i.e. should not be printed in any story file) except as follows: 3.4.2.1 Character 0 (ASCII "null") is legal but prints nothing. 3.4.2.2 Character 13 ("carriage return") prints a newline. 3.4.2.3 In Version 6, character 9 ("tab") at the start of a screen line should print a paragraph indentation suitable for the font being used: if it is printed in the middle of a screen line, it should be a space. Character 11 ("cursor up") should be printed as a suitable gap between two sentences (in the same way that typographers normally place larger spaces after the full stops ending sentences than after words or commas). 3.4.3 Character codes between 32 ("space") and 126 ("tilde") are legal and are printed from the standard ASCII character set: 0123456789abcdef0123456789abcdef -------------------------------- $20 !"#$%&'()*+,-./0123456789:;<=>? $40 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_ $60 `abcdefghijklmnopqrstuvwxyz{!}~ -------------------------------- In particular code $23 (35 decimal) is a hash mark, not a pound sign. (Code $7c (124 decimal) is a vertical stroke which is shown as ! here for typesetting reasons.) Character 127 ("delete") is illegal. 9 3 How text is encoded and printed 3.4.4 The `ASCII' values between 128 and 154 are at present undefined: undefined character values should be printed as question marks. The range 155 to 251 is reserved for European accented characters, but 220 to 251 are undefined. Accented characters should either be printed from a suitable font (they are all taken from the `ISO Latin 1' standard set) or transliterated into plain text as in the following table: 155: a-umlaut ae 191: a-circumflex a 156: o-unlaut oe 192: e-circumflex e 157: u-umlaut ue 193: i-circumflex i 158: A-umlaut Ae 194: o-circumflex o 159: O-umlaut Oe 195: u-circumflex u 160: U-umlaut Ue 196: A-circumflex A 161: sz-ligature ss 197: E-circumflex E 162: quotation << or " 198: I-circumflex I 163: marks >> or " 199: O-circumflex O 164: e-umlaut e 200: U-circumflex U 165: i-umlaut i 201: a-ring a 166: y-umlaut y 202: A-ring A 167: E-umlaut E 203: o-slash o 168: I-umlaut I 204: O-slash O 169: a-acute a 205: a-tilde a 170: e-acute e 206: n-tilde n 171: i-acute i 207: o-tilde o 172: o-acute o 208: A-tilde A 173: u-acute u 209: N-tilde N 174: y-acute y 210: O-tilde O 175: A-acute A 211: ae-ligature ae 176: E-acute E 212: AE-ligature AE 177: I-acute I 213: c-cedilla c 178: O-acute O 214: C-cedilla C 179: U-acute U 215: Icelandic thorn th 180: Y-acute Y 216: Icelandic eth th 181: a-grave a 217: Icelandic Thorn Th 182: e-grave e 218: Icelandic Eth Th 183: i-grave i 219: pound symbol L 184: o-grave o 185: u-grave u 186: A-grave A 187: E-grave E 188: I-grave I 189: O-grave O 190: U-grave U *** The values from 164 onward are defined for the first time in this standard. (Note that all these values are the same as the keyboard input character codes for the same letters.) 10 3 How text is encoded and printed 3.4.5 The `ASCII' values 252 to 255 are illegal, not undefined. 3.5 The remaining Z-characters translate directly into printed characters: 3.5.1 The Z-character 0 is printed as a space. 3.5.2 In Version 1, Z-character 1 is printed as a new-line. 3.5.3 In Versions 2 and later, Z-characters in the range 6 to 31 depend on the current alphabet. Except for character 6 in A2, they are printed as: Z-char 6789abcdef0123456789abcdef current -------------------------- A0 abcdefghijklmnopqrstuvwxyz A1 ABCDEFGHIJKLMNOPQRSTUVWXYZ A2 ^0123456789.,!?_#'"/\-:() -------------------------- (Character 7 in A2, written here as a circumflex ^, is a new-line.) 3.5.4 Version 1 has the same table except that A2 (needing no new-line) accommodates the < character as well. It has the form: 6789abcdef0123456789abcdef -------------------------- A2 0123456789.,!?_#'"/\<-:() -------------------------- 3.5.5 In Versions 5 and later, a game may replace the above table by providing its own "character set table". It does this by giving the byte address of such a table in the word at $34 in the header. (If this byte address is 0, then the default table above is used.) 3.5.5.1 The character set table consists of 78 bytes arranged as 3 blocks of 26 ASCII values, translating Z-characters 6 to 31 for alphabets A0, A1 and A2. Z-characters 6 and 7 of A2, however, are still translated as escape and newline codes (as above). 3.6 Since the end-bit only comes up once every three Z-characters, a string may have to be `padded out' with null values. This is conventionally achieved with a sequence of 5's, though a sequence of (for example) 4's would work equally well. 3.6.1 It is legal for the string to end while a multi-Z-character construction is incomplete: for instance, after only the top half of an ASCII value has been given. The partial construction is simply ignored. (This can happen in printing dictionary words which have been guillotined to the dictionary resolution of 6 or 9 Z-characters.) 3.7 When encrypting text for a dictionary word: A1 may not be used; nor may abbreviations; the pad character, if needed, must be 5; and the total string length must be 6 Z-characters (in Versions 1 to 3) or 9 (Versions 4 and later). For example, "i" is encrypted as 14, 5, 5, 5, 5, 5, 5, 5, 5 $48a5 $14a5 $94a5 11 3 How text is encoded and printed Remarks. In practice the text compression factor is not really very good: for instance, 155000 characters of text squashes into 99000 bytes. (Text usually accounts for about 75% of a story file.) Encoding does at least encrypt the text so that casual browsers can't read it. Well-chosen abbreviations will reduce total story file size by 10% or so. The German translation of `Zork I' uses a character set table for accented letters and is illegible on interpreters (like ITF) which do not implement this feature. (`Shogun' also needs the character set table.) It is helpful for an interpreter to filter out any ASCII control characters other than those explicitly legalised above, as this makes run-time crashes of the "printing random text" kind much less severe for the terminal. The continental European quotation marks << and >> should have spacing which looks sensible either in French style <> or in German style >>Danke!<<. Further accented characters may be allocated codes later. Other graphical or unusual characters are best handled by creating a new font (see §16 for an example font). 4 How instructions are encoded We do but teach bloody instructions Which, being taught, return to plague th' inventor - Shakespeare, Macbeth 4.1 A single Z-machine instruction consists of the following sections (and in the order shown): Opcode 1 or 2 bytes (Types of operands) 1 or 2 bytes: 4 or 8 2-bit fields Operands Between 0 and 8 of these: each 1 or 2 bytes (Store variable) 1 byte (Branch offset) 1 or 2 bytes (Text to print) An encoded string (of unlimited length) Bracketed sections are not present in all opcodes. (A few opcodes take both "store" and "branch".) 4.2 There are four `types' of operand. These are often specified by a number stored in 2 binary digits: $$00 Large constant (0 to 65535) 2 bytes $$01 Small constant (0 to 255) 1 byte $$10 Variable 1 byte $$11 Omitted altogether 0 bytes 12 4 How instructions are encoded 4.2.1 Large constants, like all 2-byte words of data in the Z-machine, are stored with most significant byte first (e.g. $2478 is stored as $24 followed by $78). A `large constant' may in fact be a small number. 4.2.2 Variable number $00 refers to the top of the stack, $01 to $0f mean the local variables of the current routine and $10 to $ff mean the global variables. It is illegal to refer to local variables which do not exist for the current routine (there may even be none). 4.2.3 The type `Variable' really means "variable by value". Some instructions take as an operand a "variable by reference": for instance, inc has one operand, the reference number of a variable to increment. This operand usually has type `Small constant' (and Inform automatically assembles a line like @inc turns by writing the operand turns as a small constant with value the reference number of the variable turns). 4.3 Each instruction has a form (long, short, extended or variable) and an operand count (0OP, 1OP, 2OP or VAR). If the top two bits of the opcode are $$11 the form is variable; if $$10, the form is short. If the opcode is 190 ($BE in hexadecimal) and the version is 5 or later, the form is "extended". Otherwise, the form is "long". 4.3.1 In short form, bits 4 and 5 of the opcode byte give an operand type as above. If this is $11 then the operand count is 0OP; otherwise, 1OP. In either case the opcode number is given in the bottom 4 bits. 4.3.2 In long form the operand count is always 2OP. The opcode number is given in the bottom 5 bits. 4.3.3 In variable form, if bit 5 is 0 then the count is 2OP; if it is 1, then the count is VAR. The opcode number is given in the bottom 5 bits. 4.3.4 In extended form, the operand count is VAR. The opcode number is given in a second opcode byte. 4.4 Next, the types of the operands are specified. 4.4.1 In short form, bits 4 and 5 of the opcode give the type. 4.4.2 In long form, bit 6 of the opcode gives the type of the first operand, bit 5 of the second. A value of 0 means a small constant and 1 means a variable. (If a 2OP instruction needs a large constant as operand, then it should be assembled in variable rather than long form.) 4.4.3 In variable or extended forms, a byte of 4 operand types is given next. This contains 4 2-bit fields: bits 6 and 7 are the first field, bits 0 and 1 the fourth. The values are operand types as above. Once one type has been given as `omitted', all subsequent ones must be. Example: $$00101111 means large constant followed by variable (and no third or fourth opcode). 4.4.3.1 In the special case of the "double variable" VAR opcodes call_vs2 and call_vn2 (opcode numbers 12 and 26), a second byte of types is given, containing the types for the next four operands. 13 4 How instructions are encoded 4.5 The operands are given next. Operand counts of 0OP, 1OP or 2OP require 0, 1 or 2 operands to be given, respectively. If the count is VAR, there must be as many operands as there were types other than `omitted'. 4.5.1 Note that only call_vs2 and call_vn2 can have more than 4 operands, and no instruction can have more than 8. 4.6 "Store" instructions return a value: e.g., mul multiplies its two operands together. Such instructions must be followed by a single byte giving the variable number of where to put the result. 4.7 Instructions which test a condition are called "branch" instructions. The branch information is stored in one or two bytes, indicating what to do with the result of the test. If bit 7 of the first byte is 0, a branch occurs when the condition was false; if 1, then branch is on true. If bit 6 is set, then the branch occupies 1 byte only, and the "offset" is in the range 0 to 63, given in the bottom 6 bits. If bit 6 is clear, then the offset is a signed 14-bit number given in bits 0 to 5 of the first byte followed by all 8 of the second. 4.7.1 An offset of 0 means "return false from the current routine", and 1 means "return true from the current routine". 4.7.2 Otherwise, a branch moves execution to the instruction at address Address after branch data + Offset - 2. 4.8 Two opcodes, print and print_ret, are followed by a text string. This is stored according to the usual rules: in particular execution continues after the last 2-byte word of text (the one with top bit set). Remarks. Some opcodes have type VAR only because the available codes for the other types had run out; print_char, for instance. Others, especially call, need the flexibility to have between 1 and 4 operands. The Inform assembler can assemble branches in either form, but the compiler always writes 2-byte branch data and never uses offset values of 0 or 1. (The computation involved in achieving these optimisations outweighs the slight gain.) The disassembler Txd numbers locals from 0 to 14 and globals from 0 to 239 in its output (corresponding to variable numbers 1 to 15, and 16 to 255, respectively). The branch formula is sensible because in the natural implementation, the program counter is at the address after the branch data when the branch takes place: thus it can be regarded as PC = PC + Offset - 2. If the rule were simply "add the offset" then, since the offset couldn't be 0 or 1 (because of the return-false and return-true values), we would never be able to skip past a 1-byte instruction (say, a 0OP like quit), or specify the branch "don't branch at all" (sometimes useful to ignore the result of the test altogether). Subtracting 2 means that the only effects we can't achieve are PC = PC - 1 and PC = PC - 2 14 4 How instructions are encoded and we would never want these anyway, since they would put the program counter somewhere back inside the same instruction, with horrid consequences. On disassembly Briefly, the first byte of an instruction can be decoded using the following table: $00 -- $1f long 2OP small constant, small constant $20 -- $3f long 2OP small constant, variable $40 -- $5f long 2OP variable, small constant $60 -- $7f long 2OP variable, variable $80 -- $8f short 1OP large constant $90 -- $9f short 1OP small constant $a0 -- $af short 1OP variable $b0 -- $bf short 0OP except $be extended opcode given in next byte $c0 -- $df variable 2OP (operand types in next byte) $e0 -- $ff variable VAR (operand types in next byte(s)) Here is an example disassembly: @inc_chk c 0 label; 05 02 00 d4 long form; count 2OP; opcode number 5; operands: 02 small constant (referring to variable c) 00 small constant 0 branch if true: 1-byte offset, 20 (since label is 18 bytes forward from here). @print "Hello.^"; b2 11 aa 46 34 16 45 9c a5 short form; count 0OP. literal string, Z-chars: 4 13 10 17 17 20 5 18 5 7 5 5. @mul 1000 c sp; d6 1f 03 e8 02 00 variable form; count 2OP; opcode number 22; operands: 03 e8 long constant (1000 decimal) 02 variable c store result to stack pointer (var number 00). @call_1n Message; 8f 01 56 short form; count 1OP; opcode number 15; operand: 01 56 long constant (packed address of routine) .label; 15 5 How routines are encoded 5.1 A routine is required to begin at an address in memory which can be represented by a packed address (for instance, in Version 5 it must occur at a byte address which is divisible by 4). 5.2 A routine begins with one byte indicating the number of local variables it has (between 0 and 15 inclusive). 5.2.1 In Versions 1 to 4, that number of 2-byte words follows, giving initial values for these local variables. In Versions 5 and later, the initial values are all zero. 5.3 Execution of instructions begins from the byte after this header information. There is no formal `end-marker' for a routine (it is simply assumed that execution eventually results in a return taking place). 5.4 In Version 6, there is a "main" routine (whose packed address is stored in the word at $06 in the header) called when the game starts up. It is illegal to return from this routine. 5.5 In all other Versions, the word at $06 contains the byte address of the first instruction to execute. The Z-machine starts in an environment with no local variables from which, again, a return is illegal. Remarks. Note that it is permissible for a routine to be in dynamic memory. Marnix Klooster suggests this might be used for compiling code at run time! In Versions 3 and 4, Inform always stores 0 as the initial values for local variables. Inform's "main" routine is required not to have local variables and has to be the first defined routine. This ensures it is in the bottom 64K of memory, as it must be (in Versions other than 6). 6 The game state: storage and routine calls 6.1 The "state of play" is defined as the following: the contents of dynamic memory; the contents of the stack; the value of the program counter (PC), and the "routine call state" (that is, the chain of routines which have called each other in sequence, and the values of their local variables). Note that the routine call state, the stack and the PC must be stored outside the Z-machine memory map, in the interpreter's private memory. 6.1.1 The entire state of play must be stored when the game is saved. 6.1.1.1 The format of a saved game file is not specified. 16 6 The game state: storage and routine calls 6.1.1.2 An internal saved game for "undo" purposes (if there is one) is not part of the state of play. This is important: if a saved game file also contained the internal saved game at the time of saving, it would be impossible to undo the act of restoration. It also prevents internal saved games from growing larger and larger as they include their predecessors. 6.1.2 On a "restore" or "undo" (which restores a game saved into internal memory), the entire state of play is written back except for one bit: bit 0 of `Flags 2' in the header, the flag revealing whether the game is being transcribed to printer. 6.1.2.1 Before a "restore", an interpreter should check that the file to be used has been saved from the same game currently being played. (See remark below.) 6.1.2.2 After a "restore" or "undo", an interpreter should reset the header values marked Rst in the header table of §11. (It should not be assumed that the game was saved by the same interpreter.) 6.1.3 A "restart" is similar: the entire state is restored from the original story file; but the transcription bit is preserved; and the interpreter should reset the Rst parts of the header. 6.1.4 In Versions 5 and later, an interpreter unable to save the game state into internal memory (for "undo" purposes) must clear bit 4 of `Flags 2' in the header. 6.2 Global variables (variable numbers $10 to $ff) are stored in a table in the Z-machine's dynamic memory, at a byte address given in word 6 of the header. The table consists of 240 2-byte words and the initial values of the global variables are the values initially contained in the table. (It is legal for a program to alter the table's contents directly in play, though not for it to change the table's address.) 6.3 Writing to the stack pointer (variable number $00) pushes a value onto the stack; reading from it pulls a value off. Stack entries are 2-byte words as usual. 6.3.1 The stack is considered as empty at the start of each routine: it is illegal to pull values from it unless values have first been pushed on. 6.3.2 The stack is left empty at the end of each routine: when a return occurs, any values pushed during the routine are thrown away. 6.3.3 Stack size has not previously been specified. The author proposes the present capacity of Zip as a future minimum standard: let the `usage'of a routine call be 4 plus the number of local variables it has. During a game the total of the usages for each routine in the recursive chain of routines being called, plus the game's own stack usage, must never reach 1024. 6.4 Routine calls occur in the following circumstances: when one of the call... opcodes is executed; in Versions 4 and later, when timed keyboard input is being monitored; in Versions 5 and later, when a sound effect finishes; in Version 6, when the game begins (to call the "main" routine); in Version 6, when a "newline interrupt" occurs. 6.4.1 A routine call may have any number of arguments, from 0 to 3 (in Versions 1 to 4) or 0 to 7 (Versions 5 and later). All routines return a value (though sometimes this value is thrown away afterward: for example by opcodes in the form call_vn*). 17 6 The game state: storage and routine calls 6.4.2 Routine calls preserve local variables and the stack (except when the return value is stored in a local variable or onto the top of the stack). 6.4.3 A routine call to packed address 0 is legal: it does nothing and returns false (0). Otherwise it is illegal to call a packed address where no routine is present. 6.4.4 When a routine is called, its local variables are created with initial values taken from the routine header (Versions 1 to 4) or with initial value 0 (Versions 5 and later). Next, the arguments are written into the local variables (argument 1 into local 1 and so on). 6.4.4.1 It is legal for there to be more arguments than local variables (any spare arguments are thrown away) or for there to be fewer. 6.4.5 The return value of a routine can be any Z-machine number. Returning `false' means returning 0; returning `true' means returning 1. 6.5 A "stack frame" is an index to the routine call state (that is, the call-stack of return addresses from routines currently running, and values of local variables within them). This index is a Z-machine number. The interpreter must be able to produce the current value and to set a value further down the call-stack than the current one, effectively throwing away its recent history (see catch and throw). 6.6 In Version 6, the Z-machine understands a third kind of stack: a "user stack", which is a table of words in dynamic memory. The first word in this table always holds the number of spare slots on the stack (so the initial value is the capacity of the stack). The Z-machine makes no check on stack under-flow (i.e., pulling more values than were pushed) which would over-run the length of the table if the program allowed it to happen. Remarks. Most interpreters store the whole of dynamic memory to disc as part of their saved game files, which can make them as much as 45K or so long. A player making a serious attack on a game may end up wasting a whole megabyte, more than convenient without a hard disc. Bryan Scattergood's Psion interpreter ingeniously avoids this by only saving bytes of dynamic memory which are different from the initial state of the game. It is unspecified how an interpreter should decide whether a saved game file belongs to the game currently being played. It is normal to insist that the release numbers, serial codes and checksums all match. The Pinfocom interpreter deliberately checks only the release number, so that saved games can be exchanged between different editions of `Seastalker' (presumably compiled to handle the sonarscope differently). The stack is stored in the interpreter's own memory, not anywhere in the Z-machine. The game program has no direct access to the stack memory or stack pointer; on some implementations the game's main stack is also used to store the routine call state (i.e. the game stack and the call-stack are the same) but this need not be true. The stack size specification guarantees in particular that if the game itself never uses more than 32 stack entries at once then it can have a recursive depth of at least 90 routine calls. The author believes that old Infocom games will all run with a stack size of 512 words. Note that the "state of play" does not include numerous input/output settings (the current window, cursor position, splitness or otherwise, which streams are selected, etc.): neither does it include the state of the random-number generator. (Games with elaborate status lines must redraw them after a restore has taken place.) 18 6 The game state: storage and routine calls Zip provides "undo", but the ITF interpreter currently does not (and save_undo returns 0, unfortunately). This is probably its greatest failing. Some Infocom-written interpreters will only provide "undo" to a game which has bit 4 of `Flags 2' set: but Inform 5.5 doesn't set this bit, so modern interpreters should be more generous. 7 Output streams and file handling 7.1 At any given time text is being output through a selection of "output streams" (possibly none, possibly several at once). 7.1.1 Two output streams are common to all Versions: number 1 (the screen) and 2 (the game transcript, usually printed to a printer or a file). 7.1.2 Versions 3 and later supply these and two other output streams, numbered 3 (Z-machine memory) and 4 (a script file of the player's whole commands and of individual keypresses as read by read_char). 7.1.2.1 Output stream 3 writes to a table in dynamic memory. When the stream is selected, the table may have any contents (even the initial `size' word will be ignored by the interpreter). While the stream is selected, the table's contents are unspecified (and a game cannot safely read or write to it). When the stream is deselected, the initial word of the table holds the number of characters printed and subsequent bytes hold those characters. Similarly, in Version 6, the total width of printing (in units) will then be stored in the word at $30 in the header. (It is the programmer's responsibility to make the table large enough: the interpreter performs no overflow checking.) 7.1.2.2 Output stream 3 is unusual in that, while it is selected, no text is sent to any other output streams which are selected. (However, they remain selected.) 7.1.2.2.1 Newlines are written to output stream 3 as ASCII 13. Any character 10 codes printed should be converted to 13. 7.1.2.3 Output stream 4 is unusual in that, when it is selected, the only text printed to it is that of the player's commands and keypresses (as read by read_char). (Each command is written, in one go, when it has been finished: a command which has been timed-out, or has been terminated by a code in the terminating character codes table, is not written. Mistypes and uses of `delete' are not written.) 7.2 On output streams 1 and 2 (only), text printing may be "buffered" in that new-lines are automatically printed to ensure that no word (of length less than the width of the screen) spreads across two lines. (This process is sometimes called "word-wrapping".) 7.2.1 In Versions 1 to 3, buffering is always on. In Versions 4 and later it is on by default (at the start of a game) and a game can switch it on or off using the buffer_mode opcode. 19 7 Output streams and file handling 7.2.2 In Version 6, each of the eight windows has its own "buffering flag". In other Versions, the buffer_mode applies only to the lower window. Output should never be buffered on the upper window. 7.3 In Versions 1 and 2, output stream 1 is always selected and stream 2 can be selected or deselected by the game, by setting or clearing bit 0 of `Flags 2'. 7.4 In Versions 3 and later, all four output streams can be selected or deselected using the output_stream opcode. In addition, stream 2 can be selected or deselected by setting or clearing bit 0 of `Flags 2'. 7.5 Character codes in the range 256 to 767 can only be printed on the screen. The author encourages interpreter-writers to make it easy for game designers to modify the interpreter to print suitable substitutes on the other streams. (For instance, if 500 represents a Chinese dragon character, this routine might print "dragon" on the other streams.) Failing this, good practice would be to print a question mark on the other streams. 7.6 In Versions 5 and later, the Z-machine has the ability to load and save files (using optional operands with the save and restore opcodes). 7.6.1 *** Filenames have the following format (approximately the MS-DOS 8.3 rule): one to eight alphanumeric characters, a full stop and zero to three alphanumeric characters (the "file extension"). 7.6.1.1 The interpreter must convert all filenames to upper case before use. If no full stop is given, ".AUX" should be appended. 7.6.1.2 Games should avoid the extensions ".INF", ".H", ".Z" followed by a number or ".SAV": otherwise they may be in danger of erasing their own object code, source code or saved game files. 7.6.2 *** Saved files are not associated with any particular session of a game. They are not part of the "state of play". 7.6.3 *** A game may depend on having up to 32 auxiliary files (with different names). 7.6.4 File-handling errors such as "disc corrupt" and "disc full" should be reported directly to the player by the interpreter. The error "file not found" should only cause a failure return code from restore. Remarks. The Inform Designer's Manual advises games always to switch buffering off when printing to the upper window. This is wise since the ITF interpreter does not behave correctly on this point. An ambiguous point about output stream 4 is whether it should contain the answers to interpreter questions like "what file name should your saved game have?": it can actually be quite useful to be able to include such answers in test script files. (When running a long script, I often save the game at several places during it, in order to save time in re-running passages.) Ideally, an interpreter should be able to write time delays (for timed input) into stream 4 (i.e., to a script file). In practice this is formidably hard to implement. 20 7 Output streams and file handling A typical auxiliary file might be one containing the player's preferred choices. This would be created when he first changed any of the default settings, and loaded (if present) whenever the game started up. 8 The screen model 8.1 Text may be printed in any font of the interpreter's choice, variable- or fixed-pitch: except that when bit 1 of `Flags 2' in the header is set, or when the text style has been set to Fixed Pitch, then a fixed-pitch font must be used. 8.1.1 In Versions 5 and later, the height and width of the current font (in units (see below)) should be written to bytes $26 and $27 of the header, respectively. The width of a font is defined as the width of its `0' character. 8.1.2 An interpreter should ideally provide 4 fonts, with ID numbers as follows: 1: the normal font 2: a picture font 3: a character graphics font 4: a Courier-style font with fixed pitch (In addition, font ID 0 means "the previous font".) Ideally all text styles should be available for each font (for instance, Courier bold should be obtainable) except that font 3 need only be available in Roman and Reverse Video. Each font should provide characters for character codes 32 to 126 (plus character codes for any accented characters with codes greater than 127 which are being implemented as single accented letters on-screen). 8.1.3 A game must not use fonts other than 1 unless allowed to by the interpreter: see the set_font opcode for how to give or refuse permission. (`Beyond Zork' produces different character graphics according to whether or not font 3 is available: see §16 for the full story.) This permission may, at the interpreter's whim, depend on which window is active. 8.1.3.1 It is legal for a game to change font at any time, including halfway through the printing of a word. 8.1.4 The specification of the "picture font" is unknown (conjecturally, it was intended to provide pictures before Version 6 was properly developed). Interpreters need not implement it. 8.1.5 The specification of the character graphics font is given in §16. 8.1.5.1 In Version 5 (only), an interpreter which cannot provide the character graphics font should clear bit 3 of `Flags 2' in the header. 21 8 The screen model 8.2 In Versions 1 to 3, a status line should be printed by the interpreter, as follows. In Version 3, it must set bit 4 of `Flags 1' in the header if it is unable to produce a status line. 8.2.1 In Versions 1 and 2, all games are "score games". In Version 3, if bit 1 of `Flags 1' is set then the game is a "score game"; otherwise, a "time game". 8.2.2 The short name of the object whose number is in the first global variable should be printed on the left hand side of the line. 8.2.2.1 Whenever the status line is being printed the first global must contain a valid object number. (It would be useful if interpreters could protect themselves in case the game accidentally violates this requirement.) 8.2.2.2 If the object's short name exceeds the available room on the status line, the author suggests that an interpreter should break it at the last space and append an ellipsis "...". There is no guaranteed maximum length for location names but an interpreter should expect names of length up to at least 49 characters. 8.2.3 If there is room, the right hand side of the status line should display: 8.2.3.1 For "score games": the score and number of turns, held in the values of the second and third global variables respectively. The score may be assumed to be in the range -99 to 999 inclusive, and the turn number in the range 0 to 9999. 8.2.3.2 For "time games": the time, in the form hours:minutes (held in the second and third globals). The time may be given on a 24-hour clock or the number of hours may be reduced modulo 12 (but if so, "AM" or "PM" should be appended). Either way the player should be able to see the difference between 4am and 4pm, for example. The hours global may be assumed to be in the range 0 to 23 and the minutes global in the range 0 to 59. 8.2.4 The status line is updated in exactly two circumstances: when a show_status opcode is executed, and just before the keyboard is read by read. (It is not displayed when the game begins.) 8.3 Under Versions 5 and later, text printing has a current foreground and background colour. (In Version 6, each window has its own pair.) 8.3.1 The following codes are used to refer to colours: -1 = the colour of the pixel under the mouse arrow (if any) 0 = the current setting of this colour 1 = the default setting of this colour 2 = black 3 = red 4 = green 5 = yellow 6 = blue 7 = magenta 8 = cyan 9 = white (These are loosely based on the IBM PC colour-scheme.) 8.3.2 If the interpreter cannot produce colours, it should clear bit 0 of `Flags 1' in the header. 8.3.3 If the interpreter can produce colours, it should set bit 0 of `Flags 1' in the header, and write its default background and foreground colours into bytes $2c and $2d of the header. 22 8 The screen model 8.3.4 If a game wishes to use colours, it should have bit 6 in `Flags 2' set in its story file. (However, an interpreter should not rule out the use of colours just because this has not been done.) 8.4 The screen should ideally be at least 60 characters wide by 14 lines deep. (Old Apple II interpreters had a 40 character width and some modern laptop ones have a 9 line height, but implementors should seek to avoid these extremes if possible.) The interpreter may change the exact dimensions whenever it likes but must write the current height (in lines) and width (in characters) into bytes $20 and $21 in the header. 8.4.1 The interpreter should use the screen height for calculating when to pause and print "[MORE]". A screen height of 255 lines means "infinite height", in which case the interpreter should never stop printing for a "[MORE]" prompt. (In case, say, the screen is actually a teletype printer, or has very good "scrollback".) 8.4.2 Screen dimensions are measured in notional "units". In Versions 1 to 4, one unit is simply the height or width of one character. In Version 5 and later, the interpreter is free to implement units as anything from character sizes down to individual pixels. 8.4.3 In Version 5 and later, the screen's width and height in units should be written to the words at $22 and $24. 8.5 The screen model for Versions 1 and 2 is as follows: 8.5.1 The screen can only be printed to (like a teletype) and there is no control of the cursor. 8.5.2 At the start of a game, the screen should be cleared and the text cursor placed at the bottom left (so that text scrolls upwards as the game gets under way). 8.6 The screen model for Version 3 is as follows: 8.6.1 The screen is divided into a lower and an upper window and at any given time one of these is selected. (Initially it is the lower window.) The game uses the set_window opcode to select one of the two. Each window has its own cursor position at which text is printed. Operations in the upper window do not move the cursor of the lower. Whenever the upper window is selected, its cursor position is reset to the top left. Selecting, or re-sizing, the upper window does not change the screen's appearance. 8.6.1.1 The upper window has variable height (of n lines) and the same width as the screen. This should be displayed on the n lines of the screen below the top one (which continues to hold the status line). Initially the upper window has height 0. When the lower window is selected, the game can split off an upper window of any chosen size by using the split_window opcode. 8.6.1.1.1 Printing onto the upper window overlays whatever text is already there. Printing is normally buffered (unless the game has turned this off), just as in the lower window. 8.6.1.2 An interpreter need not provide the upper window at all. If it is going to do so, it should set bit 5 of `Flags 1' in the header to signal this to the game. It is only legal for a game to use set_window or split_window if this bit has been set. 23 8 The screen model 8.6.1.3 Following a "restore" of the game, the interpreter should automatically collapse the upper window to size 0. 8.6.2 When text reaches the bottom right of the lower window, it should be scrolled upwards. The upper window should never be scrolled: it is legal for a character to be printed on the bottom right position of the upper window (but the position of the cursor after this operation is undefined: the author suggests that it stay put). 8.6.3 At the start of a game, the screen should be cleared and the text cursor placed at the bottom left (so that text scrolls upwards as the game gets under way). 8.7 The screen model for Versions 4 and later, except Version 6, is as follows: 8.7.1 Text can be printed in five different styles (modelled on the VT100 design of terminal). These are: Roman (the default), Bold, Italic, Reverse Video (usually printed with foreground and background colours reversed) and Fixed Pitch. The specification does not require the interpreter to be able to display more than one of these at once (e.g. to combine italic and bold), and most interpreters can't. If the interpreter is going to allow certain combinations, then note that changing back to Roman should turn off all the text styles currently active. 8.7.1.1 An interpreter need not provide Bold or Italic (even for font 1) and is free to interpret them broadly. (For example, rendering bold-face by changing the colour, or rendering italic with underlining.) 8.7.1.2 It is legal to change text style at any point, including in the middle of a word being printed. 8.7.2 There are two "windows", called "upper" and "lower": at any given time one of these two is selected. (Initially it is the lower window.) The game uses the set_window opcode to select one of the two. Each window has its own cursor position at which text is printed. Operations in the upper window do not move the cursor of the lower. Whenever the upper window is selected, its cursor position is reset to the top left. 8.7.2.1 The upper window has variable height (of n lines) and the same width as the screen. (It is usual for interpreters to print the upper window on the top n lines of the screen, overlaying any text which is already there, having been printed in the lower window some time ago.) Initially the upper window has height 0. When the lower window is selected, the game can split off an upper window of any chosen size by using the split_window opcode. 8.7.2.1.1 It is unclear exactly what split_window should do if the upper window is currently selected. The author suggests that it should work as usual, leaving the cursor where it is if the cursor is still inside the new upper window, and otherwise moving the cursor back to the top left. (This is analogous to the Version 6 practice.) 8.7.2.2 In Version 4, the lower window's cursor is always on the bottom screen line. In Version 5 it can be at any line which is not underneath the upper window. If a split takes place which would cause the upper window to swallow the lower window's cursor position, the interpreter should move the lower window's cursor down to the line just below the upper window's new size. 24 8 The screen model 8.7.2.3 When the upper window is selected, its cursor position can be moved with set_cursor. This position is given in characters in the form (row, column), with (1; 1) at the top left. The opcode has no effect when the lower window is selected. It is illegal to move the cursor outside the current size of the upper window. 8.7.2.4 An interpreter should use a fixed-pitch font when printing on the upper window. 8.7.2.5 In Version 4, text buffering should work in the upper window exactly as it does in the lower one (i.e., it must be turned off by the game if it is not required). In Versions 5 and later, text buffering is never active in the upper window (even if a game begins printing there without having turned it off). 8.7.3 Clearing regions of the screen: 8.7.3.1 When text reaches the bottom right of the lower window, it should be scrolled upwards. (When the text style is Reverse Video the new blank line should not have reversed colours.) The upper window should never be scrolled: it is legal for a character to be printed on the bottom right position of the upper window (but the position of the cursor after this operation is undefined: the author suggests that it stay put). 8.7.3.2 Using the opcode erase_window, the specified window can be cleared to background colour. (Even if the text style is Reverse Video the new blank space should not have reversed colours.) 8.7.3.2.1 In Versions 5 and later, the cursor for the window being erased should be moved to the top left. In Version 4, the lower window's cursor moves to its bottom left, while the upper window's cursor moves to top left. 8.7.3.3 Erasing window 1 clears the whole screen, collapses the upper window to height 0 and moves the cursor of the lower screen to bottom left (in Version 4) or top left (in Versions 5 and later). The same operation should happen at the start of a game. 8.7.3.4 Using erase_line in the upper window should erase the current line from the cursor position to the right-hand edge, clearing it to background colour. (Even if the text style is Reverse Video the new blank space should not have reversed colours.) 8.8 The screen model for Version 6 is as follows: 8.8.1 The display is an array of pixels. Coordinates are usually given (in units) in the form (y; x), with (1; 1) in the top left. 8.8.2 If the interpreter thinks the status line should be redrawn (e.g. because a menu window has been clicked over it), it may set bit 2 of `Flags 2'. The game is expected to notice, take action and clear the bit. (However, a more efficient interpreter would cache the status line and handle redraws itself.) 8.8.3 There are eight "windows", numbered 0 to 7. The code -3 is used as a window number to mean "the currently selected window". This selection can be changed with the set_window opcode. Windows are invisible and usually lie on top of each other. When something is printed in a window, it appears on the screen, but subsequent movements of the window do not move what was printed and there is no sense in which characters `belong' to any particular window once printed. Each window has a position (in units), a size (in units), a cursor position within it (in units, relative to its own origin), a number of flags called "attributes" and a number of variables called "properties". 25 8 The screen model 8.8.3.1 There are four attributes, numbered as follows: 1: character wrapping 2: scrolling 3: text copied to output stream 2 (the transcript, if selected) 4: buffered printing Each can be turned on or off, using the window_style opcode. 8.8.3.1.1 Character wrapping takes place (if set) when printing a character would push beyond the right-hand edge of the window: if set, then the character is printed on the left of the next line. If it is clear, then text is printed along the line but clipped to the window size. 8.8.3.2 There are 16 properties, numbered as follows: 0 y coordinate 6 left margin size 12 font number 1 x coordinate 7 right margin size 13 font size 2 y size 8 newline interrupt routine 14 attributes 3 x size 9 interrupt countdown 15 line count 4 y cursor 10 text style 5 x cursor 11 colour data Each property is a standard Z-machine number and is readable with get_wind_prop and writeable with put_wind_prop. However, a game should only use put_wind_prop to set the newline interrupt routine and interrupt countdown: everything else is either set by the interpreter (such as the line count) or set using specialised opcodes (such as set_font). 8.8.3.2.1 If a window has character wrapping, then text is clipped to stay inside the left and right margins. After a new-line, the cursor moves to the left margin on the next line. Margins can be set with set_margins but this should only be done just after a newline or just after the window has been selected. (These values are margin sizes in pixels, and are by default 0.) 8.8.3.2.2 If the interrupt countdown is set to a non-zero value (which by default it is not), then the line count is decremented on each new-line, and when it hits zero the routine whose packed address is stored in the "newline interrupt routine" property is called before text printing resumes. (This routine may, for example, meddle with margins to roll text around a crinkly-shaped picture.) The interrupt routine should not attempt to print anything. 8.8.3.2.3 The text style is set just as in Version 4, using set_text_style (which sets that for the current window). The property holds the operand of that instruction (e.g. 4 for italic). 8.8.3.2.4 The foreground colour is stored in the upper byte of the colour data property, the background colour in the lower byte. 8.8.3.2.5 The font height (in pixels) is stored in the upper byte of the font size property, the font width (in pixels) in the lower byte. 26 8 The screen model 8.8.3.2.6 The interpreter may use the line count to see when it should print "[MORE]". 8.8.3.3 All eight windows begin at (1; 1). Window 0 occupies the whole screen and is initially selected. Window 1 is as wide as the screen but has zero height. Windows 2 to 7 have zero width and height. All eight windows begin with buffered printing on, and the other attributes off. 8.8.3.4 A window can be moved with move_window and resized with window_size. If the window size is reduced so that its cursor lies outside it, the cursor should be reset to the left margin on the top line. 8.8.3.5 Each window remembers its own cursor position (relative to its own coordinates, so that the position (1; 1) is at its top left). These can be changed using set_cursor (and it is legal to move the cursor for an unselected window). It is illegal to move the cursor outside the current window. 8.8.3.6 Each window can be scrolled vertically (up or down) any number of pixels, using the scroll_window opcode. 8.8.4 To some extent windows 0 and 1 mimic the behaviour of the lower and upper windows in the Version 4 screen model: 8.8.4.1 The split_screen opcode tiles windows 0 and 1 so that window 1 has the given height and is placed at the top left, while window 0 is moved to be just below it and has its height shortened by the height of window 1. (If this makes a negative amount, the height becomes 0.) Finally, window 0 is selected. 8.8.4.2 An "unsplit" (that is, a split_screen 0) takes place when the entire screen is cleared with erase_window -1, if a "split" has previously occurred (meaning that windows 0 and 1 have been set up as above). 8.8.5 Screen clearing operations: 8.8.5.1 Erasing a picture is like drawing it (see below), except that the space where it would appear is painted over with background colour instead. 8.8.5.2 The current line can be erased using erase_line, either all the way to the right margin or by any positive number of pixels in that direction. The space is painted over with background colour (even if the current text style is Reverse Video). 8.8.5.3 Each window can be erased using erase_window, erasing to background colour (even if the current text style is Reverse Video). 8.8.5.3.1 Erasing window number -1 erases the entire screen and unsplits windows 0 and 1 (see above). 8.8.5.3.2 Erasing window -2 erases the entire screen (without changing any window attributes or cursor positions). 8.8.6 Pictures may accompany the game. They are not stored in the story file (or the Z-machine) itself, and the interpreter is simply expected to know where to find them. Infocom supplied files of pictures in different formats on different machines. The exact format of such files is not specified here. 8.8.6.1 Pictures are numbered from 1 upwards (not necessarily contiguously). They can be "drawn" or "erased" (using draw_picture and erase_picture). Before attempting to do so, a game may ask the interpreter about the picture (using picture_data): 27 8 The screen model this allows the interpreter to signal that the picture in question is unavailable, or to specify its height and width. 8.8.6.2 The game may, if it wishes, use the picture_table opcode to give the interpreter advance warning that a group of pictures will soon be needed (for instance, a collection of icons making up a control panel). The interpreter may want to load these pictures off disc and into a memory cache. Remarks. See §16 for comment on how `Beyond Zork' uses fonts. Some interpreters print the status line when they begin running a Version 3 game, but this is incorrect. (It means that a small game printing text and then quitting cannot be run unless it includes an object.) The author's preferred status line formats are: Hall of Mists 80/733 Lincoln Memorial 12:03 PM Thus the score/turns block always fits in 3+1+4 = 8 characters and the time in 2+1+2+1+2 = 8 characters. (Games needing more exotic time lines, for example, should not be written in Version 3.) The only existing Version 3 game to use an upper window is `Seastalker' (for its sonarscope display). Some ports of ITF apply buffering (i.e. word-wrapping) and scrolling to the upper window, with unfortunate consequences. This is why the standard Inform status line is one character short of the width of the screen. The original Infocom files seldom use erase_window, except with window -1 (for instance `Trinity' only uses it in this form). ITF does not implement it in any other case. The Version 5 re-releases of older games make use of consecutive set_text_style instructions to attempt to combine boldface reverse video (in the hints system). None of Infocom's Version 4 or 5 files use erase_line at all, and ITF implements it badly (with unpredictable behaviour in Reverse Video text style). (It's interesting to note that the Version-5 edition of `Zork I' - one of the earliest Version 5 files - blanks out lines by looking up the screen width and printing that many spaces.) Note that a minor bug in Zip writes bytes $22 to $25 in the header as four values, giving the screen dimensions in the form left, right, top, bottom: provided units are characters (i.e. provided the font width and height are both 1) then since "left" and "top" are both 0, this bug has no effect. Some details of the known IBM graphics files are given in Paul David Doherty's "Infocom Fact Sheet". See also Mark Howell's program "pix2gif", which extracts pictures to GIF files. (This is one of his "Infocom toolkit" programs.) 9 Sound effects 9.1 Some games, from Version 3 onward, have sound effects attached. These are not stored in the story files (or the Z-machine) itself, and the interpreter is simply expected 28 9 Sound effects to know where to find them. Infocom implemented sound effects differently on different machines. 9.1.1 In Version 6, the interpreter should set bit 5 of `Flags 1' if it can provide sound effects. 9.1.2 In Version 5 and later, a game should have bit 7 of `Flags 2' set in its story file if it wants to use sound effects. The interpreter should then clear this bit if it cannot oblige. 9.2 Sound effects are numbered upwards from 1. Number 1 is a high-pitched beep, number 2 a low-pitched one and effects from 3 upward are supplied by the interpreter somehow for the particular game in question. 9.3 A sound effect can be played at any volume level from 1 to 8 (8 being loudest of these). The volume level -1 should be implemented as "loudest possible". 9.4 Sound effects take place in the background, while normal operation of the Z-machine is going on. Control is via the sound_effect opcode, allowing the game to prepare, start, stop or finish with an effect. 9.4.1 The game may (but need not) "prepare" a sound effect before use. This would indicate to the interpreter that the game intends to use the effect soon: an interpreter might act on this information by loading the sampled sound off disc and into a memory cache. 9.4.2 A sound effect can then be "stopped" or "started". Only one sound effect is playing at any given time, and starting a new sound effect automatically stops any current one. 9.4.3 In Versions 5 and later, a sound effect may repeat any specified number of times, or repeat forever (until stopped). 9.4.4 Eventually, though, if it has not been stopped, it may end by itself. A routine (specified at start time) can then be called. The intention is that this routine may implement effects such as fading in and out, by replaying the sound effect at a different volume. (A game should not place any important code in such a routine.) 9.4.5 The game should explicitly "finish with" any sound effect which is not likely to occur again for a while: the interpreter can then throw it out of memory. Remarks. The safest way an Inform program can try to produce a bleep is by executing @sound_effect 1. Some ports of Zip believe that the first operand of this is the number of bleeps to make (so that @sound_effect 2 bleeps twice), but this is incorrect. Only two Infocom games provided sound effects: `The Lurking Horror' and `Sherlock'. Their story files only contain the following usages of sound_effect: sound_effect 1 sound_effect 2 sound_effect number 2 volume (in TLH) sound_effect number 2 volume/repeats function (in Sherlock) sound_effect 0 3 sound_effect 0 4 29 9 Sound effects The format of Infocom's shipped sound effects files has been documented by Stefan Jokisch and his notes are available from ftp.gmd.de. 10 Input streams and devices 10.1 In Versions 1 and 2, the player's commands can only be drawn from the keyboard. 10.2 In Versions 3 and later, the player's keypresses are drawn from the current "input stream". There are two input streams: numbered 0 (the keyboard) and 1 (a file containing commands). Other inputs (mouse clicks or menu selections), if available, are also implemented as keypresses (see below). 10.2.1 The format of a file containing commands must be the same as that written in output stream 4. 10.2.2 The game can change the current input stream itself, using the opcode input_stream. It has no way of finding out which input stream is currently in use. An interpreter is free to change the input stream whenever it likes (e.g. at the player's request) or, indeed, to run the entire game under input stream 1 (for testing purposes). 10.2.3 When input stream 1 is first selected, the interpreter may use any method of choosing a file name for the file of commands. (Good practice is to use the same conventions as when choosing a filename for output to stream 4.) 10.2.4 When the the current stream is stream 1, the interpreter should not hold up long passages of text (by printing "[MORE]" and waiting for a keypress, for instance). 10.3 Mouse support is optional but can be provided in Versions 5 and later. 10.3.1 In a game which wishes to use the mouse, bit 5 of `Flags 2' in the header should be set in the story file, and word $36 of the header should be the byte address of the mouse data table in dynamic memory. 10.3.1.1 If the interpreter cannot offer mouse support, then it should clear bit 5 of `Flags 2' to signal this to the game. 10.3.2 The mouse data table has the format: Word 0: Length of the table (in words) Word 1: Mouse x coordinate Word 2: Mouse y coordinate (The table length is usually 2.) These coordinates should be updated regularly by the interpreter. 10.3.3 The mouse is presumed to have between 0 and 16 buttons. The state of these buttons can be read by the read_mouse opcode in Version 6. Otherwise, mouse clicks are treated as keyboard input codes (see below). 30 10 Input streams and devices 10.3.4 In Version 6, the mouse can either be free or constrained to one of the 8 windows: if so, clicks outside the `mouse window' must be ignored, and the interpreter is at liberty to confine the mouse's movement to the boundary of its window. 10.4 Menu support can optionally be provided in Version 6. 10.4.1 In a game which wishes to use menus, bit 8 of `Flags 2' in the header should be set in the story file. 10.4.1.1 If the interpreter cannot offer menu support, then it should clear bit 8 of `Flags 2' to signal this to the game. 10.4.2 Menus are numbered from 0 upwards. 0, 1 and 2 are reserved for the interpreter to manage (this system has only been implemented on the Macintosh, wherein 0 is the Apple menu, 1 the File menu and 2 the Edit menu). Menus numbered 3 and upwards can be created or removed with the make_menu opcode. 10.4.3 Menu selection is reported to the game as a keypress (see below). Details of what selection has been made are read with read_mouse. 10.5 Whole commands are read from the input stream using the read opcode. (Note that this has two different internal names in Inform, sread for Versions 1 to 4 and aread subsequently.) 10.5.1 In Versions 1 to 3, the interpreter must redisplay the status line before it begins accepting input. 10.5.2 Commands are normally terminated by a new-line (a carriage return or a line feed as appropriate for the machine's keyboard or file format). 10.5.2.1 In Versions 5 and later, the game may provide a "terminating characters table" by giving its byte address in the word at $2e in the header. This table is a zero-terminated list of input character codes which cause aread to finish the command (in addition to new-line). Only function key codes are permitted: these are defined as those between 129 and 154 inclusive, together with 252, 253 and 254. The special value 255 means "any function key code is terminating". 10.5.3 *** In Versions 4 and later, an interpreter should ideally be able to time input and to call a (game) routine at periodic intervals: see the aread opcode. If it is able to do this, it should set bit 7 of `Flags 1' in the header. 10.6 In Versions 4 and later, individual characters can be read from the current input stream, using read_char. Again, the interpreter should ideally be able to time input and to call a (game) routine at periodic intervals. If it is able to do this, it should set bit 7 of `Flags 1' in the header. 10.7 For input purposes the character set is as follows: 31 10 Input streams and devices 0-9 ---- 10 New-line (ends input of a command) 11-12 ---- 13 New-line (ends input of a command) 14-26 ---- 27 Escape 28-31 ---- 32-126 Standard ASCII (see 3.4.3) 127-128 ---- 129-154 Function key codes (see below) 155-251 Accented letter codes (see below) 252-254 Function key codes (see below) 255- ---- The codes marked ---- should never be read. (Of course an interpreter may well want to use other ASCII codes for its own line-editor when the player is typing a command: 127 for "delete", for instance. The table means only that these codes should not be passed to the game.) Note that an interpreter can return either 10 or 13 as "new-line". (The recommended choice is 10.) 10.7.1 The "escape" code is optional: an interpreter need not provide an escape key. (The Inform library clears and quits menus if Escape is pressed.) 10.7.2 The first block of function key codes is as follows: 129: cursor up 130: cursor down 131: cursor left 132: cursor right 133: f1 134: f2 .... 144: f12 145: keypad 0 146: keypad 1 .... 154: keypad 9 10.7.3 The input codes 155 to 251 refer to European accented letters: see the table in §3.4.4. 10.7.4 In Version 6, mouse clicks and menu selections are reported as the function key codes: 252: menu click 253: mouse double-click 254: mouse single-click In Versions 5 and later (except 6), menus are unavailable, and it is legal for an interpreter to translate both forms of mouse-click as code 254. This is the recommended practice but a game should not depend on it. 10.7.5 All the codes not marked as ---- should be passed to read_char. Function key codes and the code for "escape" should not be entered by read into the input buffer (they have no specified appearance on screen), but accented letter codes should. Remarks. Menus in `Beyond Zork' define cursor up and cursor down as terminating characters, and make use of read in the upper window. Ideally, an interpreter should be able to read time delays (for timed input) from stream 1 (i.e., from a script file). In practice this is formidably hard to implement. The `Beyond Zork' story file is capable of receiving both mouse-click codes (253 and 254), listing both in its terminating characters table and treating them equally. 32 11 The format of the header 11.1 The header table summarises those locations in the Z-machine's header which an interpreter must deal with. (For much fuller details, see Appendix A.) "Hex" means the address, in hexadecimal; "V" the earliest Version to which the rule is applicable; "Dyn" means that the byte or bit may legally be changed by the game during play; "Int" means that the interpreter may change it; "Rst" means that the interpreter must set it correctly after loading the game, after a restore or after a restart. 11.1.1 It is illegal for a game to alter those fields not marked as "Dyn". An interpreter is therefore free to store values of such fields in its own variables. 11.1.2 The state of the transcription bit (bit 0 of Flags 2) is only changed by the game (see §7.3, §7.4), but the interpreter ensures that its value survives a restart or restore. 11.1.3 Infocom used the interpreter numbers: 1 DECSystem-20 5 Atari ST 9 Apple IIc 2 Apple IIe 6 IBM PC 10 Apple IIgs 3 Macintosh 7 Commodore 128 11 Tandy Color 4 Amiga 8 Commodore 64 (The DECSystem-20 was Infocom's own in-house mainframe.) An interpreter should choose the interpreter number most suitable for the machine it will run on. (The main consideration is that the behaviour of `Beyond Zork' actually depends on the interpreter number.) 11.1.4 *** The use of bit 7 in `Flags 1' to signal whether timed input is available is new in this document: see the preface. 11.1.5 *** If an interpreter obeys Revision n.m of this document perfectly, as far as anyone knows, then byte $32 should be written with n and byte $33 with m. If it is an earlier (non-standard) interpreter, it should leave these bytes as 0. 11.1.6 The file length stored at $1a is actually divided by a constant, depending on the Version, to make it fit into a header word. This constant is 2 for Versions 1 to 3, 4 for Versions 4 to 5 or 8 for Versions 6 and later. Remarks. See the "Infocom fact sheet" for numbers and letters of the known interpreters shipped by Infocom. Interpreter versions are conventionally the upper case letters in sequence (A, B, C, ...). At present most ports of Zip use interpreter number 6, and most of ITF use number 2. The unusual behaviour of `Beyond Zork' concerns its character graphics: see the remarks to §16. 33 Hex V Dyn Int Rst Contents 0 1 Version number (1 to 6) 1 3 Flags 1: .3 Bit 1 Status line type: 0=score/turns, 1=hours:mins .3 * * 4 Status line not available? .3 * * 5 Screen-splitting available? .3 * * 6 Is a variable-pitch font the default? 4 Flags 1: .5 * * Bit 0 Colours available? .6 * * 1 Picture displaying available? .4 * * 2 Boldface available? .4 * * 3 Italic available? .4 * * 4 Fixed-space font available? .6 * * 5 Sound effects available? .4 * * 7 Timed keyboard input available? 4 1 Base of high memory (byte address) 6 1 Initial value of program counter (byte address) 6 Packed address of initial "main" routine 8 1 Location of dictionary (byte address) A 1 Location of object table (byte address) C 1 Location of global variables table (byte address) E 1 Base of static memory (byte address) 10 1 Flags 2: .1 * * * Bit 0 Set when transcripting is on .3 * 1 Game sets to force printing in fixed-pitch font .6 * * 2 Int sets to request status line redraw: game clears when it complies with this. .5 * * 3 If set, game wants to use pictures .5 * * 4 If set, game wants to use the UNDO opcodes .5 * * 5 If set, game wants to use a mouse .5 6 If set, game wants to use colours .5 * * 7 If set, game wants to use sound effects .6 * * 8 If set, game wants to use menus (For bits 3,4,5,7 and 8, Int clears again if it cannot provide the requested effect.) 18 2 Location of abbreviations table (byte address) 1A 3+ Length of file (see note) 1C 3+ Checksum of file 1E 4 * * Interpreter number 1F 4 * * Interpreter version (single ASCII character) Some early Version 3 files do not contain length and checksum data, hence the notation 3+. 34 Hex V Dyn Int Rst Contents 20 4 * * Screen height (lines): 255 means "infinite" 21 4 * * Screen width (characters) 22 5 * * Screen width in units 24 5 * * Screen height in units 26 5 * * Font height in units 27 5 * * Font width in units (defined as width of a '0') 28 6 Routines offset (divided by 8) 2A 6 Static strings offset (divided by 8) 2C 5 * * Default background colour 2D 5 * * Default foreground colour 2E 5 Address of terminating characters table (bytes) 30 6 * Total width in pixels of text sent to output stream 3 32 1 * * Standard revision number 34 5 Character set table address (bytes), or 0 for default 36 5 Mouse data table address (bytes) 12 The object table 12.1 The object table is held in dynamic memory and its byte address is stored in the word at $0a in the header. (Recall that objects have flags attached called attributes, numbered from 0 upward, and variables attached called properties, numbered from 1 upward. An object need not provide every property.) 12.2 The table begins with a block known as the property defaults table. This contains 31 words in Versions 1 to 3 and 63 in Versions 4 and later. When the game attempts to read the value of property n for an object which does not provide property n, the n-th entry in this table is the resulting value. 12.3 Next is the object tree. Objects are numbered consecutively from 1 upward, with object number 0 being used to mean "nothing" (though there is formally no such object). The table consists of a list of entries, one for each object. 12.3.1 In Versions 1 to 3, there are at most 255 objects, each having a 9-byte entry as follows: ---32 bits in 4 bytes--- ---3 bytes------------------ ---2 bytes-- parent, sibling and child must all hold valid object numbers. The properties pointer is the byte address of the list of properties attached to the object. Attributes 0 to 31 are flags (at any given time, they are either on (1) or off (0)) and are stored topmost bit first: e.g., attribute 0 is stored in bit 7 of the first byte, attribute 31 is stored in bit 0 of the fourth. 35 12 The object table 12.3.2 In Version 4 and later, there are at most 65535 objects, each having a 14-byte entry as follows: ---48 bits in 6 bytes--- ---3 words, i.e. 6 bytes---- ---2 bytes-- 12.4 Each object has its own property table. Each of these can be anywhere in dynamic memory (indeed, a game can legally change an object's properties table address in play, provided the new address points to another valid properties table). 12.4.1 In Versions 1 to 3, a property table has header: -----byte---- --some even number of bytes--- where the text-length is the number of 2-byte words making up the text, which is stored in the usual format. (This means that an object's short name is limited to 765 Z-characters.) After the header, the properties are listed in descending numerical order. (This order is essential and is not a matter of convention.) Each property is stored as a block ---between 1 and 8 bytes-- where the size byte is arranged as 32 times the number of data bytes minus one, plus the property number. A property list is terminated by a size byte of 0. (It is otherwise illegal for a size byte to be a multiple of 32.) 12.4.2 In Versions 4 and later, a property block instead has the form --1 or 2 bytes--- --between 1 and 64 bytes-- The property number occupies the bottom 6 bits of the first size byte. 12.4.2.1 If the top bit of the size byte is set, then there is a second size byte. The bottom six bits contain the property data length (counting in bytes), minus 1, and the top two bits must be $$10. 12.4.2.2 Otherwise, if bit 6 of the size byte is set then the length is 2, and if it is clear then the length is 1. 12.5 It is the game's responsibility to keep the object tree well-founded: the interpreter is not required to check. "Well-founded" means the following: (a) An object with a sibling also has a parent. (b) An object is the parent of exactly those objects in the sibling list of its child. (c) Each object can be given a level n, such that parentless objects have level 0 and all children of a level n object have level n + 1. Remarks. The largest valid object number is not directly stored anywhere in the Z-machine. Utility programs like "Infodump" deduce this number by assuming that, initially, the object entries end where the first property table begins. The reason why the second property size byte needs to have top bits set to $$10 is that the size field must be parsable either forwards or backwards - the Z-machine needs to be able to reconstruct the length of a property given only the address of the first byte of its data. (There 36 12 The object table are very many (e.g. 2000) property entries in a story file, so optimising size into one byte most of the time is worthwhile.) In fact only the top bit of the second byte needs to be set, so it would be extremely easy to modify an interpreter to allow up to 128 bytes of property data. Infocom seem not to have noticed, or not to have needed this. Inform can only construct well-founded object trees as the initial game state, but it is easy to compile sequences of code like "move red box to blue box" followed by "move blue box to red box" which leave the object tree in an ill-founded state. (The Inform library protects the standard object-movement verbs against this.) 13 The dictionary and lexical analysis 13.1 The dictionary table is held in static memory and its byte address is stored in the word at $08 in the header. 13.2 The table begins with a short header: n entry-length number-of-entries byte ------n bytes----------------- byte 2-byte word The keyboard input codes are "word-separators": typically (and under Inform mandatorily) these are the ASCII codes for full stop, comma and double-quote. Note that a space character (32) should never be a word-separator. The "entry length" is the length of each word's entry in the dictionary table. (It must be at least 4 in Versions 1 to 3, and at least 6 in later Versions.) 13.2.1 Note that control codes such as the ASCII for "tab" are never given in the word-separators table: they aren't legal keyboard input codes (an interpreter might sensibly convert a tab to a space). 13.3 In Versions 1 to 3, each word has an entry in the form ------- 4 bytes ------ (entry length-4) bytes The interpreter ignores the bytes of data (presumably the game's parser will use them). The encoded text contains 6 Z-characters (it is always padded out with Z-character 5's to make up 4 bytes: see "How strings are encoded"). The text may include spaces or other word-separators (though, if so, the interpreter will never match any text to the dictionary word in question: surprisingly, this can be useful and is a trick used in Inform 5/12). 13.4 In Versions 4 and later, the encoded text has 6 bytes and always contains 9 Z-characters. 37 13 The dictionary and lexical analysis 13.5 The word entries follow immediately after the dictionary header and must be given in numerical order of the encoded text (when the encoded text is regarded as a 32 or 48-bit binary number with most-significant byte first). It must not contain two entries with the same encoded text. 13.6 Lexical analysis takes place in two circumstances: on request of a tokenise opcode (in which case it can use any dictionary table it likes, in the format above) and during acceptance of a game command (in which case the standard dictionary is used). 13.6.1 First, the text is broken up into words. Spaces divide up words and are otherwise ignored. Word separators also divide words, but each one of them is considered a word in its own right. Thus, the erratically-spaced text "fred,go fishing" is divided into four words: fred / , / go / fishing 13.6.2 Each word is then encoded as a Z-machine string in dictionary form, and searched for in the dictionary. 13.6.3 A "parse table" is then written, recording the number of words, the length and position of each word and the dictionary address of each word which is recognised. For the format, see the sread opcode. Remarks. Usually (under Inform, mandatorily) there are three bytes of data in the word entries, so that dictionary entry lengths are 7 and 9 in the early and late Z-machine, respectively. It is essential that dictionary entries are in numerical order of the bytes of encrypted text so that interpreters can search the dictionary efficiently (e.g. by a binary-chop algorithm). Because the letters in A0 are in alphabetical order, because the bits are ordered in the right way and because the pad character 5 is less than the values for the letters, the numerical ordering corresponds to normal English alphabetical order for ordinary words. (For instance "an" comes before "anaconda".) The Infocom games do contain words whose initial character is not a letter (words such as "#record"). 14 Complete table of opcodes 14.1 This table contains all 117 opcodes and, taken with the dictionary in §15, describes exactly what each should do. In addition, it lists which opcodes are actually used in the known Infocom story files, and documents the Inform assembly language syntax. Reading the opcode tables The two columns "St" and "Br" (store and branch) mark whether an instruction stores a result in a variable, and whether it must provide a label to jump to, respectively. 38 14 Complete table of opcodes The "Opcode" is written TYPE:Decimal where the TYPE is the operand count (2OP, 1OP, 0OP or VAR) or else EXT for two-byte opcodes (where the first byte is (decimal) 190). The decimal number is the lowest possible decimal opcode value (by convention, 256 is added for extended opcodes). The hex number is the opcode number within each TYPE. The "V" column gives the Version information. If nothing is specified, the opcode is as stated from Version 1 onwards. Otherwise, it exists only from the version quoted onwards. Before this time, its use is illegal. Some opcodes change their meanings as the Version increases, and these have more than one line of specification. Others become illegal again, and these are marked [illegal]. In a few cases, the Version is given as "3/4" or some such. The first number is the Version number whose specification the opcode belongs to, and the second is the earliest Version in which the opcode is known actually to be used in an Infocom-produced story file. A dash means that it is seems never to have been used (in any Version). The table explicitly marks opcodes which do not exist in any version of the Z-machine as ------: in addition, none of the extended set of codes from $1d to $ff were ever used. Inform assembly language An Inform line beginning with an @ is sent directly to the assembler. In the syntax below, and must be variables (or sp, the stack pointer);