SMITH# - manual

A SMITH# computer is just an array of cells (currently of 32-bit signed integers). There are no registers to modify, there is no distinguishing between cells and immediates, there are only cells. Therefor, you could very well say that SMITH# is the ultimate virtual machine (were it not for the presence of Java2K)

Instructions

Instructions typically take up at least two cells: one cell whos' number identifies the instruction, and the next cell that identifies the first argument. Because instructions are the same as numbers, you could write code by just writing the respective numbers.

Note: Because of this, there are very many different ways of writing the same statement. For example, the instruction "stop" has the numeric ID of 0, and an alias ".". This in itself gives you three different ways of encoding the stop instruction:

stop
.
0

You can also use the EVAL statement to provide difficult calculations that boil down to the number 0, thereby declaring the instruction "stop". As you can also use the instruction aliases, here are just some of the possible combinations:

eval(=-=)
?(>=->=)
?(stop)
?(stop+cite-(stop+cite))

A complete list of instructions is provided at the end of this document.

The SMITH# Syntax

A SMITH# sourcecode consists of a consecutive list of statements. Each statement has the following syntax:

STATEMENT = [LABEL] WS OPCODE [WS OPERAND { WS ',' WS OPERAND }] WS.

Whitespaces

Whitespaces (labelled "WS" in the syntax) are BLANK, TAB, and NEWLINE. You can have any number of Whitespaces when the syntax says so. Note that the syntax does NOT allow whitespaces in EVAL expressions described below.

Comments

Everything starting with a ';' to the next newline is ignored.

Labels

Every statement can have a label. Since a single line can have more than one statement, a single line can have more than one label; but a single instruction can only have a single label. Labels are not-case-sensitive, and of the typical variety (char, followed by either chars or digits).

It is important to understand that labels are visible only by the compiler, the final code just is an array of integer cells.

The STOP instruction

The STOP instruction halts execution. As already noted above, the instruction has the numeric ID 0, and an alias "."

The CITE instruction

The only way to specify data is by citing it. The cite instruction has the following form

cite <data>

This instruction will fill two cells: with the number 1, and with the number specified by <data>. If you use a string as <data>, you will get many cells. For example,

cite "Help"

has exactly the same effect as

cite 'H' 
cite 'e' 
cite 'l' 
cite 'p' 
cite 0

which, if you are in a bad mood, you could write as

cite'H'"101?(")'l'"'p'1 0

As you can see from the latter, the alternative syntax for cite is " (quotation mark). The ID of the cite instruction is 1. Thus, the following statements are all equal, and all create two cells: 1,0

1 0
cite 0
" 0
" stop
" eval(stop)
" ?(stop)
cite =
cite eval(cite-<>)

I think you agree this gives your creativity a great chance for funny expressions.

Another thing you should note: When you refer to a label, the compiler will replace it by the associated cell number plus one. Why is that ? Well, take a look at this perfectly valid code extract:

a: cite 3
b: cite 2
   add  a,b

Surely, what you want to do is add 3 to 2; however, as the cite instruction actually uses up two cells, you would end up adding one cite instruction to another (which incidently would result in a copy instruction).

The COPY instruction

This instruction copies a certain number of cells from one index to another. Behaviour for overlapping indexes is undefined. Here is the syntax:

copy <from>,<to>,<count>

This instruction will copy <count> cells from cell <from> to cell <to>. The Copy instruction has the numeric ID 1, and the alias ":=". Here is an example:

a:  cite 0
b:  cite 1
c:  cite 54
    copy    c,a,b

The instruction in line 4 will copy the 1 cell from c to a. Because a,b and c are labels, it will actually copy the number 54 to the number 0, thus overwriting it. This is the only way to create code in SMITH#. Note that this is about the only instruction that can reasonably use indirect addressing by using the ^ operator. For example, in the following code:

a:  cite eval(d)
b:  cite 1
c:  cite stop
    copy c,^a,b
d:  add  a,b

Line 4 will insert the instruction "stop" as the very next instruction. If you have already copied part of the code, indirect addresses can always point to the corrected location. There is no need for PC relative coding !!!

The NORM instruction

The NORM instruction compares a cell with the number zero, and stores the scalable result in another cell. The syntax is

NORM <code>,<cell>,<scale>

This instruction first evaluates the expression "cell <code> zero". If the expression evaluates to false, the number zero will be stored in <cell>. If it evaluates to true, the number in the cell <scale> will be stored in <cell>. As usual, all codes have both names and numeric IDs. Here are the possible codes:

code value code value
= 0 <> 1
< 2 > 3
<= 4 >= 5

And here is an example, taken straight from the Hello, World Sample:

   cite "Hello, World"
a: cite 0
b: eval 1
   copy    1,a,b
   norm    =,a,b

The copy instruction will place the character 'H' into cell a. The norm instruction compares it with zero. If it were zero, then the contents of b (that is: 1) would be copied to the variable a. Because it is not, the variable a is filled with 0.

The EVAL Pseudoinstruction

Think of EVAL best as a pseudo instruction used only by the preprocessor. It must always generate a unique number (the expression is not evaluated at runtime, but at compiletime). EVAL expressions are in brackets, must not contain any whitespaces, and can contain *,+,- and / operations. * and / have higher priority than + and -. Note that there is no numeric ID for the EVAL instruction, because it is evaluated at compiletime. However there is an alias, if you become tired of writing all those EVALs, namely "?". One important thing to note is that labels referred to in EVAL expressions are the actual label cells, not label cell plus one (see above). Take a look at the following two code lines

a: cite eval(a)
   cite a

If, a refers to the cell 32, then you will actually have created

   cite 32
   cite 33

Arithmetic operations

As part for my quest to make SMITH more readable, I have added 4 (four!!! one more than three, one less than five !!!) arithmetic operations. Originally, I wanted to use "rad", "rot", "abs", and "arctan", but I stuck with the more conventional set of

ADD or +
SUB or -
MUL or *
DIV or /

Remember, that typing

+1,A

will not add 1 to cell A, but rather the contents of cell 1 to the contents of cell A.

Logical operations

SMITH# has NAND as its only logical operator. As you know, NAND is sufficient to implement the more common AND, OR and NOT operations; and it is much more economic to use one logical operator instead of three. After all, one command is surely easier to remember than three commands, right ?

Well, to evaluate "B=(B NAND A)", the syntax is

NAND A,B

The alternate syntax for NAND is "~&" (easy to remember, NOT AND).

Doing I/O

There are two functions for doing IO, that are pretty self-explanatory. Unfortunately, in the current version of SMITH#, only the output instruction works.

OUTPUT or <<
INPUT or >>

Both opcodes take one argument, that is the address of the cell to output.

Syntax overview

The following syntax is more or less taken directly from the implementation (or vice versa, the implementation was done from it). Which came first, the egg or the chicken ?

Item Definition
SYNTAX {STATEMENT}.
STATEMENT [LABEL] WS OPCODE [WS OPERAND { WS ',' WS OPERAND }] WS.
LABEL IDENTIFIER ':'.
WS { ' ' | '\t' }.
OPCODE DECINT | OPCODEID | EVAL | NORMCODE.
OPCODEID 'COPY' | 'STOP' | 'NORM' | 'ADD2' | 'ADD3' | ... .
OPERAND OPCODE | CITE.
CITE 'CITE' ( OPCODE | '"' {ASCII} '"' | ''' ASCII ''' ).
NORMCODE '=' | '<>' | '<' | '>' | '>=' | '<='.
EVAL 'EVAL' WS '(' WS EXP0 WS ')'
EXP0 ['+'|'-'] EXP1 { ('+'|'-') EXP1}.
EXP1 EXP2 { ('*'|'/') EXP2}.
EXP2 EXP3 | ( '(' EXP0 ')' ).
EXP3 DECINT | IDENTIFIER.
IDENTIFIER CHAR { ( CHAR | DECDIGIT ) }.
DECINT DECDIGIT { DECDIGIT }.
DECDIGIT '0' .. '9'.

Opcode overview

And here now is a list of opcodes that are possible

code value code value
STOP . = 0
CITE " <> 1
COPY := < 2
NORM @ > 3
ADD + <= 4
SUB - >= 5
MUL * 6
DIV / 7
OUTPUT << 8
INPUT >> 9
NAND &~ 10

For example, to add two variables, you could write ADD, or +, or <=, or 4, or any EVAL code, that results in the number 4, such as

eval(OUTPUT/2), eval(<<>=<=).