Notes on a really minimal Tierra-style program (Sept. 2001):

I may never have time to implement this, but in case anyone wants to tackle it:

Tierra uses 32 opcodes, which can be encoded in 5 bits. If we only have 16 opcodes, operations can fit in 4 bits, or two ops per byte. Assuming a 32K "soup", 64K instructions could be stored. This allows a fairly large simulation to fit into a small space. The current target is a system that can run on a Commodore 64, a 6502/6510A CPU machine with 64K total RAM.

This is not expected to be practical, as a 6502 machine is deathly slow by modern standards. (Organism instructions / second is unlikely to get beyond double digits.) It may not be possible at all unless the native operating system can be overwritten (see below).

On the other hand, in modern machines the entire system should fit into the L2 or even L1 cache. (The AMD Athlon, for example, has 128K of L1 cache.) This should help offset any slowdown on modern machines for using 16-bit and 8-bit operations, for which they are not optimized.

Another possible target is a Palm OS device. They have strong limits on stack space and global variables, but this implementation should fit well within those limits. Also, the CPU may be just barely fast enough to make the simulation practical.

Other considerations make the Palm platform attractive. Most recent models can display 4-bit grayscale images, allowing a natural way to display extracts from the soup to the screen - opcodes naturally map to gray levels.

Example opcode set:

Addressing instructions:

Bit patternNameStack EffectDescription
0100adr(-- "addr just after complementary template")Reads "template" of no-ops immediately following, pushes addr after complementary template, if found]
0101jmp(--)Sets IP to addr after complementary template]

Arithmetic instructions:

0010sub(x y -- x-y)
0011add(x y -- x+y)
0110dec(x -- x-1)
0111inc(x -- x+1)

Stack operations:

1011swap(x y -- y x)
1110dup(x -- x x)
1010over(x y - x y x)
1111rot(x y z -- y z x)
1100drop(x --)

Branch instructions:

1101ifz(bool -- )If bool == 0, exec following instruction, else skip

"Metabolic" instructions:

1001maldiv(size -- "addr of newly allocated child") OR (--)If no child already, allocates; if child, divide - pushes addr of child onto stack if child allocated
1000copy(source destination offset -- source destination offset)Copies source+offset to destination+offset, does not consume arguments

Notes on opcode numbering:

Alternative opcode sets:

It would be nice to include a "shl" or "zero" (push zero onto stack) to build arbitrary numbers in concert w/inc and dec, but it's hard to use one without the other. We might consider dropping "inc" and "dec", and inserting "one" (push 1 onto stack) and "shl". We could then implement inc and dec with "one add" and "one sub".

Even then there is no operation to move data from the soup to the stack or vice versa, so it is doubtful that the language is Turing-complete. Although desirable from a theoretical point of view, this does not appear to be strictly necessary for interesting evolution to occur.

CPU struct:

u16ipinstruction pointer
u16meaddress of organism (offset into alloc array)
u16kidaddress of child if any (offset into alloc array)
u8spstack pointer

Total: 24 bytes

Implementation notes:

Expected data size totals:

32K for soup (64K total slots, two per byte)
24K for cpu array (1024 CPUs, 23 bytes/cpu) (may need to be trimmed)
4K for allocation tracking
60K total data (plus a few state variables, list head ptrs and such)

Assuming 64K total memory, this leaves 4K for code, variables, and stack. Forth is the most likely high-level language to implement this in such a small space. A highly optimizing C compiler might work. However, it seems probable that some of the system, if not all, will need to be written in assembly for the C-64 target.

Cutting the above values in half, however, gives a total of 30K and leaves ample space for code and stack. This will further limit the usefulness of the system but may be the only way to get it done without having to write a custom OS and/or Forth implementation.

Example organism, 54 instructions:

 0   0       (Template 1)
 1   0
 2   adr
        stack: ("start addr + 2")
 3   1       (Template -1)
 4   1
 5   dec     (account for template size)
 6   dec
        stack: ("start")
 7   dup
        stack: ("start" "start")
 8   adr
 9   1       (Template 2)
10   0
11   0
12   inc     (include end dummy)
        stack: ("start" "start" "end addr+1")
13   swap
14   sub
        stack: ("start" "size of org")
15   swap
        stack: ("size" "start")
16   1       (Template 3)
17   1
18   1
19   over     (preserve size and start for next allocation)
        stack: ("size" "start" "size")
20   dup
        stack: ("size" "start" "size" "size")
21   maldiv  (allocate child)
        stack: ("size" "start" "size" "child address")
22   swap
        stack: ("size" "start" "child" "size")
23   0       (Template 4)
24   1
25   0
26   dec     (keep track of what's copied)
        stack: ("size" "start" "child" "offset")
27   copy    (copy start+offset to child+offset)
28   dup
        stack: ("size" "start" "child" "offset" "offset")
29   ifz
        stack: ("size" "start" "child" "offset")
30   jmp     (exit copy loop)
31   1       (Template 5)
32   1
33   0
34   jmp     (to top of copy loop)
35   1       (Template -4)
36   0
37   1
38   (dummy)
39   0       (Template -5)
40   0
41   1
42   drop    (drop offset (now zero))
43   drop    (drop child addr)
        stack: ("size" "start")
44   maldiv  (divide, give child its own CPU)
45   jmp     (jump to outer allocate loop)
46   0       (Template -3)
47   0
48   0
49   (dummy)
50   0       (Template -2)
51   1
52   1
53   (dummy) (To separate creatures)

This organism executes 457 instructions to create the first child, then 439 for each subsequent copy. By thinning and reusing templates, it would be possible to reduce the size of the organism to 44 ops, possibly less.

Note, this organism only uses 14 of the opcodes (it doesn't use "add" and "rot").

Possible names: