INTRO TO TIERRA AND MINEV This document is intended to provide a quick introduction to the basic concepts and ideas involved in Tierra and Minev. (See the README.1st file to get instructions on how to operate Minev.) I assume you have some knowledge of programming and hopefully at least a glancing brush with assembly language. There are references at the end for those interested in more details and more recent results. I. Evolution and how to make it work for you A. The three ingredients of evolution. Tierra (and thus Minev) creates an environment where virtual organisms can reproduce, mutate, and die. Given just these three conditions, evolution takes off from there. Many "genetic algorithms" and other forms of artifical evolution have been tried over the years, with interesting and varied results, but Tierra is almost unique in that the organisms are not constrained to select from a finite pool of traits. They are capable of inventing entirely new ways of making a living. B. How Minev does it. i. Minevian programs. Organisms in Minev are small assembly-language programs along with a "CPU" to run them. Minevian creatures execute in a preemptive multitasking environment. (That is to say, each organism gets a turn to execute a few instructions, then the OS gives the next cell a chance.) This simulates (poorly) a true parallel architecture. The creatures live in a "soup" which holds 32767 (32K-1) opcodes. Cells can "own" at most two chunks of the soup. Only the owner of a chunk of the soup can write to it. Anyone can read and execute any part of the soup, however, and if a chunk of soup isn't owned by anyone, anybody can write to it. A cell owns its own code, and can allocate space for a child. The CPUs the creatures use have two address registers (AX and BX), two arithmetic registers (CX and DX), a stack (five deep), an instruction pointer which indicates the address in the "soup" where the next opcode will be read, and a 'flag' register that is set when a program makes an error. CPU exceptions are not immediately lethal to an organism. If an opcode produces an error, a penalty is taken (see "Death", below) and execution proceeds to the next opcode. Minevian opcodes are five bits long, so there are only 32 opcodes. None of the opcode take operands. For example, PUSHA puts a copy of the contents of the AX register onto the stack. There is a separate PUSHB instruction to push register BX onto the stack. ii. Special features of the Minevian opcode set A full list of the opcodes and their functions appears in the README file. This is intended to give a quick into to some non-obvious features of the Minevian "language". As mentioned before, the opcode set is extremely small. This is intentional, as Prof. Ray wanted to have the number of opcodes be of the same order of magnitude as the number of amino acids in terrestrial proteins. Even so, the Minevian language is Turing- complete. This means, essentially, that anything you can do in any other assembler language you can do in Minevian. (You'd never *want* to, but that's beside the point.) Probably the single most unusual feature of Minevian operation is address-by-template. Programs do not have to jump to specific addresses in Minev. Instead, jump locations are marked by 'templates' composed of no-ops. (There are two no-ops in the Minevian language, NOP1 and NOP0.) For example, the sequence JMP NOP1 NOP1 NOP1 NOP0 cause the CPU to search the nearby soup for the sequence NOP0 NOP0 NOP0 NOP1, and continue executing after this complementary template. This is analogous to how many biological processes work, where molecules or membranes present surfaces that match complementary surfaces on other molecules or membranes. iii. Death In the original Tierra, organisms were placed on a queue at birth. When the soup started to fill up too far, a "reaper" function would go in and kill off cells until the soup wasn't so crowded. I couldn't afford the memory to keep such a list around, so I handled things slightly differently. At birth, cells are given a random "lifespan", which is the maximum number of instructions they can execute before they die. If a cell raises a CPU exception (by, e.g. trying to write to soup it doesn't have write priveleges to) a penalty is subtracted from the lifespan (e.g, 10 instructions). In this manner, flawed creatures will tend to die off more rapidly than viable ones. iv. Reproduction. A typical program will determine how much size to allocate, allocate that amount of space, copy its own code into that space, and then "divide". When a cell divides, it loses the ability to write to the child's soup, but the child is allocated a CPU, given ownership of the soup it's in, and begins executing. It is important to note that no 'fitness function' is applied to organisms besides the brute fact of whether or not they reproduce. An organism could theoretically arise that computed the value of pi, but if it didn't also reproduce it would go extinct in short order. 'Species' that are better at reproducing will tend to stick around, and ones that are poor at reproducing will die off. v. Mutation If Minevan programs always copied themselves perfectly, there wouldn't be any evolution because nothing would ever change. Since this is an evolution simulator, it's a good thing that mutation can occur. There are three major means by which mutations occur: "Transcription errors", "CPU failures", and sloppy reproduction. Minevian organisms sometimes make mistakes when copying an instruction. At a low rate (e.g. 1/2000 of the time) a random bit willl be flipped in the output, causing the child to differ from the parent, analogous to a "transcription error" in biological DNA. Also, the CPUs fail at a low rate. Sometimes (e.g. 1/10000 of the time) the ADD opcode (which adds DX to CX) might add twice or not at all. Other opcodes have similar failure modes. Finally, Minevian organisms are sloppy reproducers, and often modify their own code or incorporate the code of another organism into their own. For example, most programs blindly assume that a soup allocation will succeed. If it doesn't, the program may overwrite itself or do something else equally embarassing. II. Results The following are some genomes that have appeared in Minev over the course of development. All the following genomes, and several others, are in the "genomes" directory included in this distribution. A. The ancestor organism. Assuming you've read this far, you're interested. So let's see how a Minevian organism works. This creature is a port of one of the first organisms Prof. Ray ever wrote. It wasn't designed to be evolvable - in fact, it was designed to help debug Tierra! ; Genome: gen82.gen ; Size: 82 nop1 ; Template marking start of creature nop1 nop1 nop1 adrb ; Search back for complementary template, result to AX nop0 ; (Complement to start template) nop0 nop0 nop0 zeroc ; Clear CX register lowc ; Flip low bit of CX (sets CX to 1) shl ; Shift CX right (CX = 2) shl ; Shift CX right (CX = 4) pushc ; Push CX onto stack popd ; Pop stack into DX (now DX = CX) pusha ; Push AX (address of "adrb") onto stack popc ; Pop stack into CX (now CX = AX) sub ; Subtract DX from CX, result into CX (CX holds start of org.) pushc ; Push CX onto stack popb ; Pop stack into BX (BX holds start of creature) pushc ; Push CX onto stack popd ; Pop into DX (DX = BX = CX = start of creature) adrf ; Search forward for end of creature nop0 ; Complement to end of creature marker nop1 nop0 nop1 pusha ; Push AX (end of creature) onto stack popc ; Pop into CX (CX = AX = end of creature) inc ; Increment Cx to include dummy statement at end sub ; Subtract DX from CX, CX = size of creature nop1 ; Reproduction loop start template nop0 nop0 nop1 mal ; Allocate daughter cell (address of child to AX) call ; Call copy loop nop1 ; Copy loop start complement nop1 nop0 nop0 div ; Divide, give child CPU jmpb ; Jump to start of reproduction loop nop0 ; Complement of repro. loop start template nop1 nop1 nop0 ifz ; Dummy statement to separate templates (never executed) nop0 ; Copy loop entry template nop0 nop1 nop1 pushc ; Push size onto stack (to be restored later) nop1 ; Copy loop start template nop0 nop0 nop0 dec ; Decrement CX (CX = CX - 1) movii ; Copy soup at BX+CX to soup at AX+CX ifz ; If CX = 0, execute next instruction, else skip it jmpf ; Jump out of copy loop (only if CX = 0) nop0 ; Copy loop exit complement nop0 nop0 nop1 jmp ; Jump to beginning of copy loop nop0 ; Copy loop beginning complement nop1 nop1 nop1 ifz ; Dummy to separate templates (never executed) nop1 ; Copy loop exit template nop1 nop1 nop0 popc ; Restore saved size to CX from stack ret ; Return to reproduction loop nop1 ; End of creature template nop0 nop1 nop0 ifz ; Dummy statement to separate creatures If we analyze this program, we see that it takes 35 instructions to start up (figure out where it begins and ends, and from that determine its size). Then it enters an infinite loop for reproduction, where it allocates space for a child, calls the "copy subroutine", divides, and then repeats. This takes 8 instructions to execute. The copy procedure first saves the size of the organism to the stack (5 instructions), then proceeds to copy itself into the space for the child (8 instructions per instruction copied). (Note that it copies itself from bottom to top.) When it finishes, it exits the loop, restores the size, and returns back to the reproduction loop (6 instructions). So, it takes 35+8+5+(8*81)+6 (702) instructions to copy itself the first time, and then 8+(8*81)+6 (662) instructions thereafter. This is not a particularly efficient algorithm, and you can probably already see ways to improve it. (Before you read on, I suggest trying to come up with smaller, faster programs.) B. Parasites The first new kind of organism that arises is almost always a parasite. Since an organism can read and execute another cell's code, programs arise that execute part of the code of another nearby organism. This means that the parasite doesn't have to include that code in its own genome, and thus is smaller than a program that includes all of its own code. Thus, the parasite can copy itself faster and has a competitive advantage. It is at a disadvantage in that if there isn't a host within search range, it will fail to reproduce at all. You tend to see 'predator/prey' cycles where parasites grow in number at the expense of the hosts, until there are so few hosts that the parasites die off, and then the hosts rebound, and then the *parasites* rebound, etc. Here's a parasite of size 53. Although it was not intended, only a couple mutations are needed to change a host to a parasite (the mutations are marked with '*'s). ; Genome: gen53.gen ; Size: 53 nop1 ; Template marking start of creature nop1 nop1 nop1 adrb ; Search back for complementary template, result to AX nop0 ; (Complement to start template) nop0 nop0 nop0 zeroc ; Clear CX register lowc ; Flip low bit of CX (sets CX to 1) shl ; Shift CX right (CX = 2) shl ; Shift CX right (CX = 4) pushc ; Push CX onto stack popd ; Pop stack into DX (now DX = CX) pusha ; Push AX (address of "adrb") onto stack popc ; Pop stack into CX (now CX = AX) sub ; Subtract DX from CX, result into CX (CX holds start of org.) pushc ; Push CX onto stack popb ; Pop stack into BX (BX holds start of creature) pushc ; Push CX onto stack popd ; Pop into DX (DX = BX = CX = start of creature) adrf ; Fails (no template) *adrf ; MUTATION: now searches for end of creature nop1 ; Truncated template nop0 nop1 pusha ; Push AX (end of creature) onto stack popc ; Pop into CX (CX = AX = end of creature) inc ; Increment Cx to include dummy statement at end sub ; Subtract DX from CX, CX = size of creature nop1 ; Reproduction loop start template nop0 nop0 nop1 mal ; Allocate daughter cell (address of child to AX) call ; Call copy loop nop1 ; Copy loop start complement nop1 nop0 nop0 div ; Divide, give child CPU jmpb ; Jump to start of reproduction loop nop0 ; Complement of repro. loop start template nop1 nop1 nop0 ifz ; Dummy statement to separate templates (never executed) *nop0 ; MUTATION: not used nop0 ; < New end of creature marker nop1 ; < nop0 ; < nop1 ; Never used If you follow the algorithm through, you see that it takes 465 instructions+1 exception penalty (typically 10 instructions) to copy the first time, and 430 thereafter. This is a big improvement, and parasites are quite sucessful reproducers. C. Optimization i. Template reduction Programs very quickly hit on various forms of optimization that improve their ability to reproduce. One obvious optimization is to reduce the size of the templates where possible. Not only does this mean less no-ops to execute, but less no-ops to *copy*, which is even more important, as we saw in the ancestor above. Here's an example from an organism of size 45. Note how small the templates are: ; Genome: 45-492-1.gen ; Size: 45 call ; Pushes current address onto stack; see note below popc ; Load CX, BX, and DX with address of creature start pushc popb pushc popd adrf ; Find end of creature nop1 nop0 nop1 pusha ; Copy end to CX popc sub ; Get size of creature nop1 ; Reproduction loop template nop0 nop0 mal ; Allocate space for child call ; Call copy subroutine nop1 nop0 div ; Make daughter jmpb ; Jump to top of copy loop nop0 ; Note double use of template - used as both nop1 ; reproduction loop complement and nop1 ; copy loop entry pushc ; Save size onto stack nop1 ; Junk nop1 ; Copy loop template nop0 nop0 dec ; Keep track of instructions copied movii ; Copy an instruction ifz ; Check for end of copy loop jmpf ; Exit copy loop nop1 ; Copy loop exit complement jmp ; Jump to top of copy loop (not done yet) nop0 ; Copy loop complement nop1 nop1 popc ; Restore saved size ret ; Return to repro loop nop1 ; Template marking end of creature nop0 nop1 nop0 This one takes 240 instructions to copy itself, and is therefore 2.76 times as efficient as the 82-line ancestor. Note that the copy loop, the most expensive part of the creature, has been improved to only execute 5 instructions per copy, instead of 8 in the ancestor. One feature of this organism requires special explanation. In Minev, the "call" opcode has two distinct phases. First, it pushes the current IP onto the stack (which happens to be the address of the "call" opcode) and then it jumps to the complement of the template that follows it. Even if one phase fails the other may succeed. That is, even if the stack is full and the IP cannot be pushed onto the stack, the jump may succeed; or the IP may be pushed, but the jump may fail because there isn't a template. Many organisms hit upon this as a wonderful way to quickly find their own beginning. They execute a 'call' at the start of the program and then pop the address into an appropriate register, thus finding their start in two steps. (Well, there is a CPU execption raised, so they do suffer a penalty, but usually the benefits outweigh the drawbacks.) I didn't plan this when I first implemented the 'call' opcode, but the organisms 'discovered' it nonetheless. Cute, eh? :-> If you really don't like it, go ahead and change how 'call' works. You've got the source, right? ii. Template reuse ("cooperation" or "sociality") Another optimization has to do with the way Minev allocates memory. Usually, when a cell allocates space for a child, it requests the slot immediately after the parent. Over time, you tend to get 'clots' of organisms of the same type all crammed together. In this case, some organisms use the end template of the previous cell to mark their own beginning. They 'assume' that the organism immediately prior to themselves is the same 'species' as they are. This allows them to be smaller than they would otherwise be, but if their "assumption" is invalid, lord knows what might happpen... Here's an example, from an organism of size 66: ; Genome: 66-22-55.gen ; Size: 66 adrb ; Find start of creature (see the end of this one) nop1 ; Matches end of previous organism pusha ; Copy address into CX popc nop1 ; Junk pushc ; Copy CX to BX and DX popb pushc popd adrf ; Find end fo creature nop0 nop1 nop0 nop1 pusha ; Put end of cell in DX popc sub ; Find size of organism nop1 ; Repro loop template nop0 nop0 nop1 mal ; Allocate space for child mal ; Fails call ; Call copy loop nop0 nop0 div ; Give child its own CPU jmpb ; Jump to top of repro loop nop0 ; Repro loop complement nop1 ; Note: call to copy loop jumps to here... nop1 nop0 ifz ; Junk (not used) nop0 nop0 nop1 nop1 pushc ; Save size onto stack nop1 ; Copy loop template nop0 nop0 nop0 dec ; Copy instructions movii ifz ; Check for copy done jmpf ; Exit copy loop nop0 nop0 nop0 nop1 jmp ; Keep copying nop0 ; Copy loop complement nop1 nop1 nop1 ifz nop1 ; Copy loop exit template nop1 nop1 nop0 popc ; Retore saved size ret ; Return to repro loop nop1 ; End of creature nop0 nop1 nop0 ; Following creature finds this no-op... iii. Loop unrolling One of the cutest of the optimizations observed makes use of the idea of "unrolling the loop". (Many modern compilers, e.g. gcc, implement this idea when optimizing code). In any loop, you have the part of the code that does useful work, and the 'overhead' of keeping track of the loop count and jumping to the top of the loop or exiting. You can sometimes make the loop run faster if you duplicate the useful part multiple times. For example, dig this: ; Genome: 54-61-2.gen ; Size: 54 call ; Get address of start of creature popc ; Put start into CX nop1 ; Junk pushc ; Copy start into BX and DX popb pushc popd adrf ; Find end of creature nop0 ; End-of-creature complement nop1 nop0 popc ; Junk (stack empty at this point) pusha ; Copy end address into CX popc sub ; Get size of creature nop1 ; Repro loop template nop0 nop0 nop1 mal ; Allocate child call ; Call copy loop nop1 ; Copy loop complement nop0 nop0 div ; Make daughter cell jmpb ; Jump to top of repro loop nop0 ; Repro loop complement nop1 ; (Also copy loop complement nop1 nop0 pushc ; Save size of creature on stack nop1 ; Copy loop top nop0 nop0 mal ; Junk (memory already allocated here) dec ; Multiple copies of the 'useful' copy code movii dec movii dec movii ifz ; Check for copy done jmpf ; If done, exit loop nop0 ; Exit complement jmpb ; Jump to top of copy loop nop0 ; Copy loop complement nop1 nop1 ifz ; Junk (CX always zero here) popc ; Restore saved size ret ; Return to repro loop nop1 ; Template marking end of genome nop0 nop1 Note how the 'dec movii' pair is duplicated three times. Note also that the size of the creature, 54, is an exact multiple of three. Look at the copy loop for a moment. It executes 10 instructions to copy three opcodes. So, it averages 3 1/3 instructions per copy - much better than the eight its ancestor took. This little beauty takes 195 instructions to copy itself, making it 3.39 times more efficient than the 82-line ancestor above. This took a run of almost three billion instructions to evolve. iv. Using the 'ifl' instruction In Minev 1.0, the 'ifl' instruction was hopelessly broken and could never do anything useful. After I fixed it, in the very first test run the organisms figured out a way to use it. For example, take a look at this copy loop: nop1 ; Copy loop template nop0 nop0 nop0 dec ; Copy instructions movii ifl ; Check to see if the 'movii' failed jmpf ; Exit copy loop nop0 nop0 nop0 nop1 jmp ; Keep copying nop0 ; Copy loop complement nop1 nop1 nop1 Now, to exit the loop, it doesn't check to see if CX is zero. Instead it checks to see if the 'movii' failed. This still works, usually. As soon as it tries to write to memory it doesn't own (before the child, almost always) the 'movii' will fail, and the program exits the copy loop. It has gone through the loop one extra time, and paid an exception penalty, but otherwise it works fine. Why would a program bother with this? Why is this an advantage? It seems like this would be a *dis*advantage, and it would be, in a *perfect* Minevian world. But Minev has limits, and things go wrong. Consider what happens when a program tries to allocate a child and fails. The old, 'ifz' organism will blindly try to copy itself anyway, wasting hundreds of instructions and incurring an exception penalty on every 'movii'. The new, 'ifl' organism will notice the problem on the first 'movii' and break out of the copy loop far earlier. If you enable the statistic recording option, you will see that in a mature soup, less than 10% of the allocations succeed. Organisms tend to 'clot' near each other, and fragmentation of the soup becomes noticeable before too long as well. In this sort of environment, having the 'ifl' mutation is a powerful advantage. However, it doesn't show up in every run, useful as it is. It only shows up when the 'Hamming distance' between 'ifz' and 'ifl' is 1 - that is, when the opcode numbers of the two instructions differ by only one bit. Otherwise, it's too 'hard' for an organism to get there. It would have to mutate that 'ifz' instruction into something else first, and that's immediately fatal. (In the default opcode ordering, BTW, the Hamming distance is 4 - as far as it can be. I'd suggest using the 'd' command or changing the default order in 'minev.c' if you want to see this in action.) v. The current record Here are the smallest organisms that have yet evolved in Minev. Note that they use many of the above optimizations, including template reuse, template reduction, and a special jump method: ; Genome: 32-188-1.gen ; Size: 32 call ; Get address of start popc ; Copy start into CX, BX, and DX pushc popb pushc popd call ; Save addresses to return to call adrf ; Find end of creature nop1 ; End of creature complement nop0 call ; Save address to return to pusha ; Move address of creature end to CX popc div ; Fails ('junk code') sub ; Get size of creature mal ; Allocate space for child nop0 ; Copy loop template nop0 dec ; Decrement CX, copy inst movii ifz ; Check if loop exit jmpf nop0 ; loop exit complement jmp ; jump to top of loop nop1 ; Used twice, as both complement to copy loop and as exit nop1 div ; Give child its own CPU popc ; clear nearest address ret ; jump to 2nd call (find end, recompute size) nop0 ; Template marking end of creature nop1 Here's a variation from a few billion instructions later: ;Genome: 32-552-1.gen ; Size: 32 call ; Push address of top of creature onto stack popc ; Load CX, BX, and DX with start of creature pushc popb pushc popd call ; Load address to return to adrf ; Find end of creature nop1 nop0 dup ; Push address onto stack again pusha ; Load CX with end of creature popc swap ; Flip top two entries on stack (equal, so no effect) div ; Fails dup ; Push address for the third time sub ; Get size of creature mal ; Allocate child nop0 ; Top of copy loop template nop0 dec ; Keep track of instructions copied movii ; Copy an instruction ifz ; Check for loop exit jmpf ; Jump out of copy loop nop0 ; Copy loop exit complement call ; Jump to top of copy loop nop1 ; Double use - exit of copy loop, complement of top nop1 div ; Make kid ret ; Jump to first 'call' nop0 ; End of creature template nop1 Even though these organisms include many 'junk' opcodes and cause multiple CPU exceptions over the course of their execution, they are actually up to 3.8 times as efficient as the original ancestor! (The first takes as few as 174 instructions to copy itself, plus 2 exception penalties.) The second genome has a truly bizarre algorithm that's not as efficient as the first. But it works. It's worth following through it yourself to see how it manages to copy itself. It's *strange*. But it works, and that's all evolution cares about. vi. The smallest possible. So far as I can tell, you can't have a self-contained (i.e. not a parasite and not 'social') Minevian organism that's smaller than 22 instructions, and that includes taking advantage of some funky features of the Minevian opcode set. There are many possible (millions, actually) organisms of size 22, but here's an example: ; Genome: gen22.gen ; Size: 22 call ; Use trick to read start address div ; Fails first time, works subseqently popb ; Get start address pushb ; Copy address to DX popd adrf ; Get address of the end of the creature nop0 pusha ; Move address to CX popc *inc ; Increment CX (include dummy statement at end of program) sub ; Get size of creature (CX = CX - DX) pushb ; Save address of start of creature - once to return pushb ; there, once to simulate "call" mal ; Allocate space for child nop0 ; Copy loop start template dec ; Keep track of instructions copied movii ; Copy from parent to child ifz ; If we're done... ret ; ...jump to the 'div' statement above jmpb ; ...else jump to top of copy loop nop1 ; Copy loop start complement (also marks creature's end) *ifz ; Dummy statement to separate creatures This sucker takes as little as 102 instructions to copy itself! The first time through, it takes 103 instructions and causes two CPU exceptions, but it's still pretty good. Note that you could make it even more efficient by using the 'loop unrolling' trick described above, but then it would be 24 instructions. If you can come up with a smaller (self-contained) organism, let me know, I'd be interested. (Note: If you remove the instructions marked with a '*' above, you get a 20-line creature that takes only 93 instructions to copy itself, but if the statement after the last 'nop1' is a no-op, the creature will fail. That's just about the limit in terms of size, though.) REFERENCES Ray, T.S. 1991. An Approach to the Synthesis of Life. _Artificial_Life_II_, 371-408. C. Langton, C. Taylor, J. D. Farmer, & S. Rasmussen (eds.), Addison-Wesley: Redwood City, CA. Ray, T.S. 1994. An Evolutionary Approach to Synthetic Biology: Zen and the Art of Creating Life. _Artificial_Life_1(1/2)_: 195-226 Ray, T.S. Thearling, K., Evolving Multi-cellular Artificial Life, _Proceedings_of_Artificial_Life_IV_, R. Brooks and P. Maes (eds.), MIT Press: Cambridge, July 1994. For more information about artificial life in general and Tierra in particular, try http://alife.santafe.edu/, the Santa Fe Institute.