Overview of the Prion Library

A few points:

  • If you are new to the field of GP, please utilize the List of GP Resources linked on the Prion main page before delving into the Prion library. I am assuming that you know the basics already.
  • I will flag some headings as "Class". This is to indicate that these objects have a literal class coded for them in the library.
  • For other headings, I will utilize the term "Concept", these headings address underlying assumptions or implied constructs that are made by the library, but are not actual classes in the code (i.e. you cannot instantiate a sequence, it is a concept, but you can instantiate a sequencer, which is a class)
  • I will present a few classses that are included as "Helpers". These classes are intended to make working with the library more intuitive, but you are not obligated to utilize them if they are not needed in your implementation
  • I have included a diagram of what a typical flow would look like for a full implementation at the bottom of this page
  • I have a link for the full code for the examples shown at the bottom of this page
  • For advanced concepts and implementations, you can search the Research articles with the tag "PRION"
  • All of the code presented below can be copied directly into a python file and will run as advertized as long as the file prion.py is in the same directory

The Processor (Class):

The core of the "machine", It is a simulated stack processor. This design choice was made for the following reasons:
  • In certain areas, stack-based machines demonstrate superior performance (in most of the remainder they seem to function no better or worse)
  • Stacks are an intuitive data structure even to a novice
  • Code to manipulate stacks is easy to generate in multiple languages
As the registers of this virtual processor are stacks, we need to be careful about implementation decisions. I do want provide maximum capabilities and flexibility from a library point of view, but we need to think very carefully about the constraints that we place on or processor (and also assume that these constraints are not so rigid that they cannot be easily overridden by the implementer if needed). It may be a tad pedantic at this point but there are some basic and very important points that require some dwelling upon upfront:
  • We can assume that we will get an array of objects referred to as inputs
  • Inputs can be indexed by integer
  • When inputs are addressed by index, they yield a single, valid value
  • We will have a set of at least one (sometimes more) stack objects which will serve as registers for the processor
  • The element left on the top of each register after sequence evaluation (definition of sequence to be made shortly) will be considered the "output" of that sequence
  • Registers respond as expected to "push" and "pop" operations
  • Registers will always contain a relative zero (even if popped empty)

The Codon (Concept):

The codon is not a class in the Prion framework, but it is the building block on which the sequence (a program generated for the virtual stack processor) is based. While we want to keep our constraints minimal from the perspective of the framework, we do have to make some assumptions about the nature of a codon in order to build a consistent framework. At a bare minimum we will assume the following about codons:
  • They are stored as a list (implying the sequence will be a list of lists
  • A codon must contain at least one element to be considered valid (an opcode)
  • If a codon contains two elements, the first will be an opcode and the second must be a valid stack index
  • Greater than two elements are also possible but if the opcode cannot utilize them it will disregard them

The Genome (Class):

The Genome class is going to be what you will ultimately be using to initialize almost every other object in the system. I am keeping it inherently simple. I am only making the following assumptions to build the base class:
  • It has a type of ERC (Ephemoral Random Constant)
  • It has an intrinsic idea of bounds for whatever the type of ERC is requested
  • These bounds can sensibly be arranged so that xmin <= xmax (in the default case [0. 1]
  • It is capable of generating a single line of LGP code for our stack processor (a codon) and you have at least three available types [INPUT, ERC, FUNCTION]
  • You can ask for a specific class of codon, or simply call the basic codon() function to get one of them randomly (with equal weight)
The default genome contains the following functional primatives:
  • INP: Push an Input value on a designated stack
  • PUSH: Push an ERC value on a designated stack
  • POP: Remove a value from the TOP of a designated stack register
  • ADD: Add the top 2 values on a designated stack and push the result back onto that stack
  • SUB: Subtract the top - 1 value on a designated stack from the top and push the result back onto that stack
  • MULT: Multiply the top 2 values on a designated stack and push the result back onto that stack
  • DIV: Divide the top value by the top - 1 value on a designated stack and push the result back onto that stack (if top - 1 is zero, rotate top and top - 1, but do not divide)
  • IF: Store top three elements, if top is greater than top - 1, push top. Otherwise, push top - 2
A default genome can be created within the library fairly easily. It will assume that you want to utilize the default function set above unless you specify a different set. Also, it will assume that the ERC type is an integer that should fall between [0, 1]. This assumption is made so that even without specifying an ERC type and range, the default genome can still inherently solve logic problems. Furthermore, it makes the system more fault tolerant as the e_codon() function should return a valid value in all cases (even if it is never called/used). The only constraint on the initialization of the default genome is that you must specify the input and output width. So for a problem which takes 3 inputs (an x, y and z for instance) and returns a single output, you would create it as as such:
#!/usr/local/bin/python

execfile("prion.py")
Gen = Genome(3, 1)

The Sequence (Concept):

A valid codon follows the three template types. All codons are lists of one or more elements as demonstrated above. A list of valid codons of one or greater length for the specific genome is referred to as a sequence. It follows simple rules:
  • Any sub-sequence of a valid sequence is also a valid sequence.
  • Two valid sequences (of the same genome) can be cut at any point in their codon lists, joined together and produce a valid sequence
  • A Null list in *not* a valid sequence (if a Null sequence is required for whatever reason, simply create a sequence with 1 NOP instruction)
  • Once a sequence has been traversed, the program is over
  • A codon must contain at least two elements [Opcode, StackTarget], the only exceptions are NOP and INT (which do not affect the stack registers)
  • Input and ERC codons must contain at least three elements [Opcode, StackTarget, Value]
Within the confines of the Prion library, the sequence is coded as a list of lists [[]]. Here is an example of Prion machine code to calculate: x^2 + xy + y^2
It has been run through a decoder for ease of reading. In this case we will assume that we are running a single stack register in out virtual processor and that the values for x and y are stored at input positions 0 and 1 respectively and our inputs in this particular case are [2, 5]. I have added comments inline to reflect what the stack is looking at internally.
INP  [0, 0] # Push x on stack | [0]              -> [2, 0]
INP  [0, 0] # Push x on stack | [2, 0]           -> [2, 2, 0]
MULT [0]    # Multiply stack  | [2, 2, 0]        -> [4, 0]
INP  [0, 0] # Push x on stack | [4, 0]           -> [2, 4, 0]
INP  [0, 1] # Push y on stack | [2, 4, 0]        -> [5, 2, 4, 0]
MULT [0]    # Multiply stack  | [5, 2, 4, 0]     -> [10, 4, 0]
INP  [0, 1] # Push y on stack | [10, 4, 0]       -> [5, 10, 4, 0]
INP  [0, 1] # Push y on stack | [5, 10, 4, 0]    -> [5, 5, 10, 4, 0]
MULT [0]    # Multiply stack  | [5, 5, 10, 4, 0] -> [25, 10, 4, 0]
ADD  [0]    # Add stack       | [25, 10, 4, 0]   -> [35, 4, 0]
ADD  [0]    # Add stack       | [35, 4, 0]       -> [39, 0] = 39

Here is the code to evaluate the sequence that corresponds to the decoded program above
#!/usr/local/bin/python

execfile("./prion.py")

# for x^2 + xy + y^2
# Using PRION_OPCODE* variables for convenience
S1 = [[PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_ADD, 0],
      [PRION_OPCODE_ADD, 0]]

Inp = [2, 5]
Gen = Genome(2, 1)
Prc = Processor(Gen)

print(str(S1))
print("Result: " + str(Prc.evaluate(S1, Inp)[0]))
The output of the program above is what you would expect, but I will include it here for reference:
$ ./test.py
[[1, 0, 0], [1, 0, 0], [6, 0], [1, 0, 0], [1, 0, 1], [6, 0], [1, 0, 1], [1, 0, 1], [6, 0], [4, 0], [4, 0]]
Result: 39
$
I will refer to this sequence as S1 (for sequence 1) for the remainder of the examples below. Since some of our examples will require multiple sequences, let;s define another simple equation of two variable (meaning two inputs: [x, y]) and one output. Since this equation has effectively the same genome as S1 it can be evaluated on the same processor. For S2, we will simulate the following equation: x^3 + x^2 + (x * y) and for our concrete example we will utilize the same inputs [2. 5] (assume x is still index 0 and y is still index 1).
INP  [0, 0] # Push x on stack | [0]             -> [2, 0]
INP  [0, 0] # Push x on stack | [2, 0]          -> [2, 2, 0]
INP  [0, 0] # Push x on stack | [2, 2, 0]       -> [2, 2, 2, 0]
MULT [0]    # Multiply stack  | [2, 2, 2, 0]    -> [4, 2, 0]
MULT [0]    # Multiply stack  | [4, 2, 0]       -> [8, 0]
INP  [0, 0] # Push x on stack | [8, 0]          -> [2, 8, 0]
INP  [0, 0] # Push x on stack | [2, 8, 0]       -> [2, 2, 8, 0]
MULT [0]    # Multiply stack  | [2, 2, 8, 0]    -> [4, 8, 0]
INP  [0, 0] # Push x on stack | [4, 8, 0]       -> [2, 4, 8, 0]
INP  [0, 1] # Push y on stack | [2, 4, 8, 0]    -> [5, 2, 4, 8, 0]
MULT [0]    # Multiply stack  | [5, 2, 4, 8, 0] -> [10, 4, 8, 0]
ADD  [0]    # Add stack       | [10, 4, 8, 0]   -> [14, 8, 0]
ADD  [0]    # Add stack       | [14, 8, 0]      -> [22, 0] = 22
Here is the code to evaluate the sequence that corresponds to the decode program above
#!/usr/local/bin/python

execfile("./prion.py")

# for x^3 + x^2 + xy
# Using PRION_OPCODE* variables for convenience
S2 = [[PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_ADD, 0],
      [PRION_OPCODE_ADD, 0]]

Inp = [2, 5]
Gen = Genome(2, 1)
Prc = Processor(Gen)

print(str(S2))
print("Result: " + str(Prc.evaluate(S2, Inp)[0]))

The result is, as expected: 22

Now given the nature of what we have defined to be a sequence in Prion, it stands to reason that any sub-sequence of codons is a sequence we can assume that these programs could be sub-sequences of larger programs. And since it does not mater to the next codon what a previous codon has done, we could just as easily concatenate these sequences and it would still create a valid Prion program. When we concatenate them (I'll stop torturing you with code and keep it in the file) and run against the same inputs we receive the value of 22 (The value of S2 which is now at the "tail" end of the sequence. If you flip them around you should get the answer to S1. This is because these programs are intelligently designed, they have no introns (this will not be the case with randomly generated sequences). The processor stops evaluating when the sequence ends. Since these sequence have been designed, they will finish up with a valid answer every time (they know what they are solving after all), however the underlying initial sequence solution will be on the next layer below on the register stack(s) in the processor.

If we take a single-point crossover (let's say at cut-point 6) and took the first part of S1 and the second part of S2 would could theoretically generate a program that "solves" a completely different problem, This is a simple single-point crossover operation. The result of this crossover yields a sequence which for the same inputs evaluates to a final answer of 34. Definitely something has changed. The primary responsibility for that genetic operation will fall on the Splicer class, but before we go there, let's talk about the object that will be generating our sequences in practice, the Sequencer.

Regarding the basic functionality of the sequencer is fairly straightforward. It is mostly a convenience class you can very easily create your own sequence simply by calling the codon() function contained with the Genome that you are currently utilizing. Here is a random sequence of 64 codons generated with the Genome and a simple loop (I use the same genome as above, which by default when asked for an ERC it will give 1 or 0).

Using the same genome and (x, y), I received a result of 1 for the following randomly generated sequence of 64 codons below:
[[2, 0, 1], [3, 0], [2, 0, 1], [2, 0, 1], [2, 0, 0], [3, 0], [6, 0], 
[2, 0, 1], [3, 0], [2, 0, 0], [8, 0], [2, 0, 0], [5, 0], [2, 0, 1], 
[2, 0, 0], [7, 0], [2, 0, 0], [7, 0], [2, 0, 0], [8, 0], [4, 0], [3, 0], 
[5, 0], [2, 0, 1], [2, 0, 1], [6, 0], [8, 0], [2, 0, 1], [8, 0], [6, 0], 
[2, 0, 0], [2, 0, 1], [4, 0], [2, 0, 1], [2, 0, 1], [2, 0, 1], [5, 0], 
[2, 0, 0], [2, 0, 0], [8, 0], [2, 0, 1], [2, 0, 0], [8, 0], [2, 0, 1], 
[2, 0, 0], [5, 0], [2, 0, 1], [2, 0, 1], [4, 0], [2, 0, 0], [2, 0, 0], 
[3, 0], [2, 0, 0], [2, 0, 0], [5, 0], [7, 0], [2, 0, 0], [3, 0], 
[2, 0, 0], [2, 0, 1], [2, 0, 1], [5, 0], [2, 0, 0], [2, 0, 1]]

Now an astute reader might notice why this is fairly quickly, Since we are not setting ERC bounds, the default genome will return either 1 or zero when an ERC is requested. Also we will notice that the final instruction is a PUSH of an ERC value (1 was chosen randomly in this case) unto the solution stack.
It should be apparent that any sequence with one output can end with the codon [2, 0, 1] and the overall return will always be a 1. This isn't necessarily a design flaw of the system, just a consequence of this particular random assortment of codons generated by equal chance in the genome, Optimizing generation of programs can be troublesome (better off evolving new ones than blindly trying to pre-optimize random ones), but for the sake of the sequencer overview we will attempt to change the default behaviour a bit.

The Sequencer (Class):

The Sequencer is the class that is tasked with creating valid linear sequences for processing by the the selected Processor class in the Prion library. By default, its operation is no more complicated than what is contained in the copde sample file associted with this page:
#!/usr/local/bin/python

execfile("prion.py")

L   = 64
Gen = Genome(2, 1)
S4  = [Gen.codon() for x in range(L)]

The real difference between this naive approach and utilizing the Sequencer class is that the Sequencer class is somewhat tunable with respect to the weights of the types of codons it will request from the genome. Given that there are three basic types (recall Input, ERC and Function), you can pass the Sequencer class a vector of 3 integers to help it make choices that may be better weighted to your particular problem class. Let's say as a for instance you have good reason to believe that your answer can be found purely through the manipulation of inputs. It would be fairly trivial to instruct the base Sequencer class to favor inputs and functions while ignoring ERCs altogether. This is accomplished by explicitly setting the "weights" parameter in the __init__ function of the Sequencer (by default set to [1, 1, 1] to give an equal chance to any of the three).
To set your own weights, you can pass your own vector of values. These values will be used as integers when fill the "wheel" of the Sequencer. This wheel is spun every time a codon request needs to be made. The number passed in the corresponding index of the vector indicates how many entries you wish to put on the wheel for each type [ERC, INP, FUNC] in that order. So we would expect that code such as the following would generate, on average, no ERCs and roughly 4 times as many function codons as input codons:
#!/usr/local/bin/python

execfile("prion.py")


Gen = Genome(2, 1)
W   = [0, 1, 4]
Seq = Sequencer(Gen, weights=W)

S5 = Seq.sequence(100)

# How many ERC codons (should be none)
erc_count = 0
for codon in S5:
        if codon[0] == PRION_OPCODE_PUSH:
                erc_count += 1
# How many INP codons
inp_count = 0
for codon in S5:
        if codon[0] == PRION_OPCODE_INP:
                inp_count += 1
print(str(S5))
print("weights   = " + str(W))
print("erc_count = " + str(erc_count))
print("inp_count = " + str(inp_count))
print("s_length  = " + str(len(S5)))

It will be roughly as expected when the program is run. Although you will not be guaranteed exactly 4 times the functions (the selection is ultimate1y random after all). You should find that at the very least there will be no ERC operations, and more often than not, there will be a significantly higher number of function codons than inputs
$ ./test.py
[[4, 0], [1, 0, 0], [1, 0, 0], [5, 0], [1, 0, 0], [5, 0], [5, 0], [1, 0, 0], [1, 0, 0], [7, 0], [4, 0], [3, 0], 
[7, 0], [6, 0], [1, 0, 0], [4, 0], [1, 0, 0], [3, 0], [1, 0, 0], [8, 0], [5, 0], [1, 0, 0], [4, 0], [5, 0], 
[1, 0, 0], [8, 0], [1, 0, 1], [5, 0], [1, 0, 1], [5, 0], [8, 0], [6, 0], [1, 0, 0], [4, 0], [3, 0], 
[1, 0, 1], [6, 0], [6, 0], [5, 0], [1, 0, 0], [3, 0], [6, 0], [7, 0], [3, 0], [4, 0], [8, 0], [7, 0], [7, 0], 
[6, 0], [8, 0], [4, 0], [4, 0], [5, 0], [4, 0], [6, 0], [5, 0], [7, 0], [1, 0, 0], [4, 0], [7, 0], [8, 0], 
[7, 0], [7, 0], [8, 0], [1, 0, 1], [1, 0, 0], [1, 0, 1], [4, 0], [1, 0, 1], [5, 0], [4, 0], [6, 0], [3, 0], [4, 0], 
[5, 0], [8, 0], [8, 0], [4, 0], [1, 0, 1], [1, 0, 1], [8, 0], [3, 0], [1, 0, 0], [8, 0], [5, 0], [4, 0], [4, 0], 
[6, 0], [6, 0], [1, 0, 1], [6, 0], [3, 0], [3, 0], [1, 0, 0], [3, 0], [4, 0], [1, 0, 1], [4, 0], [4, 0], [1, 0, 1]]
weights   = [0, 1, 4]
erc_count = 0
inp_count = 27
s_length  = 100
$


It is here that I must offer a healthly word of caution. Do not assume that you know in advance what the sequences should or should not contain. In most cases, all of the initially generated sequences will be incredibly awful at solving the problems they are given. Don't rely on the sequencer to generate even remotely useful programs. It is simply a point at which we can begin evolution. You cannot generally know in advance what types of codons will be required in your ultimate solution (except for the most trivial problems, which you would not expend the power required to utilize GP techniques to solve in the first place).

The Splicer (Class) and the Mutator (Class):

Note: As these are both considered types of reproduction operations (From a biological perspective, crossover is sexual and mutation in asexual), I'm covering them both in the same section
Although our initial list of genome sourced sequences will likely be inadequate to solve the underlying problem presented by our data, they will not all be equally terrible. Some will be fitter than others. The basic approach we will use to get better sequences will involve analogs of biological processes. Since all sequences from compatible genomes (they are the same "species") can be cut and recombined to form other valid sequences, we could take a cross section of "not so terrible" sequences and either recombine of mutate them into sequences that may more closely approximate a true solution.

One of the operations that can be used for this is called crossover. This is represented by the Splicer class in the Prion framework. At is core, the base Splicer will take two sequences (preferably of the length, but it can accommodate sequences of different length if needed), pick a "cut-point" within the bounds of both, and generate a new sequence which is a hybrid of the original two that were passed to it. It practice the code will look like this:
#!/usr/local/bin/python

execfile("./prion.py")

Gen = Genome(2, 1)
Seq = Sequencer(Gen)
Spl = Splicer(Gen)

S1  = Seq.sequence(50)
S2  = Seq.sequence(50)

S3  = Spl.splice(S1, S2)
After this executes, two random 50 codon sequences will be generated (S1 and S2), and a third (S3) which should begin with the codons of S1 and end with the codons of S2. The cut-point is chosen at random. This is a basic single-point crossover operation.

The Mutator class is different. The Mutator will take a single sequence and based on it's internal mutation rate (p), it will iterate through the sequence and replace codons if it gets a return of less than p on a random roll or 100. In most models, you will want the mutation rate to be very low (the default Mutator will mutate only 5% of the codons it encounters on average). However, for some models (i.e. lambda-based), you may wish to set it higher. This can be done by setting p to a different value [1, 100] when the mutate function is called as such:
#!/usr/local/bin/python

execfile("./prion.py")

Gen = Genome(2, 1)
Seq = Sequencer(Gen)
Mut = Mutator(Gen)

S1  = Seq.sequence(500)
S2  = Mut.mutate(S1, p=50) # will mutate 50% on average 

The Decoder (Helper Class):

The Decoder will take a valid Prion sequence and render it in a slightly more human-readable format. The output of the decoder class is similar to the decoded examples listed above, albeit without the comments and stack statuses. So using our x^2 + xy + y^2 example above:
  
#!/usr/local/bin/python

execfile("./prion.py")

# for x^2 + xy + y^2
# Using PRION_OPCODE* variables for convenience
S1 = [[PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 0],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_INP, 0, 1],
      [PRION_OPCODE_MULT, 0],
      [PRION_OPCODE_ADD, 0],
      [PRION_OPCODE_ADD, 0]]

Gen = Genome(2, 1)
Dec = Decoder(Gen)
print(Dec.decode(S1))
Our decoder would yield the following
$ ./test.py
INP     [0, 0]
INP     [0, 0]
MULT    [0]
INP     [0, 0]
INP     [0, 1]
MULT    [0]
INP     [0, 1]
INP     [0, 1]
MULT    [0]
ADD     [0]
ADD     [0]

$

The Fitness (Helper Class):

the Fitness clas is listed as a helper in this sense because it is impossible to know in advance what fitness actually means to the user of the Prion library. It performs a Euclidian distance calculation between two array arguments of the same size and normalizes the result so that it is a numerical value betwee (0.0, 1.0].
A value of 1.0 indicates a perfect match. Values closer to 1.0 than other values can be considered "fitter" in the broadest sense (although this may not be true in all contexts). Here is an example (please keep in mind that there are many more ways of measuring distance than Euclidian):
#!/usr/local/bin/python

execfile("./prion.py")

Fit = Fitness()

Ideal = [2, 5]
print("Fitness of [0, 0] against Ideal")
print(Fit.fitness([0, 0], Ideal))
print("Fitness of [2, 5] against Ideal")
print(Fit.fitness([2, 5], Ideal))
With the output of:
$ ./test.py
Fitness of [0, 0] against Ideal
0.0333333333333
Fitness of [2, 5] against Ideal
1.0
$