This essay is heavily based on the relevant sections in "An Introduction to Genetic Algorithms" by Melanie Mitchell. This essay contains a great deal of mathematics, a decent high-school background of statistics, algebra, and logic is advised.
Schemata and Other TerminologyThe pioneer of genetic algorithms (GA), John Holland, was the first person to fully describe GAs mathematically. GAs are designed to search for, discover, emphasize and breed good solutions (or "building blocks") to a problem in a highly parallel fashion. Holland invented the idea of schemas (or schemata) to formally conceptualize the notion of "building blocks."
Each schema is a template for a bit string of length l. The schemata are comprised of one's and zero's (defining the bits), and asteriks ('*') that act as wild cards within the string. Therefore, the schema:
represents all bit strings that begin with one and have a '0' at the 5th bit position. Examples of bit strings that satisfy this are 111001, 100001, 111100, 111101 etc. These are called instances of the schema. The H in the equation stands for hyperplane. A hyperplane is a plane of various dimensions - the dimension of the plane is equal to l, the number of bits in the string. To define this principle further, take the schema: H = * * 0. Since H is a three-dimensional hyperplane it can be graphed like this:
Other traits of a schema include the defined bits and the defining length. The defined bits are the number of bits defined within the schema - in the above example there are 2 defined bits. The defining length is the distance between the two most outer defined bits - in the above example, the defining length is 4.
Sets and Subsets
ProcessingRemembering that any bitstring is a member of 2l different schemata, look at the example Mitchell uses. The length-2 bitstring '11' is an instance of **, 1*, *1, and 11. Now, in a population of n-strings it is easy to see that you will have anything between 2l and n x 2l different schemata.
It can therefore be said, that while the GA explicitly evaluates n strings during any given generation, it is implicitly evaluating a much larger number of schema. Think of it as half of the population being part of the schema '1 * * ... *' and the other half as '0 * * ... *'. Now, after a given generation has been run, you will have an estimate of the average fitness of those schemata (it will be quite a erroneous estimate, since the population actually evaluated was only a tiny portion of the actual schema). The estimates of these averages of obviously not stored explicitly since the schemata themselves are not explicitly evalulated. Where this idea is of help, is in looking at the increase or decrease of a given schema in a population - because it can be described as though the GA were storing these implicitly calculated values.
The Dynamics of a Schema
Now, the expect value of number of instances at t+1 is equivalent to the number of offstring produced. Therefore, in order to calculate further we need to clarify what form of selection process we will use. Assume that we use a fitness-proportionate selection method (this method is used in the GA example), therefore the expected number of offstring is equal to the fitness of the string over the average fitness of the generation or:
So, assuming that xÎH (x is a subset, or instance, of H), we have the equation:
Now, using the definition of û(H,t) we come up with the final equation:
There we have our basic formula for tracking the dynamics of a schema in a genetic algorithm. Genetic algorithms are this simple, though, since they also simulate the effects of mutation and crossovers. These two effects have both constructive and destructive effects - but for the moment, concentrate only on the destructive effects.
Compensating for Destructive Effects
Given Sc(H) as being the probability of schema H surviving a single-point crossover, d(H) as the defining length, and l as the length of the bit string you can create the equation:
A breakdown of this equation is necessary. The d(H)/(l-1) is equal to the fraction of the string that the schema occupies. For example, if we have a defining length of 3 in a string of length 5, then 75% of the string is part of schema H. Now, we multiply that by the probability that they survive, pc. This gives us an upper-bound that the schema will be destroyed. We want the lower-bound, though, so the upper bound is subtracted from 1 (remember cumulative probabilities equal 1).
Now that crossover has been addressed, we must look at mutation. Let pm be the probability that a bit will be mutated, and that the function Sm(H) is the probability that the schema H will survive the mutation. In order to survive the mutation, the schema must remain intact, therefore Sm(H) must be equal to the probability that a bit will NOT be mutated (1-pm) raised to the power of the number of defined bits in the schema, or:
where o(H) is the number of defined bits (or order) of the schema. From these two calculations you can see that the shorter and lower order the schema, the greater chance it has to survive. By applying these effects with equation 1.1, you get the Schema Theorem first derived by Holland:
Hopefully, you now fully understand the mathematics behind genetic algorithms. For an analysis of what equation 1.2 implies, see the section in Melanie Mitchell's book An Introduction to Genetic Algorithms. Note that this equation only deals with the destructive effects of genetic algorithms, in fact crossover is regarded to be one of the chief areas of the GA's power, and mutation is often seen as an insurance policy to prevent a loss of diversity. It is the characteristic of the GA to be able to implicitly evalulate many solutions that makes it so powerful.
Mathematical ModelsThe Schema Theorem only shows how schemas dynamically change, and how short, low-order schemas whose fitness remain above the average mean receive exponentially growing increases in the number of samples. It cannot make more direct predictions about the population composition, distribution of fitness and other statistics more directly related to the genetic algorithm itself.
We are now going to look at the simpliest mathematical model for a GA as derived by Michael Vose and Gunar Liepins. This is a model of a genetic algorithm like the one used in the sample. The algorithm is as follows:
Now that p(t) and s(t) can be defined in terms of each other, we must find a operator G that acts like a genetic algorithm. Meaning that is G was applied to s(t), it would mimic the effects of running a genetic algorithm through one generation:
Now, using the idea that the expected value of the the population at t+1 is equal to the probability of the strings mating. Also, let's define x ~ y as 'x differs from y by a scalar factor'. We now have:
The last equation is the type of relationship we are looking for - yet due to statistical problems, and the lack of crossover and mutation in the equation we have to complicate matters further (a lot further!). Plus, we still haven't figured G into the equation. The way Vose and Liepens continued was by defining G as the composition of the fitness matrix F and a recombination operator (we'll call this ß). ß mimic both crossover and mutation, thus the term 'recombination.'
The question now is how to define ß. Vose and Liepens defined it by finding ri,j(k), the probability that string k will be produced by recombination between string i and string j. Therefore, once ri,j(k) is known we can see that:
This equation shows that the expected value of the proportion of string k in the next generation is equal to the "probability that it will be produced by each given pair of parents, times by those parents' probabilities of being selected, summed over all possible pairs of parents."
We still have to define both ri,j(k) and ß, though. Vose and Liepens firstly defined a simplier matrix M whose elements at i,j gave the probability ri,j(0) that string 0 (all bits 0) would be produced through recombination of i and j.
ri,j(0) is equal to the sum of the probability that crossover does not occur between i and j and the selected offspring is mutated to all zeros, and the probability that crossover does occur and the selected offspring is mutated to all zeros. So, the probability that crossover occurs between i and j is pc (thus the probability is doesn't is 1-pc). Same goes with mutation, pm it does occur, 1-pm is doesn't. Let's define |i| as the number of ones in the bit-string i of length l. Therefore, the probability that i will be mutated to all zeros is:
that is, the probability that all of the |i| ones are mutated multiplied by the probability that none of that (1-|i|) zeros are mutated. With this in mind, the first term in the expression for ri,j(0) equates to:
|j| is the same as |i|, only defined for string j. Note that 1/2 is included since only one of the children survive the mating process. Now, for the second term let h and k denote the two offspring produced from a single-point crossover at point c (counted from right-hand side, like binary). There are, therefore, l-1 crossover points, each having the probability 1/(l-1) of being chosen. Therefore, the second term can be written as:
All that is missing now is a definition of h and k. So let i1 be the substring of i consisting of l-c bits to the left of c, and i2 to the substring of i to the right of c. Therefore,|h|=|i|-|i2|+|j2|, and |k|=|j|-|j2|+|i2|. Now, using a bit of fancy notation adjustment, you can simplify the equation down a little:
Now, we can finally define ri,j(0). To simplify the equation, lets define h as pm/(1-pm). After a little of algebraic manipulation we get the final equation:
We know finally have the ri,j, we need to use this to come up with ß and finally G. By use of logical operators and permutations and a lot of manipulation you can express ß in terms of M.
Defining |v| as the sum of its elements, we can define Gp:
Now, although G and Gp act on different representations of the population, they can be translated into one and other through simple transformation.
Finally, we are at the end of this long essay! Hopefully you've got a much better understanding of the theoritcal and mathematical theory behind genetic algorithms. Note that the math we looked at only applied to the simpliest of genetic algorithms (and one with an infinite population), more complicated models and explanations much better than my own can be found in Melanie Mitchell's book An Introduction to Genetic Algorithms.