Generation 5: Theory & Research


Genetic Algorithms - Sam Hsiung

Contribution concerning Genetic Algorithms (this is just a test sample) [hsiung@cs.uchicago.edu] submitted on August 29, 1998 at 22:39:10:

Introduction to Genetic Algorithms

Symbolic AI vs Genetic Algorithms

Most symbolic AI systems are very static. Most of them can usually only solve one given specific problem, since their architecture was designed for whatever that specific problem was in the first place. Thus, if the given problem were somehow to be changed, these systems could have a hard time adapting to them, since the algorithm that would originally arrive to the solution may be either incorrect or less efficient. Genetic algorithms (or GA) were created to combat these problems. They are basically algorithms based on natural biological evolution. The architecture of systems that implement genetic algorithms (or GA) are more able to adapt to a wide range of problems. A GA functions by generating a large set of possible solutions to a given problem. It then evaluates each of those solutions, and decides on a "fitness level" (you may recall the phrase: "survival of the fitness") for each solution set. These solutions then breed new solutions. The parent solutions that were more "fit" are more likely to reproduce, while those that were less "fit" are more unlikely to do so. In essence, solutions are evolved over time.

General Algorithm for Genetic Algorithms

Create a Random Initial State

An initial population is created from a random selection of solutions (which are analagous to chromosomes). This is unlike the situation for Symbolic AI systems, where the initial state in a problem is already given instead.

Evaluate Fitness

A value for fitness is assigned to each solution (chromosome) depending on how close it actually is to solving the problem (thus arriving to the answer of the desired problem). (These "solutions" are not to be confused with "answers" to the problem, think of them as possible characteristics that the system would employ in order to reach the answer.)

Reproduce (& Children Mutate)

Those chromosomes with a higher fitness value are more likely to reproduce offspring (which can mutate after reproduction). The offspring is a product of the father and mother, whose composition consists of a combination of genes from them (this process is known as "crossing over".

Next Generation

If the new generation contains a solution that produces an output that is close enough or equal to the desired answer then the problem has been solved. If this is not the case, then the new generation will go through the same process as their parents did. This will continue until a solution is reached.


A GA Example

Let us consider a diophantine (only integer solutions) equation: a+2b+3c+4d=30, where a,b,c,d are positive integers. Using a genetic algorithm, all that is needed is a little time to reach a solution (a,b,c,d). Of course you could ask, why not just use a brute force method (plug in every possible value for a,b,c,d given the constraints 1 =< a,b,c,d =< 30)? The architecture of GA systems allow for a solution to be reached quicker since "better" solutions have a better chance of surviving and procreating, as opposed to randomly throwing out solutions and seeing which ones work.

Let's start from the beginning. First we will choose 5 random initial solution sets, with constraints 1 =< a,b,c,d =< 30. (Note that we can choose smaller constraints for b,c,d, but for the sake of simplicity we shall use 30)


Chromosome
(a,b,c,d)
1
(1,28,15,3)
2
(14,9,2,4)
3
(13,5,7,3)
4
(23,8,16,19)
5
(9,13,5,2)

Table 1: 1st Generation Chromosomes and Their Contents



To calculate the fitness values, plug each solution set into the expression a+2b+3c+4d. Then, calculate the absolute value of the difference of each expression with 30, this will be our fitness value.


Chromosome
Fitness Value
1
|114-30|=84
2
|54-30|=24
3
|58-30|=28
4
|163-30|=133
5
|58-30|=28

Table 2: Fitness Values of 1st Generation Chromosomes (Solution sets)



Since values that are lower are closer to the desired answer (30), these values are more desirable. In this case, higher fitness values are not desirable, while lower ones are. In order to create a system where chromosomes with more desirable fitness values are more likely to be chosen as parents, we must first calculate the percentages that each chromosome has of being picked. One solution is to take the sum of the multiplicative inverses of the fitness values (0.135266), and calculate the percentages from there. (Note: ALL simulations were created using a random number generator)


Chromosome
Likelihood
1
(1/84)/0.135266 = 8.80%
2
(1/24)/0.135266 = 30.8%
3
(1/28)/0.135266 = 26.4%
4
(1/133)/0.135266 = 5.56%
5
(1/28)/0.135266 = 26.4%

Table 3: Parent Selection by Percentages



In order to pick our 5 pairs of parents (each of which will have 1 offspring, and thus 5 new solutions sets total), imagine that we had a 10000 sided die, and on 880 of those sides, chromosome 1 was labeled, and on 3080 of those sides, chromosome 2 was labeled, and on 2640 of those sides, chromosome 3 was labeled, and on 556 of those sides, chromosome 4 was labeled, and on 2640 of those sides, chromosome 5 was labeled. To choose our first pair we roll the die twice, and take those chromosomes to be our first two parents. Continuing in this fashion we have the following parents:


Father Chromosome
Mother Chromosome
3
1
5
2
3
5
2
5
5
3

Table 4: Simulated Selection of Parents



The offspring of each of these parents contains the genetic information of both father and mother. How this can be determined is very arbitrary. However for this case, we could use something called a "cross-over". Let us say a mother has the solution set a1,b1,c1,d1, and a father has the solution set a2,b2,c2,d2, then there can be six possible cross overs (| = dividing line):


Father Chromosome
Mother Chromosome
Offspring Chromosome
a1 | b1,c1,d1
a2 | b2,c2,d2
a1,b2,c2,d2 or a2,b1,c1,d1
a1,b1 | c1,d1
a2,b2 | c2,d2
a1,b1,c2,d2 or a2,b2,c1,d1
a1,b1,c1 | d1
a2,b2,c2 | d2
a1,b1,c1,d2 or a2,b2,c2,d1

Table 5: Generalization of Cross-overs Between Parents



There are many other ways in which parents can trade genetic information to create an offspring, crossing over is just one way. Where the dividing line would be located is completely arbitrary, and so is whether or not the father or mother will contribute to the left or right of the dividing line. Now let's apply this to our offspring:


Father Chromosome
Mother Chromosome
Offspring Chromosome
(13 | 5,7,3)
(1 | 28,15,3)
(13,28,15,3)
(9,13 | 5,2)
(14,9 | 2,4)
(9,13,2,4)
(13,5,7 | 3)
(9,13,5 | 2)
(13,5,7,2)
(14 | 9,2,4)
(9 | 13,5,2)
(14,13,5,2)
(13,5 | 7, 3)
(9,13 | 5, 2)
(13,5,5,2)

Table 6: Simulated Crossovers from Parent Chromosomes



Now we can calculate the fitness values for the new generation of offsprings.


Offspring Chromosome
Fitness Value
(13,28,15,3)
|126-30|=96
(9,13,2,4)
|57-30|=27
(13,5,7,2)
|57-30|=22
(14,13,5,2)
|63-30|=33
(13,5,5,2)
|46-30|=16

Table 7: Fitness Values of Offspring Chromosomes



The average fitness value for the offspring chromosomes were 38.8, while the average fitness value for the parent chromosomes were 59.4. Of course, the next generation (the offspring) are supposed to mutate, that is, for example we can change one of the values in the ordered quadruple of each chromosome to some random integer between 1 and 30. Progressing at this rate, one chromosome should eventually reach a fitness level of 0 eventually, that is when a solution is found. If you tried and simulated this yourself, depending on the mutations you may actually get a fitness average that is higher, but on the long-run, the fitness levels will decrease. For systems where the population is larger (say 50, instead of 5), the fitness levels should more steadily and stabily approach the desired level (0).



[ Generation 5: Theory & Research ]