|
The Strategy Behind Fitness |
| Research > Genetic Algorithms > Articles > Strategy Behind Fitness |
One of the beauties of the Genetic Algorithm is that the general algorithm is the same regardless of its application. The only piece of the GA that is tailored for the given problem is the fitness function. But since the fitness function is the only connection to the given problem, the programmer must be quite careful in designing the function so that the GA can optimize it correctly. Also, so far I have only discussed the Genetic Algorithm itself without going into any examples. In this article, I will talk about some theoretical problems that you could use a Genetic Algorithm to solve, and how one could design good fitness function for them. Hopefully things start to fit together for you once you see some concrete examples. Anatomy of a Genome and Its Fitness There are two main parts of a fitness function: decoding and evaluation. How you choose to translate your genome into one particular solution to your problem is called the encoding scheme. The part where you take your translated solution and measure how "good" it is we'll call the evaluation process. When you're done evaluating, you give the genome a number called it's fitness. Exemplar Gratis: A Multi-Variable Equation
x=10.100111111001110111 (2.6235 in base 10) It's easy enough to represent this genome as an array of bits in your program, as long as you keep track of where the decimal point is and where the x values stop and the y values start. Both the decoding and the evaluation process for this kind of genome are simple - just convert the values to base 10 and plug them into the f(x,y) and return the value for the fitness. Optimizing Evaluation Alternative: Marker-Based Encoding This is more of a cutting-edge technique for encoding genomes. Let me introduce the idea using an example from mathematics: graphs. A graph is a structure with vertices and edges (ie points with connections). Here is an example of what a simple graph might look like:
Suppose your job is to make a graph. You don't want to specify beforehand how many vertices, how many edges, and how those vertices are connected - all of that needs to be randomly generated in the genome. So how do we code up a graph into a list of integers? Consider this scheme:
To decode this genome, you need to pass through it twice. The first pass, look for start and stop markers and count how many vertices you have (call it n). Name each vertex starting at 0. The second pass is where you fill in each edge. Take the edges and evaluate it mod n and put an edge there. Take a look at how this random sequence of integers might be decoded. Note that n=3. The second line are the names of each vertex. 34 49 55
23 81 44 92
66 19 02 99
13 23 11 82
88 82 15
So when you evaluate each number mod 3, you get this: 19 mod 3 = 1 82 mod 3 = 1 In the end, it codes for this graph:
Now it's important to choose the length of your genomes and frequency of start/stop markers carefully. If you have a long genome with frequent starts and stops, then expect to have lots of vertices, for example. It has been shown that this encoding scheme works reasonably well for coding things like neural networks. The idea came from looking at how real genomes work: they have start and stop codons within the mRNA which function the same way. The advantages to this approach are also that you can have entire sections of unused genome, so it allows for a lot of flexibility. One mutation or mating can create or remove a whole new vertex, as well as shift how the whole structure works. Conclusion |
| Page last updated: May 14, 2006 |


