Population Dynamics
Research > Genetic Algorithms > Articles > Population Dynamics

The past few articles have discussed the crucial elements involved in a Genetic Algorithm. This article talks about some features that are optional for GAs, but they are often employed today. Most GAs that I have done have not used species or subpopulations because I have not needed them, but they are certainly quite help in a practical setting. Speciation and Subpopulations are great for employing creativity in order to find multiple solutions to a problem, as well as refining the search space of a given problem.

Search Space Revisited
I mentioned search space before, and I would like to revisit its technical meaning to lead into what subpopulations are good for. The search space is the total number of possible genomes you could have in a population. Since the population is "searching" for the best solution, the amount of genomes that a population could go through without finding the right one is can be seen as looking through a space of genomes. This is usually easy to measure when your encoding scheme is with binary digits because then just take the number of bits you use for each genome and raise 2 to that power.

Given that the calculation involves raising exponents, you can see that search space is typically enormous. Don't let that worry you. This is one of beauties of Genetic Algorithms: they can take enormous amounts of search space and still find something that would take us humans years to find. Sometimes, however, the search space is simply too big. The key reason for estimating the search space is to be able to push the GA's ability as far as it can go.

I first stumbled upon search space issues when I was doing a project for my Computer Science class called ASGA: A Simple Genetic Algorithm. The problem that the algorithm was simple: guessing a word where each genome was a string of letters. I found that the algorithm wouldn't converge almost at all when the word was fifteen letters or more, but if there were fourteen letters or less, the whole population would be done in seconds. Now think about the search space for that problem. Every letter you add to the genome multiplies the search space by 26 (since there are 26 letters of the alphabet). What I found fascinating was that, for a 70-genome population, the breaking point was 14 to the 26th power. Why is that?


Both are huge search spaces, what is the breaking point?

Let's look at search space in terms of the two-variable optimization problem from the fitness article. Recall that we are looking for (one of) the highest value in a a given domain of a very complicated function. The graph of the function is as follows.


Graph of

Here, each genome is 40 binary digits long which encodes for two numbers between 0 and 3. Estimating the search space for this problem is simple: 2 to the 40th power.

It's pretty easy to see that the more bits you add to the genome, the more precise answer you're looking for. So there are two ways you can change the search space in this genome. You can change the number of bits to get a more or less precise answer, or you can limit the overall domain (maybe only look for an answer between 2 and 3). As a combination of the two ideas, you can first make your genomes not very precise, but over a large domain. When you come up with an answer, you can limit the domain, but make them more precise. That way, the search space to the algorithm never changes, but you can progressively zoom into a good solution.


Limiting the search space by refining the domain.
Left: the same graph as above with domain 0 to 3.
Right: same graph only zoomed in with domain 1.5 to 2.25
Notice how it's much easier to find the highest point on the zoomed in graph


Speciation and Subpopulations
Sometimes when you know that you'll have many different correct solutions in your population and you don't want them to cancel each other out, you group them together into "species" so that only similar genomes mate with each other. Another approach to this is to take multiple populations and evolve them separately so that they come up with different "ideas", then merge them together and see which idea ends up the best. Let's go into the standard ways of doing this.

Speciation
Using the concept of species is another one of those natural ideas which has found its way into how Genetic Algorithms are done these days. In real life, certain kinds of animals find their niche and end up secluding themselves in order to survive. If similar animals from the Sahara began mating with ones from the Arctic, would the children be any better? The offspring probably couldn't survive either environment. On the other hand, wouldn't it make sense for animals who are near each other to mate so that they can gradually migrate toward each other? This concept is known as genetic drift - species eventually migrate from one area to another while adapting along the way mostly by mating but also by mutating.

Speciation can be an important idea to use in a GA if you're trying to keep your population diverse. This is done by taking "similar" genomes and separating them out so that they're likely to mate only within their species. It's sort of like introducing geography into the population. The tough part which requires a keen understanding of your problem is how you define "similar" and "not-similar". Some search spaces don't lend themselves to this; those are usually the spaces which only have one optimal solution. But if you can determine somehow that a set of points "good" for different reasons than another set of points, then you should speciate your population. Take a look at our favorite 2-variable spikey graph.


The six yellow and red dots represent genomes. Look at the location of each species.

Above there are six genomes labelled by their location. Notice that if a yellow and a red genome mated, which is very likely considering they are both very fit, their child might end up somewhere in that valley between them. The population would grow a lot slower because there are two groups that have great ideas, but they're cancelling each other out.

So now a tough question lends itself: how do you determine a species? Sometimes there isn't a niche going on and every genome is pretty evenly spaced. And also remember that comparing each genome to another in some way is already a quadratic operation - a phrase which makes us Computer Scientists cringe! A satisfactory answer, unfortunately, has not quite been found as far as I've seen. I have read a few articles which use this idea and managed to get it to work using some graduate-level statistical analysis techniques, but those only applied to their specfic problem.

Another issue to address is how to implement this. Your selection operator will need to be tailor-made for your speciation, which can be quite hairy if you're using an off-the-shelf library.

Subpopulations
Also derived from species and genetic drift, this is quite a common technique used in Genetic Algorithm research. One of the issues in attacking a problem of gargantuan search space is that sometimes you want to just increase the population size. If your population has hundreds of genomes then it should have hundreds of good ideas, right? Wrong. Increasing the population size only dilutes the good genomes and doesn't make them good enough to propagate in the first few generations. You also want to be able to look at the same search space and possibly come up with different ideas, which means you need small populations.

Since any two initial populations can evolve into different solutions, it would make sense to take multiple populations, evolve them over a number of generations, then merge them together to get a whole new population. This new population will be bursting with creative, refined ideas that can be evaluated against each other to get the best genome. Though I have not done much with subpopulations myself, I have read about their success in the literature.

Really the only extra operator that needs to be hashed out is the Merging Operator. There are a number of ways you can combine two populations, the easiest of which is to take the two and add them together, which doubles the size of the overall population. This is not reccomended for the reason mentioned above where the good genomes get diluted. The other way of merging is similar to the killing operator: intersection. Take only the best fit from both populations. According to the literature, this seems to be the best known approach.

Speciating and subpopulating a genetic algorithm are stimulating ideas that have proven useful in the practical world. There seems no main reason against using these techniques if you have the opportunity to do so. My only reason for not exploring this branch of GAs was that I was getting great results without them and that it would have been an enormous undertaking to implement either of speciation or subpopulations in the end. According to what I have read, it is my opinion that they are probably worthwhile. The only warning that I would have for those actually developing these techniques is to make sure that you have your population working before you speciate and such. In other words, speciation and subpopulations ought not to be relied upon, they only enhance what you already have.



News | Research | Personal Profile | Photo Album
Contact Me

Page last updated: May 24, 2006