Galapagos
A Genetic Algorithm Framework in Java
Home

...by Andy Meneely
in his Senior Project at Calvin College.

Development Updates

04/27

  • Taking Remy's comments and a few of my own critiques, I've done yet another revision of the tutorials.
  • I decided to get rid of the section where they navigate through a custom Java driver. Why do they need to learn about a custom driver? That section has actually moved to Tutorial 3, which the students won't have time to do. But it fits really nicely into Lab 3, in case you were worried.
  • I put a band-aid solution on poor code in the framework with the SurvivalOperator class. A larger refactoring would be nice, but I didn't want to break everything two days before the big lab.
  • Hope the lab goes well....

04/13

  • The last few weeks can be summed up in rewriting Tutorials 1 and 2 for my expert test with Remy. I like the format of these labs. Lab 2 was totally rewritten with a whole new example and more hand-holding. I'm convinced this is good, though.
  • Tomorrow I'm testing this all out with Remy.
  • I made a bunch of edits to the framework as well - all of which are committed and released.

03/23

  • I worked primarily on the examples today
  • The IntGenome example was changed significantly, and much commenting was done to help out the process.
  • I've also been working on getting those reporting mechanisms better. It's still console output, but it's much better this time.
  • In the writing, I've tried to make things more clear. I've separated out the Eclipse stuff into sidebars (which I love - THANK YOU FRENS!)
  • Todo
    • Change the other examples to be more readable
    • Apply many of these changes to the other labs.
  • The expectation for this lab are these
      • The average student will finish the first lab
      • The great students will finish the second lab

03/22

  • I got rid of the "Dummy Genome" everywhere. The whole concept seemed a little smelly to begin with, anyway.
  • I changed the GenomeFactory interface to only one method, createGenome(List<Integer> data), to alleviate the confusion. The createRandomGenome method will be implemented by just putting a random list of integers into the createGenome method. This change will hopefully make things a lot less confusing, and will cut down on that evil duplicate code.
  • Actually, I lied... the GenomeFactory interface must also support the method genomeSize() so that the framework can know how many integers to generate.
  • Moved MersenneTwister into the framework to cut down on the JAR files being used. I think that's legal...
  • Made the text output for the Runner to be a little more descriptive.
  • TODO
    • Change around the examples to make them a bit more user-friendly
    • Come up with some airtight examples and exercises.

03/20

  • I changed the parameters to the Mutate operators to be a little more specific than "intensity".
  • Todo
    • Rewrite and revise the tutorials in a very massive way
    • Clean up the examples, making them more accessible
    • Write an introduction for the tutorials so that the users will know more of what to expect

02/23

  • I've been up to the wee hours of the morning writing a bazillion tutorials and other documentation. I recommend you take a glance at it.

02/19

  • Finished the Operators Sheet.
  • Tweaked some Javadoc.
  • Set up the overall structure for the tutorials, and examples

02/18

  • Lots of smaller changes to the documentation: navigation bar, more work on the Operators document, officially posted the date for the pilot lab, etc.
  • Wrote the transitivity thing.
  • To do this weekend
    • Finish Operators sheet
    • Finish Examples Guide
    • Finish the Getting Started and First Algorithm tutorials.
  • To do this week
    • Write all the tutorials
    • Finish off the examples already in place
    • Write a few new examples

02/15

  • Did tons of documentation work. I'm getting there....

02/14

  • Made some small changes to the code, mostly just testing and polishing
  • Worked a lot on documentation, but there's a lot more work to be done there

02/10

  • Finished the javadoc, now working on documentation.

01/30

  • Javadoc'd every class within the galapagos.framework package - more to come.

01/14

  • Wrote another utility method in SuperBean called missingProperties() which returns a list of all properties which are null. Used in error checking for GalapagosRunner
  • Fixed a lot of things.
  • Finished List
    • Runner is working, though some testing will surely find some bugs
    • Integration Testing in terms of JUnit is fine, but I prefer some human testing at this point in terms of integration
  • Revised List of "Tasks Ahead of Me"
    • Documentation
      • JavaDoc for everything
      • Parameters and Properties Quick Reference
      • Tutorial
    • Testing throughout the code which I may have missed.
    • Begin writing Ant code for the buildfile - issues have already arisen regarding that.
  • "Back Burner"
    • XML Schema - with the missingProperties() method, we can check for missing properties easier than to enforce a schema. If I see a very, very good reason to use a schema instead (other than slightly more descriptive errors).
  • Code Metrics
    • 3,729 lines of code, 43 packages, 101 classes
    • Lines of code per method: 3.532(stddev 4.017), maximum 19
    • 89 unit tests (nearly all have multiple assertions)

01/07

  • Major refactoring: move Factory away from a property of any operator and get it from Population
  • PopulationPref was made to serve as holder for the preferences when the runner generates a StandardPopulation

01/03

  • I did some reorganization of Comparator and Factory. This was motivated by adding Population to the runner.
  • I had to make a few decisions about the runner
    • Since, GAMersenneTwister, is not a JavaBean (since it follows a singleton pattern, so it has no public constructor), I decided that all algorithms run from the Runner use GAMersenneTwister - the best random generator available anyway.
    • ReportStatus will be Report_End until the reporting mechanism evolves more.
    • PopSize, then, is now an attribute of GalapagosAlgorithm. The setter fires up the StandardPopulation.generateRandomPopulation(..). Future refactoring will involve taking two constructors for the algorithms, one that will generate a StandardPopulation and another that will simply accept a Population.

01/01

  • Selection operators and all of the scalers are supported.
  • My fancy equals(Object other) from SuperBean method broke, so I'll need to fix it.
  • Speciation is supported, so the OperatorSet is done
  • Idea: for noSpeciation and noScaling, allow the user to not specify that operator and default to it. That is, after the XML parsing, check for a null and add the appropriate operator
  • Issues that drive me nuts:
    • Still can't get PipedInputStream to work (piping the XSLT Transformer's OutputStream to the XMLDecoder's InputStream). I think this might be because the entire file needs to be buffered, which means that a Stream object is inappropriate for XMLDecoder.
    • I want to separate galapagosTransform.xsl to separate files for readability, so using <xsl:include> is what I should do. But, the base URI of this XSLT is not the document's location, but the location of the project. I need to access the document's location so that I can resolve a relative link without knowledge of the XSLT file's absolute location. This should be easy, but...
    • I can't decide on how to use the Report mechanism well. I might export an XML file and use an XSLT to create an HTML report, but maybe that's beating around the bush. I need to research JasperReports and other reporting libraries.
  • Major tasks ahead of me
    • Once the parser is done (which is coming close), I need to write the runner with all of the error-checking bells and whistles.
    • Write a schema to enforce the runner files.
    • A truckload of documentation
    • More thorough integration testing the entire algorithm
    • Write tutorials
  • A few code metrics
    • 3,555 lines of code, 41 packages, 96 classes
    • Lines of code per method: 3.45 (stddev 3.9), maximum 19
    • 81 unit tests (about 100 assertions)

12/29

  • Implemented an equals(Object) method in SuperBean, which uses introspection to check all of the properties (very handy for future java bean stuff).
  • Runner now supports Mate, Mutate, and Survival, and some Selection operators (scalers are the current issue).
  • Much more to do, but most of it is busywork now that I have the code patterns all figured out.
  • Idea: Documentation must include the XML configuration structure
  • MAJOR REFACTOR: rewrote the runner's unit tests so that TestCase classes don't extend each other. This is because all of the child classes re-run the parent's test methods (slows down the unit testing execution time).

12/24

  • Beanified every operator, did a little refactoring.
  • Implemented the Mate Operators for the runner

12/23

  • After many iterations and many different decisions, I am now implementing the runner by using XSLT to transform the Galapagos XML configuration format to the format that Java's XMLEncoder/Decoder uses.
  • Began a SuperBean, which all JavaBeans will be derived from, that has the method hasAllProperties. This method uses the Introspector to check all of the properties of the bean against null and (in the future) will do a deep check of all other SuperBeans.
  • In working on the transformation process, I have come across a sticky problem: piping an Outputstream to an Inputstream. So far I'm simply dumping the information to an intermediate XML file (which may be the final solution for pedagogical reasons)

11/8

  • Started the Galapagos Runner with the parser. Currently I'm using the (very clunky) DOM from org.w3c, but I'm debating on going to JDOM for my parsing needs. JAXB is also on my horizon.
  • Didn't commit the code yet since I might change libraries.
  • Working on XML Schema for the input XML file for the runner.

11/7

  • In Roulette, created a "base case" for speed-up reasons. If there are only two genomes left unmated, then grab those two. This bug was discovered when the Window scaler was placed in and the runSelection method of Roulette slowed down a ton (i.e. since one genome had 1 as fitness and the other had 3000, it's pointless to keep randomly picking because there's only two left).
  • Incorporated fitness scaling
    • Every QuantifiableGenome now has a getScaledFitness() function, with an appropriate exception message when the fitness has not been scaled.
    • LogScale and SqrtScale both have a multiplier parameter to scale the output
    • WindowScale subtracts the lowest fitness from the population
    • Idea: make an exercise that requires a combination of two scalers (e.g. window and logarithmic)
  • Idea: make a genetic programming example, then have some exercises be different variations on it. Need to do some research on the "textbook" examples of genetic programming.
  • Idea: do a simple Traveling Salesperson example.

10/16

  • In Genome, changed toString() to report(), since toString() really shouldn't be used there.
  • TODO: Write fitness scaling into Roulette. Scalers: Log, Sqrt, Windowing, Ranking
  • Idea: Once fitness scaling is in place, write a random generator whose pdf is transformed from a uniform distribution to an exponential, quadratic, etc. Then apply this to a selection operator which would rank all genomes and select with bias towards the top.

 

10/13

  • Created nullCheck methods for when the operators are loaded as JavaBeans to ensure that all parameters are loaded properly before the algorithm is run
  • IterativeAlgorithm is now used fully in the IntGenome example, so the Main class no longer extends GalapagosAlgorithm
  • Made Genome's toString() method abstract, forcing the user to write a useful toString() method for reporting purposes.
  • Started on GAReport, a more sophisticated reporting mechanism
  • Researched a few new ideas for Pet Problems. Will consult more formal sources soon.
  • Designed and began to implement the XML schema for the Runner. Brainstorm below:

TODO

  • Move functionality of ReportStatus away from an enumeration to GAReport which collects all possible data then reports only what you want, inputing the OutputStream as a parameter
  • Add Fitness Scaling to the Roulette selection operator
  • Idea: create a tutorial which asks the user to create a GA for solving general integer solutions to Diophantine Equations (e.g. x+3y+5z = 30). This would make for a great example instead of a continuous-space example like SineGenome
  • Change the abstract toString() method to another, more descriptive method for reporting later on.
  • Change SineGenome's toString() value to something meaningful (i.e. the x-,y-, and z-values)

10/5

  • Added the GNU License to all files
  • Made GAOperatorLoadingException more descriptive
  • Extensive testing done on SineGenome example
  • Finished the IterativeAlgorithm
  • Added to the Parameters file
  • Created the SourceForge account.

9/29

  • Created a generateRandomPopulation method in the framework
  • The GenomeFactory is now a part of the Population, so this means that the population is in charge of creating new genomes, keeping a list of genomes, and knowing how to compare genomes.
  • CompGenome and SineGenome were refactored considerably
  • The property GenomeSize was delegated to the user to define when they implement GenomeFactory.
  • Pulled up GenomeFactory into the abstract class MateOperator from its children
  • Began Tutorials section
  • Began Operators, Properties, and Parameters quick reference guide.
  • Created Official Documentation
  • Document Type Definition of the runner configuration input file is underway.

9/24

  • Fixed a selection bug in keeping track of unmated genomes
  • Made all operators abstract classes instead of interfaces, moved around common fields
  • Changed the random number generator to a parameter everywhere
  • Added a numChildrenPerFamily parameter, allowing for multiple children per family
  • Created an iterative algorithm
  • Began pet problem "SineWave", which is the optimization of the continuous function


Task List:

  • Incorporate an Iterative Algorithm into the framework
  • Fix number formatting fix in SineGenome
  • Change GenomeFactory to a parameter, not a property
  • Move Population out of OperatorSet
  • Move generateRandomPopulation into the framework

9/19
Initial Entry
I have been working on this project for about 3 months now and I have the framework up on its feet with 50 unit tests, two pet problems, and over 1000 effective lines of code. Most importantly, it all works beautifully! ...time to break it again.