XRDS

Crossroads The ACM Magazine for Students

Sign In

Association for Computing Machinery

Magazine: Hello world Hands-on introduction to genetic programming

back to top 

The idea to mimic the principles of Darwinian evolution in computing has been around at least since the 1950s, so long, in fact, that it has grown into the field called evolutionary computing (EC). In this tutorial, we'll learn the basic principles of EC and its offspring, genetic programming (GP), on a "toy problem" of symbolic regression. We'll also learn how to use OpenBeagle, a generic C++ object-oriented EC framework.

back to top  The Fittest Program Survives

EC can be regarded as a very general kind of optimization, where the solution to a given problem is selected from an evolving population of candidate solutions, or individuals, represented by their genomes. The selection is based on certain fitness criteria, which can just be a function operating on genomes.

The computation starts by choosing a random bunch of individuals—generation zero. Generation n+1 is the result of applying evolution operators to the individuals of generation n. The most used operators are mutation (random modification of a single individual's genome) and crossover (random mixing of genomes of two individuals). The individuals that produce "offspring" are chosen based on their fitness. The process ends when a certain stopping criteria are met (for example, some predefined number of generations).

GP takes these ideas one step further by performing the search in the space of programs (algorithms). A program's genome is usually represented as a tree of primitives, such as variables, arithmetical and logical operators, loops, conditionals, function calls, and so forth.

The very nature of EC and GP enables one to tackle problems without having the slightest idea how the solution should look. Indeed, this paradigm has been successfully applied in a broad range of applications, producing results that have even been patented as new inventions. On the other hand, success is never guaranteed.

back to top  Example

Let's demonstrate the principles outlined above on the classical "toy example" of symbolic regression. Among the several genetic programming tools available, I have chosen to use the OpenBeagle framework. Written in standard C++, OpenBeagle supports virtually any kind of EC through subclassing and polymorphism. Combined with portability and speed of C++, this approach is a good choice for many projects. It's also beautifully designed and a real pleasure to use. There are detailed instructions for downloading and installing the package at http://beagle.gel.ulaval.ca. The example below was largely taken from OpenBeagle documentation.

Let's say we are given some one-dimensional data samples xrds1701_c.gif and we would like to find a formula y = f(x) which best fits the data. Suppose we decide to constrain the formula to consist only of arithmetic operations: addition, subtraction, multiplication, division. Then our genome trees will consist of these four primitives as intermediate nodes, together with the leaf node "x" (the function's argument). In OpenBeagle, we need to define an object of the type GP::PrimitiveSet. See Listing 1a.

The GP::System object is a kind of container holding the information about the evolutionary system. It contains objects of type Beagle::Component. (Note that all the framework objects have reference counting, and so the references are in fact smart pointers.) We define a component to hold our initial data, which will be used later on. See Listing1b.

I have used the formula yi = −1 −sin(2xi) for this example. So in fact we will be approximating a trigonometric function by a rational one.

The next step is to define the fitness function which measures how close an individual/is to a perfect match. In our case, a good choice will be to calculate the deviation

ueq01.gif

and then set the fitness to be the normalized value (the algorithm maximizes the fitness, so a perfect match would have the maximal fitness)

ueq02.gif

In OpenBeagle, the fitness calculation is encapsulated by objects of type GP::EvaluationOp. Ours would be coded as shown in Listing 2a.

Having defined these essential components of a GP system, now we only need to combine everything. There are two additional objects we need to be familiar with. The first is GP::Vivarium, which encapsulates all the individuals of all the generations throughout the whole evolution process, as well as statistical data. For example, it has a member of type Beagle::HallOfFame that holds the best individual. Finally, the entire process is controlled by a GP::Evolver.

It is responsible for selecting the initial population and applying the evolution operators, evaluating the individuals at each generation, until a termination criteria is met. The default implementation contains a pre-defined set of mutation and crossover operators. The main() function is shown in Listing 2b.

Compile and run the program. Two XML files will be produced: beagle.log and beagle.obm.gz. Exploring these files is highly recommended, as it provides many insights into the architecture and the inner workings of the framework.

Everything in OpenBeagle can be customized and extended. With almost no effort, I have added additional primitives to the system and extended the InitialData component to write the X and Y arrays to the log. Then I used this information to visually explore the best individual of a run, as depicted in Figures 1a and 1b.

All the source code used in this example, along with instructions, can be downloaded from http://xrds.acm.org/code.cfm. Again, there's much more to OpenBeagle than presented here, so I encourage you to investigate!

back to top  Resources & Further Reading

Wikipedia
Evolutionary computing entry Genetic programming entry

ACM Special Interest Group on Genetic and Evolutionary Computation
www.sigevo.org

ACM SIG EVO Newsletter
www.sigevolution.org

A Field Guide to Genetic Programming
http://dces.essex.ac.uk/staff/rpoli/gp-field-guide

Professor John Koza: a pioneer of modern GP
www.genetic-programming.com/johnkoza.html

Essentials of Metaheuristics: free book
http://cs.gmu.edu/~sean/book/metaheuristics

Genetic-Programming.org
www.genetic-programming.org

back to top  Frameworks

PyEvolve
http://pyevolve.sourceforge.net

ECJ (Java)
http://cs.gmu.edu/~eclab/projects/ecj/

Gaul
http://gaul.sourceforge.net

back to top  Footnotes

DOI: http://doi.acm.org/10.1145/1836543.1836558

back to top  Figures

F1Figure 1. The best individual is shown, including: (a) its complete genotype and (b) how well it approximates the initial data.

UF1Listing 1. (a) First, we define the primitive set in OpenBeagle. (b) Next, we define a component to hold the initial data.

UF2Listing 2. Sample code is provided for (a) the evaluation operator and (b) the main program in OpenBeagle.

back to top 

©2010 ACM  1528-4972/10/0900  $10.00

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.

Comments

There are no comments at this time.

 

To comment you must create or log in with your ACM account.

Pointers

Code for Hands-On Introduction to Genetic Programming

Pick up the code for this article here:

http://xrds.acm.org/helloworld/2010/08/genetic-programming.cfm