Magazine: Hello world Hands-on introduction to genetic programming
FREE CONTENT FEATURE
Hands-on introduction to genetic programming
Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition
The idea to mimic the principles of Darwinian evolution in computing has been around at least since the 1950s, so long, in fact, that it has grown into the field called evolutionary computing (EC). In this tutorial, we'll learn the basic principles of EC and its offspring, genetic programming (GP), on a "toy problem" of symbolic regression. We'll also learn how to use OpenBeagle, a generic C++ object-oriented EC framework.
EC can be regarded as a very general kind of optimization, where the solution to a given problem is selected from an evolving population of candidate solutions, or individuals, represented by their genomes. The selection is based on certain fitness criteria, which can just be a function operating on genomes.
The computation starts by choosing a random bunch of individualsgeneration zero. Generation n+1 is the result of applying evolution operators to the individuals of generation n. The most used operators are mutation (random modification of a single individual's genome) and crossover (random mixing of genomes of two individuals). The individuals that produce "offspring" are chosen based on their fitness. The process ends when a certain stopping criteria are met (for example, some predefined number of generations).
GP takes these ideas one step further by performing the search in the space of programs (algorithms). A program's genome is usually represented as a tree of primitives, such as variables, arithmetical and logical operators, loops, conditionals, function calls, and so forth.
The very nature of EC and GP enables one to tackle problems without having the slightest idea how the solution should look. Indeed, this paradigm has been successfully applied in a broad range of applications, producing results that have even been patented as new inventions. On the other hand, success is never guaranteed.
Let's demonstrate the principles outlined above on the classical "toy example" of symbolic regression. Among the several genetic programming tools available, I have chosen to use the OpenBeagle framework. Written in standard C++, OpenBeagle supports virtually any kind of EC through subclassing and polymorphism. Combined with portability and speed of C++, this approach is a good choice for many projects. It's also beautifully designed and a real pleasure to use. There are detailed instructions for downloading and installing the package at http://beagle.gel.ulaval.ca. The example below was largely taken from OpenBeagle documentation.
Let's say we are given some one-dimensional data samples
and we would like to find a formula y = f(x)
which best fits the data. Suppose we decide to constrain the
formula to consist only of arithmetic operations: addition,
subtraction, multiplication, division. Then our genome trees will
consist of these four primitives as intermediate nodes, together
with the leaf node "x" (the function's argument). In OpenBeagle,
we need to define an object of the type
GP::PrimitiveSet. See Listing
GP::System object is a kind of
container holding the information about the evolutionary system.
It contains objects of type
Beagle::Component. (Note that all the
framework objects have reference counting, and so the references
are in fact smart pointers.) We define a component to hold our
initial data, which will be used later on. See
I have used the formula yi = −1 −sin(2xi) for this example. So in fact we will be approximating a trigonometric function by a rational one.
The next step is to define the fitness function which measures how close an individual/is to a perfect match. In our case, a good choice will be to calculate the deviation
and then set the fitness to be the normalized value (the algorithm maximizes the fitness, so a perfect match would have the maximal fitness)
In OpenBeagle, the fitness calculation is encapsulated by
objects of type
GP::EvaluationOp. Ours would
be coded as shown in Listing 2a.
Having defined these essential components of a GP system, now
we only need to combine everything. There are two additional
objects we need to be familiar with. The first is
GP::Vivarium, which encapsulates all the
individuals of all the generations throughout the whole evolution
process, as well as statistical data. For example, it has a
member of type
Beagle::HallOfFame that holds
the best individual. Finally, the entire process is controlled by
It is responsible for selecting the initial population and applying the evolution operators, evaluating the individuals at each generation, until a termination criteria is met. The default implementation contains a pre-defined set of mutation and crossover operators. The main() function is shown in Listing 2b.
Compile and run the program. Two XML files will be produced:
beagle.obm.gz. Exploring these files is
highly recommended, as it provides many insights into the
architecture and the inner workings of the framework.
Everything in OpenBeagle can be customized and extended. With
almost no effort, I have added additional primitives to the
system and extended the
to write the X and Y arrays to the log. Then I used this
information to visually explore the best individual of a run, as
depicted in Figures 1a and
All the source code used in this example, along with instructions, can be downloaded from http://xrds.acm.org/code.cfm. Again, there's much more to OpenBeagle than presented here, so I encourage you to investigate!
Evolutionary computing entry Genetic programming entry
ACM Special Interest Group on Genetic and Evolutionary
ACM SIG EVO Newsletter
A Field Guide to Genetic Programming
Professor John Koza: a pioneer of modern GP
Essentials of Metaheuristics: free book
©2010 ACM 1528-4972/10/0900 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
To comment you must create or log in with your ACM account.
Pick up the code for this article here: