## Magazine: **Hello world** Hands-on introduction to genetic programming

**FREE CONTENT FEATURE**

## Hands-on introduction to genetic programming

Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition

The idea to mimic the principles of Darwinian evolution in
computing has been around at least since the 1950s, so long, in
fact, that it has grown into the field called *evolutionary
computing* (EC). In this tutorial, we'll learn the basic
principles of EC and its offspring, genetic programming (GP), on
a "toy problem" of *symbolic regression.* We'll also learn
how to use **OpenBeagle**, a generic C++ object-oriented EC
framework.

EC can be regarded as a very general kind of optimization,
where the solution to a given problem is selected from an
evolving population of candidate solutions, or
*individuals*, represented by their *genomes*. The
selection is based on certain *fitness criteria*, which can
just be a function operating on genomes.

The computation starts by choosing a random bunch of
individuals—generation zero. Generation *n*+1 is the
result of applying *evolution operators* to the individuals
of generation *n*. The most used operators are
*mutation* (random modification of a single individual's
genome) and *crossover* (random mixing of genomes of two
individuals). The individuals that produce "offspring" are chosen
based on their fitness. The process ends when a certain stopping
criteria are met (for example, some predefined number of
generations).

GP takes these ideas one step further by performing the search
in the space of *programs* (*algorithms*). A program's
genome is usually represented as a tree of *primitives*,
such as variables, arithmetical and logical operators, loops,
conditionals, function calls, and so forth.

The very nature of EC and GP enables one to tackle problems without having the slightest idea how the solution should look. Indeed, this paradigm has been successfully applied in a broad range of applications, producing results that have even been patented as new inventions. On the other hand, success is never guaranteed.

Let's demonstrate the principles outlined above on the classical "toy example" of symbolic regression. Among the several genetic programming tools available, I have chosen to use the OpenBeagle framework. Written in standard C++, OpenBeagle supports virtually any kind of EC through subclassing and polymorphism. Combined with portability and speed of C++, this approach is a good choice for many projects. It's also beautifully designed and a real pleasure to use. There are detailed instructions for downloading and installing the package at http://beagle.gel.ulaval.ca. The example below was largely taken from OpenBeagle documentation.

Let's say we are given some one-dimensional data samples
and we would like to find a formula *y = f(x)*
which best fits the data. Suppose we decide to constrain the
formula to consist only of arithmetic operations: addition,
subtraction, multiplication, division. Then our genome trees will
consist of these four primitives as intermediate nodes, together
with the leaf node "x" (the function's argument). In OpenBeagle,
we need to define an object of the type

. See __GP::PrimitiveSet__**Listing
1a**.

The

object is a kind of
container holding the information about the evolutionary system.
It contains objects of type
__GP::System__

. (Note that all the
framework objects have reference counting, and so the references
are in fact smart pointers.) We define a component to hold our
initial data, which will be used later on. See
__Beagle::Component__**Listing1b**.

I have used the formula *y*_{i} = −1
−sin(2*x*_{i}) for this example. So in
fact we will be approximating a trigonometric function by a
rational one.

The next step is to define the fitness function which measures how close an individual/is to a perfect match. In our case, a good choice will be to calculate the deviation

and then set the fitness to be the normalized value (the algorithm maximizes the fitness, so a perfect match would have the maximal fitness)

In OpenBeagle, the fitness calculation is encapsulated by
objects of type

. Ours would
be coded as shown in __GP::EvaluationOp__**Listing 2a**.

Having defined these essential components of a GP system, now
we only need to combine everything. There are two additional
objects we need to be familiar with. The first is

, which encapsulates all the
individuals of all the generations throughout the whole evolution
process, as well as statistical data. For example, it has a
member of type __GP::Vivarium__

that holds
the best individual. Finally, the entire process is controlled by
a __Beagle::HallOfFame__

.__GP::Evolver__

It is responsible for selecting the initial population and
applying the evolution operators, evaluating the individuals at
each generation, until a termination criteria is met. The default
implementation contains a pre-defined set of mutation and
crossover operators. The main() function is shown in
**Listing 2b**.

Compile and run the program. Two XML files will be produced:

and
__beagle.log__

. Exploring these files is
highly recommended, as it provides many insights into the
architecture and the inner workings of the framework.__beagle.obm.gz__

Everything in OpenBeagle can be customized and extended. With
almost no effort, I have added additional primitives to the
system and extended the

component
to write the X and Y arrays to the log. Then I used this
information to visually explore the best individual of a run, as
depicted in __InitialData__**Figures 1a** and
**1b**.

All the source code used in this example, along with instructions, can be downloaded from http://xrds.acm.org/code.cfm. Again, there's much more to OpenBeagle than presented here, so I encourage you to investigate!

**Wikipedia**

Evolutionary computing entry Genetic programming entry

**ACM Special Interest Group on Genetic and Evolutionary
Computation**

www.sigevo.org

**ACM SIG EVO Newsletter**

www.sigevolution.org

**A Field Guide to Genetic Programming**

http://dces.essex.ac.uk/staff/rpoli/gp-field-guide

**Professor John Koza:** a pioneer of modern GP

www.genetic-programming.com/johnkoza.html

**Essentials of Metaheuristics:** free book

http://cs.gmu.edu/~sean/book/metaheuristics

**Genetic-Programming.org**

www.genetic-programming.org

**PyEvolve**

http://pyevolve.sourceforge.net

**ECJ (Java)**

http://cs.gmu.edu/~eclab/projects/ecj/

**Gaul**

http://gaul.sourceforge.net

DOI: http://doi.acm.org/10.1145/1836543.1836558

Figure 1. The best individual is shown, including: (a)
its complete genotype and (b) how well it approximates the
initial data.

Listing 1. (a) First, we define the
primitive set in OpenBeagle. (b) Next, we define a component to
hold the initial data.

Listing 2. Sample code is provided for (a)
the evaluation operator and (b) the main program in
OpenBeagle.

**©2010
ACM 1528-4972/10/0900 $10.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.

To comment you must create or log in with your ACM account.

### Pointers

Pick up the code for this article here:

http://xrds.acm.org/helloworld/2010/08/genetic-programming.cfm

## Comments

There are no comments at this time.