## Bacterial computing

Full text also available in the ACM Digital Library as PDF | HTML | Digital Edition

A multidisciplinary group of students from Missouri Western State University and Davidson College (under our mentorship) recently designed and constructed a proof-of-concept experiment to solve a Hamiltonian Path Problem inside live bacteria. The students worked within a new field called synthetic biology, which seeks to apply engineering principles (including standardization, modularity, and abstraction), mathematical modeling, and molecular biology techniques to build biological machines to perform functions with applications in medicine, the environment, energy, and technology.

A Hamiltonian path is a sequence of directed edges, in a directed graph, that start at one node and end at another node while visiting all nodes exactly once. A directed graph is a set of nodes with directed edges between some pairs of nodes. A Hamiltonian Path Problem (HPP) asks, for a given directed graph and specified beginning and ending nodes, does there exist a Hamiltonian path in the graph? For example, the graph in Figure 1 has a unique Hamiltonian path beginning at node 1 and ending at node 5.

In 1994, Leonard Adleman (who put the A in RSA cryptography) published a seminal paper on DNA computing. The article [1] described Adleman's in vitro experiment in which the edges of the graph of Figure 1 were represented by strands of DNA with the property that each strand could only attach itself to certain other strands.

"A single E. coli bacterium will replicate itself approximately every 30 minutes. Growing a bacterium overnight results in approximately a billion identical, independent biological-processors."

DNA is composed of four nucleotides, represented by A, T, G, and C. In double-stranded DNA, an A on one strand always pairs with a T on the other strand, and similarly, a G on one strand always pairs with a C on the other. These A-T and G-C interactions are called Watson-Crick base pairing.

Adleman constructed single-stranded segments of DNA so that as these segments of DNA "found each other" in the test tube, compatible segments assembled by base pairing to form longer and longer pieces of double-stranded DNA. At the end of the binding step, Adleman checked segments of the correct length for representation of each node and found some segments that corresponded to the unique Hamiltonian path of the directed graph in Figure 1. Adelman's breakthrough experiment demonstrated the application of biological molecules to solve a computational problem.

"Synthetic biology is a natural fit for multidisciplinary research for undergraduate students. The iGEM community provides a supportive environment for teams wishing to engage in research."

In trying to solve the problem in a new way, the students took a different approach. The team, composed of students from two campuses, as well as faculty from the mathematics and biology departments, designed and constructed a proof-of-concept experiment to solve a HPP inside live bacteria.

The team participated in the International Genetically Engineered Machines (iGEM) competition. iGEM began in 2004 with five teams from the U.S. In 2010, there were more than 130 teams from North America, Latin America, Europe, Asia, and Africa that shared information and resources in an effort to advance the field of synthetic biology. The iGEM community specifically targets undergraduate students because of their creativity and enthusiasm. Each year iGEM students design, build, and test synthetic biology projects with varied applications and present them at the annual iGEM Jamboree held at Massachusetts Institute of Technology.

Hurdles to Creating a Bacterial Computer

Designing and constructing a bacterial computer to solve any math problem presents several distinct challenges. First, how will the components of the problem be encoded into the DNA of a bacterium? Second, how will the information be manipulated (a necessary component for computation)? Finally, how will the results of the computation be "displayed"?

The students addressed each of these challenges to solve the HPP. The used in this design used living E. coli to find the solution to the HPP, and is shown in Figure 2.

Our first design consideration was to determine how the problem could be encoded into DNA. DNA is endowed with a natural directionality, known as 5' to 3', which we exploited to represent the directed edges of our graph. DNA components must be present in the correct 5' to 3' orientation and spatial order to make a functional gene capable of producing a protein. A gene functions when there is a promoter, a ribosome binding site (RBS), and a coding sequence for a particular gene. A transcriptional terminator may be included at the 3' end to prevent the gene expression machinery from continuing farther down the DNA.

The critical step for coding an HPP was splitting the genes that produce red fluorescent protein (RFP) and green fluorescent protein (GFP) into two pieces, and inserting a short segment of DNA between the two pieces. Fluorescent protein genes are standard reporters in the synthetic biology community because they cause cells to glow, allowing researchers to monitor the results of an experiment in real time, without isolating the DNA. These reporters and others are made available to the iGEM community through the Registry of Standard Biological Parts [3].

Finding appropriate locations to split these genes was a significant bioinformatics challenge. The insertion location had to be chosen so that the split gene was still functional, but a portion of RFP combined with a portion of GFP would result in a non-functional protein. To encode the HPP in a bacterial cell, we split the RFP into two pieces, RFP1 and RFP2, and the GFP into two pieces, GFP1 and GFP2. The three edges of our graph were built by taking the second portion (the 3' piece) of the source node followed by the first half of the destination node. More specifically, edge A consisted of RFP2 and GFP1, edge B consisted of GFP2 and the transcriptional terminator, and edge C consisted of RFP2 and the transcriptional terminator. Our control construct ABC and one of our test constructs, BAC, are shown in Figure 3.

"Many other iGEM projects have components of computation, simulation, and mathematical modeling. The results have application in medicine, the environment, and biofuels, as well as other benefits from data analysis and model building, to the testing of experimental designs."

Our second design challenge was to determine how the DNA information would be manipulated within the bacterial cells. In nature, Salmonella typhimurium has the ability to invert a segment of its DNA to switch from one gene to another so its extracellular surface is covered by a different protein. The cell's mechanism, called the Hin-hix inverter, was reconstituted by a previous iGEM team from Missouri Western State University and Davidson College for use in E. coli to solve the Burnt Pancake Problem [2]. Each segment that we want to flip is flanked by a 26-basepair hix site on each side. In the presence of Hin proteins, the DNA is temporarily cut at a pair of hix sites, inverted, and reattached. To simplify our notation we used a prime (') to indicate that a segment of DNA was flipped from its original 5' to 3' orientation. In Figure 3, the yellow triangles represent the hix sites. Figure 4 shows one series of successive inversions that converts the test construct in Figure 3b to the solution construct shown in Figure 3a.

"Undergraduate students worked within a new field called synthetic biology, which seeks to apply engineering principles, mathematical modeling, and molecular biology techniques to build biological machines to perform functions with applications in medicine, the environment, energy, and technology."

Once the problem was encoded in DNA we benefited from the greatest advantage of bacterial computing. A single E. coli bacterium will replicate itself approximately every 30 minutes. Growing a bacterium overnight results in 2^30 (approximately a billion), identical, independent biological-processors. When exposed to the Hin protein each E. coli cell flips some segment of its DNA flanked by the hix sites. Even if this process is random (which is still an open question) the probability is essentially 1 that at least one in a billion biological processors will find the solution.

How the Bacteria Responds

The final design challenge was how to know which cells have solved the HPP.

A bacterium with Edge A in the first position in 5' to 3' orientation will express red fluorescent protein because the two halves of the RFP gene have been successfully united and thus appear red under ultraviolet light. If in addition Edge B occurs in the second position, the two pieces of the GFP will be reunited and the bacterium will additionally glow green. The presence of both red and green fluorescent protein indicates a solution to our HPP, a combination that appears yellow under ultraviolet light.

We created several beginning constructs for our HPP problem. One of our constructs, corresponding to the schematic in Figure 3b, began with no fluorescence. But individual clones gained fluorescence after Hin-mediated flipping of the DNA.

Figure 5 shows photographs of plates from the lab. Figure 5a shows uncolored bacteria with no Hin introduced, where as Figure 5b shows a similar plate that began with all bacteria uncolored, but changed to colored after the introduction of Hin. The bacteria that glowed red are those that managed to flip their DNA at the hix sites ending up with Edge A in the first position and in 5' to 3' orientation. Those that glowed yellow flipped from their original positions of Edge B first and Edge A second, to a final orientation of Edge A in the first position and Edge B in the second position, both pointing in 5' to 3' direction.

Synthetic Biology's Potential

Synthetic biology is a natural fit for multidisciplinary research for undergraduate students. The iGEM community provides a supportive environment for teams wishing to engage in research.

The applications of synthetic biology are widely varied, and projects can be tailored to individual interests. The projects carried out by our team have a deliberate focus on mathematics by applying biology to solve math problems, but many other iGEM projects have components of computation, simulation, and mathematical modeling. The results have application in medicine, the environment, and biofuels, as well as other benefits from data analysis and model building, to the testing of experimental designs. Readers can check out more at iGEM's web site: http://2010.igem.org.

References

1. Adleman, L. 1994. M. Molecular computation of solutions to combinatorial problems. Science, 266. 1021–1024.

2. Haynes K. A., Broderick M. L., Brown A. D., et al. 2008. Engineering bacteria to solve the burnt pancake problem. Journal Biol En., 2, No. 8.

3. Registry of Standard Biological Parts: http://partsregistry.org/Main_Page.

Authors

Jeff Poet is an associate professor of mathematics at Missouri Western State University and 2010 winner of the Mathematical Association of America Missouri Section Distinguished University Teaching of Mathematics Award.

Malcolm Campbell is professor of biology at Davidson College in Davidson, North Carolina, co-founder of the Genome Consortium for Active Teaching (GCAT), and co-author with Laurie Heyer of the premier undergraduate textbook on genomics, now in its second edition.

Laurie Heyer, an associate professor of mathematics at Davidson College, lectures widely on bioinformatics, and in 2009 was selected as the Matthews Award recipient in recognition of her unique service to the College.

Todd Eckdahl is professor of biology at Missouri Western State University and has won the James V. Mehl Faculty Scholarship Award, a Board of Governors Distinguished Professor Award, and a State of Missouri Governor's Award for Excellence in Teaching.

Footnotes

Together, the four authors have collaborated since 2006 to co-mentor undergraduate teams of synthetic biology researchers. The students have won awards at the annual iGEM competition for their NSF-funded research in bacterial computing.

DOI: http://doi.acm.org/10.1145/1836543.1836550

Figures

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

There are no comments at this time.

### Pointers

ACM Special Interest Group on Genetic and Evolutionary Computation

Operates an annual Genetic and Evolutionary Computation Conference (GECCO)
http://www.sigevo.org/

ACM Special Interest Group on Programming Languages

Explores programming language concepts and tools focusing on design, implementation, and efficient use
http://www.sigplan.org/

Journal of Biological Engineering

Online journal that publishes articles on all aspects of biological engineering
http://www. jbiolengorg/

### Jargon

ATP

Adenosine triphosphate: a chemical that exists within living cells to facilitate energy transfer. The human body contains on average 250 grams of ATP

GA

Genetic Algorithm: a subset of evolutionary algorithms, structured in a manner directly analogous to biological evolution, with iterated rounds of breeding, mutation, and selection

GFP

Green Fluorescent Protein: a gene used as an actuator in synthetic biology, analogous to an LED in a digital circuit

HPP

Hamiltonian Path Problem: given a directed graph and distinguished start and end nodes, does there exist a path from start to end that visits all nodes exactly once?

iGEM

International Genetically Engineered Machines competition: a synthetic biology competition for undergraduate students around the world