Working with Different Genetic Algorithm Representations in Python

4 years ago   •   15 min read

By Ahmed Fawzy Gad

Depending on the nature of the problem being optimized, the genetic algorithm (GA) supports two different gene representations: binary, and decimal. The binary GA has only two values for its genes, which are 0 and 1. This is easier to manage as its gene values are limited compared to the decimal GA, for which we can use different formats like float or integer, and limited or unlimited ranges.

This tutorial discusses how the PyGAD library supports the two GA representations, binary and decimal. The outline of this tutorial is as follows:

  • Getting started with PyGAD
  • Controlling the gene's range in the initial population
  • Gene type (int or float)
  • Avoiding exceeding the initial range
  • Continuous and discrete gene ranges
  • Custom values for each gene
  • Customizing some genes while randomizing others
  • The binary genetic algorithm
  • User-defined initial population

You can also run the code for this tutorial for free on Gradient.

Let's get started.

Bring this project to life

Getting Started with PyGAD

PyGAD is a Python library for implementing the genetic algorithm. To install it and get started, check out the tutorial 5 Genetic Algorithm Applications Using PyGAD. As the name implies, we'll show you how to develop five different applications using the library. You can run the code for free on Gradient.

In Building a Game-Playing Agent for CoinTex Using the Genetic Algorithm, PyGAD was used to build an agent that plays a game called CoinTex.

PyGAD is also documented at Read the Docs.

Before starting this tutorial, make sure you have at least PyGad version 2.6.0 installed.

pip install pygad>=2.6.0

The next section discusses how to use PyGAD to customize the range of values for the genes.

Control the Gene's Range in the Initial Population

As we've discussed, the GA has two representations for its genes:

  1. Binary
  2. Decimal

For the binary GA, each gene has only two values: 0 or 1. On the other hand, the decimal representation can use any decimal value for the gene. This section discusses how to limit the range of gene values.

In some problems it may be useful for a user to limit the range of valid gene values. For example, say that each gene value must be between 5 and 15 without any exceptions. How can we do that?

PyGAD supports two parameters to handle this scenario: init_range_low and init_range_high. The parameter init_range_low specifies the lower limit of the range from which the initial population is created, while init_range_high specifies the upper limit. Note that init_range_high is exclusive. So, if init_range_low=5 and init_range_high=15, then the possible gene values are from 5 up to, but not including, 15. This example is shown in the code snippet below.

Aside from these arguments we must also specify our fitness function, fitness_func, and number of generations, num_generations.

Although they're optional, the next three parameters must be specified when the initial population is created randomly:

  • num_parents_mating: Number of parents to mate.
  • sol_per_pop: Set to 3, which means the population has 3 solutions.
  • num_genes: Each solution has 4 genes.
import pygad

def fitness_function(solution, solution_idx):
    return sum(solution)

ga_instance = pygad.GA(num_generations=1,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=5,
                       init_range_high=15)

Once the instance is created, the random initial population is prepared in the initial_population attribute. The line below prints it. Because the population has 3 solutions, where each solution has 4 genes, the shape of the population is (3, 4). Note that each gene has a value between 5 and 15. By default, the type of the genes is float.

print(ga_instance.initial_population)
print(ga_instance.initial_population.shape)
[[14.02138539 10.13561641 13.77733116 5]
 [13.28398269 14.13789428 12.6097329   7.51336248]
 [ 9.42208693  6.97035939 14.54414418  6.54276097]]

(3, 4)

The following code sets init_range_low to 1, and init_range_high to 3, to see how the range of genes changes.

ga_instance = pygad.GA(num_generations=1,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=1,
                       init_range_high=3)

print(ga_instance.initial_population)

As given below, the randomly created population has all the genes between 1 and 3. Note that it is possible to have a gene with value 1 but it is impossible to have a value 3.

[[1.00631559 2.91140666 1.30055502 2.10605866]
 [2.23160212 2.32108812 1.90731624 1]
 [2.23293791 1.9496456  1.25106388 2.46866602]]

Note that the init_range_low and init_range_high parameters just limit the range of genes in the initial population. What if the solutions evolved into a number of generations? This may make the genes exceed the initial range.

To do an experiment, the num_generations parameter is set to 10 and the run() method is called to evolve the solutions through the 10 generations.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=1,
                       init_range_high=3)

ga_instance.run()

After the run() method completes, the next code prints the following:

  • The initial population using the initial_population attribute.
  • The final population using the population attribute.
print(ga_instance.initial_population)
print(ga_instance.population)
[[1.08808272 1.16951518 1.30742402 1.40566555]
 [2.88777068 2.49699173 2.47277427 2.36010308]
 [1.94598736 2.10177613 1.57860387 1.45981019]]

[[3.7134492  1.9735615  3.39366783 2.21956642]
 [3.7134492  2.49699173 2.47277427 2.36010308]
 [2.94450144 1.9735615  3.39366783 2.36010308]]

For the initial population, all the genes are between 1 and 3. For the final population, some genes exceeded the range like the first and third genes in the first solution. How to force the genes within any population to be within the range? This is discussed in the Avoid Exceeding the Range section.

Something else to note is that the type of the gene is floating-point. Some problems may only work with integer values. The next section discusses how to specify the type of genes using the gene_type parameter.

Gene Type (int or float)

By default, PyGAD assigns random floating-point values to the initial population. In case the user wants the values to be integers, the gene_type parameter is available for this purpose. It is supported in PyGAD 2.6.0 and higher.

It supports 2 values:

  1. float: It is the default value. This means the genes are floating-point numbers.
  2. int: The genes are converted from floating-point numbers to integers.

The next code sets the gene_type parameter to int to force the random initial population to have integer genes.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=1,
                       init_range_high=3,
                       
                       gene_type=int)

print(ga_instance.initial_population)

The random initial population is printed below. Note that the range of the gene is from 1 to 3 exclusive. This means that 1 and 2 are the only integers. Thus, the population only has values 1 and 2.

[[1 1 2 1]
 [1 2 2 1]
 [1 2 1 2]]

When the range changes to be from 5 to 10, then the possible gene values are 5, 6, 7, 8, and 9.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=5,
                       init_range_high=10,

                       gene_type=int)

print(ga_instance.initial_population)
[[5 9 7 8]
 [5 7 9 8]
 [5 5 6 7]]

Note that setting the gene_type parameter to either int or float does not prevent the genes from exceeding the range specified using the init_range_low and init_range_high parameters. This is discussed in the next section.

Avoid Exceeding the Initial Range

The randomly created initial population has its genes within the range specified by the 2 parameters init_range_low and init_range_high. But this does not guarantee its genes to be always within this range. The reason is that the genes' values change due to the mutation operation.

By default, the mutation operation of type random is applied to all genes. This causes some random changes to the genes which cause their values to exceed the initial range. Based on the type of problem being solved, exceeding the range may or may not be an issue.

If the problem must have its genes within a range, then there are different options to force all genes in all generations to be within the range. These options are summarized as follows:

  1. Do not use random mutation.
  2. Disable the mutation operation.
  3. Use the mutation_by_replacement parameter. This is the most practical option.

Let's discuss each of these options.

Do Not Use random Mutation

The type of the used mutation operation is specified using the mutation_type parameter. The supported types are:

  1. Random: mutation_type=random
  2. Swap: mutation_type=swap
  3. Inversion: mutation_type=inversion
  4. Scramble: mutation_type=scramble

Out of those 4 types, only the random mutation may change the gene values outside the range. So, one way to force the genes to be within the initial range is to use another type of mutation than the random mutation.

The next code uses the swap mutation. Even after the run() method executes, the genes values are still inside the initial range.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=5,
                       init_range_high=10,
                       
                       mutation_type="swap",

                       gene_type=int)

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[6 9 8 7]
 [5 5 8 8]
 [9 8 5 6]]

[[8 9 6 9]
 [8 9 6 9]
 [8 9 6 9]]

This option might not be feasible in many situations because the other types keep the original gene values while only changing their order. There are no changes introduced to the genes.

Disable the Mutation Operation

PyGAD can disable the mutation operation by setting the mutation_type parameter to None. Even that it preserves the gene values within the initial range, but it disables one of the primary options for evolving the solutions.

The next code disables the mutation operation. After 10 generations, the genes are still within the specified range.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=5,
                       init_range_high=10,
                       
                       mutation_type=None,

                       gene_type=int)

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[7 9 5 9]
 [5 6 6 8]
 [8 5 6 6]]

[[7 9 5 9]
 [7 9 6 6]
 [7 9 5 6]]

Use the mutation_by_replacement Parameter

The previous 2 options either scarify by the random mutation or by the mutation itself to keep the gene in the initial range. The most feasible option that supports using the random mutation while still keeping the genes within the specified range is the boolean parameter mutation_by_replacement.

Normally, random mutation generates a random value. This value is then added to the current gene value. Assume there is a gene with value 2.5 and the specified range is 1 to 3, exclusive. If the random value is 0.7, then adding it to the current gene value results in 2.5+0.7=3.2 which is outside the range.

When the mutation_by_replacement parameter is True, then it replaces (not adds) the gene value by the random value. So, when the random value is 0.7, the new gene value will be 0.7. If the gene_type is set to int, the result will be 1.0.

The user can control the range from which the random value is generated using the 2 parameters random_mutation_min_val and random_mutation_max_val which specify the lower and upper limits, respectively.

To keep the genes within the range, each of these parameters must satisfy the following condition:

init_range_low <= param <= init_range_high

For the best experience, set random_mutation_min_val=init_range_low and  random_mutation_max_val=init_range_high.

The next code gives an example of using 3 parameters discussed in this subsection (mutation_by_replacement, random_mutation_min_val, and random_mutation_max_val).

ga_instance = pygad.GA(num_generations=1000,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=5,
                       init_range_high=10,

                       random_mutation_min_val=5,
                       random_mutation_max_val=10,

                       mutation_by_replacement=True,

                       gene_type=int)

ga_instance.run()

For any number of generations, the genes will not exceed the range. The next code prints the initial and final populations. The genes in the final population do not exceed the range.

print(ga_instance.initial_population)
print(ga_instance.population)
[[5 8 8 5]
 [9 8 8 9]
 [5 9 8 9]]

[[9 9 9 9]
 [9 9 9 9]
 [9 9 8 9]]

Using the 3 parameters mutation_by_replacement, it is possible to make the GA works only with binary genes (i.e. genes with values 0 and 1). This is by doing the following:

  • Set init_range_low=random_mutation_min_val=0.
  • Set init_range_high=random_mutation_max_val=2.
  • Set mutation_by_replacement=True.
  • Set gene_type=int.
ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=0,
                       init_range_high=2,

                       random_mutation_min_val=0,
                       random_mutation_max_val=2,

                       mutation_by_replacement=True,

                       gene_type=int)

ga_instance.run()

Here are the initial and final populations where all genes are either 0 or 1.

print(ga_instance.initial_population)
print(ga_instance.population)
[[1 1 0 1]
 [0 1 0 0]
 [0 1 1 1]]

[[0 1 0 1]
 [0 1 0 0]
 [0 1 1 0]]

Note that this is not the only way to support binary GA. Using the gene_space parameter, it is also possible to support the binary GA. This parameter is introduced in the next section.

Continuous and Discrete Gene Ranges

The previous discussion assumes that the range from which the genes are sampled is continuous. So, if the range starts from 1 to 5, then all the values within this range (1, 2, 3, and 4) are acceptable. What if some values within a range are not permitted or the values do not follow a continuous range (e.g. -2, 18, 43, and 78)? For that purpose, PyGAD supports a parameter named gene_space to specify the gene values space.

The gene_space parameter allows the user to list all the possible genes values. It accepts a list or tuple in which all the possible gene values are listed.

The next code uses the gene_space parameter to list the possible values for all the genes. As a result, all the genes are sampled from the listed 4 values.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[-2, 18, 43, 78])

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[78 43 78 -2]
 [18 -2 78 78]
 [43 43 18 -2]]

[[-2 -2 18 78]
 [18 -2 78 78]
 [-2 -2 18 78]]

Note that all the genes are sampled from the same values. In other words, the values in the gene_space parameter are global to all genes. What if each gene has distinct values? The next section discusses how to specify custom values for each gene.

Custom Values for Each Gene

When the gene_space parameter accepts a non-nested list/tuple, then the values in this list/tuple are used to sample the values of all genes. It may happen that some genes have their own distinct values. The gene_space parameter accepts the values of each gene separately. This is by creating a nested list/tuple where each item holds the possible values for its corresponding gene.

Assume there are 4 genes and each gene has its own value space. A list of the possible values for each gene is prepared as given below. Note that no gene has its values following a sequence. Each gene might have a different number of values.

  1. Gene 1: [-4, 2]
  2. Gene 2: [0, 5, 7, 22, 84]
  3. Gene 3: [-8, -3, 0, 4]
  4. Gene 4: [1, 6, 16, 18]

All of the 4 lists are added as items in the gene_space parameter as given below.

gene_space = [[-4, 2], 
              [0, 5, 7, 22, 84],
              [-8, -3, 0, 4],
              [1, 6, 16, 18] ]

The next code creates an instance of the pygad.GA class which uses the gene_space attribute. The printed initial and final population shows how each gene is sampled from its own space. For example, the values of the first gene for all solutions are -4 and 2.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[[-4, 2], 
                                   [0, 5, 7, 22, 84], 
                                   [-8, -3, 0, 4], 
                                   [1, 6, 16, 18] ])

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[-4. 84.  0. 18.]
 [ 2.  7.  4.  1.]
 [ 2.  0. -8.  6.]]

[[-4.  7.  4.  1.]
 [ 2.  7.  4.  1.]
 [-4.  7.  4.  6.]]

The values of the previous 4 genes were not following a sequence. It may happen that the values of some genes follow a sequence. The values of the 4 genes are listed below. The values of the first gene start from 0 to 5 (exclusive) and the values of the second gene start from 16 to 27  (exclusive). The values of the third and fourth genes are the same as previously.

  1. Gene 1: [0, 1, 2, 3, 4]
  2. Gene 2: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
  3. Gene 3: [-8, -3, 0, 4]
  4. Gene 4: [1, 6, 16, 18]

The new value of the gene_space parameter is given below.

gene_space = [ [0, 1, 2, 3, 4],
               [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
               [-8, -3, 0, 4],
               [1, 6, 16, 18] ]

What if a gene has a sequence of, for example, 1,000 values. Do we have to list its individual elements? Fortunately, PyGAD allows the space of a single gene to be specified using the range() function. If the value space of the first gene starts from 0 up to but not including 5, then it can be modeled using range(0, 5). For the second gene that starts from 16 up to but not including 26, then its value space is represented using range(16, 27).

After using the range() function, the new value of the gene_space parameter is given below.

gene_space = [ range(5), range(16, 27), [-8, -3, 0, 4], [1, 6, 16, 18] ]

Here is the code that uses the updated gene_space.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[range(5), 
                                   range(16, 27), 
                                   [-8, -3, 0, 4], 
                                   [1, 6, 16, 18] ])

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[ 0. 19. -8. 18.]
 [ 2. 26.  4.  6.]
 [ 3. 18. -3.  1.]]

[[ 3. 25.  0.  6.]
 [ 0. 26.  4.  18.]
 [ 3. 22.  0.  6.]]

It is possible to fix a gene to a single value. This is by assigning its item in the gene_space parameter to that single value. Here is an example in which the first gene is set to 4 and the third gene to 5. These 2 genes will not have their values changed ever.

gene_space = [4, 
              range(16, 27), 
              5, 
              [1, 6, 16, 18] ]

Here is the code that uses the last gene_space value. In the initial and final population, the first and third genes never change.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[4, 
                                   range(16, 27), 
                                   5, 
                                   [1, 6, 16, 18] ])

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[ 4. 21.  5. 16.]
 [ 4. 18.  5. 16.]
 [ 4. 24.  5. 16.]]

[[ 4. 18.  5.  1.]
 [ 4. 18.  5. 16.]
 [ 4. 18.  5. 18.]]

Customize Some Genes while Randomizing Others

According to the previous discussion of the gene_space parameter, each gene has its own gene space specified either by hardcoding the individual values or using the range() function.

In some cases, the user might need to force some genes to be restricted to some values but other genes might be randomized. For example, if the chromosome has a gene that must be either -1 or 1 but the other genes can have any random value. How to do that?

For the gene to be randomized, assign its item in the gene_space parameter to None. This means the value for this gene will be randomized. The next line assigns the list [-1, 1] to the first gene and None to the remaining 3 genes. The last 3 genes will have random values.

gene_space = [[-1, 1], None, None, None]

The next code uses the last gene_space value. Note how the first gene is sampled from the list [-1, 1] while the other genes have random values.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[[-1, 1], None, None, None])
ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[ 1.          0.28682682  1.39230915  1.12768838]
 [-1.         -1.05781089  1.71296713  2.56994039]
 [ 1.          3.78611876 -3.80634854  2.15975074]]

[[-1.         -1.05781089  1.88097581  2.56994039]
 [-1.         -1.05781089  1.71296713  2.56994039]
 [-1.         -1.05781089  1.3061504   2.56994039]]

Note that the random genes are initialized randomly from values within the range specified by the 2 parameters init_range_low and init_range_high. If the type of mutation is random, then the random value added to the gene is sampled from the range specified by the 2 parameters random_mutation_min_val and random_mutation_max_val. Moreover, the type of the random value is determined according to the gene_type parameter. Finally, if the mutation_by_replacement is set to True, then the random value will not be added but replaces the gene. Note that these parameters only affect the genes that have their space set to None.

The next code forces the gene initial value to be between 10 and 20, exclusive. The mutation random range is from 30 to 40, exclusive. The gene_type is set to int.

ga_instance = pygad.GA(num_generations=1000,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       init_range_low=10,
                       init_range_high=20,

                       random_mutation_min_val=30,
                       random_mutation_max_val=40,
                       
                       gene_space=[[-1, 1], None, None, None],

                       gene_type=int)

ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[ 1. 16. 14. 10.]
 [-1. 12. 16. 14.]
 [-1. 17. 19. 13.]]

[[-1. 12. 16. 48.]
 [-1. 15. 26. 14.]
 [-1. 12. 16. 14.]]

Binary Genetic Algorithm

In the Use the mutation_by_replacement Parameter section, PyGAD supported the binary genetic algorithm by using the following parameters.

  • init_range_low=random_mutation_min_val=0.
  • init_range_high=random_mutation_max_val=2.
  • mutation_by_replacement=True.
  • gene_type=int.

It is also possible to support the binary GA by using the gene_space parameter. This is by setting this parameter to the global space [0, 1]. This means all genes have their values either 0 or 1.

The next code sets the gene_space parameter to [0, 1]. This forces the values of all genes to be either 0 or 1.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       sol_per_pop=3,
                       num_genes=4,
                       fitness_func=fitness_function,

                       gene_space=[0, 1])
ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[1 1 1 0]
 [0 1 0 0]
 [1 1 1 0]]

[[0 1 1 0]
 [0 1 0 1]
 [0 1 0 0]]

User-Defined Initial Population

Sometimes the user might want to start with a custom initial population without any randomization. PyGAD supports a parameter named initial_population that allows the user to specify a custom initial population.

The next code prepares assigns a nested list to the initial_population parameter in which there are 3 solutions where each solution has 4 genes. In this case, the num_genes and sol_per_pop parameters are not needed as they will be deduced from the value assigned to the initial_population parameter.

ga_instance = pygad.GA(num_generations=10,
                       num_parents_mating=2,
                       fitness_func=fitness_function,

                       initial_population=[[34, 32, 24, -2],
                                           [3, 7, 2, 7],
                                           [-2, -4, -6, 1]])
ga_instance.run()

print(ga_instance.initial_population)
print(ga_instance.population)
[[34 32 24 -2]
 [ 3  7  2  7]
 [-2 -4 -6  1]]

[[3 7 2 6]
 [3 7 2 7]
 [3 7 2 7]]

Conclusion

This tutorial used the PyGAD library to work with both the binary and decimal representations of the genetic algorithm. The tutorial discussed the different parameters in PyGAD to allow the user to control how the initial population is created in addition to controlling the mutation operation.

Using the gene_type parameter, the gene values can be either floats or integers. The mutation_by_replacement parameter is used to keep the genes in their initial range. The initial_population parameter accepts a user-defined initial population.

The gene_space parameter helps in case the gene values do not follow a sequence. In this case, the discrete gene values are fed as a list. This parameter accepts a custom value space for each gene. Moreover, it allows some genes to be sampled from a defined space and others to be selected randomly.

Add speed and simplicity to your Machine Learning workflow today

Get startedContact Sales

Spread the word

Keep reading