Introduction
Depending on the nature of the problem being optimized, the genetic algorithm (GA) supports two different gene representations: binary, and decimal. The binary GA has only two values for its genes, which are 0 and 1. This is easier to manage as its gene values are limited compared to the decimal GA, for which we can use different formats like float or integer, and limited or unlimited ranges.
This tutorial discusses how the PyGAD library supports the two GA representations, binary and decimal. The outline of this tutorial is as follows:
- Getting started with PyGAD
- Controlling the gene's range in the initial population
- Gene type (
int
orfloat
) - Avoiding exceeding the initial range
- Continuous and discrete gene ranges
- Custom values for each gene
- Customizing some genes while randomizing others
- The binary genetic algorithm
- User-defined initial population
Prerequisites: Getting Started with PyGAD
PyGAD is a Python library for implementing the genetic algorithm. To install it and get started, check out the tutorial 5 Genetic Algorithm Applications Using PyGAD. As the name implies, we'll show you how to develop five different applications using the library. You can run the code for free on Gradient.
In Building a Game-Playing Agent for CoinTex Using the Genetic Algorithm, PyGAD was used to build an agent that plays a game called CoinTex.
PyGAD is also documented at Read the Docs.
Before starting this tutorial, make sure you have at least PyGad version 2.6.0 installed.
pip install pygad>=2.6.0
The next section discusses how to use PyGAD to customize the range of values for the genes.
Control the Gene's Range in the Initial Population
As we've discussed, the GA has two representations for its genes:
- Binary
- Decimal
For the binary GA, each gene has only two values: 0 or 1. On the other hand, the decimal representation can use any decimal value for the gene. This section discusses how to limit the range of gene values.
In some problems it may be useful for a user to limit the range of valid gene values. For example, say that each gene value must be between 5 and 15 without any exceptions. How can we do that?
PyGAD supports two parameters to handle this scenario: init_range_low
and init_range_high
. The parameter init_range_low
specifies the lower limit of the range from which the initial population is created, while init_range_high
specifies the upper limit. Note that init_range_high
is exclusive. So, if init_range_low=5
and init_range_high=15
, then the possible gene values are from 5 up to, but not including, 15. This example is shown in the code snippet below.
Aside from these arguments we must also specify our fitness function, fitness_func
, and number of generations, num_generations
.
Although they're optional, the next three parameters must be specified when the initial population is created randomly:
num_parents_mating
: Number of parents to mate.sol_per_pop
: Set to 3, which means the population has 3 solutions.num_genes
: Each solution has 4 genes.
import pygad
def fitness_function(solution, solution_idx):
return sum(solution)
ga_instance = pygad.GA(num_generations=1,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=5,
init_range_high=15)
Once the instance is created, the random initial population is prepared in the initial_population
attribute. The line below prints it. Because the population has 3 solutions, where each solution has 4 genes, the shape of the population is (3, 4)
. Note that each gene has a value between 5 and 15. By default, the type of the genes is float.
print(ga_instance.initial_population)
print(ga_instance.initial_population.shape)
[[14.02138539 10.13561641 13.77733116 5]
[13.28398269 14.13789428 12.6097329 7.51336248]
[ 9.42208693 6.97035939 14.54414418 6.54276097]]
(3, 4)
The following code sets init_range_low
to 1, and init_range_high
to 3, to see how the range of genes changes.
ga_instance = pygad.GA(num_generations=1,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=1,
init_range_high=3)
print(ga_instance.initial_population)
As given below, the randomly created population has all the genes between 1 and 3. Note that it is possible to have a gene with value 1 but it is impossible to have a value 3.
[[1.00631559 2.91140666 1.30055502 2.10605866]
[2.23160212 2.32108812 1.90731624 1]
[2.23293791 1.9496456 1.25106388 2.46866602]]
Note that the init_range_low
and init_range_high
parameters just limit the range of genes in the initial population. What if the solutions evolved into a number of generations? This may make the genes exceed the initial range.
To do an experiment, the num_generations
parameter is set to 10 and the run()
method is called to evolve the solutions through the 10 generations.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=1,
init_range_high=3)
ga_instance.run()
After the run()
method completes, the next code prints the following:
- The initial population using the
initial_population
attribute. - The final population using the
population
attribute.
print(ga_instance.initial_population)
print(ga_instance.population)
[[1.08808272 1.16951518 1.30742402 1.40566555]
[2.88777068 2.49699173 2.47277427 2.36010308]
[1.94598736 2.10177613 1.57860387 1.45981019]]
[[3.7134492 1.9735615 3.39366783 2.21956642]
[3.7134492 2.49699173 2.47277427 2.36010308]
[2.94450144 1.9735615 3.39366783 2.36010308]]
For the initial population, all the genes are between 1 and 3. For the final population, some genes exceeded the range like the first and third genes in the first solution. How to force the genes within any population to be within the range? This is discussed in the Avoid Exceeding the Range section.
Something else to note is that the type of the gene is floating-point. Some problems may only work with integer values. The next section discusses how to specify the type of genes using the gene_type
parameter.
Gene Type (int
or float
)
By default, PyGAD assigns random floating-point values to the initial population. In case the user wants the values to be integers, the gene_type
parameter is available for this purpose. It is supported in PyGAD 2.6.0 and higher.
It supports 2 values:
float
: It is the default value. This means the genes are floating-point numbers.int
: The genes are converted from floating-point numbers to integers.
The next code sets the gene_type
parameter to int
to force the random initial population to have integer genes.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=1,
init_range_high=3,
gene_type=int)
print(ga_instance.initial_population)
The random initial population is printed below. Note that the range of the gene is from 1 to 3 exclusive. This means that 1 and 2 are the only integers. Thus, the population only has values 1 and 2.
[[1 1 2 1]
[1 2 2 1]
[1 2 1 2]]
When the range changes to be from 5 to 10, then the possible gene values are 5, 6, 7, 8, and 9.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=5,
init_range_high=10,
gene_type=int)
print(ga_instance.initial_population)
[[5 9 7 8]
[5 7 9 8]
[5 5 6 7]]
Note that setting the gene_type
parameter to either int
or float
does not prevent the genes from exceeding the range specified using the init_range_low
and init_range_high
parameters. This is discussed in the next section.
Avoid Exceeding the Initial Range
The randomly created initial population has its genes within the range specified by the 2 parameters init_range_low
and init_range_high
. But this does not guarantee its genes to be always within this range. The reason is that the genes' values change due to the mutation operation.
By default, the mutation operation of type random
is applied to all genes. This causes some random changes to the genes which cause their values to exceed the initial range. Based on the type of problem being solved, exceeding the range may or may not be an issue.
If the problem must have its genes within a range, then there are different options to force all genes in all generations to be within the range. These options are summarized as follows:
- Do not use
random
mutation. - Disable the mutation operation.
- Use the
mutation_by_replacement
parameter. This is the most practical option.
Let's discuss each of these options.
Do Not Use random
Mutation
The type of the used mutation operation is specified using the mutation_type
parameter. The supported types are:
- Random:
mutation_type=random
- Swap:
mutation_type=swap
- Inversion:
mutation_type=inversion
- Scramble:
mutation_type=scramble
Out of those 4 types, only the random
mutation may change the gene values outside the range. So, one way to force the genes to be within the initial range is to use another type of mutation than the random
mutation.
The next code uses the swap
mutation. Even after the run()
method executes, the genes values are still inside the initial range.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=5,
init_range_high=10,
mutation_type="swap",
gene_type=int)
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[6 9 8 7]
[5 5 8 8]
[9 8 5 6]]
[[8 9 6 9]
[8 9 6 9]
[8 9 6 9]]
This option might not be feasible in many situations because the other types keep the original gene values while only changing their order. There are no changes introduced to the genes.
Disable the Mutation Operation
PyGAD can disable the mutation operation by setting the mutation_type
parameter to None
. Even that it preserves the gene values within the initial range, but it disables one of the primary options for evolving the solutions.
The next code disables the mutation operation. After 10 generations, the genes are still within the specified range.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=5,
init_range_high=10,
mutation_type=None,
gene_type=int)
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[7 9 5 9]
[5 6 6 8]
[8 5 6 6]]
[[7 9 5 9]
[7 9 6 6]
[7 9 5 6]]
Use the mutation_by_replacement
Parameter
The previous 2 options either scarify by the random
mutation or by the mutation itself to keep the gene in the initial range. The most feasible option that supports using the random
mutation while still keeping the genes within the specified range is the boolean parameter mutation_by_replacement
.
Normally, random mutation generates a random value. This value is then added to the current gene value. Assume there is a gene with value 2.5 and the specified range is 1 to 3, exclusive. If the random value is 0.7, then adding it to the current gene value results in 2.5+0.7=3.2
which is outside the range.
When the mutation_by_replacement
parameter is True
, then it replaces (not adds) the gene value by the random value. So, when the random value is 0.7, the new gene value will be 0.7. If the gene_type
is set to int
, the result will be 1.0
.
The user can control the range from which the random value is generated using the 2 parameters random_mutation_min_val
and random_mutation_max_val
which specify the lower and upper limits, respectively.
To keep the genes within the range, each of these parameters must satisfy the following condition:
init_range_low <= param <= init_range_high
For the best experience, set random_mutation_min_val=init_range_low
and random_mutation_max_val=init_range_high
.
The next code gives an example of using 3 parameters discussed in this subsection (mutation_by_replacement
, random_mutation_min_val
, and random_mutation_max_val
).
ga_instance = pygad.GA(num_generations=1000,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=5,
init_range_high=10,
random_mutation_min_val=5,
random_mutation_max_val=10,
mutation_by_replacement=True,
gene_type=int)
ga_instance.run()
For any number of generations, the genes will not exceed the range. The next code prints the initial and final populations. The genes in the final population do not exceed the range.
print(ga_instance.initial_population)
print(ga_instance.population)
[[5 8 8 5]
[9 8 8 9]
[5 9 8 9]]
[[9 9 9 9]
[9 9 9 9]
[9 9 8 9]]
Using the 3 parameters mutation_by_replacement
, it is possible to make the GA works only with binary genes (i.e. genes with values 0 and 1). This is by doing the following:
- Set
init_range_low=random_mutation_min_val=0
. - Set
init_range_high=random_mutation_max_val=2
. - Set
mutation_by_replacement=True
. - Set
gene_type=int
.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=0,
init_range_high=2,
random_mutation_min_val=0,
random_mutation_max_val=2,
mutation_by_replacement=True,
gene_type=int)
ga_instance.run()
Here are the initial and final populations where all genes are either 0 or 1.
print(ga_instance.initial_population)
print(ga_instance.population)
[[1 1 0 1]
[0 1 0 0]
[0 1 1 1]]
[[0 1 0 1]
[0 1 0 0]
[0 1 1 0]]
Note that this is not the only way to support binary GA. Using the gene_space
parameter, it is also possible to support the binary GA. This parameter is introduced in the next section.
Continuous and Discrete Gene Ranges
The previous discussion assumes that the range from which the genes are sampled is continuous. So, if the range starts from 1 to 5, then all the values within this range (1, 2, 3, and 4) are acceptable. What if some values within a range are not permitted or the values do not follow a continuous range (e.g. -2, 18, 43, and 78)? For that purpose, PyGAD supports a parameter named gene_space
to specify the gene values space.
The gene_space
parameter allows the user to list all the possible genes values. It accepts a list or tuple in which all the possible gene values are listed.
The next code uses the gene_space
parameter to list the possible values for all the genes. As a result, all the genes are sampled from the listed 4 values.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[-2, 18, 43, 78])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[78 43 78 -2]
[18 -2 78 78]
[43 43 18 -2]]
[[-2 -2 18 78]
[18 -2 78 78]
[-2 -2 18 78]]
Note that all the genes are sampled from the same values. In other words, the values in the gene_space
parameter are global to all genes. What if each gene has distinct values? The next section discusses how to specify custom values for each gene.
Custom Values for Each Gene
When the gene_space
parameter accepts a non-nested list/tuple, then the values in this list/tuple are used to sample the values of all genes. It may happen that some genes have their own distinct values. The gene_space
parameter accepts the values of each gene separately. This is by creating a nested list/tuple where each item holds the possible values for its corresponding gene.
Assume there are 4 genes and each gene has its own value space. A list of the possible values for each gene is prepared as given below. Note that no gene has its values following a sequence. Each gene might have a different number of values.
- Gene 1: [-4, 2]
- Gene 2: [0, 5, 7, 22, 84]
- Gene 3: [-8, -3, 0, 4]
- Gene 4: [1, 6, 16, 18]
All of the 4 lists are added as items in the gene_space
parameter as given below.
gene_space = [[-4, 2],
[0, 5, 7, 22, 84],
[-8, -3, 0, 4],
[1, 6, 16, 18] ]
The next code creates an instance of the pygad.GA
class which uses the gene_space
attribute. The printed initial and final population shows how each gene is sampled from its own space. For example, the values of the first gene for all solutions are -4 and 2.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[[-4, 2],
[0, 5, 7, 22, 84],
[-8, -3, 0, 4],
[1, 6, 16, 18] ])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[-4. 84. 0. 18.]
[ 2. 7. 4. 1.]
[ 2. 0. -8. 6.]]
[[-4. 7. 4. 1.]
[ 2. 7. 4. 1.]
[-4. 7. 4. 6.]]
The values of the previous 4 genes were not following a sequence. It may happen that the values of some genes follow a sequence. The values of the 4 genes are listed below. The values of the first gene start from 0 to 5 (exclusive) and the values of the second gene start from 16 to 27 (exclusive). The values of the third and fourth genes are the same as previously.
- Gene 1: [0, 1, 2, 3, 4]
- Gene 2: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]
- Gene 3: [-8, -3, 0, 4]
- Gene 4: [1, 6, 16, 18]
The new value of the gene_space
parameter is given below.
gene_space = [ [0, 1, 2, 3, 4],
[16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26],
[-8, -3, 0, 4],
[1, 6, 16, 18] ]
What if a gene has a sequence of, for example, 1,000 values. Do we have to list its individual elements? Fortunately, PyGAD allows the space of a single gene to be specified using the range()
function. If the value space of the first gene starts from 0 up to but not including 5, then it can be modeled using range(0, 5)
. For the second gene that starts from 16 up to but not including 26, then its value space is represented using range(16, 27)
.
After using the range()
function, the new value of the gene_space
parameter is given below.
gene_space = [ range(5), range(16, 27), [-8, -3, 0, 4], [1, 6, 16, 18] ]
Here is the code that uses the updated gene_space
.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[range(5),
range(16, 27),
[-8, -3, 0, 4],
[1, 6, 16, 18] ])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[ 0. 19. -8. 18.]
[ 2. 26. 4. 6.]
[ 3. 18. -3. 1.]]
[[ 3. 25. 0. 6.]
[ 0. 26. 4. 18.]
[ 3. 22. 0. 6.]]
It is possible to fix a gene to a single value. This is by assigning its item in the gene_space
parameter to that single value. Here is an example in which the first gene is set to 4 and the third gene to 5. These 2 genes will not have their values changed ever.
gene_space = [4,
range(16, 27),
5,
[1, 6, 16, 18] ]
Here is the code that uses the last gene_space
value. In the initial and final population, the first and third genes never change.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[4,
range(16, 27),
5,
[1, 6, 16, 18] ])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[ 4. 21. 5. 16.]
[ 4. 18. 5. 16.]
[ 4. 24. 5. 16.]]
[[ 4. 18. 5. 1.]
[ 4. 18. 5. 16.]
[ 4. 18. 5. 18.]]
Customize Some Genes while Randomizing Others
According to the previous discussion of the gene_space
parameter, each gene has its own gene space specified either by hardcoding the individual values or using the range()
function.
In some cases, the user might need to force some genes to be restricted to some values but other genes might be randomized. For example, if the chromosome has a gene that must be either -1 or 1 but the other genes can have any random value. How to do that?
For the gene to be randomized, assign its item in the gene_space
parameter to None
. This means the value for this gene will be randomized. The next line assigns the list [-1, 1]
to the first gene and None
to the remaining 3 genes. The last 3 genes will have random values.
gene_space = [[-1, 1], None, None, None]
The next code uses the last gene_space
value. Note how the first gene is sampled from the list [-1, 1]
while the other genes have random values.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[[-1, 1], None, None, None])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[ 1. 0.28682682 1.39230915 1.12768838]
[-1. -1.05781089 1.71296713 2.56994039]
[ 1. 3.78611876 -3.80634854 2.15975074]]
[[-1. -1.05781089 1.88097581 2.56994039]
[-1. -1.05781089 1.71296713 2.56994039]
[-1. -1.05781089 1.3061504 2.56994039]]
Note that the random genes are initialized randomly from values within the range specified by the 2 parameters init_range_low
and init_range_high
. If the type of mutation is random, then the random value added to the gene is sampled from the range specified by the 2 parameters random_mutation_min_val
and random_mutation_max_val
. Moreover, the type of the random value is determined according to the gene_type
parameter. Finally, if the mutation_by_replacement
is set to True
, then the random value will not be added but replaces the gene. Note that these parameters only affect the genes that have their space set to None
.
The next code forces the gene initial value to be between 10 and 20, exclusive. The mutation random range is from 30 to 40, exclusive. The gene_type
is set to int
.
ga_instance = pygad.GA(num_generations=1000,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
init_range_low=10,
init_range_high=20,
random_mutation_min_val=30,
random_mutation_max_val=40,
gene_space=[[-1, 1], None, None, None],
gene_type=int)
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[ 1. 16. 14. 10.]
[-1. 12. 16. 14.]
[-1. 17. 19. 13.]]
[[-1. 12. 16. 48.]
[-1. 15. 26. 14.]
[-1. 12. 16. 14.]]
Binary Genetic Algorithm
In the Use the mutation_by_replacement Parameter section, PyGAD supported the binary genetic algorithm by using the following parameters.
init_range_low=random_mutation_min_val=0
.init_range_high=random_mutation_max_val=2
.mutation_by_replacement=True
.gene_type=int
.
It is also possible to support the binary GA by using the gene_space
parameter. This is by setting this parameter to the global space [0, 1]
. This means all genes have their values either 0 or 1.
The next code sets the gene_space
parameter to [0, 1]
. This forces the values of all genes to be either 0 or 1.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
sol_per_pop=3,
num_genes=4,
fitness_func=fitness_function,
gene_space=[0, 1])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[1 1 1 0]
[0 1 0 0]
[1 1 1 0]]
[[0 1 1 0]
[0 1 0 1]
[0 1 0 0]]
User-Defined Initial Population
Sometimes the user might want to start with a custom initial population without any randomization. PyGAD supports a parameter named initial_population
that allows the user to specify a custom initial population.
The next code prepares assigns a nested list to the initial_population
parameter in which there are 3 solutions where each solution has 4 genes. In this case, the num_genes
and sol_per_pop
parameters are not needed as they will be deduced from the value assigned to the initial_population
parameter.
ga_instance = pygad.GA(num_generations=10,
num_parents_mating=2,
fitness_func=fitness_function,
initial_population=[[34, 32, 24, -2],
[3, 7, 2, 7],
[-2, -4, -6, 1]])
ga_instance.run()
print(ga_instance.initial_population)
print(ga_instance.population)
[[34 32 24 -2]
[ 3 7 2 7]
[-2 -4 -6 1]]
[[3 7 2 6]
[3 7 2 7]
[3 7 2 7]]
Conclusion
This tutorial used the PyGAD library to work with both the binary and decimal representations of the genetic algorithm. The tutorial discussed the different parameters in PyGAD to allow the user to control how the initial population is created in addition to controlling the mutation operation.
Using the gene_type
parameter, the gene values can be either floats or integers. The mutation_by_replacement
parameter is used to keep the genes in their initial range. The initial_population
parameter accepts a user-defined initial population.
The gene_space
parameter helps in case the gene values do not follow a sequence. In this case, the discrete gene values are fed as a list. This parameter accepts a custom value space for each gene. Moreover, it allows some genes to be sampled from a defined space and others to be selected randomly.