In two previous tutorials we saw an introduction to Cython, a language that mainly defines static data types to the variables used in Python. This boosts the performance of Python scripts, resulting in dramatic speed increases. For example, when applied to NumPy arrays, Cython completed the sum of 1 billion numbers 1250 times faster than Python.

This tutorial builds upon what we discussed previously to speed-up the execution of a project that implements the genetic algorithm (GA) in Python. The base project is available on GitHub. We'll inspect the code and follow the instructions discussed in the previous two tutorials to make as many changes as possible to boost performance, and the run the generations in significantly less time compared to Python.

We'll begin by downloading the GitHub project. Then we'll look at cythonizing each part of the genetic algorithm; the fitness function, mating pool, crossover, and mutation. We'll also see how to implement different NumPy functions in C-speed, and will conclude the post with the final implementation of the full code and a comparison if its comparison with Python.

Note that you do not need to know the genetic algorithm to complete this tutorial; we will go over each part of it, and all you need to do is cythonize the Python code regardless of whether it is the genetic algorithm or something else. If you do want more details about how the genetic algorithm works, see my other posts on LinkedIn (with implementation on GitHub):

Let's get started.

Bring this project to life

# Downloading and Using the GitHub Project

The Python implementation of the genetic algorithm is available at this GitHub page. The project has two files. The first is the *ga.py* file, which implements the genetic algorithm operations including:

- Fitness function calculation using the
`cal_pop_fitness()`

function - Mating pool using the
`select_mating_pool()`

function - Crossover using the
`crossover()`

function (single point crossover is implemented) - Mutation using the
`mutation()`

function (just a single gene has its value updated)

The second file is named *Example_GeneticAlgorithm.py. *We look at a basic example of optimizing the following equation, where **x** is a random input vector with 6 elements:

`y = w1*x1 + w2*x2 + w3*x3 + w4*x4 + w5*x5 + 6w*x6`

The *Example_GeneticAlgorithm.py *script prepares the initial population and loops through the generations. In each generation, the functions listed above in *ga.py* are called.

Throughout this tutorial, we are going to inspect the implementation of both the *ga.py* and *Example_GeneticAlgorithm.py* scripts and see what we can change to reduce the computational time. By just running the project and removing all print statements (which are very time consuming), the Python code takes around 1.46 seconds to go through 10,000 generations (run on Core i7-6500U CPU @ 2.5 GHz, with 16 GB DDR3 RAM).

Let's start with the *ga.py* file.

# Cythonizing Functions Inside ga.py

Inside the *ga.py *file, the first function is `cal_pop_fitness()`

. This calculates the fitness value for each individual in the population. It is the first step in the GA.

## Fitness Function

The `cal_pop_fitness()`

function accepts two arguments: a vector with 6 values (**x1** to **x6** in the equation above), and the population for which the fitness values will be calculated. The population consists of individuals, and the length of each individual is 6 (because there are 6 weights, **w1** to **w6**, for the 6 inputs **x1** to **x6**). If, for example, there are 8 individuals, then the size of the array holding the population is 8 x 6. In other words, it is a 2D-array (or a matrix).

The function calculates a fitness value for each individual by summing the products between each of the 6 weights for each individual and the 6 equation inputs. The function then returns the fitness values for all individuals as a vector.

```
def cal_pop_fitness(equation_inputs, pop):
fitness = numpy.sum(pop*equation_inputs, axis=1)
return fitness
```

How can we Cythonize this? According to the four tips stated in the previous tutorial about using Cython and NumPy, the first step is to process the NumPy array inside a function β this is already the case. After defining the function, all we need to do is define the data type of the arguments, return data type, the data type of the local variables defined within the function (optionally, we can also disable unnecessary features such as bounds checking). Here is the new function after making these changes:

```
import numpy
cimport numpy
import cython
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=1]
cal_pop_fitness(numpy.ndarray[numpy.double_t, ndim=1] equation_inputs, numpy.ndarray[numpy.double_t, ndim=2] pop):
cdef numpy.ndarray[numpy.double_t, ndim=1] fitness
fitness = numpy.sum(pop*equation_inputs, axis=1)
return fitness
```

Outside the function, Cython is used to call several decorators which disable three features: wrap around (as we no longer use negative indices), checking for None values, and bounds checking. Note that we only disabled bounds checking because we are sure that no index will exceed the bounds.

Generally, we can define a function in Cython in three ways:

`def`

: defines a function that works at Python speed, and thus is a bit slow. The`def`

`def`

can be called inside or outside the Cython/Python script.`cdef`

: this can only be defined within a Cython script, and cannot be called from a Python script. It works faster than a function defined using`def`

.`cpdef`

: this gives the advantages of both`def`

and`cdef`

. The function can be defined only inside a Cython script, but can be called from a Cython or Python script.`cpdef`

is fast as`cdef`

.

Because we may use all of the functions defined inside the Cython script from a Python script, we will use the `cpdef`

keyword for defining all functions.

Exactly after cpdef, the return data type of the function is set to `numpy.ndarray[numpy.double_t, ndim=1]`

. This means the function will return a variable of type `numpy.ndarray`

. The type of the elements inside the array is also set to double using `numpy.double_t`

. Finally, the number of dimensions is set to 1 using the ndim argument because a 1D-array (vector) is returned. Note that if there is a mismatch between the number of dimensions defined in the return type and the actual data returned, an exception will be thrown.

Next, the data types of the two arguments are defined. All of them are `numpy.ndarray`

and the elements type is `double`

. The first argument has one dimension, while the second argument has two.

Now the function header is completely defined. Inside the function there is a single local variable, the *fitness* vector.** **It is defined in the same way as the first function argument. Finally, the 1-D array is returned.

At this point the `cal_pop_fitness()`

is Cythonized; it is not readable as Python, but it is now faster.

## Mating Pool

The next function, `select_mating_pool()`

, is implemented in Python as follows:

```
def select_mating_pool(pop, fitness, num_parents):
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = numpy.where(fitness == numpy.max(fitness))
max_fitness_idx = max_fitness_idx[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = -99999999999
return parents
```

The Cython version is below. You can easily understand the Cython function as it does not differ much from the Python version. This function returns the mating pool, which consists of more than one individual. As a result, the returned array is 2D and therefore ndim is set to 2 in the return data type. There are 6 local variables in the function, each defined using the cdef keyword. Just note that slicing and indexing for the NumPy arrays is done the same as in Python. Loopipng through the array also uses indexing, which is the faster way to do so.

```
import numpy
cimport numpy
import cython
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] select_mating_pool(numpy.ndarray[numpy.double_t, ndim=2] pop, numpy.ndarray[numpy.double_t, ndim=1] fitness, int num_parents):
cdef numpy.ndarray[numpy.double_t, ndim=2] parents
cdef int parent_num, max_fitness_idx, min_val, max_fitness
min_val = -999999
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = numpy.where(fitness == numpy.max(fitness))[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = min_val
return parents
```

## Crossover

The next function is `crossover()`

, defined below in Python.

```
def crossover(parents, offspring_size):
offspring = numpy.empty(offspring_size)
crossover_point = numpy.uint8(offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
```

The Cython version is as follows. Note that the `wraparound()`

** **decorator is set to True because negative indexing is required here. Note also that the type of the offspring_size argument is tuple, so you must provide this argument as such. Any mismatch will cause an error.

Because the `crossover_point`

local variable is defined as an integer variable, we use `numpy.uint8()`

to enforce this and prevent any errors. The remaining part of the function is exactly the same as in Python. Note that there are still some changes to be made later, where we'll replace some time-consuming operations by others that take less time.

```
import numpy
cimport numpy
import cython
@cython.wraparound(True)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] crossover(numpy.ndarray[numpy.double_t, ndim=2] parents, tuple offspring_size):
cdef numpy.ndarray[numpy.double_t, ndim=2] offspring
offspring = numpy.empty(offspring_size)
cdef int k, parent1_idx, parent2_idx
cdef numpy.int_t crossover_point
crossover_point = numpy.uint8(offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
```

## Mutation

The last function in the *ga.py* file is `mutation()`

, shown here in Python:

```
def mutation(offspring_crossover, num_mutations=1):
mutations_counter = numpy.uint8(offspring_crossover.shape[1] / num_mutations)
for idx in range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
for mutation_num in range(num_mutations):
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, gene_idx] = offspring_crossover[idx, gene_idx] + random_value
gene_idx = gene_idx + mutations_counter
return offspring_crossover
```

The cythonized version is below. It follows the steps we've seen before: disabling unused features, using `cpdef`

rather than `def`

, and declaring the data types for the arguments, return values, and local variables. Because negative indexing is not required, it is disabled for this function.

```
import numpy
cimport numpy
import cython
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] mutation(numpy.ndarray[numpy.double_t, ndim=2] offspring_crossover, int num_mutations=1):
cdef int idx, mutation_num, gene_idx
cdef double random_value
cdef Py_ssize_t mutations_counter
mutations_counter = numpy.uint8(offspring_crossover.shape[1] / num_mutations)
for idx in range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
for mutation_num in range(num_mutations):
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, gene_idx] = offspring_crossover[idx, gene_idx] + random_value
gene_idx = gene_idx + mutations_counter
return offspring_crossover
```

We've finished cythonizing *ga.py*! The new full code is listed below. Just save this code into a file named *ga.pyx*** **and we will build it in the *Building the .pyx Files* section using the *setup.py* file.

```
import numpy
cimport numpy
import time
import cython
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=1] cal_pop_fitness(numpy.ndarray[numpy.double_t, ndim=1] equation_inputs, numpy.ndarray[numpy.double_t, ndim=2] pop):
cdef numpy.ndarray[numpy.double_t, ndim=1] fitness
fitness = numpy.sum(pop*equation_inputs, axis=1)
return fitness
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] select_mating_pool(numpy.ndarray[numpy.double_t, ndim=2] pop, numpy.ndarray[numpy.double_t, ndim=1] fitness, int num_parents):
cdef numpy.ndarray[numpy.double_t, ndim=2] parents
cdef int parent_num, max_fitness_idx, min_val, max_fitness, a
min_val = -99999999999
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = numpy.where(fitness == numpy.max(fitness))[0][0]
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = min_val
return parents
@cython.wraparound(True)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] crossover(numpy.ndarray[numpy.double_t, ndim=2] parents, tuple offspring_size):
cdef numpy.ndarray[numpy.double_t, ndim=2] offspring
offspring = numpy.empty(offspring_size)
cdef int k, parent1_idx, parent2_idx
cdef numpy.int_t crossover_point
crossover_point = numpy.uint8(offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] mutation(numpy.ndarray[numpy.double_t, ndim=2] offspring_crossover, int num_mutations=1):
cdef int idx, mutation_num, gene_idx
cdef double random_value
cdef Py_ssize_t mutations_counter
mutations_counter = numpy.uint8(offspring_crossover.shape[1] / num_mutations)
for idx in range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
for mutation_num in range(num_mutations):
random_value = numpy.random.uniform(-1.0, 1.0, 1)
offspring_crossover[idx, gene_idx] = offspring_crossover[idx, gene_idx] + random_value
gene_idx = gene_idx + mutations_counter
return offspring_crossover
```

The second file, *Example_GeneticAlgorithm.py*, calls the functions defined in the *ga.py *file. Let's finish cythonizing this second file before we get our GA running.

# Cythonizing Example_GeneticAlgorithm.py

The Python implementation of the *Example_GeneticAlgorithm.py*** **file is as follows. The time module is imported so we can compare the performance with Python compared to Cython.

```
import numpy
import ga
import time
equation_inputs = [4,-2,3.5,5,-11,-4.7]
num_weights = len(equation_inputs)
sol_per_pop = 8
num_parents_mating = 4
pop_size = (sol_per_pop,num_weights)
new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)
best_outputs = []
num_generations = 10000
t1 = time.time()
for generation in range(num_generations):
fitness = ga.cal_pop_fitness(equation_inputs, new_population)
best_outputs.append(numpy.max(numpy.sum(new_population*equation_inputs, axis=1)))
parents = ga.select_mating_pool(new_population, fitness,
num_parents_mating)
offspring_crossover = ga.crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
offspring_mutation = ga.mutation(offspring_crossover, num_mutations=2)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
t2 = time.time()
t = t2-t1
print("Total Time %.20f" % t)
```

The cythonized code is listed below. The *ga* module is imported as a regular Python module. All you have to do is to declare the data type of all variables used. Just take care to match passed variables with the types accepted by the functions edited previously.

```
import ga
import numpy
cimport numpy
import time
cdef numpy.ndarray equation_inputs, parents, new_population, fitness, offspring_crossover, offspring_mutation
cdef int num_weights, sol_per_pop, num_parents_mating, num_generations
cdef tuple pop_size
cdef double t1, t2, t
equation_inputs = numpy.array([4,-2,3.5,5,-11,-4.7])
num_weights = equation_inputs.shape[0]
num_weights = equation_inputs.shape[0]
num_parents_mating = 4
sol_per_pop = 8
num_parents_mating = 4
pop_size = (sol_per_pop, num_weights)
new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)
num_generations = 10000
t1 = time.time()
for generation in range(num_generations):
fitness = ga.cal_pop_fitness(equation_inputs, new_population)
parents = ga.select_mating_pool(new_population, fitness,
num_parents_mating)
offspring_crossover = ga.crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
offspring_mutation = ga.mutation(offspring_crossover, num_mutations=2)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
t2 = time.time()
t = t2-t1
print("Total Time %.20f" % t)
```

We are just able to assign the `numpy.ndarray`

data type to the NumPy variables and nothing more. We cannot specify the number of dimensions or the elements' data type because these features are not yet supported by Cython. If the code was wrapped into a function, however, then we could define everything and speed up the processing. We will do exactly this further on.

For now, just save the Cython code into a file named *Example_GeneticAlgorithm.pyx*, which will be built along with the *ga.pyx* file.

# Building The .pyx Files

The next step is to build the *.pyx*** **files to generate the *.pyd***/***.so*** **files to be imported in the project. The *setup.py*** **file used for this purpose is listed below. Because there are two *.pyx* files to be built, the `cythonize()`

** **function is not given an explicit name but asked to build all files with *.pyx*** **extension.

```
import distutils.core
import Cython.Build
import numpy
distutils.core.setup(
ext_modules = Cython.Build.cythonize("*.pyx"),
include_dirs=[numpy.get_include()]
)
```

In order to build the files, issue the command below from the command line.

`python setup.py build_ext --inplace`

After the command completes successfully, we can just import the *Example_GeneticAlgorithm.pyx* file using the following command. This will run the code automatically.

`import Example_GeneticAlgorithm`

The Cython code takes **0.945** seconds to complete. Compare this to **1.46** seconds for the Python code; Cython is **1.55 **times faster (note that the code is running on a machine with Core i7-6500U CPU @ 2.5 GHz and 16 GB DDR3 RAM).

To further reduce the time we can make a simple edit: use a function to wrap the contents of the *Example_GeneticAlgorithm.pyx*** **file.

# Evolving Generations Within a Function vs. Script Body

Let's create a function named `optimize()`

within *Example_GeneticAlgorithm.pyx*, and put the contents of this file within our new function:

```
import ga
import numpy
cimport numpy
import time
import cython
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef optimize():
cdef numpy.ndarray equation_inputs, parents, new_population, fitness, offspring_crossover, offspring_mutation
cdef int num_weights, sol_per_pop, num_parents_mating, num_generations
cdef list pop_size
cdef double t1, t2, t
equation_inputs = numpy.array([4,-2,3.5,5,-11,-4.7])
num_weights = equation_inputs.shape[0]
sol_per_pop = 8
num_weights = equation_inputs.shape[0]
num_parents_mating = 4
pop_size = [sol_per_pop,num_weights]
#Creating the initial population.
new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)
num_generations = 1000000
t1 = time.time()
for generation in range(num_generations):
fitness = cal_pop_fitness(equation_inputs, new_population)
parents = select_mating_pool(new_population, fitness,
num_parents_mating)
offspring_crossover = crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
offspring_mutation = mutation(offspring_crossover, num_mutations=2)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
t2 = time.time()
t = t2-t1
print("Total Time %.20f" % t)
print(cal_pop_fitness(equation_inputs, new_population))
```

In order to call the `optimize()`

function, just rebuild the Cython *.pyx* files and issue the following Python commands from the command line:

```
import Example_GeneticAlgorithm
Example_GeneticAlgorithm.optimize()
```

This now takes **0.944** rather than **0.945** seconds; almost no change at all. One reason is due to calling the external module *ga*** **for each function needed. Instead, we'll save the function call by copying and pasting the `optimize()`

function inside the *ga.pyx* file. Because the functions are part of the same file, there is less overhead in calling them.

Because the `optimize()`

function is now part of the *ga.pyx *file, we no longer need the *Example_GeneticAlgorithm.pyx* file. You can edit the *setup.py* file to specify that just the *ga.pyx*** **file is to be built.

The commands below are used to call the **optimize() **function. The time is now 0.9 seconds rather than **0.944** and thus the Cython code is now **1.62** times faster than Python.

```
import ga
ga.optimize()
```

Now all of the code has been Cythonized, but still more can be done to improve the speed. Let's see how we can use C functions, rather than Python functions β this will give the most drastic speed increase yet.

# Implement Python Features in C Speed

Python makes many things easier to the programmer and this is one of its benefits. But this increases the time in some cases. In this section, we are going to inspect some of the functions that are available in Python but slow and see how to implement them to run in C speed.

## Implementing NumPy sum() in C Speed

Inside the **cal_pop_fitness()** function, the sum of products between each individual and the equation inputs was calculated using the **numpy.sum()** function. We can implement this function manually using 2 for loops according to the code below. Note that the loops runs in C speed. For this reason, the variable **fitness** is declared as **numpy.ndarray **type and initialized as a zeros array using **numpy.zeros()**. The result o f calculating the fitness values are saved in this variable.

```
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef cal_pop_fitness(numpy.ndarray[numpy.double_t, ndim=1] equation_inputs, numpy.ndarray[numpy.double_t, ndim=2] pop):
cdef numpy.ndarray[numpy.double_t, ndim=1] fitness
fitness = numpy.zeros(pop.shape[0])
# fitness = numpy.sum(pop*equation_inputs, axis=1) # slower than looping.
for i in range(pop.shape[0]):
for j in range(pop.shape[1]):
fitness[i] += pop[i, j]*equation_inputs[j]
return fitness
```

After making this edit, we can build the **.pyx** file and see how faster the new code. The new code after using the above function takes just **0.8** seconds. Thus, implementing the **numpy.sum() **function using loops saved **0.1** seconds (**100** milliseconds). Let's think about something else to optimize.

Inside the **select_mating_pool()** function, the index of the maximum element in the fitness array was returned using this line.

`max_fitness_idx = numpy.where(fitness == numpy.max(fitness))[0][0]`

We can edit the function to implement this line in C speed using the loop below. By doing so, the execution time is now 0.44 seconds rather than 0.8 seconds. Compared to Python, Cython is now 3.32 times faster.

```
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] select_mating_pool(numpy.ndarray[numpy.double_t, ndim=2] pop, numpy.ndarray[numpy.double_t, ndim=1] fitness, int num_parents):
cdef numpy.ndarray[numpy.double_t, ndim=2] parents
cdef int parent_num, max_fitness_idx, min_val, max_fitness, a
min_val = -99999999999
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = 0
# numpy.where(fitness == numpy.max(fitness))
for a in range(1, fitness.shape[0]):
if fitness[a] > fitness[max_fitness_idx]:
max_fitness_idx = a
parents[parent_num, :] = pop[max_fitness_idx, :]
fitness[max_fitness_idx] = min_val
return parents
```

## NumPy Array Slicing in C Speed

Slicing just returns a part of the array into another array. We can implement this in Cython for `parents`

and `pop`

in the new function listed below. By doing that, Cython takes just 0.427 seconds rather than 0.44 seconds.

```
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] select_mating_pool(numpy.ndarray[numpy.double_t, ndim=2] pop, numpy.ndarray[numpy.double_t, ndim=1] fitness, int num_parents):
cdef numpy.ndarray[numpy.double_t, ndim=2] parents
cdef int parent_num, max_fitness_idx, min_val, max_fitness, a
min_val = -99999999999
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = 0
# numpy.where(fitness == numpy.max(fitness))
for a in range(1, fitness.shape[0]):
if fitness[a] > fitness[max_fitness_idx]:
max_fitness_idx = a
# parents[parent_num, :] = pop[max_fitness_idx, :] # slower han looping by 20 ms
for a in range(parents.shape[1]):
parents[parent_num, a] = pop[max_fitness_idx, a]
fitness[max_fitness_idx] = min_val
return parents
```

Because slicing is also used in the `crossover()`

** **function, we can edit it to implement array slicing using loops that run at C speed. The new function is below, and takes 0.344 seconds rather than 0.427. These changes might seem minor, but when you're running hundreds or thousands of lines of code, they add up to make a huge impact. At this point, this function runs 4.24 times faster than in Python.

The value assigned to the `crossover_point`

** **variable was converted previously using `numpy.uint8()`

. Now, it is converted in C style using `(int)`

.

```
@cython.wraparound(True)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] crossover(numpy.ndarray[numpy.double_t, ndim=2] parents, tuple offspring_size):
cdef numpy.ndarray[numpy.double_t, ndim=2] offspring
offspring = numpy.empty(offspring_size)
cdef int k, parent1_idx, parent2_idx
cdef numpy.int_t crossover_point
crossover_point = (int) (offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
for m in range(crossover_point):
offspring[k, m] = parents[parent1_idx, m]
for m in range(crossover_point-1, -1):
offspring[k, m] = parents[parent2_idx, m]
# The next 2 lines are slower than using the above loops because they run with C speed.
# offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
# offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
```

## Generating Random Values in C

The `mutation()`

function uses the `numpy.random.uniform()`

** **function to return a random double value that is added to a gene:

`random_value = numpy.random.uniform(-1.0, 1.0, 1)`

We can avoid using this function and generate the random number using the `rand()`

function that is available in the `stdlib`

** **library of C. The implementation for the `mutation()`

** **function thus becomes:

```
from libc.stdlib cimport rand, RAND_MAX
cdef double DOUBLE_RAND_MAX = RAND_MAX # a double variable holding the maximum random integer in C
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] mutation(numpy.ndarray[numpy.double_t, ndim=2] offspring_crossover, int num_mutations=1):
cdef int idx, mutation_num, gene_idx
cdef double random_value
cdef Py_ssize_t mutations_counter
mutations_counter = (int) (offspring_crossover.shape[1] / num_mutations) # using numpy.uint8() is slower than using (int)
cdef double rand_num
for idx in range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
for mutation_num in range(num_mutations):
# random_value = numpy.random.uniform(-1.0, 1.0, 1)
rand_double = rand()/DOUBLE_RAND_MAX
random_value = rand_double * (1.0 - (-1.0)) + (-1.0)
offspring_crossover[idx, gene_idx] = offspring_crossover[idx, gene_idx] + random_value
gene_idx = gene_idx + mutations_counter
return offspring_crossover
```

At first, the `rand()`

function is imported from `stdlib`

so that we have access to this function in C. `rand()`

returns an integer value in the range of 0 to RAND_MAX, which is a constant (its value is at least 32767). Since we want the random number to fall in a range from 0 to 1, we need to divide the returned random value by the maximum possible random integer. We do this by copying RAND_MAX into a double variable named DOUBLE_RAND_MAX, and dividing the random number by this value. The scaled random value is now available in the `rand_double`

** **variable. Β This is then scaled so that it falls in a range of -1 to 1, and is saved in the `random_value`

** **variable.

By generating the random value using the C `rand()`

** **function, Cython now takes just 0.08 seconds (80 milliseconds) to run. Compare that to 0.344 seconds from earlier. This is the biggest difference yet. Now the code runs 18.25 times faster than in Python.

Now that we've made all of our edits, the full *ga.pyx *file looks like this:

```
import numpy
cimport numpy
import time
import cython
from libc.stdlib cimport rand, RAND_MAX
cdef double DOUBLE_RAND_MAX = RAND_MAX # a double variable holding the maximum random integer in C
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef cal_pop_fitness(numpy.ndarray[numpy.double_t, ndim=1] equation_inputs, numpy.ndarray[numpy.double_t, ndim=2] pop):
cdef numpy.ndarray[numpy.double_t, ndim=1] fitness
fitness = numpy.zeros(pop.shape[0])
# fitness = numpy.sum(pop*equation_inputs, axis=1) # slower than looping.
for i in range(pop.shape[0]):
for j in range(pop.shape[1]):
fitness[i] += pop[i, j]*equation_inputs[j]
return fitness
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] select_mating_pool(numpy.ndarray[numpy.double_t, ndim=2] pop, numpy.ndarray[numpy.double_t, ndim=1] fitness, int num_parents):
cdef numpy.ndarray[numpy.double_t, ndim=2] parents
cdef int parent_num, max_fitness_idx, min_val, max_fitness, a
min_val = -99999999999
parents = numpy.empty((num_parents, pop.shape[1]))
for parent_num in range(num_parents):
max_fitness_idx = 0
# numpy.where(fitness == numpy.max(fitness)) # slower than looping by 250 ms.
for a in range(1, fitness.shape[0]):
if fitness[a] > fitness[max_fitness_idx]:
max_fitness_idx = a
# parents[parent_num, :] = pop[max_fitness_idx, :]
for a in range(parents.shape[1]):
parents[parent_num, a] = pop[max_fitness_idx, a]
fitness[max_fitness_idx] = min_val
return parents
@cython.wraparound(True)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] crossover(numpy.ndarray[numpy.double_t, ndim=2] parents, tuple offspring_size):
cdef numpy.ndarray[numpy.double_t, ndim=2] offspring
offspring = numpy.empty(offspring_size)
cdef int k, parent1_idx, parent2_idx
cdef numpy.int_t crossover_point
crossover_point = (int) (offspring_size[1]/2)
for k in range(offspring_size[0]):
parent1_idx = k%parents.shape[0]
parent2_idx = (k+1)%parents.shape[0]
for m in range(crossover_point):
offspring[k, m] = parents[parent1_idx, m]
for m in range(crossover_point-1, -1):
offspring[k, m] = parents[parent2_idx, m]
# The next 2 lines are slower than using the above loops because they run with C speed.
# offspring[k, 0:crossover_point] = parents[parent1_idx, 0:crossover_point]
# offspring[k, crossover_point:] = parents[parent2_idx, crossover_point:]
return offspring
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef numpy.ndarray[numpy.double_t, ndim=2] mutation(numpy.ndarray[numpy.double_t, ndim=2] offspring_crossover, int num_mutations=1):
cdef int idx, mutation_num, gene_idx
cdef double random_value
cdef Py_ssize_t mutations_counter
mutations_counter = (int) (offspring_crossover.shape[1] / num_mutations) # using numpy.uint8() is slower than using (int)
cdef double rand_num
for idx in range(offspring_crossover.shape[0]):
gene_idx = mutations_counter - 1
for mutation_num in range(num_mutations):
# random_value = numpy.random.uniform(-1.0, 1.0, 1)
rand_double = rand()/DOUBLE_RAND_MAX
random_value = rand_double * (1.0 - (-1.0)) + (-1.0)
offspring_crossover[idx, gene_idx] = offspring_crossover[idx, gene_idx] + random_value
gene_idx = gene_idx + mutations_counter
return offspring_crossover
@cython.wraparound(False)
@cython.nonecheck(False)
@cython.boundscheck(False)
cpdef optimize():
cdef numpy.ndarray equation_inputs, parents, new_population, fitness, offspring_crossover, offspring_mutation
cdef int num_weights, sol_per_pop, num_parents_mating, num_generations
cdef list pop_size
cdef double t1, t2, t
equation_inputs = numpy.array([4,-2,3.5,5,-11,-4.7])
num_weights = equation_inputs.shape[0]
sol_per_pop = 8
num_weights = equation_inputs.shape[0]
num_parents_mating = 4
pop_size = [sol_per_pop,num_weights]
#Creating the initial population.
new_population = numpy.random.uniform(low=-4.0, high=4.0, size=pop_size)
num_generations = 10000
t1 = time.time()
for generation in range(num_generations):
fitness = cal_pop_fitness(equation_inputs, new_population)
parents = select_mating_pool(new_population, fitness,
num_parents_mating)
offspring_crossover = crossover(parents,
offspring_size=(pop_size[0]-parents.shape[0], num_weights))
offspring_mutation = mutation(offspring_crossover, num_mutations=2)
new_population[0:parents.shape[0], :] = parents
new_population[parents.shape[0]:, :] = offspring_mutation
t2 = time.time()
t = t2-t1
print("Total Time %.20f" % t)
print(cal_pop_fitness(equation_inputs, new_population))
```

# Conclusion

This tutorial used Cython to reduce the time of execution of a genetic algorithm Python implementation using NumPy. We've brought down our computational time from 1.46 seconds to a mere 0.08 seconds, resulting in an 18x speed increase. As a result, we can do 1 million generations in less than 10 seconds with Cython, compared to 180 seconds in Python.

This same methodology can be used for any code written in Python; inspect it line by line, identify bottlenecks, and reduce computation time by implementing the tricks we saw here. You don't necessarily need to know C, but knowledge of C will clearly help you implement faster workarounds. Even without a deep understanding of C, simple tricks like defining variable types can make a big difference when running long or computationally expensive code.