Intro
Building machine learning models can be a simple process when we have the right data to train our models. An example would be a situation where we need to train a model to be able to predict if some customer will churn. If we had a sufficiently well made dataset, all we would have to do is plug that data into a model, with a simple library like Brains.js or TensorFlow, and add a back-propagation algorithm to optimize our model. With that, we would have a model that is able to take our input data and return an optimized prediction for whether or not a customer would stop buying a product.
But what if we don't have good data, and gathering data is beyond our capabilities? We have to define our targets goal, or what we think an optimal answer would be, and thats where algorithms like genetic algorithms and reinforcement learning come into play. In these paradigms, we are able to define a goal, and our model interacts with the environment it is in to learn from its pervious experiences until it is able to solve its defined task.
Series Recap
So far, we have gone through the basics of the genetic algorithm, its components, and the concepts behind its functioning that enable it to find optimal solutions to simple and hard problems. We did so by using one to guess a phrase by comparing how many letters it got correct, and using those instances closest to being correct to create children instances or offsprings that carry information about our problem space into the next generation. You can check out that tutorial here. In the second article, we were able to use our learnings from the previous article to optimize a deep neural network. We adapted information gotten from our previous tutorial to solve our optimization problem, to add 2 numbers passed into it, and we built our own custom neural network that was able to take advantage of our customization to manipulate the values of weights and biases, instead of doing so through an already made library like TensorFlow.
The goal of this article
For today's article, we are going to go over building the environment that our model would interact with, making changes to that environment, and reading the state of it.
Setting up our Player
We need to have a player object capable of looking at the environment and making decisions based on the state of the environment.
A Naive player
Here is the full code for the Player class we will use in this tutorial:
Our naive player is able to look at the environment, view empty spaces, and randomly pick one of those spaces to draw either an X or an O on. Our naive player does not take advantage of our genetic algorithm model; it isn't able to learn how to play the game and is just choosing random empty spaces to play. We would later give our player the ability to make more educated choices after we set up our genetic algorithm.
Our player class is able to look at the board and declare if they won or not after a move by calling the win() function. Wins can be either vertical, horizontal, or adjacent in nature.
Design for our model
Designing a way of solving this problem can be a bit tricky. We need to figure out how to represent the 3 different states on our board. These 3 states are:
- A space occupied by agent
- A space occupied by an opponent
- An empty space
If we had 2 states, we could conveniently identify one as 1 and the other as 0. But three leaves us with two choices: represent the values in the range -1 to 1 (-1,0,1), or have an array of length 18. In the latter option, indexes 0 to 8 can mark where our agent chose, and indexes 9 to 17 can represent places occupied by our opponent. Let's try the second option for this tutorial. You can see an explanation for how this array would be formed above.
We opt to use ReLu for our hidden layers because they, as usual, outperformed sigmoid (to nobody's surprise).
Here is a visual explanation of our model.
The new version of our genetic algorithm Code
For this article, you need to have already gone through part 1 and part 2 of this series, and ensure you understand the basics. We are going to be using the same code template, and going over changes in the codebase.
The Code
Now that we've looked at the code, let's go through the functions and see what is new in our codebase.
Fitness function
Our Fitness function for making an agent learn to play is very simple. Our objective is to reduce the number of times we lose. If this was gradient descent, our cost function would be an equation that tells us how many times we lose or how many times an agent loses a game. Knowing this we can derive our fitness function to be the ratio between total games played and total games not lost
We consider both draws and wins as equally desirable outcomes for now, and try to optimize for either of them as this optimizes for the lowest loss.
To see how to fit our agent, we need to actually test it against an opponent, and there is no better opponent than other agents we create. Each agent competes with its siblings, and plays both as the first player and the second player. We do this because a tic tac toe game players can either be first or second to make a move in a game.
PlayAGame Function
This function takes in 2 players and is able to simulate the entire gameplay for our agent and its opponent. We do this to see if our agent is better than his opponent. We are keener in preventing our agent from loosing so a draw is okay, but we could optimize for a win by giving wins 1 and draws half(0.5) (Try this out after you finish this tutorial to see how well it works!).
How do we make the agent not play in an occupied box?
An intuition we want to give our agent is one where it knows it cant play a box already occupied by it or its opponent. To train out this behavior, we just flat out declare the game as failed when they make such a move.
Where the move function is able to check to see if our choice to play is occupied or not, this automatically returns a zero (we basically discourage that too).
Completing our player Agent
We need to make some changes to our Player class. Currently, it doesn't think before it plays, it just selects a random space that is available and marks that location. This is not intuitive and imparts no strategy for winning. we need to make our player take advantage of the Model we pass into it. Take a look at the code below.
Think function
Our think function is able to take advantage of our model by taking the state of the environment and creating a single 1∗18 Matrix which we pass into the machine learning model and generates an output (Note: our model uses a softmax activation function; we select the max value as our position to mark by the player).
Move Function
Movement is gotten from the think function. The answer is mapped to a location on the game by think, and the move function can place the player's mark on that location thereby occupying it.
Running our code
To run the code, all we need to do is call the solve function, and you can do so by following the code below:
We tested the codebase with different populations sizes between 100, 200, and 1000, and found the code trains more efficiently with a population size of 100. It optimizes pretty quickly, and we were able to get a top fitness function of 84%. That isn't bad, but it means that our best model only won 84 percent of its games.
Here is an example output:
Conclusion
Following along this tutorial series, we were able to create a machine learning model that optimizes the values of its weights with a genetic algorithm. We started with building understanding about how genetic algorithms work by creating a phrase that matches a known phrase. By doing this we were able to understand how genetic algorithms find answers to problems. In the part 2, we were able to update the weights of a neural network. For this tutorial, we finally applied all our knowledge, and built a model to solve our tic tac toe problem using neural networks and genetic algorithms. We can apply this method in different problem spaces from chess to wall breaker to predicting stock prices.