Tutorial

Automatic Hyperparameter Optimization With Keras Tuner

Learn how to utilize the search algorithms of Keras Tuner to automatically get the best hyperparameters for Tensorflow models.

2 years ago • 9 min read

By Samuel Ozechi

Bring this project to life

Run on Gradient

Hyperparameters are configurations that determine the structure of machine learning models and control their learning processes. They shouldn't be confused with the model's parameters (such as the bias) whose optimal values are determined during training.

Hyperparameters are adjustable configurations that are manually set and tuned to optimize the model performance. They are top-level parameters whose values contribute to determining the weights of the model parameters. The two main types of hyperparameters are the model hyperparameters (such as the number and units of layers) which determine the structure of the model and the algorithm hyperparameters (such as the optimization algorithm and learning rate), which influences and controls the learning process.

Some standard hyperparameters for training neural nets include:

1. Number of hidden layers

2. Number of units for hidden layers

3. The dropout rate - A single model can be used to simulate having a large number of different network architectures by randomly dropping out nodes during training

4. Activation function (Relu, Sigmoid, Tanh) - defines the output of that node given an input or set of inputs

5. Optimization algorithm (Stochastic Gradient descent, Adam Optimizer, RMSprop, e.t.c) - tools for updating model parameters and minimizing the value of the loss function, as evaluated on the training set.

6. Loss function - a measurement of how good your model is in terms of predicting the expected outcome

7. Learning rate - controls how much to change the model in response to the estimated error each time the model weights are updated

8. Number of training iterations (epochs) - the number times that the learning algorithm will work through the entire training dataset.

9. Batch size - this hyperparameter of gradient descent that controls the number of training samples to work through before the model's internal parameters are updated.

When building machine learning models, hyperparameters are set to guide the training process. Depending on the performance of the model after initial training, these values are repeatedly adjusted to improve the model, until a combination of values that produces the best results is chosen. The process of adjusting hyperparameters to obtain the right set of values that optimizes the performance of machine learning models is known as Hyperparameter Tuning.

Tuning hyperparameters could be challenging in deep learning. This is mainly due to the different configurations that need to be rightly set, several trials of re-adjusting these values to improve the performance and the poor results that arise from setting sub-optimal values for the hyperparameters. In practice, these values are usually set and fine-tuned based on certain inferences such as the general principles for specific problems (e.g using the softmax activation function for multiclass classification), prior experience from building models (e.g progressively reducing the units of hidden layers by a factor of 2), domain knowledge and size of the input data (building simpler networks for smaller dataset).

Even with this understanding, it is still difficult to come up with perfect values for these hyperparameters. Practitioners often determine the best hyperparameters using a trial and error approach. This is done by initializing the values based on their understanding of the problem, and then instinctively adjusting the values on several training trials according to the model’s performance before choosing the final values with the best performance for the model.

Manually fine-tuning hyperparameters this way is often laborious, time-consuming, sub-optimal and inefficient for managing computing resources. An alternative approach is to utilize scalable hyperparameter search algorithms such as Bayesian optimization, Random search and Hyperband. Keras Tuner is a scalable Keras framework that provides these algorithms built-in for hyperparameter optimization of deep learning models. It also provides an algorithm for optimizing Scikit-Learn models.

In this article, we will learn how to use various functions of the Keras Tuner to perform an automatic search for optimal hyperparameters. The task is to use the Keras Tuner to obtain optimal hyperparameters for building a model that accurately classifies the images of the CIFAR-10 dataset.

Bring this project to life

Run on Gradient

1. Setup.

Using Keras Tuner requires the installation of the Tensorflow and Keras Tuner packages and importing the required libraries for building our model.
KerasTuner requires Python 3.6+ and TensorFlow 2.0+. These come pre-installed on Gradient Machines.

# install required packages
pip install tensorflow
pip install keras_tuner

# import required packages
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Dense, Flatten, Convolution2D, BatchNormalization
from tensorflow.keras.layers import ReLU, MaxPool2D, AvgPool2D, GlobalAvgPool2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model
import keras_tuner as kt
from sklearn.model_selection import train_test_split

2. Load and Prepare the Dataset.

We will load the CIFAR-10 dataset that contains 50,000 training and 10,000 test images of 10 object classes. You can read more about the dataset here. We also normalize the image pixel values to have similar data distribution and simplify the training.

A preprocessed dataset version is preloaded into the Keras dataset module for easy access and use.

2.1 Load the dataset and normalize the image pixel values.

# load the CIFAR-10 dataset from keras
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()


# Normalize the image pixel values
img_train = x_train.astype('float32') / 255.0
img_test = x_test.astype('float32') / 255.0

# split the train data into train and validation sets
x_train, y_train, x_val, y_val = train_test_split(x_train, y_train,                                                             test_size=0.25)

3. Building a Hypermodel.

Now that we have the setup and prepared our input data, we can build our model for hypertuning. This is done using Keras Tuner to define a search model (known as hypermodel) which is then passed to a tuner for hypertuning.

Hypermodels are either defined by creating a custom model builder function, utilizing the built-in models or subclassing the Tuner class for advanced use cases.
We will be using the first two approaches to create search models for autotuning our hyperparameters.

3.a. Using a Custom Model.

To use a custom model, we will define a model-building function by defining the layers we need, tailor the search space for finding the best parameters and define a default value for the hyperparameters when we are not tuning them.

3.a.1 Define A Model-Building Function.

The function takes a parameter (hp) which instantiates the Hyperparameter object of Keras Tuner and is used to define the search space for the hyperparameter values. We will also compile and return the hypermodel for use. We will be using the Keras functional model pattern for building our model.

# function to build an hypermodel
# takes an argument from which to sample hyperparameters
def build_model(hp):

  inputs = Input(shape = (32, 32, 3)) #input layer
  x = inputs

  # iterate a number of conv blocks from min_value to max_value
  # tune the number of filters
  # choose an optimal value from min_value to max_value
  for i in range(hp.Int('conv_blocks',min_value = 3, max_value = 5, default=3)): # Int specifies the dtype of the values
    filters = hp.Int('filters_' + str(i),min_value = 32,max_value = 256, step=32) 

    for _ in range(2):
      # define the conv, BatchNorm and activation layers for each block
      x = Convolution2D(filters, kernel_size=(3, 3), padding= 'same')(x)
      x = BatchNormalization()(x)
      x = ReLU()(x)

    # choose an optimal pooling type
    if hp.Choice('pooling_' + str(i), ['avg', 'max']) == 'max': # hp.Choice chooses from a list of values
        x = MaxPool2D()(x)
    else:
        x = AvgPool2D()(x)

  x = GlobalAvgPool2D()(x) # apply GlobalAvG Pooling

  # Tune the number of units in the  Dense layer
  # Choose an optimal value between min_value to max_value
  x = Dense(hp.Int('Dense units',min_value = 30, max_value = 100, step=10, default=50), activation='relu')(x)
  outputs = Dense(10, activation= 'softmax')(x) # output layer 
  
  # define the model
  model = Model(inputs, outputs)

  # Tune the learning rate for the optimizer
  # Choose an optimal value frommin_value to max_value
  model.compile(optimizer= Adam(hp.Float('learning_rate', min_value = 1e-4, max_value =1e-2, sampling='log')), 
                loss= 'sparse_categorical_crossentropy', metrics = ['accuracy'])
  return model

Understanding the code.

Line 3: We define a model building function (build_model) and pass a parameter (hp) which instantiates the Hyperparameter object of the Keras Tuner package, this is utilized for defining the search space for the hyperparameter values.

Line 5-6: We define our input layer and pass it to a variable (x)

Line 11: We define a search space for the number of convolution blocks for our model. We use the hp.Int function to create an integer hyperparameter search space. This creates a search space from min_value + 1 to max value. This will search through a space of 4 and 5 convolution blocks for the optimum value that maximizes accuracy.

Line 12: We define a search space for the number of filters for each convolutional layer in a block. A step of 32 increases the filter units by 32 for successive convolution layers.

Line 14-24: We define a set of three layers for each block. Each sub-layer applies convolution, batch normalization and ReLU activation to the input. The hp.Choice function for the pooling layer randomly chooses one of the supplied pooling to apply to the input. We then pass the predefined filter search space to the convolution layer.

Line 26: We apply Global average pooling and a dense layer with a search space from min_value to max_value and a step of 10. We also define the output layer with a softmax activation.

Line 34-40: Finally we define the model using the input and output layers, compile the model and return the built hypermodel.

For compiling the model we define a learning rate search space with the hp.Float function which creates a search space from 0.0001 to 0.002 for selecting the optimal learning rate.

3.a.2 Initializing The Search Algorithm (Tuner).

After building the Hypermodel, we can now initialize our search algorithm. We will have to choose from the built-in search algorithms, such as Bayesian Optimization, Hyperband, and Random Search, for classical machine learning models.

We will be using the Hyperband search algorithm for our example. The tuner function takes parameters such as the hypermodel, an objective metric for evaluating the model, the max_epochs for training, the number of hyperband_iterations for each model, and a directory for saving the training logs (which can be visualized with Tensorboard) and the project_name.

# initialize tuner to run the model.
# using the Hyperband search algorithm
tuner = kt.Hyperband(
    hypermodel = build_model,
    objective='val_accuracy',
    max_epochs=30,
    hyperband_iterations=2,
    directory="Keras_tuner_dir",
    project_name="Keras_tuner_Demo")

3.b. Using Built-in Models.

Keras Tuner currently provides two tunable built-in models, the HyperResnet and HyperXception models which search through different combinations for the Resnet and Xception architectures respectively. Defining the tuner using built-in models is similar to using the model building function.

# Initialize a random search tuner
# using the Resnet architecture
# and the Random Search algorithm
tuner = kt.tuners.RandomSearch(
  kt.applications.HyperResNet(input_shape=(32, 32, 3), classes=10),
  objective='val_accuracy',
  max_trials=30)

4. Run the Search for Optimal Hyperparameters.

We can then use our tuner to search for the optimal hyperparameters for the model within the defined search space. The method is similar to fitting a model using Keras.

# Run the search
tuner.search(x_train, y_train,
             validation_data= (x_test,y_test), 
             epochs=30,
             callbacks=[tf.keras.callbacks.EarlyStopping(patience=2)])

5. Get and Display the Optimal Hyperparameters and Model.

The best hyperparameters for the model within the defined search space can be gotten using the get_best_hyperparameters method of the tuner instance and the best model using the get_best_models method.

# Get the optimal hyperparameters
best_hps= tuner.get_best_hyperparameters(1)[0]

# get the best model
best_model = tuner.get_best_models(1)[0]

We can also view the best hyperparameters. In our example, we can achieve this thus:

nblocks = best_hps.get('conv_blocks')
print(f'Number of conv blocks: {nblocks}')
for hyparam in [f'filters_{i}' for i in range(nblocks)] + [f'pooling_{i}' for i in range(nblocks)] + ['Dense units'] + ['learning_rate']:
    print(f'{hyparam}: {best_hps.get(hyparam)}')

This displays the optimal values for the number of convolution blocks, filters and units for the convolution and dense layers, choices of pooling layer and the learning rate.

We can also view the summary and structure of the optimal model using the appropriate Keras functions.

# display model structure
plot_model(best_model, 'best_model.png', show_shapes=True)

# show model summary
best_model.summary()

6. Training The Model.

Finally, we will build a model using the optimal hyperparameters before calling the fit function for training the model.

# Build the model with the optimal hyperparameters
# train the model.
model = tuner.hypermodel.build(best_hps)
model.fit(x_train, y_train, 
          validation_data= (x_val,y_val), 
          epochs= 25,
           callbacks=[tf.keras.callbacks.EarlyStopping(patience=5)])

Here I train the model for 50 epochs and added an EarlyStopping callback to stop training when the model is no longer improving.

6. Evaluate The Model.

We can evaluate the model on the test set. We will be evaluating the model using the loss and accuracy score of the model. You can try out other metrics as applicable.

# evaluate the result
eval_result = model.evaluate(x_test, y_test)
print(f"test loss: {eval_result[0]}, test accuracy: {eval_result[1]}")

Summary.

Hyperparameters are key determinants for the performance of machine learning models and tuning them with a trial and error approach is inefficient. Keras Tuner applies search algorithms to automatically find the best hyperparameters in a defined search space.

In this article, we utilized the Keras Tuner to determine the best hyperparameters for a multiclass classification task. We were able to define a search space in an hypermodel using our custom model and built-in models before leveraging the provided search algorithms to automatically search through several values and combinations in finding an optimal combination of hyperparameters for our model.

You can check out the Keras Tuner guide for guides on visualizing the tuning process on Tensorboard, distributing the hypertuning process, tailoring the search space and subclassing the Tuner class for advanced use cases.

Add speed and simplicity to your Machine Learning workflow today

Get started

Blog

Docs

Community

ML Showcase

Professional Services

Talk to an Expert

Model Interpretability and Understanding for PyTorch using Captum

Generating images with Stable Diffusion

Solutions

Product

Resources

Company