This tutorial is from a 7 part series on Dimension Reduction:
- Understanding Dimension Reduction with Principal Component Analysis (PCA)
- Diving Deeper into Dimension Reduction with Independent Components Analysis (ICA)
- Multi-Dimension Scaling (MDS)
(This post assumes you have a working knowledge of neural networks. A notebook with the code is available at github repo)
An autoencoder can be defined as a neural network whose primary purpose is to learn the underlying manifold or the feature space in the dataset. An autoencoder tries to reconstruct the inputs at the outputs. Unlike other non-linear dimension reduction methods, the autoencoders do not strive to preserve to a single property like distance(MDS), topology(LLE). An autoencoder generally consists of two parts an encoder which transforms the input to a hidden code and a decoder which reconstructs the input from hidden code. A simple example of an autoencoder would be something like the neural network shown in the diagram below.
One might wonder "what is the use of autoencoders if the output is same as input? How does feature learning or dimension reduction happen if the end result is same as input?".
The assumption behind autoencoders is that the transformation
input --> hidden --> input will help us learn important properties of the dataset. The properties which we aim to learn in turn depend upon the restrictions put on the network.
Types of AutoEncoders
Let's discuss a few popular types of autoencoders.
- Regularized Autoencoders: These types of autoencoders use various regularization terms in their loss functions to achieve desired properties.
The size of the hidden code can be greater than input size.
1.1 Sparse AutoEncoders - A sparse autoencoder adds a penalty on the sparsity of the hidden layer. Regularization forces the hidden layer to activate only some of the hidden units per data sample. By activation, we mean that If the value of jth hidden unit is close to 1 it is activated else deactivated. The output from a deactivated node to the next layer is zero. This restriction forces the network to condense and store only important features of the data. The loss function of the sparse autoencoders can be represented as
L(W, b) = J(W,b) + regularization term
The middle layer represents the hidden layer. The green and red nodes represent the deactivated and activated nodes respectively.
1.2 Denoising Autoencoders: In denoising autoencoders, a random noise is deliberately added to the input and network is forced to reconstruct the unadulterated input. The decoder function learns to resist small changes in the input. This pretraining result in a robust neural network which is immune to noise in input up to a certain extent.
The standard normal function is used as the noising function to produce the corrupted input.
1.3 Contractive autoencoders: Instead of adding noise to input contractive autoencoders add a penalty on the large value of derivative of the feature extraction function. A small value of feature extraction function( f(x) ) derivative results in a negligible change in features when changes in the input are insignificant. In contractive encoders, feature extraction function is robust while in denoising encoders decoder function is robust.
2. Variational AutoEncoders: The variational autoencoders are based on nonlinear latent variable models. In a latent variable model, we assume that observable x are generated from hidden variables y. These hidden variables y contain important properties about the data. These autoencoders consist of two neural networks first for learning the latent variable distribution and second for generating the observables from a random sample obtained from latent variables distribution. Apart from minimizing the reconstruction loss these autoencoders also minimize the difference between the assumed distribution of latent variables and distribution resulting from the encoder. They are highly popular for generating images.
A good choice for latent variables distribution is gaussian distribution. As shown in the image above encoder outputs the parameters of the assumed gaussian. Next, a random sample is extracted from the gaussian distribution and decoder reconstructs the input from the random sample.
3.Undercomplete Autoencoders: The size of hidden layer is smaller than the input layer in undercomplete autoencoders. By reducing the hidden layer size we force the network to learn the important features of the dataset. Once the training phase is over decoder part is discarded and the encoder is used to transform a data sample to feature subspace. If the decoder transformation is linear and loss function is MSE(mean squared error) the feature subspace is same as that of PCA. For a network to learn something useful the size of the hidden code should not be close to or greater than input size network. Also, a network with high capacity(deep and highly nonlinear ) may not be able to learn anything useful. Dimension reduction methods are based on the assumption that dimension of data is artificially inflated and its intrinsic dimension is much lower. As we increase the number of layers in an autoencoder the size of the hidden layer will have to decrease. If the size of the hidden layer becomes smaller than the intrinsic dimension of the data and it will result in loss of information. The decoder could learn to map the hidden layer to specific inputs since the number of layers is large and it is highly nonlinear.
image of a multiplayer encoder and decoder. A simple autoencoder is shown below.
Loss function of the undercomplete autoencoders is given by:
L(x, g(f(x))) = (x - g(f(x)))2
Since this post is on dimension reduction using autoencoders, we will implement undercomplete autoencoders on pyspark.
There are few open source deep learning libraries for spark. E.g. bigdl from intel, tensorflowonspark by yahoo and spark deep learning from databricks .
We will be using intel's bigdl.
step1 install bigdl
If you have already installed spark run
pip install --user bigdl --no-deps else run
pip install --user bigdl. In latter case pip will install pyspark along with bigdl.
step2. Necessary imports
%matplotlib inline import numpy as np import datetime as dt import matplotlib.pyplot as plt from matplotlib.pyplot import imshow # some imports from bigdl from bigdl.nn.layer import * from bigdl.nn.criterion import * from bigdl.optim.optimizer import * from bigdl.util.common import * from bigdl.dataset.transformer import * from pyspark import SparkContext sc=(SparkContext.getOrCreate( conf=create_spark_conf(). setMaster("local")> set("spark.driver.memory","2g"))) # function to initialize the bigdl library init_engine()
step3. Load and Prepare the data
# bigdl provides a nice function for # downloading and reading mnist dataset from bigdl.dataset import mnist mnist_path = "mnist" images_train, labels_train = mnist.read_data_sets(mnist_path, "train") # mean and stddev of the pixel values mean = np.mean(images_train) std = np.std(images_train) # parallelize, center and scale the images_train rdd_images = (sc.parallelize(images_train). map(lambda features: (features - mean)/std)) print("total number of images ",rdd_images.count())
step3 Create the function for model
# Parameters for training BATCH_SIZE = 100 NUM_EPOCHS = 2 # Network Parameters SIZE_HIDDEN = 32 # shape of the input data SIZE_INPUT = 784 # function for creating an autoencoder def get_autoencoder(hidden_size, input_size): # Initialize a sequential type container module = Sequential() # create encoder layers module.add(Linear(input_size, hidden_size)) module.add(ReLU()) # create decoder layers module.add(Linear(hidden_size, input_size)) module.add(Sigmoid()) return(module)
step4 Set up the deep learning graph
undercomplete_ae = get_autoencoder( SIZE_HIDDEN, SIZE_INPUT) # transform dataset to rdd(Sample) from rdd(ndarray). # Sample represents a record in the dataset. A sample # consists of two tensors a features tensor and a label tensor. # In our autoencoder features and label will be same train_data = (rdd_images.map(lambda x: Sample.from_ndarray(x.reshape(28*28), x.reshape(28*28)))) # Create an Optimizer optimizer = Optimizer( model = undercomplete_ae, training_rdd = train_data, criterion = MSECriterion(), optim_method = Adam(), end_trigger = MaxEpoch(NUM_EPOCHS), batch_size = BATCH_SIZE) # write summary app_name='undercomplete_autoencoder-'+dt.datetime.now().strftime("%Y%m%d-%H%M%S") train_summary = TrainSummary(log_dir='/tmp/bigdl_summary', app_name=app_name) optimizer.set_train_summary(train_summary) print("logs to saved to ",app_name)
step5 Train the model
# run training process trained_UAE = optimizer.optimize()
step6 Model performance on test data
# let's check our model performance on the test data (images, labels) = mnist.read_data_sets(mnist_path, "test") rdd_test = (sc.parallelize(images). map(lambda features: ((features - mean)/std).reshape(28*28)).map( lambda features: Sample. from_ndarray(features, features))) examples = trained_UAE.predict(rdd_test).take(10) f, a = plt.subplots(2, 10, figsize=(10, 2)) for i in range(10): a[i].imshow(np.reshape(images[i], (28, 28))) a[i].imshow(np.reshape(examples[i], (28, 28)))
As we can see from the image the reconstructions are very close to the original inputs.
Conclusion: Through this post, we discussed how autoencoders can be used for dimension reduction. In the beginning, we talked about different types of autoencoders and their purpose. Later on, we implemented an undercomplete autoencoder using intel's bigdl and pyspark. For more tutorials on bigdl visit bigdl tutorials
This post concludes our series of posts on dimension reduction.