This tutorial is from a 7 part series on Dimension Reduction:
  1. Understanding Dimension Reduction with Principal Component Analysis (PCA)
  2. Diving Deeper into Dimension Reduction with Independent Components Analysis (ICA)
  3. Multi-Dimension Scaling (MDS)
  4. LLE
  5. t-SNE
  6. IsoMap
  7. Autoencoders

(A jupyter notebook with math and code(spark) is available on github repo )

Isomap stands for isometric mapping. Isomap is a non-linear dimensionality reduction method based on the spectral theory which tries to preserve the geodesic distances in the lower dimension. Isomap starts by creating a neighborhood network. After that, it uses graph distance to the approximate geodesic distance between all pairs of points. And then, through eigenvalue decomposition of the geodesic distance matrix, it finds the low dimensional embedding of the dataset. In non-linear manifolds, the Euclidean metric for distance holds good if and only if neighborhood structure can be approximated as linear. If neighborhood contains holes, then Euclidean distances can be highly misleading. In contrast to this, if we measure the distance between two points by following the manifold, we will have a better approximation of how far or near two points are. Let's understand this with an extremely simple 2-D example. Suppose our data lies on a circular manifold in a 2-D structure like in the image below.

Why geodesic distances are better than the Euclidean distances in nonlinear manifolds?

We will reduce the data to 1-D using the Euclidean distances and approximate geodesic distances. Now, if we look at the 1-D mapping based on the Euclidean metric, we see that for points which are far apart(a & b) have been mapped poorly. Only the points which can be approximated to lie on a linear manifold(c & d) give satisfactory results. On the other hand, see the mapping with geodesic distances, it nicely approximates the close points as neighbors and far away points as distant.
The geodesic distances between two points in the image are approximated by graph distance between the two points. Thus, Euclidean distances should not be used for approximating the distance between two points in non-linear manifolds while geodesic distances can be used.

Isomap uses the above principle to create a similarity matrix for eigenvalue decomposition. Unlike other non-linear dimensionality reduction like LLE & LPP which only use local information, isomap uses the local information to create a global similarity matrix. The isomap algorithm uses euclidean metrics to prepare the neighborhood graph. Then, it approximates the geodesic distance between two points by measuring shortest path between these points using graph distance. Thus, it approximates both global as well as the local structure of the dataset in the low dimensional embedding.

Let's have a basic understanding of few concepts which we need to implement the Isomap algorithm.
Pregel API - Pregel is a distributed programming model developed by Google for processing large-scale graphs. It is the inspiration behind the Apache giraph project and GraphX library of spark. Pregel is basically a message-passing interface based on the idea that a vertex's state should depend on its neighbors. A pregel computation takes as input a graph and a set of vertex states. At every iteration which is called superstep, it processes messages received at a vertex and updates the vertex state. After that, it decides which of its neighbors should receive the message at next superstep and what should be the message. Thus, messages are passed along edges and computation happens only at the vertices. The graph is not passed across the network only messages. Computation stops at maximum iterations or when no messages are left to pass. Let's understand it using a simple example. Suppose, We need to find the degree of each vertex for the graph given below. Image shown below represents a single iteration of pregel model.

At initialization, every vertex's degree is 0. We can send an empty message as an initial message to start the computation. At the end of superstep 1, Each vertex sends message 1 through each of its edges. At next superstep, each vertex sums the messages received and update its degree.

Classical MDS - Isomap is closely related to the original multidimensional scaling algorithm proposed by the Torgerson and Gower. In fact, it is an extension of the classical multidimensional scaling. The Classical multidimensional algorithm gives a closed form solution to the dimensionality reduction problem. Classical MDS uses the euclidean distances as the similarity metric while isomap uses geodesic distances. Steps of classical MDS are

  1. Create a matrix of squared dissimilarities Δ2(X) from the given X.
  2. Obtain the matrix B by double centring the dissimilarity matrix B = −0.5*(J*Δ2*J)
  3. Compute the eigenvalue decomposition of matrix B, BΔ=QΛQ‘
  4. Choose the K eigenvectors having K highest eigenvalues.

Steps of IsoMap:
Isomap differs from classical MDS in initial few steps only. Instead of using euclidean metric for dissimilarity, it uses graph distances. Steps of the Isomap algorithm are:

  1. Neighbourhood graph: Create a neighborhood graph and adjacency matrix from the dataset.

  2. Dissimilarity Matrix: After neighborhood search, we will use spark's graphX library for calculating the geodesic distances between the points. While creating our neighborhood network, we have to make sure that the resulting graph is a single connected component. If not, then our similarity matrix will remain incomplete and results will be incoherent. We need to iterate over the different values of neighborhood selection parameter to get the fully connected graph. As of now, spark does not have the shortest path function for the weighted graph. We will have to implement it. The code below presents a shortest path algorithm using pregel like api of graphX. The code is from the shortest path function for unweighted graphs of graphX lib. Functions addMaps and sendMessage have been modified to support weighted graphs.

    def ShortestPath(Verts: RDD[(VertexId, imMap[Long, Double])], 
              Edges: RDD[Edge[Double]], landmarks: Seq[Long] = Seq()): 
                                                     Graph[imMap[Long,Double],Double] = {
    
     
         val g = Graph(Verts, Edges)
    
         type SPMap = Map[VertexId, Double]
    
    
         def makeMap(x: (VertexId, Double)*) = Map(x: _*)
    
         def incrementMap(spmap1: SPMap, spmap2: SPMap, d: Double): SPMap = {
                 spmap1.map { case (k, v) => 
                     if (v + d < spmap2.getOrElse(k, Double.MaxValue)) k -> (v + d)
                     else -1L -> 0.0
     
                 }
     
             }
    
         def addMaps(spmap1: SPMap, spmap2: SPMap): SPMap = {
             (spmap1.keySet ++ spmap2.keySet).map {
                  k => k -> math.min(spmap1.
                   getOrElse(k, Double.MaxValue), 
                   spmap2.getOrElse(k, 
                   Double.MaxValue))
                  }(collection.breakOut)
              }
     
         var spGraph: Graph[imMap[Long,Double],Double]  = null
     
         if (landmarks.isEmpty){
             spGraph = g.mapVertices { (vid, attr) => makeMap(vid -> 0)}
         }
         else{
             spGraph = g.mapVertices { (vid, attr) => 
                      if (landmarks.contains(vid)) makeMap(vid -> 0) else makeMap()}
             }                                      
     
         val initialMessage = makeMap()
    
         def vertexProgram(id: VertexId, attr: SPMap, msg: SPMap): SPMap = {
                 addMaps(attr, msg)
         }
    
         def sendMessage(edge: EdgeTriplet[SPMap, Double]): 
                Iterator[(VertexId, SPMap)] = {
    
             val newAttr = incrementMap(edge.srcAttr, edge.dstAttr, edge.attr) - (-1)
    
             if (!newAttr.isEmpty) Iterator((edge.dstId, newAttr))
         else Iterator.empty
    
         }
    
         val h = Pregel(spGraph, initialMessage)(vertexProgram, sendMessage, addMaps)
    
         return(h)
     }
    

Eigenvalue decomposition: Before eigenvalue decomposition, we have to square the distance and double center the squared similarities matrix. After eigenvalue decomposition select the first K eigenvectors with K highest eigenvalues.

The plot of a subset of MNIST dataset after isomap dimension reduction.

What we have implemented is the vanilla version isomap. It requires a lot of time and computing power. It has two bottlenecks first Calculation of dissimilarity matrix requires O(N2) operations where N is the number of the samples and the second calculation of pairwise graph distances. If N is huge, which is true generally in case of big datasets, it becomes impractical. Solution to this problem is Landmark Isomap. Landmark isomap is based on landmark MDS. Landmark MDS selects a group of points termed as Landmarks and implements classical MDS on them. Based on the mapping obtained from classical MDS, remaining points are mapped in the low dimensional embedding using distance-based triangulation.
Steps for Landmark classical scaling

  1. Selects landmarks points Xlandmarks
  2. Apply classical MDS on landmarks points and obtain low dimensional emebedding Lk
  3. calculate δu where δui is mean of ith row of dissimilarity matrix of landmark points.
  4. Given a vector xa calculate δa where δai is the squared distance between the point xa and the landmark point i
  5. Low dimensional embedding for the xa is given by ya=0.5*L-1ka−δu) where L-1k is the penrose moore inverse of the Lk
    Selection of landmark points can be random or through a specific method. For obtaining a K-dimensional embedding at least K+1 landmark points are needed. For reasons related to the stability of the algorithm, number of landmark points chosen should be more than strict minimum. The accuracy of isometric mapping in landmark isomap does not suffer much due to approximation in the algorithm.

Drawbacks of Isomaps: Isomap performs poorly when manifold is not well sampled and contains holes. As mentioned earlier neighborhood graph creation is tricky and slightly wrong parameters can produce bad results.

Conclusion: In this article, we discussed another manifold learning algorithm IsoMap(Isometric Mapping).
In the beginning of the post, we talked about what is isometric mapping and how it is different from other dimension reduction algorithms. Then, we had a brief discussion on pregel API. Later on, we implemented an isomap algorithm in scala using spark's GraphX library. For people wishing to go deeper in distributed graph processing GraphX is a good starting point.
Next post in this series will be on Autoencoders