Isomap: use the geodesic distances on manifold between all pairs.
Three step algorithm:
Using KNN or \varepsilon -radius distance to construct neighborhood graph.
Using Floyd's algorithm to compute the shortest paths between every pair of nodes.
Minimize the cost function to construct d-dimensional embedding.
LLE: locally linear embedding, recovers global nonlinear structure from locally linear fits.
Main idea: represent each data point as a weight-sum function of its neighbour point.
Input: D dimension data.
Output: d<D dimension data.
Three step algorithm:
Find the neighbour point of each data point.
Minimize a reconstruct error function with normalize constrains to decide the neighbour weights.
Find embedded coordinates by minize a global reconstruct error energy function with constrains.
Main idea:relies on a random walk representation over the points
Algorithm consists of three steps
a nearest neighbor graph
similarity weights on the edges
transition probablility matrix
properties of the random walk
if we start from i0 the distribution of points it that we end up in agter t steps is given by
Main idea: The main idea is to define k centroids, one for each cluster. These centroids shoud be placed in a cunning way because of different location causes different result.
Algorithm consists of the following steps
1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
2. Assign each object to the group that has the closest centroid.
3. When all objects have been assigned, recalculate the positions of the K centroids.
4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
Although it can be proved that the procedure will always terminate, the k-means algorithm does not necessarily find the most optimal configuration, corresponding to the global objective function minimum. The algorithm is also significantly sensitive to the initial randomly selected cluster centres. The k-means algorithm can be run multiple times to reduce this effect.
K-means is a simple algorithm that has been adapted to many problem domains. As we are going to see, it is a good candidate for extension to work with fuzzy feature vectors.