Medoid

Medoids are representative objects of a data set or a cluster within a data set whose sum of dissimilarities to all the objects in the cluster is minimal.[1] Medoids are similar in concept to means or centroids, but medoids are always restricted to be members of the data set. Medoids are most commonly used on data when a mean or centroid cannot be defined, such as graphs. They are also used in contexts where the centroid is not representative of the dataset like in images, 3-D trajectories and gene expression[2] (where while the data is sparse the medoid need not be). These are also of interest while wanting to find a representative using some distance other than squared euclidean distance (for instance in movie-ratings).

For some data sets there may be more than one medoid, as with medians. A common application of the medoid is the k-medoids clustering algorithm, which is similar to the k-means algorithm but works when a mean or centroid is not definable. This algorithm basically works as follows. First, a set of medoids is chosen at random. Second, the distances to the other points are computed. Third, data are clustered according to the medoid they are most similar to. Fourth, the medoid set is optimized via an iterative process.

Note that a medoid is not equivalent to a median, a geometric median, or centroid. A median is only defined on 1-dimensional data, and it only minimizes dissimilarity to other points for metrics induced by a norm (such as the Manhattan distance or Euclidean distance). A geometric median is defined in any dimension, but unlike a medoid, it is not necessarily a point from within the original dataset.

  1. ^ Struyf, Anja; Hubert, Mia; Rousseeuw, Peter (1997). "Clustering in an Object-Oriented Environment". Journal of Statistical Software. 1 (4): 1–30.
  2. ^ van der Laan, Mark J.; Pollard, Katherine S.; Bryan, Jennifer (2003). "A New Partitioning Around Medoids Algorithm". Journal of Statistical Computation and Simulation. 73 (8). Taylor & Francis Group: 575–584. doi:10.1080/0094965031000136012. S2CID 17437463.