Manifold regularization

Manifold regularization can classify data when labeled data (black and white circles) are sparse, by taking advantage of unlabeled data (gray circles). Without many labeled data points, supervised learning algorithms can only learn very simple decision boundaries (top panel). Manifold learning can draw a decision boundary between the natural classes of the unlabeled data, under the assumption that close-together points probably belong to the same class, and so the decision boundary should avoid areas with many unlabeled points. This is one version of semi-supervised learning.

In machine learning, Manifold regularization is a technique for using the shape of a dataset to constrain the functions that should be learned on that dataset. In many machine learning problems, the data to be learned do not cover the entire input space. For example, a facial recognition system may not need to classify any possible image, but only the subset of images that contain faces. The technique of manifold learning assumes that the relevant subset of data comes from a manifold, a mathematical structure with useful properties. The technique also assumes that the function to be learned is smooth: data with different labels are not likely to be close together, and so the labeling function should not change quickly in areas where there are likely to be many data points. Because of this assumption, a manifold regularization algorithm can use unlabeled data to inform where the learned function is allowed to change quickly and where it is not, using an extension of the technique of Tikhonov regularization. Manifold regularization algorithms can extend supervised learning algorithms in semi-supervised learning and transductive learning settings, where unlabeled data are available. The technique has been used for applications including medical imaging, geographical imaging, and object recognition.