Self-supervised learning

Self-supervised learning (SSL) is a paradigm in machine learning where a model is trained on a task using the data itself to generate supervisory signals, rather than relying on externally-provided labels. In the context of neural networks, self-supervised learning aims to leverage inherent structures or relationships within the input data to create meaningful training signals. SSL tasks are designed so that solving them requires capturing essential features or relationships in the data. The input data is typically augmented or transformed in a way that creates pairs of related samples, where one sample serves as the input, and the other is used to formulate the supervisory signal. This augmentation can involve introducing noise, cropping, rotation, or other transformations. Self-supervised learning more closely imitates the way humans learn to classify objects.[1]

During SSL, the model learns in two steps. First, the task is solved based on an auxiliary or pretext classification task using pseudo-labels, which help to initialize the model parameters.[2][3] Next, the actual task is performed with supervised or unsupervised learning.[4][5][6]

Self-supervised learning has produced promising results in recent years, and has found practical application in fields such as audio processing, and is being used by Facebook and others for speech recognition.[7]

  1. ^ Bouchard, Louis (25 November 2020). "What is Self-Supervised Learning? | Will machines ever be able to learn like humans?". Medium. Retrieved 9 June 2021.
  2. ^ Doersch, Carl; Zisserman, Andrew (October 2017). "Multi-task Self-Supervised Visual Learning". 2017 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 2070–2079. arXiv:1708.07860. doi:10.1109/iccv.2017.226. ISBN 978-1-5386-1032-9. S2CID 473729.
  3. ^ Beyer, Lucas; Zhai, Xiaohua; Oliver, Avital; Kolesnikov, Alexander (October 2019). "S4L: Self-Supervised Semi-Supervised Learning". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 1476–1485. arXiv:1905.03670. doi:10.1109/iccv.2019.00156. ISBN 978-1-7281-4803-8. S2CID 167209887.
  4. ^ Doersch, Carl; Gupta, Abhinav; Efros, Alexei A. (December 2015). "Unsupervised Visual Representation Learning by Context Prediction". 2015 IEEE International Conference on Computer Vision (ICCV). IEEE. pp. 1422–1430. arXiv:1505.05192. doi:10.1109/iccv.2015.167. ISBN 978-1-4673-8391-2. S2CID 9062671.
  5. ^ Zheng, Xin; Wang, Yong; Wang, Guoyou; Liu, Jianguo (April 2018). "Fast and robust segmentation of white blood cell images by self-supervised learning". Micron. 107: 55–71. doi:10.1016/j.micron.2018.01.010. ISSN 0968-4328. PMID 29425969. S2CID 3796689.
  6. ^ Gidaris, Spyros; Bursuc, Andrei; Komodakis, Nikos; Perez, Patrick Perez; Cord, Matthieu (October 2019). "Boosting Few-Shot Visual Learning with Self-Supervision". 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE. pp. 8058–8067. arXiv:1906.05186. doi:10.1109/iccv.2019.00815. ISBN 978-1-7281-4803-8. S2CID 186206588.
  7. ^ "Wav2vec: State-of-the-art speech recognition through self-supervision". ai.facebook.com. Retrieved 9 June 2021.