Contrastive Language-Image Pre-training

CLIP
Developer(s)OpenAI
Initial releaseJanuary 5, 2021
Repositoryhttps://github.com/OpenAI/CLIP
Written inPython
LicenseMIT License
Websiteopenai.com/research/clip

Contrastive Language-Image Pre-training (CLIP) is a technique for training a pair of neural network models, one for image understanding and one for text understanding, using a contrastive objective.[1] This method has enabled broad applications across multiple domains, including cross-modal retrieval,[2] text-to-image generation,[3] aesthetic ranking,[4] and image captioning.[5]

  1. ^ Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021-07-01). Learning Transferable Visual Models From Natural Language Supervision. Proceedings of the 38th International Conference on Machine Learning. PMLR. pp. 8748–8763.
  2. ^ Hendriksen, Mariya; Bleeker, Maurits; Vakulenko, Svitlana; van Noord, Nanne; Kuiper, Ernst; de Rijke, Maarten (2021). "Extending CLIP for Category-to-image Retrieval in E-commerce". arXiv:2112.11294 [cs.CV].
  3. ^ "Stable Diffusion Repository on GitHub". CompVis - Machine Vision and Learning Research Group, LMU Munich. 17 September 2022. Archived from the original on January 18, 2023. Retrieved 17 September 2022.
  4. ^ LAION-AI/aesthetic-predictor, LAION AI, 2024-09-06, retrieved 2024-09-08
  5. ^ Mokady, Ron; Hertz, Amir; Bermano, Amit H. (2021). "ClipCap: CLIP Prefix for Image Captioning". arXiv:2111.09734 [cs.CV].