State of the art embeddings are based on the learned hidden layer representation of dedicated sentence transformer models. BERT pioneered an approach involving the use of a dedicated [CLS] token prepended to the beginning of each sentence inputted into the model; the final hidden state vector of this token encodes information about the sentence and can be fine-tuned for use in sentence classification tasks. In practice however, BERT's sentence embedding with the [CLS] token achieves poor performance, often worse than simply averaging non-contextual word embeddings. SBERT later achieved superior sentence embedding performance[8] by fine tuning BERT's [CLS] token embeddings through the usage of a siamese neural network architecture on the SNLI dataset.
Other approaches are loosely based on the idea of distributional semantics applied to sentences. Skip-Thought trains an encoder-decoder structure for the task of neighboring sentences predictions. Though this has been shown to achieve worse performance than approaches such as InferSent or SBERT.
An alternative direction is to aggregate word embeddings, such as those returned by Word2vec, into sentence embeddings. The most straightforward approach is to simply compute the average of word vectors, known as continuous bag-of-words (CBOW).[9] However, more elaborate solutions based on word vector quantization have also been proposed. One such approach is the vector of locally aggregated word embeddings (VLAWE),[10] which demonstrated performance improvements in downstream text classification tasks.
^Wu, Ledell; Fisch, Adam; Chopra, Sumit; Adams, Keith; Bordes, Antoine; Weston, Jason (2017). "StarSpace: Embed All the Things!". arXiv:1709.03856 [cs.CL].
^Sanjeev Arora, Yingyu Liang, and Tengyu Ma. "A simple but tough-to-beat baseline for sentence embeddings.", 2016; openreview:SyK00v5xx.
^Trifan, Mircea; Ionescu, Bogdan; Gadea, Cristian; Ionescu, Dan (2015). "A graph digital signal processing method for semantic analysis". 2015 IEEE 10th Jubilee International Symposium on Applied Computational Intelligence and Informatics. pp. 187–192. doi:10.1109/SACI.2015.7208196. ISBN978-1-4799-9911-8. S2CID17099431.
^Basile, Pierpaolo; Caputo, Annalina; Semeraro, Giovanni (2012). "A Study on Compositional Semantics of Words in Distributional Spaces". 2012 IEEE Sixth International Conference on Semantic Computing. pp. 154–161. doi:10.1109/ICSC.2012.55. ISBN978-1-4673-4433-3. S2CID552921.