Natural language generation

Natural language generation (NLG) is a software process that produces natural language output. A widely-cited survey of NLG methods describes NLG as "the subfield of artificial intelligence and computational linguistics that is concerned with the construction of computer systems that can produce understandable texts in English or other human languages from some underlying non-linguistic representation of information".^[1]

While it is widely agreed that the output of any NLG process is text, there is some disagreement about whether the inputs of an NLG system need to be non-linguistic.^[2] Common applications of NLG methods include the production of various reports, for example weather ^[3] and patient reports;^[4] image captions;^[5] and chatbots like chatGPT.

Automated NLG can be compared to the process humans use when they turn ideas into writing or speech. Psycholinguists prefer the term language production for this process, which can also be described in mathematical terms, or modeled in a computer for psychological research. NLG systems can also be compared to translators of artificial computer languages, such as decompilers or transpilers, which also produce human-readable code generated from an intermediate representation. Human languages tend to be considerably more complex and allow for much more ambiguity and variety of expression than programming languages, which makes NLG more challenging.

NLG may be viewed as complementary to natural-language understanding (NLU): whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a representation into words. The practical considerations in building NLU vs. NLG systems are not symmetrical. NLU needs to deal with ambiguous or erroneous user input, whereas the ideas the system wants to express through NLG are generally known precisely. NLG needs to choose a specific, self-consistent textual representation from many potential representations, whereas NLU generally tries to produce a single, normalized representation of the idea expressed.^[6]

NLG has existed since ELIZA was developed in the mid 1960s, but the methods were first used commercially in the 1990s.^[7] NLG techniques range from simple template-based systems like a mail merge that generates form letters, to systems that have a complex understanding of human grammar. NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.^[8]

^ Reiter, Ehud; Dale, Robert (March 1997). "Building applied natural language generation systems". Natural Language Engineering. 3 (1): 57–87. doi:10.1017/S1351324997001502. ISSN 1469-8110. S2CID 8460470.
^ Gatt A, Krahmer E (2018). "Survey of the state of the art in natural language generation: Core tasks, applications and evaluation". Journal of Artificial Intelligence Research. 61 (61): 65–170. arXiv:1703.09902. doi:10.1613/jair.5477. S2CID 16946362.
^ Cite error: The named reference fog was invoked but never defined (see the help page).
^ Cite error: The named reference portet was invoked but never defined (see the help page).
^ Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010-09-05). Every picture tells a story: Generating sentences from images (PDF). European conference on computer vision. Berlin, Heidelberg: Springer. pp. 15–29. doi:10.1007/978-3-642-15561-1_2.
^ Cite error: The named reference Ehud was invoked but never defined (see the help page).
^ Ehud Reiter (2021-03-21). History of NLG. Archived from the original on 2021-12-12.
^ Perera R, Nand P (2017). "Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature". Computing and Informatics. 36 (1): 1–32. doi:10.4149/cai_2017_1_1. hdl:10292/10691.

[1] Reiter, Ehud; Dale, Robert (March 1997). "Building applied natural language generation systems". Natural Language Engineering. 3 (1): 57–87. doi:10.1017/S1351324997001502. ISSN 1469-8110. S2CID 8460470.

[Gatt-2] Gatt A, Krahmer E (2018). "Survey of the state of the art in natural language generation: Core tasks, applications and evaluation". Journal of Artificial Intelligence Research. 61 (61): 65–170. arXiv:1703.09902. doi:10.1613/jair.5477. S2CID 16946362.

[fog-3] Cite error: The named reference fog was invoked but never defined (see the help page).

[portet-4] Cite error: The named reference portet was invoked but never defined (see the help page).

[farhadi-5] Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010-09-05). Every picture tells a story: Generating sentences from images (PDF). European conference on computer vision. Berlin, Heidelberg: Springer. pp. 15–29. doi:10.1007/978-3-642-15561-1_2.

[Ehud-6] Cite error: The named reference Ehud was invoked but never defined (see the help page).

[ehud-history-7] Ehud Reiter (2021-03-21). History of NLG. Archived from the original on 2021-12-12.

[8] Perera R, Nand P (2017). "Recent Advances in Natural Language Generation: A Survey and Classification of the Empirical Literature". Computing and Informatics. 36 (1): 1–32. doi:10.4149/cai_2017_1_1. hdl:10292/10691.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]