Language identification in the limit

Language identification in the limit is a formal model for inductive inference of formal languages, mainly by computers (see machine learning and induction of regular languages). It was introduced by E. Mark Gold in a technical report[1] and a journal article[2] with the same title.

In this model, a teacher provides to a learner some presentation (i.e. a sequence of strings) of some formal language. The learning is seen as an infinite process. Each time the learner reads an element of the presentation, it should provide a representation (e.g. a formal grammar) for the language.

Gold defines that a learner can identify in the limit a class of languages if, given any presentation of any language in the class, the learner will produce only a finite number of wrong representations, and then stick with the correct representation. However, the learner need not be able to announce its correctness; and the teacher might present a counterexample to any representation arbitrarily long after.

Gold defined two types of presentations:

  • Text (positive information): an enumeration of all strings the language consists of.
  • Complete presentation (positive and negative information): an enumeration of all possible strings, each with a label indicating if the string belongs to the language or not.
  1. ^ Gold, E. Mark (1964). Language identification in the limit (RAND Research Memorandum RM-4136-PR). RAND Corporation.
  2. ^ Gold, E. Mark (May 1967). "Language identification in the limit" (PDF). Information and Control. 10 (5): 447–474. doi:10.1016/S0019-9958(67)91165-5.