Language resource

In linguistics and language technology, a language resource is a "[composition] of linguistic material used in the construction, improvement and/or evaluation of language processing applications, (...) in language and language-mediated research studies and applications."[1]

According to Bird & Simons (2003),[2] this includes

  1. data, i.e. "any information that documents or describes a language, such as a published monograph, a computer data file, or even a shoebox full of handwritten index cards. The information could range in content from unanalyzed sound recordings to fully transcribed and annotated texts to a complete descriptive grammar",[2]
  2. tools, i.e., "computational resources that facilitate creating, viewing, querying, or otherwise using language data",[2] and
  3. advice, i.e., "any information about what data sources are reliable, what tools are appropriate in a given situation, what practices to follow when creating new data". The latter aspect is usually referred to as "best practices" or "(community) standards".[2]

In a narrower sense, language resource is specifically applied to resources that are available in digital form, and then, "encompassing (a) data sets (textual, multimodal/multimedia and lexical data, grammars, language models, etc.) in machine readable form, and (b) tools/technologies/services used for their processing and management".[1]

  1. ^ a b LD4LT (2020), The Metashare Ontology as Created by the LD4LT Community Group, W3C Community Group Linked Data for Language Technology (LD4LT), Development branch, version of Mar 10, 2020
  2. ^ a b c d Bird, Steven; Simons, Gary (2003-11-01). "Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources". Computers and the Humanities. 37 (4): 375–388. arXiv:cs/0308022. Bibcode:2003cs........8022B. doi:10.1023/A:1025720518994. ISSN 1572-8412. S2CID 5969663.