Korpusomat

Korpusomat - a tool for creating and searching electronic language corpora, created at the Institute of Computer Science of the Polish Academy of Sciences.

Korpusomat is a fourth generation corpus tool.[1][2] It is a web application, which eliminates the need to store data sets on the user's own computer. The corpus is created either by adding text files from the local drive (in any language[2] and format[3]), or by indicating websites from which texts are to be downloaded.[4] Then, the corpus is annotated automatically on several levels: morphosyntantic, named entities recognition (e.g. geographical names or people) and partial syntantic information (which also allows for the visualization of dependency trees).[2][5][6] The finished corpus can be edited, shared with other users, and searched.[2][5][7] There are also a number of functions offering statistical summaries of the collected texts[2][5]

  1. ^ Laurence Anthony (2013), "A critical look at software tools in corpus linguistics" (PDF), Linguistic Research, vol. 30, no. 2, p. 141-161
  2. ^ a b c d e Karol Saputa; Aleksandra Tomaszewska; Natalia Zawadzka-Paluektau; Witold Kieraś; Łukasz Kobyliński (2023), "Korpusomat. eu: A multilingual platform for building and analysing linguistic corpora" (PDF), International Conference on Computational Science, Springer Nature Switzerland, p. 230-237{{citation}}: CS1 maint: multiple names: authors list (link)
  3. ^ The full list of supported formats is available at: https://tika.apache.org/1.17/formats.html
  4. ^ "Tworzenie korpusu — Korpusomat EU 0.1 - dokumentacja".
  5. ^ a b c Witold Kieraś; Łukasz Kobyliński (2021), "Korpusomat – stan obecny i przyszłość projektu", Język Polski, 101 (2): 49–58, doi:10.31286/JP.101.2.4{{citation}}: CS1 maint: multiple names: authors list (link)
  6. ^ "Korpusomat". CLARIN (Common Language Resources & Technology Infrastructure). Retrieved 2023-10-09.
  7. ^ Andrason, Alexander; Gębka-Wolak, Małgorzata; Moroz, Andrzej (2022). "The rise of the WZIĄĆ (TAKE) Serial Verb Construction in Polish" (PDF). Stellenbosch Papers in Linguistics Plus. 65: 11–36.