dc.contributor.author | Vihman, Virve-Anneli | |
dc.contributor.author | Pilvik, Maarja-Liisa | |
dc.contributor.author | Mandel, Aive | |
dc.contributor.author | Kängsepp, Annika | |
dc.contributor.author | Aigro, Mari | |
dc.contributor.author | Koreinik, Kadri | |
dc.contributor.author | Praakli, Kristiina | |
dc.contributor.author | Lindström, Liina | |
dc.coverage.spatial | Estonia | en |
dc.date.accessioned | 2024-02-22T09:37:20Z | |
dc.date.available | 2024-02-22T09:37:20Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | https://datadoi.ee/handle/33/596 | |
dc.identifier.uri | https://doi.org/10.23673/re-455 | |
dc.description.abstract | Estonian Teen Language Corpus (Eesti teismeliste keele korpus) is a corpus representing spoken and written language data, collected from Estonian teenagers (ages 9-18) between 2019-2023. The corpus consists of four types of files. Spoken language data is represented by .eaf and .tsv files (spoken_eaf.zip, spoken_tsv.zip), and contain transcriptions of recordings made of teenagers' spontaneous speech, where one participant recorded a conversation between themselves and another person or several other people. Transcriptions are annotated on different linguistic tiers, including words, morphology, language, etc (see teke_spoken_metadata.txt). The corpus version 1.0 contains transcriptions of 116 conversations, most around one hour in length. The corpus can be used for addressing various linguistic research questions, as well as training various language technological applications (e.g. speech recognition, dialogue systems).
Written language data is made up of online chats between two teenagers (ages 10-17). Chats are represented by .tsv and .html files (chat_html.zip, chat_tsv.zip). The corpus version 1.0 includes 110 chats. Annotation includes language tags and abbreviations. All personal information has been anonymised.
Estonian Teen Language Corpus is a product of several consequtive projects, which are further described here: https://teismelistekeel.ee/. | en |
dc.description.abstract | To access the corpus, please write to Virve Vihman ([email protected]). | |
dc.format | TSV | en |
dc.format | HTML | en |
dc.format | EAF | en |
dc.format | TXT | en |
dc.format | CSV | en |
dc.language.iso | et | en |
dc.publisher | Institute of Estonian and General Linguistics, University of Tartu | en |
dc.rights | info:eu-repo/semantics/restrictedAccess | en |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | speech corpus | en |
dc.subject | chat corpus | en |
dc.subject | internet speech | en |
dc.subject | transcriptions | en |
dc.subject | morphological analysis | en |
dc.subject | teenager language | en |
dc.title | Estonian Teen Language Corpus | en |
dc.type | info:eu-repo/semantics/dataset | en |
dc.relation.iscitedby | 10.1515/lingvan-2021-0152 | en |