Show simple item record

dc.contributor.authorVihman, Virve-Anneli
dc.contributor.authorPilvik, Maarja-Liisa
dc.contributor.authorMandel, Aive
dc.contributor.authorKängsepp, Annika
dc.contributor.authorAigro, Mari
dc.contributor.authorKoreinik, Kadri
dc.contributor.authorPraakli, Kristiina
dc.contributor.authorLindström, Liina
dc.coverage.spatialEstoniaen
dc.date.accessioned2024-02-22T09:37:20Z
dc.date.available2024-02-22T09:37:20Z
dc.date.issued2023
dc.identifier.urihttps://datadoi.ee/handle/33/596
dc.identifier.urihttps://doi.org/10.23673/re-455
dc.description.abstractEstonian Teen Language Corpus (Eesti teismeliste keele korpus) is a corpus representing spoken and written language data, collected from Estonian teenagers (ages 9-18) between 2019-2023. The corpus consists of four types of files. Spoken language data is represented by .eaf and .tsv files (spoken_eaf.zip, spoken_tsv.zip), and contain transcriptions of recordings made of teenagers' spontaneous speech, where one participant recorded a conversation between themselves and another person or several other people. Transcriptions are annotated on different linguistic tiers, including words, morphology, language, etc (see teke_spoken_metadata.txt). The corpus version 1.0 contains transcriptions of 116 conversations, most around one hour in length. The corpus can be used for addressing various linguistic research questions, as well as training various language technological applications (e.g. speech recognition, dialogue systems). Written language data is made up of online chats between two teenagers (ages 10-17). Chats are represented by .tsv and .html files (chat_html.zip, chat_tsv.zip). The corpus version 1.0 includes 110 chats. Annotation includes language tags and abbreviations. All personal information has been anonymised. Estonian Teen Language Corpus is a product of several consequtive projects, which are further described here: https://teismelistekeel.ee/.en
dc.description.abstractTo access the corpus, please write to Virve Vihman ([email protected]).
dc.formatTSVen
dc.formatHTMLen
dc.formatEAFen
dc.formatTXTen
dc.formatCSVen
dc.language.isoeten
dc.publisherInstitute of Estonian and General Linguistics, University of Tartuen
dc.rightsinfo:eu-repo/semantics/restrictedAccessen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectspeech corpusen
dc.subjectchat corpusen
dc.subjectinternet speechen
dc.subjecttranscriptionsen
dc.subjectmorphological analysisen
dc.subjectteenager languageen
dc.titleEstonian Teen Language Corpusen
dc.typeinfo:eu-repo/semantics/dataseten
dc.relation.iscitedby10.1515/lingvan-2021-0152en


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

info:eu-repo/semantics/restrictedAccess
Except where otherwise noted, this item's license is described as info:eu-repo/semantics/restrictedAccess