Article

Building a Turkish UCCA dataset

Details

Citation

Bölücü N & Can B (2025) Building a Turkish UCCA dataset. Can Buglalilar B (Supervisor) Natural Language Processing, 31 (1), pp. 111-149. https://doi.org/10.1017/nlp.2024.36

Abstract
it to a logical form that can be processed and understood by machines. It is utilised by many applications in natural language processing (NLP), particularly in tasks relevant to natural language understanding(NLU). Due to the widespread use of semantic parsing in NLP, many semantic representation schemes with different forms have been proposed; Universal Conceptual Cognitive Annotation (UCCA) is one of them. UCCA is a cross-lingual semantic annotation framework that allows easy annotation without requiring substantial linguistic knowledge. UCCA-annotated datasets have been released so far for English, French, German, Russian, and Hebrew. In this paper, we present a UCCA-annotated Turkish dataset of 400 sentences that are obtained from the METU-Sabanci Turkish Treebank. We provide the UCCA annotation specifications defined for the Turkish language so that it can be extended further. We followed a semiautomatic annotation approach, where an external semantic parser is utilised for the initial annotation of the dataset, which is manually revised by two annotators. We used the same semantic parser model to evaluate the dataset with zero-shot and few-shot learning, demonstrating that even a small sample set from the target language in the training data has a notable impact on the performance of the parser (15.6% and 2.5% gain over zero-shot for labelled and unlabelled results, respectively).

Keywords
Universal Conceptual Cognitive Annotation; UCCA; Semantic representation; METU-Sabanci Turkish Treebank; dataset

Journal
Natural Language Processing: Volume 31, Issue 1

StatusPublished
ContributorDr Burcu Can Buglalilar
FundersUniversity of Stirling
Publication date31/01/2025
Publication date online31/08/2024
Date accepted by journal04/05/2024
URLhttp://hdl.handle.net/1893/37466
PublisherCambridge University Press (CUP)
ISSN2977-0424
eISSN2977-0424

People (1)

Dr Burcu Can Buglalilar

Dr Burcu Can Buglalilar

Lecturer in Computing Science, Computing Science

Files (1)