Turkish Data-To Generation Using Sequence-To Neural Networks

dc.contributor.author Demir, Şeniz
dc.date.accessioned 2023-10-18T12:06:13Z
dc.date.available 2023-10-18T12:06:13Z
dc.date.issued 2023
dc.department Mühendislik Fakültesi, Elektrik Elektronik Mühendisligi Bölümü en_US
dc.description TUBITAK-ARDEB [117E977] en_US
dc.description This work is supported by TUBITAK-ARDEB under the grant number 117E977. en_US
dc.description.PublishedMonth Aralık en_US
dc.description.WoSDocumentType article
dc.description.WoSIndexDate 2023 en_US
dc.description.WoSInternationalCollaboration Uluslararası işbirliği ile yapılmayan - HAYIR en_US
dc.description.WoSPublishedMonth Nisan en_US
dc.description.WoSYOKperiod YÖK - 2022-23 en_US
dc.description.abstract End-to-end data-driven approaches lead to rapid development of language generation and dialogue systems. Despite the need for large amounts of well-organized data, these approaches jointly learn multiple components of the traditional generation pipeline without requiring costly human intervention. End-to-end approaches also enable the use of loosely aligned parallel datasets in system development by relaxing the degree of semantic correspondences between training data representations and text spans. However, their potential in Turkish language generation has not yet been fully exploited. In this work, we apply sequenceto-sequence (Seq2Seq) neural models to Turkish data-to-text generation where the input data given in the form of a meaning representation is verbalized. We explore encoder-decoder architectures with attention mechanism in unidirectional, bidirectional, and stacked recurrent neural network (RNN) models. Our models generate one-sentence biographies and dining venue descriptions using a crowdsourced dataset where all field value pairs that appear in meaning representations are fully captured in reference sentences. To support this work, we also explore the performances of our models on a more challenging dataset, where the content of a meaning representation is too large to fit into a single sentence, and hence content selection and surface realization need to be learned jointly. This dataset is retrieved by coupling introductory sentences of person-related Turkish Wikipedia articles with their contained infobox tables. Our empirical experiments on both datasets demonstrate that Seq2Seq models are capable of generating coherent and fluent biographies and venue descriptions from field value pairs. We argue that the wealth of knowledge residing in our datasets and the insights obtained fromthis study hold the potential to give rise to the development of new end-to-end generation approaches for Turkish and other morphologically rich languages. en_US
dc.description.woscitationindex Science Citation Index Expanded en_US
dc.identifier.citation Demir, S. (2023). Turkish Data-to-Text Generation Using Sequence-to-Sequence Neural Networks. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-27. en_US
dc.identifier.doi 10.1145/3543826
dc.identifier.issn 2375-4699
dc.identifier.issn 2375-4702
dc.identifier.issue 2 en_US
dc.identifier.scopus 2-s2.0-85152906599
dc.identifier.scopusquality Q2
dc.identifier.uri https://hdl.handle.net/20.500.11779/1985
dc.identifier.uri https://doi.org/10.1145/3543826
dc.identifier.volume 22 en_US
dc.identifier.wos WOS:000963394900006
dc.identifier.wosquality Q4
dc.institutionauthor Demir, Şeniz
dc.language.iso en en_US
dc.publisher Assoc Computing Machinery en_US
dc.relation.journal Acm Transactions on Asian and Low-Resource Language Information Processing en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Of-the-art en_US
dc.subject Sequence-to-sequence model en_US
dc.subject Turkish en_US
dc.subject Wikipedia en_US
dc.subject Natural-language generation en_US
dc.subject Data-to-text generation en_US
dc.title Turkish Data-To Generation Using Sequence-To Neural Networks en_US
dc.type Article en_US

Files

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: