Language Modeling for Turkish Text and Speech Processing

Loading...
Thumbnail Image

Date

2018

Authors

Arısoy, Ebru

Journal Title

Journal ISSN

Volume Title

Publisher

Springer

Abstract

This chapter presents an overview of language modeling followed by a discussion of the challenges in Turkish language modeling. Sub-lexical units are commonly used to reduce the high out-of-vocabulary (OOV) rates of morphologically rich languages. These units are either obtained by morphological analysis or by unsupervised statistical techniques. For Turkish, the morphological analysis yields word segmentations both at the lexical and surface forms which can be used as sub-lexical language modeling units. Discriminative language models, which outperform generative models for various tasks, allow for easy integration of morphological and syntactic features into language modeling. The chapter provides a review of both generative and discriminative approaches for Turkish language modeling.

Description

Keywords

Language modeling

Turkish CoHE Thesis Center URL

Citation

Arisoy, E. & Saraçlar, M. (2018). Language modeling for Turkish text and speech processing in Turkish Natural Language Processing. pp. 69-92

WoS Q

N/A

Scopus Q

N/A

Source

Turkish Natural Language Processing

Volume

Issue

Start Page

69

End Page

92