SIGTYP Lecture


Morphologically-rich languages challenge neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter approach has shown significant benefits for translating morphologically-rich languages, although practical applications are still limited due to increased requirements in terms of model capacity. In this talk, we present an overview of recent approaches to NMT developed for translating morphologically-rich languages and open challenges related to their future deployment.

Aug 13, 2021 10:00 AM — 12:00 PM
Duygu Ataman
Duygu Ataman
Assistant Professor of Computer Science