THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
Department of Computer Science
Department of Electrical and Electronic Engineering
Human Language Technology Center
Word Sense Disambiguation vs. Statistical Machine Translation:
PhD Qualifying Examnination
Date : 21 November 2005 (Monday)
Time : 12:00-14:00
Venue : Rm 4480 (Lifts 25-26)
Dr. Dekai Wu (Supervisor)
Dr. Dit-Yan Yeung (Chairperson)
Dr. Brian Mak
Dr. Pascale Fung (EEE)
We propose to empirically demonstrate that dedicated word sense disambiguation (WSD) systems are useful to statistical machine translation (SMT), and directly investigate the related issues raised by the WSD vs. SMT debate.
WSD, the task of resolving sense ambiguity to identify the right translation of a word is one of the major challenges faced by language translation systems. If the English word "drug" translates into French as either "drogue" (used as a narcotic) or "medicament" (used as a medicine), then an English-French machine translation system needs to disambiguate every use of "drug" in order to make the correct translations.
Heavy effort has been put in designing and evaluating dedicated WSD models, in particular with the Senseval series of workshops. This is partly motivated by the often unstated assumption that any full translation system, to achieve full performance, will sooner or later have to incorporate individual WSD components. However, in most machine translation architectures, in particular SMT, the WSD problem is typically not explicitly addressed. This paradoxical situation encouraged speculation that SMT models are already very good at WSD and that current WSD systems have nothing to offer to state-of-the-art SMT.
We propose to directly address these issues by conducting an empirical investigation of the WSD vs. SMT debate. A critical survey of both WSD and SMT literature shows that current SMT systems already benefit from some WSD insights. But it is still unclear whether the new state-of-the-art WSD models can actually help improve translation quality.
We will first introduce the HKUST WSD system, which achieves the best known performance on the Senseval-3 Chinese lexical sample task, among other desirable properties for our study. Then we will present empirical results suggesting that while typical SMT models cannot disambiguate word translations as well as dedicated WSD systems, simple methods for incorporating WSD predictions do not help translation quality. Based on error analysis, we will suggest new directions to incorporate WSD predictions in SMT.
Marine Carpuat is a PhD student in computer science at HKUST, where she is a member of the Human Language Technology Center. Her research interests include natural language processing and statistical machine translation. She received a MPhil in electrical engineering from HKUST and graduated from the French Grande Ecole d'ingénieurs Supélec in 2002.
For enquiries, please call 2358-7008 or visit our website at http://www.cs.ust.hk/~hltc/seminars.html.
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Hong Kong
http://www.cs.ust.hk/~hltcLast updated: 2005.11.21 Dekai Wu