HKUST's ideal research environment is situated in beautiful Clear Water Bay, Hong Kong.

HLTC Special Seminar Series

Department of Computer Science
Department of Electrical and Electronic Engineering

Human Language Technology Center

Word Sense Disambiguation vs. Statistical Machine Translation:
PhD Qualifying Examnination


Date :     21 September 2005 (Wednesday)
Time :     12:00-14:00
Venue :   Rm 4480 (Lifts 25-26)

Committee :
Dr. Dekai Wu (Supervisor)
Dr. Brian Mak (Chairperson)
Dr. Dit-Yan Yeung
Dr. Pascale Fung (EEE)


In this survey, we review word sense disambiguation (WSD) and statistical machine translation (SMT) literature in light of the recent WSD vs. SMT debate.

WSD, the task of resolving sense ambiguity to identify the right translation of a word is one of the major challenges faced by language translation systems. If the English word "drug" translates into French as either "drogue" (used as a narcotic) or "medicament" (used as a medicine), then an English-French MT system needs to disambiguate every use of "drug" in order to make the correct translations.

Heavy effort has been put in designing and evaluating dedicated WSD models, in particular with the Senseval series of workshops. This is partly motivated by the often unstated assumption that any full translation system, to achieve full performance, will sooner or later have to incorporate individual WSD components.

However, in most machine translation architectures, in particular SMT, the WSD problem is typically not explicitly addressed, but the translation engine already implicitly factors in many contextual features into lexical choice.

In this context, an energetically debated question at conferences over the past year is whether even the new state-of-the-art WSD models actually have anything to offer to full scale SMT systems.

We will show that dedicated WSD has led to several useful insights for SMT, and present how typical SMT models perform WSD. Finally, we will discuss the main challenges for the integration of state-of-the-art dedicated WSD models in current SMT architectures.


Marine Carpuat is a PhD student in computer science at HKUST, where she is a member of the Human Language Technology Center. Her research interests include natural language processing and statistical machine translation. She received a MPhil in electrical engineering from HKUST and graduated from the French Grande Ecole d'ingénieurs Supélec in 2002.

*** All are Welcome ***

For enquiries, please call 2358-7008 or visit our website at

Human Language Technology Center
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Hong Kong
+852 2358-8831
Last updated: 2005.09.21 Dekai Wu