THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering
Human Language Technology Center
Binarizing Synchronous Grammars for Machine Translation
University of Pennsylvania
Joint work with Hao ZHANG (Rochester), Dan GILDEA (Rochester), and Kevin KNIGHT (USC/ISI)
Date : 30 July 2007 (Monday)
Time : 16:00-17:30
Venue : Rm 1505 (Lifts 25-26)
Syntax-based MT systems have seen promising improvements in translation quality, but are often computationally too expensive to be practical. In this talk, I will address this problem of efficiency in systems based on synchronous grammars and tree transducers. In this model, the theoretical complexity of decoding is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. To alleviate this problem, we develop a theory of binarization for synchronous grammars which factors big rules into smaller, binary rules, that permit efficient decoding and are in fact equivalent to Inversion Transduction Grammars (ITG). We also present a linear-time algorithm for synchronous binarization. In our large-scale experiments, we found that the syntactic re-orderings between Chinese and English are almost always binarizable and the resulting rule set significantly improves the speed and accuracy of ISI's syntax-based MT system, which is currently the state-of-the-art.
Besides efficiency, this novel technique of binarization also provides a tool to study the formal and empirical properties of synchronous grammars. For example, we are now able to address the famous question by Dekai Wu that whether there are non-ITG re-orderings between fixed-word-order languages such as English and Chinese. Our empirical results on hand-aligned English-Chinese parallel text confirmed that the pro-ITG conjecture is largely correct (99.7% cases are binarizable), while we also provide, for the first-time, "real-examples" of non-binarizable reordering verified by native speakers. If time permitting, I would also present results on freer-word-order languages.
- Liang Huang, Hao Zhang, Daniel Gildea, and Kevin Knight (2007). Binarization of Synchronous Context-Free Grammars. Under review by Computational Linguistics. http://www.cis.upenn.edu/~lhuang3/bin-scfg-single.pdf
- Hao Zhang, Liang Huang, Daniel Gildea, and Kevin Knight (2006). Synchronous Binarization for Machine Translation. In Proceedings of the HLT-NAACL 2006, Brooklyn, NY. http://www.cis.upenn.edu/~lhuang3/bin-final.pdf
Liang Huang is finishing his fourth year in the PhD program at the University of Pennsylvania, under Aravind Joshi. He is mainly interested in the theoretical aspects of computational linguistics, in particular, efficient algorithms in parsing and machine translation, and formal properties of synchronous grammars. He also works on applying computational linguistics to structural biology.
For enquiries, please call 2358-7008 or visit our website at http://www.cs.ust.hk/~hltc/seminars.html.
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Hong Kong
http://www.cs.ust.hk/~hltcLast updated: 2007.07.25 Dekai Wu