Dekai Wu (De Kai)
Professor of Computer Science and Engineering, HKUST
Distinguished Research Scholar, ICSI, Berkeley
Human Language Technology
CenterDepartment of Computer Science and Engineering
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Kowloon, Hong Kong
tel +852 2358-6989 · fax +852 2358-1477 · room 3556 (lifts 25-30)
lab +852 2358-8831 · room 2602 (lifts 27-30)
dekai@cs.ust.hk · http://www.cs.ust.hk/~dekai
❰ link to COVID-19 project ❱
#UniversalMasking #masks4all #wearamask
videos + articles + interactive simulation
I received my PhD in Computer Science from the University of California at Berkeley, and was a postdoctoral fellow at the University of Toronto (Ontario, Canada) prior to joining HKUST in its founding year in 1992. Other degrees include an Executive MBA from Kellogg (Northwestern University) and HKUST in 2002, and a BS in Computer Engineering from the University of California at San Diego (Revelle College departmental award, cum laude, Phi Beta Kappa) in 1984. I have been a visiting researcher at Columbia University in 1995-96, Bell Laboratories in 1995, and the Technische Universität München (Munich, Germany) during 1986-87.
In December 2011, I was selected by the Association for Computational Linguistics as one of only 17 scientists worldwide to be awarded the honor of founding ACL Fellow, with a citation for "significant contributions to machine translation and the development of inversion transduction grammar" which pioneered the integration of syntactic and semantic models into statistical machine translation paradigms.
I have served as Associate Editor of Computer Speech and Language, AI Journal, and ACM Transactions on Speech and Language Processing, and on the Editorial Boards of Computational Linguistics, Machine Translation, and Journal of Natural Language Engineering. I also served as Chair of IWSLT 2019, Area Co-Chair for ICLR 2020, ACL 2019, EMNLP 2016, NAACL HLT 2016, IJCAI 2015, ACL 2013, and IJCNLP 2013, Chair of IWSLT 2012, Co-Chair for EMNLP-2004, and the Organizing Committee of ACL-2000 and WVLC-5 (SIGDAT 1997), as well as the Executive Committee of the Association for Computational Linguistics (ACL). In 2007 I initiated the ongoing annual SSST workshop series on Syntax, Semantics, and Structure in Statistical Translation.
South China Morning Post has a nice overview of my accomplishments, activities, and approaches. In 2015 Debrett's named me as one of the 100 most influential figures of Hong Kong, in the area of science.
In 2019, Google named me as one of eight inaugural members of its AI Ethics council, ATEAC (Advanced Technology External Advisory Council).
Research interests
Artificial intelligence; human language and music technology; computational linguistics; natural language processing; machine translation; machine learning; neural networks; data science; cognitive models of human language and communication; multilingual computing; language modeling; speech recognition; language acquisition; dialog systems; information retrieval; knowledge management; evolution of language and music; computational musicology; computer music.
For machine learning of the relationships between musical languages, we received the ICMC Best Presentation Award from the International Computer Music Association in 2015.
For machine learning of the relationships between written and spoken languages, milestone successes pioneered by my statistical machine translation (SMT) research group include:
- the first unstructured SMT models on very different languages
- 1993— Chinese/English alignment
- 1994— Chinese/English statistical machine translation
- 1994— Chinese/English phrase/collocation translation learning
- the first syntactic and tree-structured SMT models
- 1995— inversion transduction grammar (ITG; any synchronous context-free grammar that is binary, ternary, or inverting)
- 1995— bracketing ITG (BTG or BITG)
- 1995— stochastic ITG parameter estimation (EM training)
- 1996— phrasal SMT (segmental ITG)
- 1997— projection of monolingual constraints (bilingual constraint transfer; coercion)
- 1998— SMT with lingustic ITG transduction rules
- 2005— comparable corpora mining BITG
- 2009— linear inversion transduction grammar (LITG)
- 2009— linear transduction grammar (LTG)
- 2011— preterminalized linear inversion transduction grammar (PLITG)
- 2012— rule chunking for unsupervised transduction grammar grammar induction
- 2013— rule segmentation for unsupervised transduction grammar grammar induction
- the first semantic SMT models
- 2005— word sense disambiguation for SMT (WSD for SMT)
- 2007— phrase sense disambiguation for SMT (PSD)
- 2007— semantic role labeling for SMT training (SRL for SMT)
- 2009— semantic role labeling for SMT decoding (SRL for SMT)
- 2013— training SMT against purely semantic frame based objective criteria (MEANT)
- the first semantic MT evaluation models
- 2010— human semantic MT evaluation with SRL-for-MTE (HMEANT)
- 2012— automatic semantic MT evaluation with SRL-for-MTE (MEANT)
- 2014— cross-lingual automatic semantic MT evaluation without human references (XMEANT)
- some of this is surveyed in my 2010 chapters
- "Alignment" in CRC Press' Handbook of Natural Language Processing
- "Lexical Semantics for Statistical Machine Translation" in DARPA's Handbook of Natural Language Processing and Machine Translation
Our active research projects are internationally funded by the US DARPA BOLT, GALE, and LORELEI programs, the European Union EU-BRIDGE project, and the Hong Kong RGC.
Human Language Technology Center (HLTC)
Activities
- Collective Responsibility and Accountability in an AI Era - 10th Organization Artifacts and Practices Workshop (OAP 2020). Jun 2020, Berkeley. [Co-Chair]
- Eighth International Conference on Learning Representations (ICLR 2020). Apr 2020, Addis Ababa, Ethiopia. [Area Chair]
- 16th International Workshop on Spoken Language Translation (IWSLT 2019). Nov 2019, Hong Kong. [Workshop Chair]
- 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018). Jul 2018, Melbourne, Australia. [Area Chair]
- 27th International Conference on Computational Linguistics (COLING 2018). Aug 2018, Santa Fe, New Mexico. [Area Chair]
- ACL 2018, 56th Annual Meeting of the Association for Computational Linguistics. Jul 2016, Melbourne, Australia. [Area Chair]
- IJCAI-16, 25th International Joint Conference on Artificial Intelligence. Jul 2016, New York. [Senior Program Committee]
- NAACL HLT 2016, 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics. Jun 2016, San Diego, California. [Area Chair]
- IJCAI-15, 24th International Joint Conference on Artificial Intelligence. Jul 2015, Buenos Aires, Argentina. [Area Chair]
- SSST-9, Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation (NAACL HLT 2015 Workshop), Jun 2015, Denver, Colorado
- SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (EMNLP 2014 Workshop), Oct 2014, Doha, Qatar
- 51st Annual Meeting of the Association for Computational Linguistics. Aug 2013, Sofia, Bulgaria. [Area Chair]
- 6th International Joint Conference on Natural Language Processing. Oct 2013, Nagoya, Japan. [Area Chair]
- SSST-7, Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (NAACL-HLT 2013 Workshop), Jun 2013, Atlanta, Georgia
- IWSLT 2012, International Workshop on Spoken Language Translation, 6-7 Dec 2012, Hong Kong
- SSST-6, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (ACL 2012 Workshop), Jul 2012, Jeju, South Korea
- SSST-5, Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation (ACL HLT 2011 Workshop), 23 Jun 2011, Portland, Oregon
- SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (COLING 2010 Workshop), 28 Aug 2010, Beijing
- SSST-3, Third Workshop on Syntax and Structure in Statistical Translation (NAACL HLT 2009 Workshop), 5 Jun 2009, Boulder, Colorado
- SSST-2, Second Workshop on Syntax and Structure in Statistical Translation (ACL-08: HLT Workshop), 20 Jun 2008, Columbus, Ohio
- SSST-1, Syntax and Structure in Statistical Translation (NAACL-HLT 2007 Workshop), 26 Apr 2007, Rochester, New York
- CLSP Workshop 2005, Translation by Parsing, Jul-Aug 2005, Johns Hopkins University, Center for Language and Speech Processing
- EMNLP-2004, 2004 Conference on Empirical Methods in Natural Language Processing (at ACL-04), 25-26 Jul 2004, Barcelona, Spain
- ACL-2000, 38th Annual Meeting of the Association for Computational Linguistics, 1-8 Oct 2000, Hong Kong
- WVLC-5, Fifth Workshop on Very Large Corpora, 18/20 Aug 1997, Tsinghua/HKUST
Teaching
- COMP5221 (Natural Language Processing), Spring 2024
- COMP1944 (Artificial Intelligence Ethics), Spring 2024
- COMP4221 (Introduction to Natural Language Processing), Spring 2023
- COMP1944 (Artificial Intelligence Ethics), Fall 2022
- COMP1944 (Artificial Intelligence Ethics), Spring 2022
- COMP1944 (Artificial Intelligence Ethics), Fall 2021
- COMP4901M (Artificial Intelligence Ethics), Spring 2021
- COMP4221 (Introduction to Natural Language Processing), Spring 2020
- COMP4901M (Artificial Intelligence Ethics), Spring 2020
- COMP4221 (Introduction to Natural Language Processing), Spring 2019
- COMP3211 (Fundamentals of Artificial Intelligence), Spring 2019
- COMP5221 (Natural Language Processing), Spring 2018
- COMP4221 (Introduction to Natural Language Processing), Spring 2018
- COMP4911 (IT Entrepreneurship), Spring 2018
- COMP4221 (Introduction to Natural Language Processing), Spring 2017
- COMP4911 (IT Entrepreneurship), Fall 2016
- COMP3211 (Fundamentals of Artificial Intelligence), Spring 2016
- COMP4221 (Introduction to Natural Language Processing), Fall 2015
- COMP4211 (Machine Learning), Spring 2015
- COMP2012H (Honors Object Oriented Programming and Data Structures), Fall 2014
- COMP2012H (OOP and Data Structures, Honors Study Track), Spring 2014
- COMP5221 (Natural Language Processing), Fall 2013
- COMP4221 (Introduction to Natural Language Processing), Fall 2013
- COMP4211 (Machine Learning), Spring 2013
- COMP3031 (Introduction to Programming Languages), Fall 2012
- COMP3211 (Fundamentals of Artificial Intelligence), Spring 2012
- COMP3031 (Introduction to Programming Languages), Fall 2011
- COMP300H (Introduction to Natural Language Processing), Spring 2011
- COMP221 (Fundamentals of Artificial Intelligence), Fall 2010
- COMP300H (Introduction to Natural Language Processing), Spring 2010
- COMP221 (Fundamentals of Artificial Intelligence), Fall 2009
- CSIT523 (Knowledge Management), Summer 2009
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2009
- COMP526 (Natural Language Processing), Fall 2008
- CSIT600G (Knowledge Management), Summer 2008
- COMP151H (Object-Oriented Programming, Honors Study Track), Spring 2008
- COMP251 (Introduction to Programming Languages), Fall 2007
- CSIT600G (Knowledge Management), Summer 2007
- COMP151 (Object-Oriented Programming), Spring 2007
- COMP526 (Natural Language Processing), Fall 2006
- COMP251 (Introduction to Programming Languages), Fall 2006
- COMP621N (Advanced Topics in AI), Spring 2006
- COMP151 (Object-Oriented Programming), Spring 2006
- CSIT600G (Knowledge Management), Fall 2005
- COMP621M (Advanced Topics in AI: Structural Statistical Machine Translation), Fall 2005
- COMP271 (Design and Analysis of Algorithms), Spring 2005
- COMP151 (Object-Oriented Programming), Fall 2004
- COMP621J (Advanced Topics in AI: Statistical Machine Translation), Spring 2004
- COMP526 (Natural Language Processing), Fall 2003
- COMP621H (Advanced Topics in AI: Machine Translation), Fall 2003
- COMP151 (Object-Oriented Programming), Spring 2003
- COMP171 (Data Structures and Algorithms), Fall 2002
Current/recent research students
- Markus SAERS (Sweden) Postdoc (PhD 2011 co-advised with Uppsala Universitet)
- Marine CARPUAT (France) PhD 2008
- Karteek ADDANKI (India) PhD 2016
- Jackie LO Chi Kiu (HK) PhD 2018, MPhil 2009
- Meriem BELOUCIF (Algeria) PhD 2018
- Yuchen YAN (China) PhD
- Serkan KUMYOL (Turkey) PhD
- Tyler BARTH (USA) MPhil
- Ken LEE Wing Kuen (HK) MPhil 2005
Selected publications
- Yuchen YAN, Dekai WU, and Serkan KUMYOL.
"Efficient Bilingual Generalization from Neural Transduction Grammar Induction".
16th International Workshop on Spoken Language Translation (IWSLT 2019). Hong Kong: Nov 2019.
We introduce (1) a novel neural network structure for bilingual modeling of sentence pairs that allows efficient capturing of bilingual relationships via biconstituent composition, (2) the concept of neural network biparsing, which applies to not only machine translation (MT) but also to a variety of other bilingual research areas, and (3) the concept of a biparsing-backpropagation training loop, which we hypothesize that can efficiently learn complex biparse tree patterns. Our work distinguishes from sequential attention-based models, which are more traditionally found in neural machine translation (NMT) in three aspects. First, our model enforces compositional constraints. Second, our model has a smaller search space in terms of discovering bilingual relationships from bilingual sentence pairs. Third, our model produces explicit biparse trees, which enable transparent error analysis during evaluation and external tree constraints during training.
- Meriem BELOUCIF and Dekai WU.
"SRL for low resource languages isn't needed for semantic SMT".
21st Annual Conference of the European Association for Machine Translation (EAMT 2018). Alicante: May 2018.
Previous attempts at injecting semantic frame biases into SMT training for low resource languages failed because either (a) no semantic parser is available for the low resource input language; or (b) the output English language semantic parses excise relevant parts of the alignment space too aggressively. We present the first semantic SMT model to succeed in significantly improving translation quality across many low resource input languages for which no automatic SRL is available — consistently and across all common MT metrics. The results we report are the best by far to date for this type of approach; our analyses suggest that in general, easier approaches toward including semantics in training SMT models may be more feasible than generally assumed even for low resource languages where semantic parsers remain scarce. While recent proposals to use the crosslingual evaluation metric XMEANT during inversion transduction grammar (ITG) induction are inapplicable to low resource languages that lack semantic parsers, we break the bottleneck via a vastly improved method of biasing ITG induction toward learning more semantically correct alignments using the monolingual semantic evaluation metric MEANT. Unlike XMEANT, MEANT requires only a readily-available English (output language) semantic parser. The advances we report here exploit the novel realization that MEANT represents an excellent way to semantically bias expectation-maximization induction even for low resource languages. We test our systems on challenging languages including Amharic, Uyghur, Tigrinya and Oromo. Results show that our model influences the learning towards more semantically correct alignments, leading to better translation quality than both the standard ITG or GIZA++ based SMT training models on different datasets.
- Markus SAERS and Dekai WU.
"Handling Ties Correctly and Efficiently in Viterbi Training Using the Viterbi Semiring".
12th International Conference on Language and Automata Theory and Applications (LATA 2018). Tel Aviv, Israel: Apr 2018.
The handling of ties between equiprobable derivations during Viterbi training is often glossed over in research paper, whether they are broken randomly when they occur, or on an ad-hoc basis decided by the algorithm or implementation, or whether all equiprobable derivations are enumerated with the counts uniformly distributed among them, is left to the readers imagination. The first hurts rarely occurring rules, which run the risk of being randomly eliminated, the second suffers from algorithmic biases, and the last is correct but potentially very inefficient. We show that it is possible to Viterbi train correctly without enumerating all equiprobable best derivations. The method is analogous to expectation maximization, given that the automatic differentiation view is chosen over the reverse value/outside probability view, as the latter calculates the wrong quantity for reestimation under the Viterbi semiring. To get the automatic differentiation to work we devise an unbiased subderivative for the max function.
- Meriem BELOUCIF and Dekai WU.
"Semantically Driven Inversion Transduction Grammar Induction for Early Stage Training of Spoken Language Translation".
Sixth IEEE Workshop on Spoken Language Technology (SLT
2016). San Diego: Dec 2016.
We propose an approach in which we inject a crosslingual semantic frame based objective function directly into inversion transduction grammar (ITG) induction in order to semantically train spoken language translation systems. This approach represents a follow-up of our recent work on improving machine translation quality by tuning loglinear mixture weights using a semantic frame based objective function in the late, final stage of statistical machine translation training. In contrast, our new approach injects a semantic frame based objective function back into earlier stages of the training pipeline, during the actual learning of the translation model, biasing learning toward semantically more accurate alignments. Our work is motivated by the fact that ITG alignments have empirically been shown to fully cover crosslingual semantic frame alternations. We show that injecting a crosslingual semantic based objective function for driving ITG induction further sharpens the ITG constraints, leading to better performance than either the conventional ITG or the traditional GIZA++ based approaches.
- Meriem BELOUCIF, Markus SAERS and Dekai WU.
"Improving word alignment for low resource languages using English monolingual SRL".
Sixth Workshop on Hybrid Approaches to Translation (HyTra-6). Osaka, Japan: Dec 2016.
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.
- Meriem BELOUCIF and Dekai WU.
"Injecting a Semantic Objective Function into Early Stage Learning of Spoken Language Translation".
Oriental COCOSDA 2016, 19th International Conference of the Oriental Chapter of the
International Committee for the Co-ordination and Standardization of
Speech Databases and Assessment Techniques (O-COCOSDA
2016). Bali, Indonesia: Oct 2016.
We describe a new approach for semantically training spoken language translation systems, in which we inject a crosslingual semantic frame based objective function directly into inversion transduction grammar (ITG) induction. This represents an ambitious jump from recent work on improving translation adequacy by using a semantic frame based objective function to drive the tuning of loglinear mixture weights in the final stage of statistical machine translation training. In contrast, our new approach propagates a semantic frame based objective function back into much earlier stages of the pipeline, during the actual learning of the translation model, biasing learning toward semantically more accurate alignments. This approach is motivated by the fact that ITG alignments have empirically been shown to fully cover crosslingual semantic frame alternations, even though they rule out an overwhelming majority of the space of possible alignments. We show that directly driving ITG induction with a crosslingual semantic based objective function not only helps to further sharpen the ITG constraints, but still avoids excising relevant portions of the search space, and leads to better performance than either conventional ITG or GIZA++ based approaches.
- Meriem BELOUCIF and Dekai WU.
"Driving inversion transduction grammar induction with semantic evaluation".
5th Joint Conference on Lexical and Computational Semantics
(*SEM 2016), at ACL 2016, 55-63. Berlin: Aug 2016.
We describe a new technique for improving statistical machine translation training by adopting scores from a recent crosslingual semantic frame based evaluation metric, XMEANT, as outside probabilities in expectation-maximization based ITG (inversion transduction grammars) alignment. Our new approach strongly biases early-stage SMT learning towards semantically valid alignments. Unlike previous attempts that have proposed using semantic frame based evaluation metrics as the objective function for late-stage tuning of less than a dozen loglinear mixture weights, our approach instead applies the semantic metric at one of the earliest stages of SMT training, where it may impact millions of model parameters. The choice of XMEANT is motivated by empirical studies that have shown ITG constraints to cover almost all crosslingual semantic frame alternations, which resemble the crosslingual semantic frame matching measured by XMEANT. Our experiments purposely restrict training data to small amounts to show the technique's utility in the absence of a huge corpus, to study the effects of semantic generalizations while avoiding overreliance on memorization. Results show that directly driving ITG training with the crosslingual semantic frame based objective function not only helps to further sharpen the ITG constraints, but still avoids excising relevant portions of the search space, and leads to better performance than either conventional ITG or GIZA++ based approaches.
- Dekai WU. "Generalizing Transduction
Grammars to Model Continuous Valued Musical Events". 17th
International Society for Music Information Retrieval Conference (ISMIR 2016). New York: Aug 2016.
We describe a generalization of stochastic transduction grammars to be able to model continuous values, the first models to natively handle continuous-valued musical events such as microtones while still gaining the advantages of STGs for describing complex structural, hierarchically compositional inter-part relationships. Music transduction modeling based on linguistic or grammatical models have commonly approximated continuous valued features like pitch by quantizing them into discrete symbols, which represent `clean' notes on a scale. The sacrifice is worthwhile for modeling the learning of musical improvisation and accompaniment where musical sequences interact hierarchically at many overlapping levels of granularity; previous work has shown in flamenco and hip hop how discrete STGs allow each part to influence decisions made by other parts while also satisfying contextual preferences across multiple dimensions. We extend the modeling machinery toward the many musical genres where contextual relationships between continuous values influence improvisational and accompaniment decisions. Illustrating using the `bent notes' prevalent in blues, we show how continuous STGs can be generalized from conventional discrete STGs which have until now only been able to handle symbolic events, thereby allowing musical signals to remain finely represented as continuous values without crude quantization into discrete symbols, while still retaining the ability to model probabilistic structural relations between multiple musical languages. We exemplify this new approach in learning blues notes biases via a new polynomial time algorithm for expectation-maximization training of continuous SITGs (stochastic inversion transduction grammars), a specific subclass of STGs that has already proven useful in numerous applications in both music and language.
- Meriem BELOUCIF and Dekai WU.
"A semantically confidence-weighted ITG induction algorithm".
3rd International Workshop on Semantic Machine Learning (SML
2016), at IJCAI 2016. New York: Jul 2016.
We propose a new algorithm to induce inversion transduction grammars, in which a crosslingual semantic frame based objective function is injected as confidence weighting in the early stages of statistical machine translation training. Unlike recent work on improving translation adequacy that uses a monolingual semantic frame based objective function to drive the tuning of loglinear mixture weights in the late stages of statistical machine translation training, our bilingual approach incorporates the semantic objective during the actual learning of the translation model's structure. Our approach assigns higher confidence to training examples in which the semantic frames in the input language more closely match the semantic frames of the output language, as predicted automatically by XMEANT, the crosslingual semantic frame based machine translation evaluation metric. We chose to apply this approach to induce inversion transduction grammars (ITGs), since ITG alignments prune a large majority of the space of possible alignments, while at the same time empirically fully covering all the crosslingual semantic frame alternations of the type we are using for confidence weighting. Results show that boosting semantically compatible training examples in ITG induction improves the translation performance compared to either traditional GIZA++ alignment or conventional ITG alignment based approaches for phrase based statistical machine translation.
- Dekai WU. "How Blue Can You Get? Learning Structural Relationships for Microtones via Continuous Stochastic Transduction Grammars". Seventh International Conference on Computational Creativity (ICCC 2016). Paris: Jun 2016.
We describe a new approach to probabilistic modeling of structural inter-part relationships between continuous-valued musical events such as microtones, through a novel class of continuous stochastic transduction grammars. Linguistic and grammar oriented models for music commonly approximate features like pitch using discrete symbols to represent ‘clean’ notes on scales. In many musical genres, however, contextual relationships between continuous values are essential to improvisational and accompaniment decisions—as with the ‘bent notes’ that blues rely heavily upon. In this paper, we study how stochastic transduction grammars or STGs, which have until now only been able to handle discrete symbols, can be generalized to model continuous valued features for such applications. STGs are interesting for modeling the learning of musical improvisation and accompaniment where parallel musical sequences interact hierarchically (compositionally) at many overlapping levels of granularity. Each part influences decisions made by other parts while at the same time satisfying contextual preferences across multiple dimensions; applications to flamenco and hip hop have recently been shown using discrete STGs. We propose to use a formulation of continuous STGs in which musical signals are finely represented as continuous values without crude quantization into discrete symbols, yet still retaining the ability to model probabilistic structural relations between multiple musical languages. We instantiate this approach for the specific class of stochastic inversion transduction grammars (SITGs), which has proven useful in many applications, via a polynomial time algorithm for expectation-maximization training of continuous SITGs.
- Dekai WU. "Learning Musical Creativity via Stochastic Transduction Grammars: Combination, Exploration and Transformation". Fourth International Workshop on Musical Metacreation (MUME 2016)). Paris: Jun 2016.
We discuss how Boden’s creative processes of combination, exploration, and transformation naturally emerge in models that learn musical improvisation via stochastic transduction grammar induction. Unlike a conventional monolingual grammar, a transduction grammar represents complex transformative relationships between one representation language and another. For musical improvisation, a transduction grammar both provides a large (typically infinite) space of possible hierarchical combinations, and defines a combinatorial space to explore. A stochastic transduction grammar (STG) allows controlled randomness in the combination and exploration. We have been developing STG based models in recent work on learning musical improvisation for hip hop, flamenco, and blues. Inducing an STG simultaneously (a) identifies chunks that will become candidates for recombination as well as patterns of combination, (b) constructs a new spaces for exploration in improvisation and composition, and (c) learns transformations from one representation to another.
- Markus SAERS and Dekai WU. "Learning Translations
for Tagged Words: Extending the Translation Lexicon of an ITG for
Low Resource Languages". Workshop on Multilingual and
Cross-lingual Methods in NLP (at NAACL HLT 2016), 55-64. San Diego: Jun 2016.
We tackle the challenge of learning part-of-speech classified translations as part of an inversion transduction grammar, by learning translations for English words with known part-of-speech tags, both from existing translation lexica and from parallel corpora. When translating from a low resource language into English, we can expect to have rich resources for English, such as treebanks, and small amounts of bilingual resources, such as translation lexica and parallel corpora. We solve the problem of integrating these heterogeneous resources into a single model using stochastic Inversion Transduction Grammars, which we augment with wildcards to handle unknown translations.
- Dekai WU and Karteek ADDANKI. "Freestyle: A Rap Battle Bot that Learns to Improvise". 16th International
Society for Music Information Retrieval Conference (ISMIR
2015). Málaga, Spain: Oct 2015.
We demonstrate a rap battle bot that autonomously learns to freestyle creatively in real time, via a fast new hybrid compositional improvisation model integrating symbolic transduction grammar induction with novel bilingual recursive neural networks. Given that rap and hip hop represent one of music's most influential recent developments, surprisingly little research has been done in music technology. In true rap battling---the genre's most difficult form---an appropriate output response must be improvised whenever challenged by some input line of lyrics. As with many musical improvisation tasks, modeling the creative process is complex because it requires compositionality: improvising a good response not only requires making salient associations with the challenge at many overlapping levels of granularity, but simultaneously satisfying contextual preferences across a wide variety of dimensions. Our new real-time system accomplishes this via an efficient, recursive, neurally guided stochastic grammar-based transducer. The neural network is a newly enhanced version of our recently developed TRAAM (transduction recursive auto-associative memory) model. We demonstrate strong connections between music and language processing and learning---the freestyle learning capability arises from exactly the same compositional structure learning model that autonomously learns semantic interpretation / translation models as well as other music improvisation models.
- Meriem BELOUCIF, Markus SAERS, and Dekai WU. "Improving
Semantic SMT via Soft Semantic Role Label Constraints on ITG Alignments". Machine Translation Summit XV (MT Summit
2015). Miami: Oct 2015.
We show that applying semantic role label constraints to bracketing ITG alignment to train MT systems improves the quality of MT output in comparison to the conventional BITG and GIZA alignments. Moreover, we show that applying soft constraints to SRL-constrained BITG alignment leads to a better translation system compared to using hard constraints which appear too harsh to produce meaningful biparses. We leverage previous work demonstrating that BITG alignments are able to fully cover cross-lingual semantic frame alternations, by using semantic role labeling to further narrow BITG constraints, in a soft fashion that avoids losing relevant portions of the search space. SRL-based evaluation metrics like MEANT have shown that tuning towards preserving the shallow semantic structure across translations, robustly improves translation performance. Our approach brings the same intuition into the training phase. We show that our new alignment outperforms both conventional Moses and BITG alignment baselines in terms of the adequacy-oriented MEANT scores, while still producing comparable results in terms of edit distance metrics.
- Dekai WU and Karteek ADDANKI. "Neural Versus
Symbolic Rap Battle Bots". 41st International Computer Music
Conference (ICMC 2015). Denton, Texas: Sep 2015.
We contrast two opposing approaches to building bots that autonomously learn to rap battle: a symbolic probabilistic approach based on induction of stochastic transduction grammars, versus a neural network approach based on backpropagation through unconventional transduction recursive auto-associative memory (TRAAM) models. Rap battling is modeled as a quasi-translation problem, in which an appropriate output response must be improvised given any input challenge line of lyrics. Both approaches attempt to tackle the difficult problem of compositionality: for any challenge line, constructing a good response requires making salient associations while satisfying contextual preferences at many different, overlapping levels of granularity between the challenge and response lines. The contextual preferences include fluency, partial metrical or syntactic parallelism, and rhyming at various points across the lines. During both the learning and improvisation stages, the symbolic approach attempts to explicitly enumerate as many hypotheses as possible, whereas the neural approach attempts to evolve vector representations that better implicitly generalize over soft regions or neighborhoods of hypotheses. The brute force symbolic approach is more precise, but quickly generates combinatorial numbers of hypotheses when searching for generalizations. The distributed vector based neural approach can more easily confuse hypotheses, but maintains a constant level of complexity while retaining its implicit generalization bias. We contrast both the theoretical formulation and experimental outputs of the two approaches.
- Chi-kiu LO, Philipp C. DOWLING, and Dekai WU.
"Improving evaluation and optimization of MT systems against MEANT".
10th Workshop on Statistical Machine Translation (at EMNLP
2015), 434-441. Lisbon: Sep 2015.
We show that, consistent with MEANT-tuned systems that translate into Chinese, MEANT-tuned MT systems that translate into English also outperforms BLEU-tuned system across commonly used MT evaluation metrics, even in BLEU. The result is achieved by significantly improving MEANT's sentence-level ranking correlation with human preferences through incorporating a more accurate distributional semantic model for lexical similarity and a novel backoff algorithm for evaluating MT output which automatic semantic parser fails to parse. Our surprising results of MEANT-tuned systems having a higher BLEU score than BLEU-tuned systems suggests that MEANT is a more accurate objective function guiding the development of MT systems towards producing more adequate translation.
- Dekai WU. "Compositional bilingual
artificial neural networks for predicting hypermetrical structure
among interacting flamenco parts". 2015 Biennial Meeting of
the Society for Music Perception and Cognition (SMPC 2015). Nashville, Tennessee: Aug 2015.
Particularly in traditional, improvisational genres such as flamenco or jazz, much of music can be seen as parallel musical sequences that interact such that each part influences decisions made by other parts. We recently suggested leveraging the formal language theory notion of transduction grammars, in stochastic forms, to model each part as a musical language (Wu 2013). The advantage is that stochastic transductions can be exploited to model the probabilistic, ambiguous, complex structural relationships between interacting parts. Transduction grammar induction techniques can then be used to model unsupervised learning of musical accompaniment and improvisation. We explore an alternative approach carrying many of the same properties, but instead using artificial neural networks to learn compositional distributed vector representations that implicitly encode structural relationships between associated portions of two different musical parts. As with symbolic transduction grammars, these structural association patterns can range from concrete to abstract patterns, and from short to long patterns. Unlike symbolic transduction grammars, a single vector encodes a “soft” set of multiple similar hypotheses in the same neighborhood, because similar vectors tend to be learned for association patterns that are similar—cutting down the combinatorial growth of hypotheses inherent in the symbolic approaches. Since conventional neural networks have difficulty representing compositional structures, we propose to use a bilingual generalization of Pollack’s (1990) recursive auto-associative memory. Whereas Pollack’s RAAM can be seen as a neural approximation of a single compositional language model, our TRAAM (Transduction RAAM) approach is a neural approximation of a bilingual compositional transduction model—a relation between two probabilistically structured musical languages. We discuss empirical analyses of the learning behavior of our new neural approach on the hypermetrical structure prediction problem in flamenco, where meter changes can be rapidly influenced by multiple parts.
- Dekai WU and Karteek ADDANKI. " Learning to Rap
Battle with Bilingual Recursive Neural Networks". 24th International Joint Conference on
Artificial Intelligence (IJCAI-15). Buenos Aires: Jul 2015.
We describe an unconventional line of attack in our quest to teach machines how to rap battle by improvising lyrics on the fly, in which a novel recursive bilingual neural network implicitly learns soft, context-sensitive generalizations over the structural relationships between associated parts of challenge and response raps, while avoiding the exponential complexity costs that symbolic models would require. Our recursive bilingual neural network learns the feature vectors simultaneously using context from both the challenge and the response such that challenge-response association patterns with similar structure tend to have similar vectors. Improvisation is modeled as a quasi-translation learning problem and our recursive bilingual neural network is trained to improvise fluent and rhyming responses to hip hop lyrical challenges. The soft structural relationships learned by our recursive bilingual neural network are used to improve the probabilistic responses generated by our improvisational response component.
- Meriem BELOUCIF, Chi-kiu LO, and Dekai WU. "Improving MEANT
Based Semantically Tuned SMT". 11th International Workshop
on Spoken Language Translation (IWSLT 2014), 34-41. Lake Tahoe, California: Dec 2014.
We discuss various improvements to our MEANT tuned system, previously presented at IWSLT 2013. In our 2014 system, we incorporate this year's improved version of MEANT, improved Chinese word segmentation, Chinese named entity recognition and dedicated proper name translation, and number expression handling. This results in a significant performance jump compared to last year's system. We also ran preliminary experiments on tuning to IMEANT, our new ITG based variant of MEANT. The performance of tuning to IMEANT is comparable to tuning on MEANT (differences are statistically insignificant). We are presently investigating if tuning on IMEANT can produce even better results, since IMEANT was actually shown to correlate with human adequacy judgment more closely than MEANT. Finally, we ran experiments applying our new architectural improvements to a contrastive system tuned to BLEU. We observed a slightly higher jump in comparison to last year, possibly due to mismatches of MEANT's similarity models to our new entity handling.
- Dekai WU, Chi-kiu LO, Meriem BELOUCIF and Markus SAERS. "Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars". Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure
in Statistical Translation (at EMNLP 2014), 22-33. Doha: Oct 2014.
We introduce an inversion transduction grammar based restructuring of the MEANT automatic semantic frame based MT evaluation metric, which, by leveraging ITG language biases, is able to further improve upon MEANT's already-high correlation with human adequacy judgments. The new metric, called IMEANT, uses bracketing ITGs to biparse the reference and machine translations, but subject to obeying the semantic frames in both. Resulting improvements support the presumption that ITGs, which constrain the allowable permutations between compositional segments across the reference and MT output, score the phrasal similarity of the semantic role fillers more accurately than the simple word alignment heuristics (bag-of-word alignment or maximum alignment) used in previous version of MEANT. The approach successfully integrates (1) the previously demonstrated extremely high coverage of cross-lingual semantic frame alternations by ITGs, with (2) the high accuracy of evaluating MT via weighted f-scores on the degree of semantic frame preservation.
- Karteek ADDANKI and Dekai WU. "Transduction Recursive Auto-Associative Memory: Learning Bilingual
Compositional Distributed Vector Representations of Inversion Transduction
Grammars". Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure
in Statistical Translation (at EMNLP 2014), 112-121. Doha: Oct 2014.
We introduce TRAAM, or Transduction RAAM, a fully bilingual generalization of Pollack's (1990) monolingual Recursive Auto-Associative Memory neural network model, in which each distributed vector represents a bilingual constituent—i.e., an instance of a transduction rule, which specifies a relation between two monolingual constituents and how their subconstituents should be permuted. Bilingual terminals are special cases of bilingual constituents, where a vector represents either (1) a bilingual token—a token-to-token or ``word-to-word'' translation rule—or (2) a bilingual segment—a segment-to-segment or ``phrase-to-phrase'' translation rule. TRAAMs have properties that appear attractive for bilingual grammar induction and statistical machine translation applications. Training of TRAAM drives both the autoencoder weights and the vector representations to evolve, such that similar bilingual constituents tend to have more similar vectors.
- Markus SAERS and Dekai WU. "Ternary Segmentation for Improving Search in Top-down Induction of
Segmental ITGs". Proceedings of SSST-8, Eighth
Workshop on Syntax, Semantics and Structure
in Statistical Translation (at EMNLP 2014), 48-57. Doha: Oct 2014.
We show that there are situations where iteratively segmenting sentence pairs top-down will fail to reach valid segments and propose a method for alleviating the problem. Due to the enormity of the search space, error analysis has indicated that it is often impossible to get to a desired embedded segment purely through binary segmentation that divides existing segmental rules in half – the strategy typically employed by existing search strategies – as it requires two steps. We propose a new method to hypothesize ternary segmentations in a single step, making the embedded segments immediately discoverable.
- Chi-kiu LO and Dekai WU. "BiMEANT: Integrating cross-lingual and monolingual semantic frame
similarities in the MEANT semantic MT evaluation metric". In Laurent BESACIER and Adrian-Horia DEDIU and Carlos MARTÍN-VIDE (editors),
Second International
Conference on Statistical Language and Speech Processing (SLSP
2014), 82-93. Grenoble, France: Oct 2014.
Lecture Notes in
Artificial Intelligence 7978, 2014. Heidelberg:
Springer-Verlag.
We present experimental results showing that integrating cross-lingual semantic frame similarity into the semantic frame based automatic MT evaluation metric MEANT improves its correlation with human judgment on evaluating translation adequacy. Recent work shows that MEANT more accurately reflects translation adequacy than other automatic MT evaluation metrics such as BLEU or TER, and that moreover, optimizing SMT systems against MEANT robustly improves translation quality across different output languages. However, in some cases the human reference translation employs different scoping strategies from the input sentence and thus standard monolingual MEANT, which only assesses translation quality via the semantic frame similarity between the reference and machine translations, fails to fairly and accurately reward the adequacy of the machine translation. To address this issue we propose a new bilingual metric, BiMEANT, that correlates with human judgment more closely than MEANT by incorporating new cross-lingual semantic frame similarity assessments into MEANT.
- Dekai WU, Chi-kiu LO, and Markus SAERS.
"Lexical Access Preference and Constraint Strategies for Improving Multiword Expression Association within Semantic MT Evaluation".
4th Workshop on Cognitive Aspects of the Lexicon (CogALex), at Coling 2014, 144-153. Dublin, Ireland: Aug 2014.
We examine lexical access preferences and constraints in computing multiword expression associations from the standpoint of a high-impact extrinsic task-based performance measure, namely semantic machine translation evaluation. In automated MT evaluation metrics, machine translations are compared against human reference translations, which are almost never worded exactly the same way except in the most trivial of cases. Because of this, one of the most important factors in correctly predicting semantic translation adequacy is the accuracy of recognizing alternative lexical realizations of the same multiword expressions in semantic role fillers. Our results comparing bag-of-words, maximum alignment, and inversion transduction grammars indicate that cognitively motivated ITGs provide superior lexical access characteristics for multiword expression associations, leading to state-of-the-art improvements in correlation with human adequacy judgments.
- Dekai WU, Chi-kiu LO, and Markus SAERS.
"Lexical Access Preference and Constraint Strategies for Improving Multiword Expression Association within Semantic MT Evaluation".
4th Workshop on Cognitive Aspects of the Lexicon (CogALex), at Coling 2014, 144-153. Dublin, Ireland: Aug 2014.
We examine lexical access preferences and constraints in computing multiword expression associations from the standpoint of a high-impact extrinsic task-based performance measure, namely semantic machine translation evaluation. In automated MT evaluation metrics, machine translations are compared against human reference translations, which are almost never worded exactly the same way except in the most trivial of cases. Because of this, one of the most important factors in correctly predicting semantic translation adequacy is the accuracy of recognizing alternative lexical realizations of the same multiword expressions in semantic role fillers. Our results comparing bag-of-words, maximum alignment, and inversion transduction grammars indicate that cognitively motivated ITGs provide superior lexical access characteristics for multiword expression associations, leading to state-of-the-art improvements in correlation with human adequacy judgments.
- Chi-kiu LO, Meriem BELOUCIF, Markus SAERS, and Dekai WU.
"XMEANT: Better semantic MT evaluation without reference translations".
52nd Annual Meeting of the Association for Computational
Linguistics (ACL
2014), 765-771. Baltimore, Maryland: Jun 2014.
We introduce XMEANT---a new cross-lingual version of the semantic frame based MT evaluation metric MEANT---which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references. Previous work established that MEANT reflects translation adequacy with state-of-the-art accuracy, and optimizing MT systems against MEANT robustly improves translation quality. However, to go beyond tuning weights in the loglinear SMT model, a cross-lingual objective function that can deeply integrate semantic frame criteria into the MT training pipeline is needed. We show that cross-lingual XMEANT outperforms monolingual MEANT by (1) replacing the monolingual context vector model in MEANT with simple translation probabilities, and (2) incorporating bracketing ITG constraints.
- Chi-kiu LO and Dekai WU.
"On the reliability
and inter-annotator agreement of human semantic MT evaluation via HMEANT".
Ninth International Conference on Language Resources and
Evaluation (LREC-2014). Reykjavik, Iceland: May 2014.
We present analyses showing that HMEANT is a reliable, accurate and fine-grained semantic frame based human MT evaluation metric with high inter-annotator agreement (IAA) and correlation with human adequacy judgments, despite only requiring a minimal training of about 15 minutes for lay annotators. Previous work shows that the IAA on the semantic role labeling (SRL) subtask within HMEANT is over 70%. In this paper we focus on (1) the IAA on the semantic role alignment task and (2) the overall IAA of HMEANT. Our results show that the IAA on the alignment task of HMEANT is over 90% when humans align SRL output from the same SRL annotator, which shows that the instructions on the alignment task are sufficiently precise, although the overall IAA where humans align SRL output from different SRL annotators falls to only 61% due to the pipeline effect on the disagreement in the two annotation task. We show that instead of manually aligning the semantic roles using an automatic algorithm not only helps maintaining the overall IAA of HMEANT at 70%, but also provides a finer-grained assessment on the phrasal similarity of the semantic role fillers. This suggests that HMEANT equipped with automatic alignment is reliable and accurate for humans to evaluate MT adequacy while achieving higher correlation with human adequacy judgments than HTER.
- Karteek ADDANKI and Dekai WU.
"Evaluating Improvised Hip Hop Lyrics---Challenges and Observations".
Ninth International Conference on Language Resources and
Evaluation (LREC-2014). Reykjavik, Iceland: May 2014.
We investigate novel challenges involved in comparing model performance on the task of improvising responses to hip hop lyrics and discuss observations regarding inter-evaluator agreement on judging improvisation quality. We believe the analysis serves as a first step toward designing robust evaluation strategies for improvisation tasks, a relatively neglected area to date. Unlike most natural language processing tasks, improvisation tasks suffer from a high degree of subjectivity, making it difficult to design discriminative evaluation strategies to drive model development. We propose a simple strategy with fluency and rhyming as the criteria for evaluating the quality of generated responses, which we apply to both our inversion transduction grammar based FREESTYLE hip hop challenge-response improvisation system, as well as various contrastive systems. We report inter-evaluator agreement for both English and French hip hop lyrics, and analyze correlation with challenge length. We also compare the extent of agreement in evaluating fluency with that of rhyming, and quantify the difference in agreement with and without precise definitions of evaluation criteria.
- Dekai WU. "Music and language learning via stochastic transduction grammar induction". The Syntax of Mind: Language, Music, Art. Vienna: Apr 2014.
[forthcoming]
- Dekai WU. "The Magic Number 4: Evolutionary Pressures on Semantic Frame Structure". 10th International Conference on the Evolution of Language (Evolang X). Vienna: Apr 2014.
We propose that the ``magic number 4'' puzzle of argument structure in semantic frames across most human languages is explained by selection pressures given inherent computational efficiency properties naturally arising from fundamental combinatorial mathematics of compositionality. Irrespective of language, school, or theoretical bias, linguists have long observed that what are now generally called ``semantic frames'' empirically bear a maximum limit of four core arguments per frame, for unknown reasons. We explain how this limit would automatically emerge as a consequence of evolutionary preference for the formal equivalence class of inversion transductions as an optimal balance between expressivity and fast tractable polynomial-time language learning and interpretation/transduction between different representation languages.
- Dekai WU. "Learning Music and
Language with Stochastic Transduction Grammars". EvoMus Workshop on The Evolution of Music and Language in a Comparative Perspective. Vienna: Apr 2014.
We discuss how our computational models for learning probabilistic structural relationships between pairs of compositional languages reflect fundamental cognitive capacities that underlie both human language and music processing. Formally, a single language can be described by a (probabilistically weighted) set of patterns; likewise, a bilingual/bimodal transduction can be described by a set of structurally related pattern pairs. The probabilistic rules in stochastic transduction grammars, which we pioneered and are widely used in statistical machine translation, associate compositional patterns between two representation languages—a generalized version of Saussurean signs. Our cognitively motivated transduction grammar induction methods learn by bootstrapping progressively more complex classes of transductions, from finite-state to linear to inversion transductions. Linguistically, inversion transduction grammars explain via combinatorial efficiency principles why natural languages evolved universally to impose a long-unexplained empirical limit on the number of core arguments in semantic frames to the “magic number 4”. Musically, our work demonstrates that the same transduction grammar induction processes model the learning and use of compositional relationships between parallel musical representation languages—languages for example like hypermetrically structured rhythm parts in flamenco, or lines and rhymes in hip hop.
- Chi-kiu LO, Meriem BELOUCIF, and Dekai WU. "Improving machine
translation into Chinese by tuning against Chinese MEANT". 10th International Workshop on Spoken Language Translation (IWSLT 2013). Heidelberg, Germany: Dec 2013.
We present the first ever results showing that Chinese MT output is significantly improved by tuning a MT system against a semantic frame based objective function, MEANT, rather than an n-gram based objective function, BLEU, as measured across commonly used metrics and different test sets. Recent work showed that by preserving the meaning of the translations as captured by semantic frames in the training process, MT systems for translating into English on both formal and informal genres are constrained to produce more adequate translations by making more accurate choices on lexical output and reordering rules. In this paper we describe our experiments in IWSLT 2013 TED talk MT tasks on tuning MT systems against MEANT for translating into Chinese and English respectively. We show that the Chinese translation output benefits more from tuning a MT system against MEANT than the English translation output due to the ambiguous nature of word boundaries in Chinese. Our encouraging results show that using MEANT is a promising alternative to BLEU in both evaluating and tuning MT systems to drive the progress of MT research across different languages.
- Chi-kiu LO and Dekai WU. "Human Semantic MT Evaluation with HMEANT for IWSLT 2013". 10th International Workshop on Spoken Language Translation (IWSLT 2013). Heidelberg, Germany: Dec 2013.
We present the results of large-scale human semantic MT evaluation with HMEANT on the IWSLT 2013 German-English MT and SLT tracks and show that HMEANT evaluates the performance of the MT systems differently compared to BLEU and TER. Together with the references, all the translations are annotated by annotators who are native English speakers in both semantic role labeling stage and role filler alignment stage of HMEANT. We obtain high inter-annotator agreement and low annotation time costs which indicate that it is feasible to run a large-scale human semantic MT evaluation campaign using HMEANT. Our results also show that HMEANT is a robust and reliable semantic MT evaluation metric for running large-scale evaluation campaigns as it is inexpensive and simple while maintaining the semantic representational transparency to provide a perspective which is different from BLEU and TER in order to understand the performance of the state-of-the-art MT systems.
- Markus SAERS and Dekai WU. "Learning Bilingual
Categories in Unsupervised Inversion Transduction Grammar
Induction". 13th International Conference on Parsing
Technologies (IWPT 2013). Nara, Japan: Nov 2013. [Short version
entitled "Unsupervised Learning of Bilingual
Categories in Inversion Transduction Grammar
Induction" at IWSLT 2013.]
We present the first known experiments incorporating unsupervised bilingual nonterminal category learning within end-to-end fully unsupervised transduction grammar induction using matched training and testing models. Despite steady recent progress, such induction experiments until now have not allowed for learning differentiated nonterminal categories. We divide the learning into two stages: (1) a bootstrap stage that generates a large set of categorized short transduction rule hypotheses, and (2) a minimum conditional description length stage that simultaneously prunes away less useful short rule hypotheses, while also iteratively segmenting full sentence pairs into useful longer categorized transduction rules. We show that the second stage works better when the rule hypotheses have categories than when they do not, and that the proposed conditional description length approach combines the rules hypothesized by the two stages better than a mixture model does. We also show that the compact model learned during the second stage can be further improved by combining the result of different iterations in a mixture model. In total, we see a jump in BLEU score, from 17.53 for a standalone minimum description length baseline with no category learning, to 20.93 when incorporating category induction on a Chinese--English translation task.
- Dekai WU. "Simultaneous Unsupervised
Learning of Flamenco Metrical Structure, Hypermetrical Structure,
and Multipart Structural Relations". 14th International
Society for Music Information Retrieval Conference (ISMIR
2013), 155-160. Curitiba, Brazil: Nov 2013.
We show how a new unsupervised approach to learning musical relationships can exploit Bayesian MAP induction of stochastic transduction grammars to overcome the challenges of learning complex relationships between multiple rhythmic parts that previously lay outside the scope of general computational approaches to music structure learning. A good illustrative genre is flamenco, which employs not only regular but also irregular hypermetrical structures that rapidly switch between 3/4 and 6/8 mediocompas blocks. Moreover, typical flamenco idioms employ heavy syncopation and sudden, misleading off-beat accents and patterns, while often elliding the downbeat accents that humans as well as existing meter-finding algorithms rely on, thus creating a high degree of listener “surprise'” that makes not only the structural relations, but even the metrical structure itself, ellusive to learn. Flamenco musicians rely on both complex regular hypermetrical knowledge as well as irregular real-time clues to recognize when to switch meters and patterns. Our new approach envisions this as an integrated problem of learning a bilingual transduction, i.e., a structural relation between two languages—where there are different musical languages of, say, flamenco percussion versus zapateado footwork or palmas hand clapping. We apply minimum description length criteria to induce transduction grammars that simultaneously learn (1) the multiple metrical structures, (2) the hypermetrical structure that stochastically governs meter switching, and (3) the probabilistic transduction relationship between patterns of different rhythmic languages that enables musicians to predict when to switch meters and how to select patterns depending on what fellow musicians are generating.
- Dekai WU, Karteek ADDANKI, Markus SAERS, and Meriem BELOUCIF. "Learning
to Freestyle: Hip Hop Challenge-Response Induction via Transduction
Rule Segmentation". 2013 Conference on Empirical Methods in
Natural Language Processing (EMNLP 2013), 102-112. Seattle: Oct 2013.
We attack an inexplicably under-explored language genre of spoken language—lyrics in music—via completely unsupervised induction of an SMT-style stochastic transduction grammar for hip hop lyrics, yielding a fully-automatically learned challenge-response system that produces rhyming lyrics given an input. Unlike previous efforts, we choose the domain of hip hop lyrics, which is particularly unstructured and noisy. A novel feature of our approach is that it is completely unsupervised and requires no a priori linguistic or phonetic knowledge. In spite of the level of difficulty of the challenge, the model nevertheless produces fluent output as judged by human evaluators, and performs significantly better than widely used phrase-based SMT models upon the same task.
- Markus SAERS and Dekai WU. "Bayesian
Induction of Bracketing Inversion Transduction
Grammars". 6th International Joint Conference on Natural
Language Processing (IJCNLP 2013), 1158-1166. Nagoya, Japan: Oct 2013.
We present a novel approach to learning phrasal inversion transduction grammars via Bayesian MAP (maximum a posteriori) or information-theoretic MDL (minimum description length) model optimization so as to incorporate simultaneously the choices of model structure as well as parameters. In comparison to most current SMT approaches, the model learns phrase translation lexicons that (a) do not require enormous amounts of run-time memory, (b) contain significantly less redundancy, and (c) provide an obvious basis for generalization to abstract translation schemas. Model structure choice is biased by a description length prior, while parameter choice is driven by data likelihood biased by a parameter prior. The search over possible model structures is made feasible by a novel top-down rule segmenting heuristic which efficiently incorporates estimates of the posterior probabilities. Since the priors reward model parsimony, the learned grammar is very concise and still performs significantly better than the maximum likelihood driven bottom-up rule chunking baseline.
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "Segmenting
vs. Chunking Rules: Unsupervised ITG Induction via Minimum Conditional
Description Length". Recent Advances in Natural Language
Processing (RANLP 2013). Hissar, Bulgaria: Sep 2013.
We present an unsupervised learning model that induces phrasal inversion transduction grammars by introducing a minimum conditional description length (CDL) principle to drive search over a space defined by two opposing extreme types of ITGs. Our approach attacks the difficulty of acquiring more complex longer rules when inducing inversion transduction grammars via unsupervised bottom-up chunking, by augmenting its model search with top-down segmentation that minimizes CDL, resulting in significant translation accuracy gains. Chunked rules tend to be relatively short; long rules are hard to learn through chunking, as the smaller parts of the long rules may not necessarily be good translations themselves. Our objective criterion is a conditional adaptation of the notion of description length, that is conditioned on a fixed preexisting model, in this case the initial chunked ITG. The notion of minimum CDL (MCDL) facilitates a novel strategy for avoiding the pitfalls of premature pruning in chunking approaches, by incrementally splitting an ITG with reference to a second ITG that conditions this search.
- Dekai WU, Karteek ADDANKI, and Markus SAERS. "Modeling Hip
Hop Challenge-Response Lyrics as Machine
Translation". Machine Translation Summit XIV (MT Summit
2013). Nice, France: Sep 2013.
We cast the problem of hip hop lyric generation as a translation problem, automatically learn a machine translation system that accepts hip hop lyric challenges and improvises rhyming responses, and show that improving the training data by learning an unsupervised rhyme detection scheme further improves performance. Our approach using unsupervised induction of stochastic transduction grammars is the first to apply the learning algorithms of SMT to the woefully under-explored genre of lyrics in music. A novel feature of our model is that it is completely unsupervised and does not make use of any a priori linguistic or phonetic information. Unlike the handful of previous approaches to modeling lyrics, we choose the domain of hip hop lyrics which is particularly noisy and unstructured. In order to cope with the noisy nature of the data in this domain, we compare the effect of two data selection schemes on the quality of the responses generated, and show the superiority of selection via a dedicated rhyme scheme detector that is also acquired through unsupervised learning. We also propose two strategies to mitigate the effect of disfluencies in the data which are common in the domain of hip hop lyrics, on the performance of our model. Despite the particularly noisy and unstructured nature of the domain, our model produces fluent and rhyming responses compared to a standard phrase based SMT baseline in human evaluations.
- Chi-kiu LO and Dekai WU. "Can informal genres be better translated by tuning on automatic semantic metrics?". Machine Translation Summit XIV (MT Summit
2013). Nice, France: Sep 2013.
Even though the informal language of spoken text and web forum genres presents great difficulties for automatic semantic role labeling, we show that surprisingly, tuning statistical machine translation against the SRL-based objective function, MEANT, nevertheless leads more robustly to adequate translations of these informal genres than tuning against BLEU or TER. The accuracy of automatic semantic parsing has been shown to degrade significantly on informal genres such as speech or tweets, compared to formal genres like newswire. In spite of this, human evaluators preferred translations from MEANT-tuned systems over the BLEU- or TER-tuned ones by a significant margin. Error analysis indicates that one of the major sources of errors in automatic shallow semantic parsing of informal genres is failure to identify the semantic frame for copula or existential senses of ``be''. We show that MEANT's correlation with human adequacy judgment on informal text is improved by reconstructing the missing semantic frames for ``be''. Our tuning approach is independent of the translation model architecture, so any SMT model can potentially benefit from the semantic knowledge incorporated through our approach.
- Dekai WU, Karteek ADDANKI, and Markus SAERS. "FREESTYLE: A Challenge-Response System for Hip Hop Lyrics via Unsupervised Induction of Stochastic Transduction Grammars". 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon, France: Aug 2013.
We attack an inexplicably under-explored language genre of spoken language—lyrics in music—via completely unsupervised induction of an SMT-style stochastic transduction grammar for hip hop lyrics, yielding a fully-automatically learned challenge-response system that produces rhyming lyrics given an input. Unlike previous efforts, we choose the domain of hip hop lyrics, which is particularly unstructured and noisy. A novel feature of our approach is that it is completely unsupervised and requires no a priori linguistic or phonetic knowledge. In spite of the level of difficulty of the challenge, the model nevertheless produces fluent output as judged by human evaluators, and performs significantly better than widely used phrase-based SMT models upon the same task.
- Chi-kiu LO, Karteek ADDANKI, Markus SAERS, and Dekai WU.
"Improving machine translation by training against an automatic semantic frame based evaluation metric".
51st Annual Meeting of the Association for Computational
Linguistics (ACL
2013), 375-381. Sofia, Bulgaria: Jul 2013.
We present the first ever results showing that tuning a machine translation system against a semantic frame based objective function, MEANT, produces more robustly adequate translations than tuning against BLEU or TER as measured across commonly used metrics and human subjective evaluation. Moreover, for informal web forum data, human evaluators preferred MEANT-tuned systems over BLEU- or TER-tuned systems by a significantly wider margin than that for formal newswire---even though automatic semantic parsing might be expected to fare worse on informal language. We argue that by preserving the meaning of the translations as captured by semantic frames right in the training process, an MT system is constrained to make more accurate choices of both lexical and reordering rules. As a result, MT systems tuned against semantic frame based MT evaluation metrics produce output that is more adequate. Training a machine translation system against a semantic frame based objective function is independent of the translation model paradigm, so, any translation model can benefit from the semantic knowledge incorporated through our approach. In this paper, we show for the first time that tuning an MT system against MEANT significantly improves translation adequacy on formal, as well as informal text compared to tuning against BLEU or TER.
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "Unsupervised
Transduction Grammar Induction via Minimum Description
Length". Second Workshop on Hybrid Approaches to Translation
(HyTra 2013), at ACL 2013, 67-73. Sofia, Bulgaria: Jul 2013.
We present a minimalist, unsupervised learning model that induces relatively clean phrasal inversion transduction grammars by employing the minimum description length principle to drive search over a space defined by two opposing extreme types of ITGs. In comparison to most current SMT approaches, the model learns a very parsimonious phrase translation lexicons that provide an obvious basis for generalization to abstract translation schemas. To do this, the model maintains internal consistency by avoiding use of mismatched or unrelated models, such as word alignments or probabilities from IBM models. The model introduces a novel strategy for avoiding the pitfalls of premature pruning in chunking approaches, by incrementally splitting an ITG while using a second ITG to guide this search.
- Chi-kiu LO and Dekai WU.
"MEANT at WMT 2013: A Tunable, Accurate yet Inexpensive Semantic Frame Based MT Evaluation Metric".
8th Workshop on Statistical Machine Translation (at ACL
2013), 422-428. Sofia, Bulgaria: Jul 2013.
The linguistically transparent MEANT and UMEANT metrics are tunable, simple yet highly effective, fully automatic approximation to the human HMEANT MT evaluation metric which measures semantic frame similarity between MT output and reference translations. In this paper, we describe HKUST's submission to the WMT 2013 metrics evaluation task, MEANT and UMEANT. MEANT is optimized by tuning a small number of weights—one for each semantic role label—so as to maximize correlation with human adequacy judgment on a development set. UMEANT is an unsupervised version where weights for each semantic role label are estimated via an inexpensive unsupervised approach, as opposed to MEANT's supervised method relying on more expensive grid search. In this paper, we present a battery of experiments for optimizing MEANT on different development sets to determine the set of weights that maximize MEANT's accuracy and stability. Evaluated on test sets from the WMT 2012/2011 metrics evaluation, both MEANT and UMEANT achieve competitive correlations with human judgments using nothing more than a monolingual corpus and an automatic shallow semantic parser.
- Karteek ADDANKI and Dekai WU. "Unsupervised Rhyme Scheme Identification in Hip Hop Lyrics using Hidden Markov Models". In Adrian-Horia DEDIU, Carlos MARTÍN-VIDE, Ruslan MITKOV, and Bianca TRUTHE (editors),
First International
Conference on Statistical Language and Speech Processing (SLSP
2013), 39-50. Tarragona, Spain: Jul 2013.
Lecture Notes in
Artificial Intelligence 7978, 2013. Heidelberg:
Springer-Verlag.
We attack a woefully under-explored language genre—lyrics in music—introducing a novel hidden Markov model based method for completely unsupervised identification of rhyme schemes in hip hop lyrics, which to the best of our knowledge, is the first such effort. Unlike previous approaches that use supervised or semi-supervised approaches for the task of rhyme scheme identification, our model does not assume any prior phonetic or labeling information whatsoever. Also, unlike previous work on rhyme scheme identification, we attack the difficult task of hip hop lyrics in which the data is more highly unstructured and noisy. A novel feature of our approach comes from the fact that we do not manually segment the verses in lyrics according to any pre-specified rhyme scheme, but instead use a number of hidden states of varying rhyme scheme lengths to automatically impose a soft segmentation. In spite of the level of difficulty of the challenge, we nevertheless were able to obtain a surprisingly high precision of 35.81\% and recall of 57.25\% on the task of identifying the rhyming words, giving a total f-score of 44.06\%. These encouraging results were obtained in the face of highly noisy data, lack of clear stanza segmentation, and a very wide variety of rhyme schemes used in hip hop.
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "Iterative Rule
Segmentation under Minimum Description Length for Unsupervised
Transduction Grammar Induction". In Adrian-Horia DEDIU, Carlos MARTÍN-VIDE, Ruslan MITKOV, and Bianca TRUTHE (editors),
First International
Conference on Statistical Language and Speech Processing (SLSP
2013), 224-235. Tarragona, Spain: Jul 2013.
Lecture Notes in
Artificial Intelligence 7978, 2013. Heidelberg:
Springer-Verlag.
We argue that for purely incremental unsupervised learning of phrasal inversion transduction grammars, a minimum description length driven, iterative top-down rule segmentation approach that is the polar opposite of our previous bottom-up iterative rule chunking mode yields significantly better translation accuracy and grammar parsimony. We still aim for unsupervised bilingual grammar induction such that training and testing are optimized upon the same exact underlying model—a basic principle of machine learning and statistical prediction that has become unduly ignored in statistical machine translation models of late, where most decoders are badly mismatched to the training assumptions. Our novel approach learns phrasal translations by recursively subsegmenting the training corpus, as opposed to our previous model—where we start with a token-based transduction grammar and iteratively build larger chunks. Moreover, the rule segmentation decisions in our approach are driven by a minimum description length objective, whereas the rule chunking decisions were driven by a maximum likelihood objective. We demonstrate empirically how this trades off maximum likelihood against model size, aiming for a more parsimonious grammar that escapes the perfect overfitting to the training data that we start out with, and gradually generalizes to previously unseen sentence translations so long as the model shrinks enough to warrant a looser fit to the training data. Experimental results show that our approach produces a significantly smaller and better model than the chunking-based approach.
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "Combining
Top-down and Bottom-up Search for Unsupervised Induction of
Transduction Grammars". Proceedings of SSST-7, Seventh Workshop on Syntax and Structure
in Statistical Translation (at NAACL HLT 2013), 48-57. Atlanta: Jun 2013.
We show that combining both bottom-up rule chunking and top-down rule segmentation search strategies in purely unsupervised learning of phrasal inversion transduction grammars yields significantly better translation accuracy than either strategy alone. Previous approaches have relied only on one search strategy or the other, such as the bottom-up iterative rule chunking approach of Saers et al. (2012). The key property that lets us combine the bottom-up work of Saers et al. with our opposing iterative top-down rule segmentation strategy so effortlessly is that they also stay strictly within a pure transduction grammar framework; the integration of their efforts into our own work is therefore completely seamless. Both their approach and ours share a common aim of unsupervised bilingual grammar induction under matching models during both training and testing---instead of decoding under a completely different model architecture than what is assumed during the training phases, which violates an elementary principle of machine learning and statistics. But whereas Saers et al.'s model only incrementally builds longer transduction rules by chunking smaller transduction rules bottom-up, we introduce a complementary top-down model that incrementally builds shorter transduction rules by segmenting larger transduction rules top-down, driven by a minimum description length objective that trades off maximum likelihood against model size. We show empirically that combining the more liberal rule chunking model of Saers et al. with our more conservative rule segmentation model results in significantly better translations than either strategy in isolation.
- Markus SAERS, Karteek ADDANKI, and Dekai WU. "From Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction". 24th International Conference on Computational
Linguistics (Coling-2012), 2325-2340. Mumbai: Dec 2012.
We report a wide range of comparative experiments establishing for the first time contrastive foundations for a completely unsupervised approach to bilingual grammar induction that is cognitively oriented toward early category formation and phrasal chunking in the bootstrapping process up the expressiveness hierarchy from finite-state to linear to inversion transduction grammars. We show a consistent improvement in terms of cross-entropy throughout the bootstrapping process, as well as promising decoding experiments using the learned grammars. Rather than relying on external resources such as parses, POS tags or dictionaries, our method is fully unsupervised (in the way this term is typically understood in the machine translation community). This means that the bootstrapping can only rely on information gathered during the previous step, which necessitates some strategy for expanding the expressiveness of the grammars. We present principled approaches for moving from finite-state to linear transduction grammars as well as from linear to inversion transduction grammars. It is our belief that early, integrated category formation and phrasal chunking in this unsupervised bootstrapping process is better aligned to child language acquisition. Finally, we also report exploratory decoding results using some of the learned grammars. This is the first step towards an end-to-end grammar-based statistical machine translation system.
- Anand Karthik TUMULURU, Chi-kiu LO and Dekai WU.
"Accuracy and robustness in measuring the lexical similarity
of semantic role fillers for automatic semantic MT evaluation".
26th Pacific Asia Conference on Language,Information and
Computation (PACLIC-26), 574-581. Bali: Nov 2012.
We present larger-scale evidence overturning previous results, showing that among the many alternative phrasal lexical similarity measures based on word vectors, the Jaccard coefficient most increases the robustness of MEANT, the recently introduced, fully-automatic, state-of-the-art semantic MT evaluation metric. MEANT critically depends on phrasal lexical similarity scores in order to automatically determine which semantic role fillers should be aligned between reference and machine translations. The robustness experiments were conducted across various data sets following NIST MetricsMaTr protocols, showing higher Kendall correlation with human adequacy judgments against BLEU, METEOR (with and without synsets), WER, PER, TER and CDER. The Jaccard coefficient is shown to be more discriminative and robust than cosine similarity, the Min/Max metric with mutual information, Jensen Shannon divergence, or the Dice's coefficient. We also show that with Jaccard coefficient as the phrasal lexical similarity metric, individual word token scores are best aggregated into phrasal segment similarity scores using the geometric mean, rather than either the arithmetic mean or competitive linking style word alignments. Furthermore, we show empirically that a context window size of 5 captures the optimal amount of information for training the word vectors. The combined results suggest a new formulation of MEANT with significantly improved robustness across data sets.
- Chi-kiu LO and Dekai WU.
"Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics".
Proceedings of SSST-6, Sixth Workshop on Syntax and Structure
in Statistical Translation (at ACL 2012). Jeju, South Korea: Jul 2012.
We present an unsupervised approach to estimate the appropriate degree of contribution of each semantic role type for semantic translation evaluation, yielding a semantic MT evaluation metric whose correlation with human adequacy judgments is comparable to that of recent supervised approaches but without the high cost of a human-ranked training corpus. Our new unsupervised estimation approach is motivated by an analysis showing that the weights learned from supervised training are distributed in a similar fashion to the relative frequencies of the semantic roles. Empirical results show that even without a training corpus of human adequacy rankings against which to optimize correlation, using instead our relative frequency weighting scheme to approximate the importance of each semantic role type leads to a semantic MT evaluation metric that correlates comparable with human adequacy judgments to previous metrics that require far more expensive human rankings of adequacy over a training corpus. As a result, the cost of semantic MT evaluation is greatly reduced.
- Ondřej BOJAR and Dekai WU.
"Towards a Predicate-Argument Evaluation for MT".
Proceedings of SSST-6, Sixth Workshop on Syntax and Structure
in Statistical Translation (at ACL 2012). Jeju, South Korea: Jul 2012.
HMEANT (Lo and Wu, 2011a) is a manual MT evaluation technique that focuses on predicate-argument structure of the sentence. We relate HMEANT to an established linguistic theory, highlighting the possibilities of reusing existing knowledge and resources for interpreting and automating HMEANT. We apply HMEANT to a new language, Czech in particular, by evaluating a set of English- to-Czech MT systems. HMEANT proves to correlate with manual rankings at the sentence level better than a range of automatic metrics. However, the main contribution of this paper is the identification of several issues of HMEANT annotation and our proposal on how to resolve them.
- Chi-kiu LO, Anand Karthik TUMULURU and Dekai WU.
"Fully Automatic Semantic MT Evaluation".
7th Workshop on Statistical Machine Translation (at NAACL 2012). Montreal: Jun 2012.
We introduce the first fully automatic, fully semantic frame based MT evaluation metric, MEANT, that outperforms all other commonly used automatic metrics in correlating with human judgment on translation adequacy. Recent work on HMEANT, which is a human metric, indicates that machine translation can be better evaluated via semantic frames than other evaluation paradigms, requiring only minimal effort from monolingual humans to annotate and align semantic frames in the reference and machine translations. We propose a surprisingly effective Occam's razor automation of HMEANT that combines standard shallow semantic parsing with a simple maximum weighted bipartite matching algorithm for aligning semantic frames. The matching criterion is based on lexical similarity scoring of the semantic role fillers through a simple context vector model which can readily be trained using any publicly available large monolingual corpus. Sentence level correlation analysis, following standard NIST MetricsMATR protocol, shows that this fully automated version of HMEANT achieves significantly higher Kendall correlation with human adequacy judgments than BLEU, NIST, METEOR, PER, CDER, WER, or TER. Furthermore, we demonstrate that performing the semantic frame alignment automatically actually tends to be just as good as performing it manually. Despite its high performance, fully automated MEANT is still able to preserve HMEANT's virtues of simplicity, representational transparency, and inexpensiveness.
- Karteek ADDANKI, Chi-kiu LO, Markus SAERS, and Dekai WU.
"LTG vs. ITG Coverage of Cross-Lingual Verb Frame Alternations".
16th Annual Conference of the European Association for Machine Translation (EAMT-2012). Trento, Italy:
May 2012.
We show in an empirical study that not only did all cross-lingual alternations of verb frames across Chinese–English translations fall within the reordering capacity of Inversion Transduction Grammars, but more surprisingly, about 97% of the alternations were expressible by the far more restrictive Linear Transduction Grammars. Also, about 71% of the cross-lingual verb frame alternations turn out to be monotonic even for diverse language pairs such as Chinese–English. We also observe that a source verb frame alternation pattern translates into a small subset of the possible target verb frame alternation patterns, based on the construction of the source sentence and the frame set definitions. As a part of our evaluation, we also present a novel linear time algorithm to determine whether a particular syntactic alignment falls within the expressiveness of Linear Transduction Grammars. To our knowledge, this is the first study that attempts to analyze the cross-lingual alternation behavior of semantic frames and the extent of their coverage under syntax-based machine translation formalisms.
- Simon SHI, Pascale FUNG, Emmanuel PROCHASSON, Chi-kiu LO, and Dekai WU.
"Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web".
5th International Joint Conference on Natural Language
Processing (IJCNLP 2011). Chiang Mai, Thailand: Nov 2011.
We propose a content-based approach to mine parallel resources from the entire web using cross lingual information retrieval (CLIR) with search query relevance score (SQRS). Our method improves mining recall by going beyond URL matching to find parallel documents from non-parallel sites. We introduce SQRS to improve the precision of mining. Our method makes use of search engines to query for target document given each source document and therefore does not require downloading target language documents in batch mode, reducing computational cost on the local machines and bandwidth consumption. We obtained a very high mining precision (88%) on the parallel documents by the pure CLIR approach. After extracting parallel sentences from the mined documents and using them to train an SMT system, we found that the SMT performance, with 29.88 BLEU score, is comparable to that obtained with high quality manually translated parallel sentences with 29.54 BLEU score, illustrating the excellent quality of the mined parallel material.
- Markus SAERS, Dekai WU, and Chris QUIRK.
"On the Expressivity of Linear Transductions".
Machine Translation Summit XIII (MT Summit
2011). Xiamen, China: Sep 2011.
We investigate the formal expressivity properties of linear transductions, the class of transductions generated by linear transduction grammars, linear inversion transduction grammars and preterminalized linear inversion transduction grammars. While empirical results such as those in previous work are of course an ultimate test of modeling adequacy for machine translation applications, it is equally important to understand the formal theoretical properties of any such new representation. An important part of the expressivity of a transduction is the possibility to align tokens between the two languages generated. We refer to the number of different alignments that are allowed under a transduction as its weak alignment capacity. This aspect of expressivity is quantified for linear transductions using preterminalized linear inversion transduction grammars, and compared to the expressivity of finite-state transductions, inversion transductions and syntax-directed transductions.
- Markus SAERS and Dekai WU.
"Linear Transduction Grammars and Zipper Finite-State Transducers".
Recent Advances in Natural Language Processing (RANLP
2011). Hissar, Bulgaria: Sep 2011.
We examine how the recently explored class of linear transductions relates to finite-state models. Historically neglected, linear transductions are gaining interest in statistical machine translation modeling, due to recent empirical studies demonstrating that their attractive balance of generative capacity and complexity characteristics lead to improved accuracy and speed in learning alignment and translation models. Such work has until now characterized the class of linear transductions in terms of either (a) linear inversion transduction grammars (LITGs) which are linearized restrictions of inversion transduction grammars or (b) linear transduction grammars (LTGs) which are bilingualized generalizations of linear grammars. In this paper, we offer a new alternative characterization of linear transductions, as relating four finite-state languages to each other. In other words, linear transductions are finite-state in four dimensions. We introduce the devices of zipper finite-state automata (ZFSAs) and zipper finite-state transducers (ZFSTs) in order to construct the bridge between linear transductions and finite-state models.
- Markus SAERS, Dekai WU, Chi-kiu LO, and Karteek ADDANKI.
"Speech
Translation with Grammar Driven Probabilistic Phrasal Bilexica Extraction".
12th Annual Conference of the International Speech
Communication Association (Interspeech 2011). Florence, Italy:
Aug 2011.
We introduce a new type of transduction grammar that allows for learning of probabilistic phrasal bilexica, leading to a significant improvement in spoken language translation accuracy. The current state-of-the-art in statistical machine translation relies on a complicated and crude pipeline to learn probabilistic phrasal bilexica---the very core of any speech translation system. In this paper, we present a more principled approach to learning probabilistic phrasal bilexica, based on stochastic transduction grammar learning applicable to speech corpora.
- Chi-kiu LO and Dekai WU.
"SMT vs. AI redux: How semantic frames evaluate MT more accurately".
22nd International Joint Conference on Artificial Intelligence (IJCAI-11). Barcelona: Jul 2011.
We argue for an alternative paradigm in evaluating machine translation quality that is strongly empirical but more accurately reflects the utility of translations, by returning to a representational foundation based on AI oriented lexical semantics, rather than the superficial flat n-gram and string representations recently dominating the field. Driven by such metrics as BLEU and WER, current SMT frequently produces unusable translations where the semantic event structure is mistranslated: who did what to whom, when, where, why, and how? We argue that it is time for a new generation of more “intelligent'' automatic and semi-automatic metrics, based clearly on getting the structure right at the lexical semantics level. We show empirically that it is possible to use simple PropBank style semantic frame representations to surpass all currently widespread metrics' correlation to human adequacy judgments, including even HTER. We also show that replacing human annotators with automatic semantic role labeling still yields much of the advantage of the approach. We combine the best of both worlds: from an SMT perspective, we provide superior yet low-cost quantitative objective functions for translation quality; and yet from an AI perspective, we regain the representational transparency and clear reflection of semantic utility of structural frame-based knowledge representations.
- Chi-kiu LO and Dekai WU.
"MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility via semantic frames".
49th Annual Meeting of the Association for Computational
Linguistics: Human Language Technologies (ACL HLT
2011). Portland, Oregon: Jun 2011.
We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, non-automatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacy judgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semi-automated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER.
- Markus SAERS and Dekai WU.
"Reestimation of Reified Rules in Semiring Parsing and Biparsing".
Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and
Structure in Statistical Translation (at ACL 2011). Portland,
Oregon: Jun 2011.
We show that reifying the rules from hyperedge weights to first-class graph nodes automatically gives us rule expectations in any kind of grammar expressible as a deductive system, without any explicit algorithm for calculating rule expectations (such as the inside-outside algorithm). This gives us expectation maximization training for any grammar class with a parsing algorithm that can be stated as a deductive system, for free. Having such a framework in place accelerates turn-over time for experimenting with new grammar classes and parsing algorithms---to implement a grammar learner, only the parse forest construction has to be implemented.
- Chi-kiu LO and Dekai WU.
"Structured vs. Flat Semantic Role Representations for Machine Translation
Evaluation".
Proceedings of SSST-5, Fifth Workshop on Syntax, Semantics and
Structure in Statistical Translation (at ACL 2011). Portland,
Oregon:
Jun 2011.
We argue that failing to capture the degree of contribution of each semantic frame in a sentence explains puzzling results in recent work on the MEANT family of semantic MT evaluation metrics, which have disturbingly indicated that dissociating semantic roles and fillers from their predicates actually improves correlation with human adequacy judgments even though, intuitively, properly segregating event frames should more accurately reflect the preservation of meaning. Our analysis finds that both properly structured and flattened representations fail to adequately account for the contribution of each semantic frame to the overall sentence. We then show that the correlation of HMEANT, the human variant of MEANT, can be greatly improved by introducing a simple length-based weighting scheme that approximates the degree of contribution of each semantic frame to the overall sentence. The new results also show that, without flattening the structure of semantic frames, weighting the degree of each frame's contribution gives HMEANT higher correlations than the previously best-performing flattened model, as well as HTER.
- Markus SAERS and Dekai WU.
"Principled Induction of Phrasal Bilexica".
15th Annual Conference of the European Association for Machine
Translation (EAMT-2011). Leuven, Belgium: May 2011.
We aim to replace the long and complicated, pipeline employed to produce probabilistic phrasal bilexica with a theoretically principled, grammar based, approach. To this end, we introduce a phrasal generalization of linear transduction grammars (LTGs), and an iterative induction method that works on raw corpora. Surface-based statistical machine translation (SMT) systems rely heavily on capturing the immediate context of words to be able to translate them accurately. It would be desirable to bring this power into structured SMT systems, but this is far from a trivial problem. Our immediate aim is to build a probabilistic bilexicon, which means that we would like to have a grammar where the entries constitute a natural probability distribution. Since this is not easily achievable with LTGs or linear inversion transduction grammars (LITGs), we introduce the class of preterminalized LITGs, which are equivalent to both LTGs and LITGs in terms of generative capacity, and which have the desired property of separating the lexical rules into one category whose probability distribution maps naturally to the bilexicon's. As a proof of concept, we show that phrasal bilexica, induced in this manner, can be used to improve the performance of a traditional phrase-based SMT system.
- Dekai WU. Alignment. In Nitin INDURKHYA and Fred DAMERAU (editors), CRC Handbook of Natural Language
Processing, Second Edition. 367-408. CRC Press.
2010.
In this chapter we discuss the work done on automatic alignment of parallel texts for various purposes. Fundamentally, an alignment algorithm accepts as input a bitext, and produces as output a bisegmentation relation that identifies corresponding segments between the texts. Bitext alignment fundamentally lies at the heart of all data-driven machine translation methods, and the rapid research progress on alignment since 1990 reflects the advent of statistical machine translation (SMT) and example-based machine translation (EBMT) approaches. Yet the importance of alignment extends as well to many other practical applications for translators, bilingual lexicographers, and even ordinary readers. A wide variety of techniques now exist, ranging from the most simple (counting characters or words) to the more sophisticated, sometimes involving linguistic data (lexicons) which may or may not have been automatically induced themselves. Techniques have been developed for aligning passages of various granularities: documents, paragraphs, sentences, constituents, collocations or phrases, words, and characters. Some techniques work on precisely translated parallel corpora, while others work on noisy, comparable, or non-parallel corpora. Some techniques make use of apparent morphological features, while others rely on cognates and loan-words; of particular interest is work done on languages which do not have a common writing system. Some techniques align only shallow, flat chunks, while others align compositional, hierarchical structures. The robustness and generality of different techniques has generated much discussion.
- Dekai WU, Pascale FUNG, Marine CARPUAT, Chi-kiu LO, Yongsheng YANG,
and Zhaojun WU. Lexical Semantics for Statistical Machine
Translation. In Joseph Olive, Caitlin Christianson, and John
McCary (editors), Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation.
Springer. 2010.
We present efforts toward moving statistical machine translation toward incorporating semantic modeling. The most glaring types of errors made by current systems appear to be prime targets for lexical semantics models, which have heretofore been largely absent from statistical machine translation models. Although sense disambiguation and semantic roles both appear highly relevant to translation accuracy, experience suggests that simply dropping in the existing models is unlikely to improve translation accuracy; rather, adaptations will be necessary. We discuss (1) a new Phrase Sense Disambiguation model that successfully improves statistical phrase-based translation for the first time by making three critical adaptations to traditional word sense disambiguation configurations, and (2) a series of empirical studies that illuminate more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment".
Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (at COLING 2010). Beijing: Aug 2010.
We present two contributions to grammar driven translation. First, since both Inversion Transduction Grammar and Linear Inversion Transduction Grammars have been shown to produce better alignments then the standard word alignment tool, we investigate how the trade-off between speed and end-to-end translation quality extends to the choice of grammar formalism. Second, we prove that Linear Transduction Grammars (LTGs) generate the same transductions as Linear Inversion Transduction Grammars, and present a scheme for arriving at LTGs by bilingualizing Linear Grammars. We also present a method for obtaining Inversion Transduction Grammars from Linear (Inversion) Transduction Grammars, which can speed up grammar induction from parallel corpora dramatically.
- Chi-kiu LO and Dekai WU.
"Semantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation".
Proceedings of SSST-4, Fourth Workshop on Syntax and Structure in Statistical Translation (at COLING 2010). Beijing: Aug 2010.
We present results of an empirical study on evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the semantic role annotation templates. Unlike the widely-used lexical and n-gram based or syntactic based MT evaluation metrics which are fluency-oriented, our results show that using semantic role labels to evaluate the utility of MT output achieve higher correlation with human judgments on adequacy. In this study, human readers were employed to identify the semantic role labels in the translation. For each role, the filler is considered an accurate translation if it expresses the same meaning as that annotated in the gold standard reference translation. Our SRL based f-score evaluation metric has a 0.41 correlation coefficient with the human judgement on adequacy, while in contrast BLEU has only a 0.25 correlation coefficient and the syntactic based MT evaluation metric STM has only 0.32 correlation coefficient with the human judgement on adequacy. Our results strongly indicate that using semantic role labels for MT evaluation can be significantly more effective and better correlated with human judgement on adequacy than BLEU and STM.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Linear Inversion Transduction Grammar Alignments as a Second Translation Path".
Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR (at COLING 2010). Uppsala: Jul 2010.
We explore the possibility of using Stochastic Bracketing Linear Inversion Transduction Grammars for a full-scale German–English translation task, both on their own and in conjunction with alignments induced with GIZA++. The rationale for transduction grammars, the details of the system and some results are presented.
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar".
Human Language Technologies: The 2010 Annual Conference of the
North American Chapter of the Association for Computational
Linguistics (NAACL HLT 2010). Los Angeles: Jun 2010.
The class of Linear Inversion Transduction Grammars (LITGs) is briefly introduced, and used to induce a word alignment over a parallel corpus. We show that alignment via Stochastic Bracketing LITGs is considerably faster than Stochastic Bracketing ITGs, while yielding alignments superior to the widely-used heuristic of intersecting bidirectional IBM alignments. Performance is measured as the translation quality of a phrase-based machine translation system built upon the word alignments.
- Chi-kiu LO and Dekai WU.
"Evaluating Machine
Translation Utility via Semantic Role Labels".
Seventh International Conference on Language Resources and
Evaluation (LREC-2010). Malta: May 2010.
We present the methodology that underlies new metrics for semantic machine translation evaluation that we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key semantic roles. Such roles can be annotated using Propbank-style PRED and ARG labels. Recent work by Wu and Fung (2009) introduced methods based on automatic semantic role labeling into statistical machine translation, to enhance the quality of MT output. However, semantic SMT approaches have so far still only been evaluated using lexical and n-gram based SMT evaluation metrics such as BLEU, which are not aimed at evaluating the utility of MT output. Direct data analysis is still needed to understand how semantic models can be leveraged to evaluate the utility of MT output. In this paper, we discuss a new methodology for evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to match the Propbank annotation frames.
- Dekai WU.
"Toward Machine Translation with Statistics and Syntax and Semantics".
IEEE Automatic Speech Recognition and Understanding Workshop
(ASRU 2009). Merano, Italy: Dec 2009.
In this paper, we survey some central issues in the historical, current, and future landscape of statistical machine translation (SMT) research, taking as a starting point an extended three-dimensional MT model space. We posit a socio-geographical conceptual disparity hypothesis, that aims to explain why language pairs like Chinese-English have presented MT with so much more difficulty than others. The evolution from simple token-based to segment-based to tree-based syntactic SMT is sketched. For tree-based SMT, we consider language bias rationales for selecting the degree of compositional power within the hierarchy of expressiveness for transduction grammars (or synchronous grammars). This leads us to inversion transductions and the ITG model prevalent in current state-of-the-art SMT, along with the underlying ITG hypothesis, which posits a language universal. Against this backdrop, we enumerate a set of key open questions for syntactic SMT. We then consider the more recent area of semantic SMT. We list principles for successful application of sense disambiguation models to semantic SMT, and describe early directions in the use of semantic role labeling for semantic SMT.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Markus SAERS, Joakim NIVRE, and Dekai WU.
"Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 29-32.
We present a biparsing algorithm for Stochastic Bracketing Inversion Transduction Grammars that runs in O(bn3) time instead of O(n6). Transduction grammars learned via an EM estimation procedure based on this biparsing algorithm are evaluated directly on the translation task, by building a phrase-based statistical MT system on top of the alignments dictated by Viterbi parses under the induced bigrammars. Translation quality at different levels of pruning are compared, showing improvements over a conventional word aligner even at heavy pruning levels.
- Anders SØGAARD and Dekai WU.
"Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars".
11th International Conference on Parsing Technologies (IWPT'09). Paris: Oct 2009. 33-36.
Empirical lower bounds studies in which the frequency of alignment configurations that cannot be induced by a particular formalism is estimated, have been important for the developmenet of syntax-based machine translation formalisms. The formalism that has received most attention has been inversion transduction grammars (ITGs) (Wu, 1997). All previous work on the coverage of ITGs, however, concerns parse failure rates (PFRs) or sentence level coverage, which is not directly related to any of the evaluation measures used in machine translation. Søgaard and Kuhn (2009) induce lower bounds on translation unit error rates (TUERs) for a number of formalisms, incl. normal form ITGs, but not for the full class of ITGs. Many of the alignment configurations that cannot be induced by normal form ITGs can be induced by unrestricted ITGs, however. This paper estimates the difference and shows that the average reduction in lower bounds on TUER is 2.48 in absolute difference (16.01 in average parse failure rate).
- Dekai WU and David CHIANG (editors). Proceedings of SSST-3, Third Workshop on Syntax and Structure in Statistical Translation. NAACL HLT 2009. Boulder, Colorado: Jun 2009. [website]
- Markus SAERS and Dekai WU. "Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars".
Proceedings of
SSST-3, Third Workshop on Syntax and Structure in Statistical
Translation. NAACL HLT 2009: Boulder, Colorado: Jun 2009. 28-36.
We argue that learning word alignments through a compositionally-structured, joint process yields higher phrase-based translation accuracy than the conventional heuristic of intersecting conditional models. Flawed word alignments can lead to flawed phrase translations that damage translation accuracy. Yet the IBM word alignments usually used today are known to be flawed, in large part because IBM models (1) model reordering by allowing unrestricted movement of words, rather than constrained movement of compositional units, and therefore must (2) attempt to compensate via directed, asymmetric distortion and fertility models. The conventional heuristics for attempting to recover from the resulting alignment errors involve estimating two directed models in opposite directions and then intersecting their alignments – to make up for the fact that, in reality, word alignment is an inherently joint relation. A natural alternative is provided by Inversion Transduction Grammars, which estimate the joint word alignment relation directly, eliminating the need for any of the conventional heuristics. We show that this alignment ultimately produces superior translation accuracy on BLEU, NIST, and METEOR metrics over three distinct language pairs.
- Dekai WU and Pascale FUNG. "Semantic Roles for SMT:
A Hybrid Two-Pass Model".
Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009).
Boulder, Colorado: Jun 2009.
We present results on a novel hybrid semantic SMT model that incorporates the strengths of both semantic role labeling and phrase-based statistical machine translation. The approach avoids major complexity limitations via a two-pass architecture. The first pass is performed using a conventional phrase-based SMT model. The second pass is performed by a re-ordering strategy guided by shallow semantic parsers that produce both semantic frame and role labels. Evaluation on a Wall Street Journal newswire genre test set showed the hybrid model to yield an improvement of roughly half a point in BLEU score over a strong pure phrase-based SMT baseline – to our knowledge, the first successful application of semantic role labeling to SMT.
- Dekai WU and Pascale FUNG. "Can Semantic Role Labeling
Improve SMT?".
13th Annual Conference of the European Association for Machine Translation (EAMT 2009).
Barcelona: May 2009. 218-225.
We present a series of empirical studies aimed at illuminating more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy. The experiments reported study several aspects key to success: (1) the frequencies of types of SMT errors where semantic parsing and role labeling could help, and (2) if and where semantic roles offer more accurate guidance to SMT than merely syntactic annotation, and (3) the potential quantitative impact of realistic semantic role guidance to SMT systems, in terms of BLEU and METEOR scores.
- David CHIANG and Dekai WU (editors). Proceedings of SSST-2, Second Workshop on Syntax and Structure in Statistical Translation. ACL-08: HLT, Columbus, Ohio: Jun 2008. [website]
- Marine CARPUAT and Dekai WU. "Evaluation of Context-dependent Phrasal Translation Lexicons for Statistical Machine Translation". Sixth International Conference on Language Resources and Evaluation (LREC-2008). Marrakech:
May 2008.
We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal translation lexicons are an appropriate framework to successfully incorporate Word Sense Disambiguation (WSD) modeling into SMT. However, this approach has so far only been evaluated using automatic translation quality metrics, which are important, but aggregate many different factors. A direct analysis is still needed to understand how context-dependent phrasal translation lexicons impact translation quality, and whether the additional complexity they introduce is really necessary. In this paper, we focus on the impact of context-dependent translation lexicons on lexical choice in phrase-based SMT and show that context-dependent lexicons are more useful to a phrase-based SMT system than a conventional lexicon. A typical phrase-based SMT system makes use of more and longer phrases with context modeling, including phrases that were not seen very frequently in training. Even when the segmentation is identical, the context-dependent lexicons yields translations that match references more often than conventional lexicons.
- Dekai WU. "WSD for Semantic SMT: Phrase Sense Disambiguation". Second Symposium on Innovations in Machine Translation Technologies (IMTT-2008). Tokyo: Mar 2008.
- Yihai SHEN, Chi-kiu LO, Marine CARPUAT and Dekai WU. "HKUST Statistical Machine Translation Experiments for IWSLT 2007". Fourth International Workshop on Spoken Language Translation
(IWSLT 2007). Trento:
Oct 2007. 84-88.
This paper describes experiments conducted at HKUST in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against the closed-source Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.
- Marine CARPUAT and Dekai WU. "Context-Dependent Phrasal Translation Lexicons for Statistical Machine
Translation". Machine Translation Summit XI. Copenhagen:
Sep 2007.
Most current statistical machine translation (SMT) systems make very little use of contextual information to select a translation candidate for a given input language phrase. However, despite evidence that rich context features are useful in stand-alone translation disambiguation tasks, recent studies reported that incorporating context-rich approaches from Word Sense Disambiguation (WSD) methods directly into classic word-based SMT systems, surprisingly, did not yield the expected improvements in translation quality. We argue here that, instead, it is necessary to design a context-dependent lexicon that is specifically matched to a given phrase-based SMT model, rather than simply incorporating an independently built and tested WSD module. In this approach, the baseline SMT phrasal lexicon, which uses translation probabilities that are independent of context, is augmented with a context-dependent score, defined using insights from standalone translation disambiguation evaluations. This approach reliably improves performance on both IWSLT and NIST Chinese-English test sets, producing consistent gains on all eight of the most commonly used automated evaluation metrics. We analyze the behavior of the model along a number of dimensons, including an analysis confirming that the most important context features are not available in conventional phrase-based SMT models.
- Marine CARPUAT and Dekai WU. "How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for
Statistical Machine Translation". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 43-52.
We present comparative empirical evidence arguing that a generalized phrase sense disambiguation approach better improves statistical machine translation than ordinary word sense disambiguation, along with a data analysis suggesting the reasons for this. Standalone word sense disambiguation, as exemplified by the Senseval series of evaluations, typically defines the target of disambiguation as a single word. But in order to be useful in statistical machine translation, our studies indicate that word sense disambiguation should be redefined to move beyond the particular case of single word targets, and instead to generalize to multi-word phrase targets. We investigate how and why the phrase sense disambiguation approach---in contrast to recent efforts to apply traditional word sense disambiguation to SMT---is able to yield statistically significant yimprovements in translation quality even under large data conditions, and consistently improve SMT across both IWSLT and NIST Chinese-English text translation tasks. We discuss architectural issues raised by this change of perspective, and consider the new model architecture necessitated by the phrase sense disambiguation approach.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG and Dekai WU. "Learning Bilingual Semantic Frames:
Shallow Semantic Parsing vs. Semantic Role Projection". 11th Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skövde, Sweden:
Sep 2007. 75-84.
To explore the potential application of semantic roles in structural machine translation, we propose to study the automatic learning of English-Chinese bilingual predicate argument structure mapping. We describe ARG_ALIGN, a new model for learning bilingual semantic frames that employs monolingual Chinese and English semantic parsers to learn bilingual semantic role mappings with 72.45% F-score, given an unannotated parallel corpus. We show that, contrary to a common preconception, our ARG_ALIGN model is superior to a semantic role projection model, SYN_ALIGN, which reaches only a 46.63% F-score by assuming semantic parallelism in bilingual sentences. We present experimental data explaining that this is due to cross-lingual mismatches between argument structures in English and Chinese at 17.24% of the time. This suggests that, in any potential application to enhance machine translation with semantic structural mapping, it may be preferable to employ independent automatic semantic parsers on source and target languages, rather than assuming semantic role parallelism.
- Marine CARPUAT and Dekai WU. "Improving Statistical Machine Translation using Word Sense Disambiguation". 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007). Prague:
Jun 2007. 61-72.
We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT Chinese-English test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task---and moreover never hurts performance on any test set, according not only to BLEU but to all eight most commonly used automatic evaluation metrics. Recent work has challenged the assumption that word sense disambiguation (WSD) systems are useful for SMT. Yet SMT translation quality still obviously suffers from inaccurate lexical choice. In this paper, we address this problem by investigating a new strategy for integrating WSD into an SMT system, that performs fully phrasal multi-word disambiguation. Instead of directly incorporating a Senseval-style WSD system, we redefine the WSD task to match the exact same phrasal translation disambiguation task faced by phrase-based SMT systems. Our results provide the first known empirical evidence that lexical semantics are indeed useful for SMT, despite claims to the contrary.
- Dekai WU and David CHIANG (editors). Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation. Rochester, New York: Apr 2007. [website]
- Dekai WU. "MT model space: Statistical vs.
compositional vs. example-based machine translation".
Machine Translation (2005) 19: 213-227.
Springer Online:
http://dx.doi.org/10.1007/s10590-006-9009-3. Berlin: Springer.
We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to machine translation. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie.
- Dekai WU, Marine CARPUAT, and Yihai SHEN. "Inversion Transduction Grammar Coverage of Arabic-English Word Alignment for Tree-Structured Statistical Machine Translation".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present the first known direct measurement of word alignment coverage on an Arabic-English parallel corpus using inversion transduction grammar constraints. While direct measurements have been reported for several European and Asian languages, to date no results have been available for Arabic or any Semitic language despite much recent activity on Arabic-English spoken language and text translation. Many recent syntax based statistical MT models operate within the domain of ITG expressiveness, often for efficiency reasons, so it has become important to determine the extent to which the ITG constraint assumption holds. Our results on Arabic provide further evidence that ITG expressiveness appears largely sufficient for core MT models.
- Pascale FUNG, Zhaojun WU, Yongsheng YANG, and Dekai WU. "Automatic learning of Chinese-English semantic structure mapping".
IEEE/ACL 2006 Workshop on Spoken Language Technology
(SLT 2006). Aruba: Dec 2006.
We present twin results on Chinese semantic parsing, with application to English-Chinese cross-lingual verb frame acquisition. First, we describe two new state-of-the-art Chinese shallow semantic parsers leading to an F-score of 82.01 on simultaneous frame and argument boundary identification and labeling. Subsequently, we propose a model that applies the separate Chinese and English semantic parsers to learn cross-lingual semantic verb frame argument mappings with 89.3% accuracy. The only training data needed by this cross-lingual learning model is a pair of non-parallel monolingual Propbanks, plus an unannotated parallel corpus. We also present the first reported controlled comparison of maximum entropy and SVM approaches to shallow semantic parsing, using the Chinese data.
- Marine CARPUAT, Yihai SHEN, Xiaofeng YU, and Dekai WU. "Toward Integrating
Word Sense and Entity Disambiguation into Statistical Machine
Translation".
Third International Workshop on Spoken Language Translation
(IWSLT 2006). Kyoto:
Nov 2006.
[slides]
We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST's first participation in the IWSLT spoken language translation evaluation campaign. We translated both read and spontaneous speech transcriptions from Chinese to English, achieving reasonable performance despite the fact that our system is essentially text-based and therefore not designed and tuned to tackle the challenges of speech translation. We also find that the system achieves reasonable results on a wide range of languages, by evaluating on read speech transcriptions from Arabic, Italian, and Japanese into English.
- Xiaofeng YU, Marine CARPUAT, and Dekai WU. "Boosting for
Chinese Named Entity Recognition".
Fifth SIGHAN Workshop of the Special Interest Group for Chinese
Language Processing (SIGHAN5) at COLING/ACL 2006. Sydney, Australia:
Jul 2006.
We report an experiment in which a high-performance boosting based NER model originally designed for multiple European languages is instead applied to the Chinese named entity recognition task of the third SIGHAN Chinese language processing bakeoff. Using a simple character-based model along with a set of features that are easily obtained from the Chinese input strings, the system described employs boosting, a promising and theoretically well-founded machine learning method to combine a set of weak classifiers together into a final system. Even though we did no other Chinese-specific tuning, and used only one-third of the MSRA and CityU corpora to train the system, reasonable results are obtained. Our evaluation results show that 75.07 and 80.51 overall F-measures were obtained on MSRA and CityU test sets respectively.
- Dekai WU and Ken Wing Kuen LEE. "A grammatical approach
to understanding textual tables using two-dimensional SCFGs".
21st International Conference on Computational Linguistics and
44th Annual Meeting of the Association for Computational Linguistics
(COLING/ACL 2006). Sydney, Australia: Jul 2006.
We present an elegant and extensible model that is capable of providing semantic interpretations for an unusually wide range of textual tables in documents. Unlike the few existing table analysis models, which largely rely on relatively ad hoc heuristics, our linguistically-oriented approach is systematic and grammar based, which allows our model (1) to be concise and yet (2) recognize a wider range of data models than others, and (3) disambiguate to a significantly finer extent the underlying semantic interpretation of the table in terms of data models drawn from relation database theory. To accomplish this, the model introduces Viterbi parsing under two-dimensional stochastic CFGs. The cleaner grammatical approach facilitates not only greater coverage, but also grammar extension and maintenance, as well as a more direct and declarative link to semantic interpretation, for which we also introduce a new, cleaner data model. In disambiguation experiments on recognizing relevant data models of unseen web tables from different domains, a blind evaluation of the model showed 60% precision and 80% recall.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
In Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and
Florence d'ALCHÉ-BUC (editors),
"Machine Learning Challenges, Evaluating Predictive Uncertainty,
Visual Object Classification and Recognizing Textual Entailment",
Lecture Notes in Computer Science (2006) 3944: 299-308.
Springer Online:
http://dx.doi.org/10.1007/11736790_17. Berlin: Springer.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Dekai WU and Pascale FUNG. "Inversion Transduction Grammar Constraints for Mining Parallel
Sentences from Quasi-Comparable Corpora".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, South Korea: Oct 2005.
We present a new implication of Wu's (1997) Inversion Transduction Grammar (ITG) Hypothesis, on the problem of retrieving truly parallel sentence translations from large collections of highly non-parallel documents. Our approach leverages a strong language universal constraint posited by the ITG Hypothesis, that can serve as a strong inductive bias for various language learning problems, resulting in both efficiency and accuracy gains. The task we attack is highly practical since non-parallel multilingual data exists in far greater quantities than parallel corpora, but parallel sentences are a much more useful resource. Our aim here is to mine truly parallel sentences, as opposed to comparable sentence pairs or loose translations as in most previous work. The method we introduce exploits Bracketing ITGs to produce the first known results for this problem. Experiments show that it obtains large accuracy gains on this task compared to the expected performance of state-of-the-art models that were developed for the less stringent task of mining comparable sentence pairs.
- Marine CARPUAT and Dekai WU. "Evaluating the Word
Sense Disambiguation Performance of Statistical Machine Translation".
Second International Joint Conference on
Natural Language Processing (IJCNLP-2005). Jeju, South Korea: Oct 2005.
We present the first known empirical test of an increasingly common speculative claim, by evaluating a representative Chinese-to-English SMT model directly on word sense disambiguation performance, using standard WSD evaluation methodology and datasets from the Senseval-3 Chinese lexical sample task. Much effort has been put in designing and evaluating dedicated word sense disambiguation (WSD) models, in particular with the Senseval series of workshops. At the same time, the recent improvements in the BLEU scores of statistical machine translation (SMT) suggests that SMT models are good at predicting the right translation of the words in source language sentences. Surprisingly however, the WSD accuracy of SMT models has never been evaluated and compared with that of the dedicated WSD models. We present controlled experiments showing the WSD accuracy of current typical SMT models to be significantly lower than that of all the dedicated WSD models considered. This tends to support the view that despite recent speculative claims to the contrary, current SMT models do have limitations in comparison with dedicated WSD models, and that SMT should benefit from the better predictions made by the WSD models.
- Marine CARPUAT and Dekai WU. "Word Sense Disambiguation
vs. Statistical Machine Translation". 43rd Annual Meeting of the
Association for Computational Linguistics (ACL-2005). Ann Arbor, MI:
Jun 2005.
We directly investigate a subject of much recent debate: do word sense disambigation models help statistical machine translation quality? We present empirical results casting doubt on this common, but unproved, assumption. Using a state-of-the-art Chinese word sense disambiguation model to choose translation candidates for a typical IBM statistical MT system, we find that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone. Error analysis suggests several key factors behind this surprising finding, including inherent limitations of current statistical MT architectures.
- Dekai WU. "Recognizing
Paraphrases and Textual Entailment using Inversion Transduction Grammars".
ACL-2005 Workshop on Empirical Modeling of Semantic Equivalence and Entailment. Ann Arbor, MI: Jun 2005.
We present first results using paraphrase as well as textual entailment data to test the language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. In machine translation and alignment, the ITG Hypothesis provides a strong inductive bias, and has been shown empirically across numerous language pairs and corpora to yield both efficiency and accuracy gains for various language acquisition tasks. Monolingual paraphrase and textual entailment recognition datasets, however, potentially facilitate closer tests of certain aspects of the hypothesis than bilingual parallel corpora, which simultaneously exhibit many irrelevant dimensions of cross-lingual variation. We investigate this using simple generic Bracketing ITGs containing no language-specific linguistic knowledge. Experimental results on the MSR Paraphrase Corpus show that, even in the absence of any thesaurus to accommodate lexical variation between the paraphrases, an uninterpolated average precision of at least 76% is obtainable from the Bracketing ITG's structure matching bias alone. This is consistent with experimental results on the Pascal Recognising Textual Entailment Challenge Corpus, which show surpisingly strong results for a number of the task subsets.
- Dekai WU. "Textual
Entailment Recognition Based on Inversion Transduction Grammars".
Pattern Analysis, Statistical Modelling and Computational Learning
(PASCAL Challenges Workshop - Recognising Textual Entailment
Challenge). Southampton, UK: Apr 2005.
Also in Joaquin QUIÑONERO CANDELA, Ido DAGAN, Bernardo MAGNINI, and Florence d'ALCHÉ-BUC (editors), Machine Learning Challenges, Lecture Notes in Computer Science 3944, MLCW 2005, 2006. Heidelberg: Springer-Verlag.
The PASCAL Challenge's textual entailment recognition (RTE) task presents intriguing opportunities to test various implications of the strong language universal constraint posited by Wu's (1995, 1997) Inversion Transduction Grammar (ITG) hypothesis. The ITG Hypothesis provides a strong inductive bias, and has been repeatedly shown empirically to yield both efficiency and accuracy gains for numerous language acquisition tasks. Since the RTE challenge abstracts over many tasks, it invites meaningful analysis of the ITG Hypothesis across tasks including information retrieval, comparable documents, reading comprehension, question answering, information extraction, machine translation, and paraphrase acquisition. We investigate two new models for the RTE problem that employ simple generic Bracketing ITGs. Experimental results show that, even in the absence of any thesaurus to accommodate lexical variation between the Text and the Hypothesis strings, surprisingly strong results for a number of the task subsets are obtainable from the Bracketing ITG's structure matching bias alone.
- Pascale FUNG, LIU Yi, YANG Yongsheng, Yihai SHEN, and Dekai WU.
"A
Grammar-Based Chinese to English Speech Translation System for Portable
Devices". 8th International Conference on Spoken Language
Processing (INTERSPEECH 2004 - ICSLP). Jeju, South Korea: Oct 2004.
Portable devices such as PDA phones and smart phones are increasingly popular. Many of these devices already have voice dialing capability. The next step is to offer more powerful personal-assistant features such as speech translation. In this paper, we propose a system that can translate speech commands in Chinese into English, in real-time, on small, portable devices with limited memory and computational power. We address the various computational and platform issues of speech recognition and translation on portable devices. We propose fixed-point computation, discrete front-end speech features, bi-phone acoustic models, grammar-based speech decoding, and unambiguous inversion transduction grammars for transfer-based translation. As a result, our speech translation system requires only 500k memory and a 200MHz CPU.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Why Nitpicking
Works: Evidence for Occam's Razor in Error Correctors". 20th
International Conference on Computational Linguistics (COLING-2004).
Geneva: Aug 2004.
Empirical experience and observations have shown us when powerful and highly tunable classifiers such as maximum entropy classifiers, boosting and SVMs are applied to language processing tasks, it is possible to achieve high accuracies, but eventually their performances all tend to plateau out at around the same point. To further improve performance, various error correction mechanisms have been developed, but in practice, most of them cannot be relied on to predictably improve performance on unseen data; indeed, depending upon the test set, they are as likely to degrade accuracy as to improve it. This problem is especially severe if the base classifier has already been finely tuned. In recent work, we introduced N-fold Templated Piped Correction, or NTPC (``nitpick''), an intriguing error corrector that is designed to work in these extreme operating conditions. Despite its simplicity, it consistently and robustly improves the accuracy of existing highly accurate base models. This paper investigates some of the more surprising claims made by NTPC, and presents experiments supporting an Occam's Razor argument that more complex models are damaging or unnecessary in practice.
- Weifeng SU, Marine CARPUAT, and Dekai WU. "Semi-Supervised
Training of a Kernel PCA-Based Model for Word Sense Disambiguation".
20th International Conference on Computational Linguistics
(COLING-2004). Geneva: Aug 2004.
In this paper, we introduce a new semi-supervised learning model for word sense disambiguation based on Kernel Principal Component Analysis (KPCA), with experiments showing that it can further improve accuracy over supervised KPCA models that have achieved WSD accuracy superior to the best published individual models. Although empirical results with supervised KPCA models demonstrate significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models on Senseval-2 data, we identify specific sparse data conditions under which supervised KPCA models deteriorate to essentially a most-frequent-sense predictor. We discuss the potential of KPCA for leveraging unannotated data for partially-unsupervised training to address these issues, leading to a composite model that combines both the supervised and semi-supervised models.
- Dekai WU, Weifeng SU, and Marine CARPUAT. "A Kernel PCA Method for
Superior Word Sense Disambiguation". 42nd Annual Meeting of the
Association for Computational Linguistics (ACL-2004). Barcelona: Jul
2004.
We introduce a new method for disambiguating word senses that exploits a nonlinear Kernel Principal Component Analysis (KPCA) technique to achieve accuracy superior to the best published individual models. We present empirical results demonstrating significantly better accuracy compared to the state-of-the-art achieved by either naive Bayes or maximum entropy models, on Senseval-2 data. We also contrast against another type of kernel method, the support vector machine (SVM) model, and show that our KPCA-based model outperforms the SVM-based model. It is hoped that these highly encouraging first results on KPCA for natural language processing tasks will inspire further development of these directions.
- Dekai WU and Yihai SHEN. "An Efficient
Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR
Parsing". ACL-2004 Workshop on Incremental Parsing: Bringing
Engineering and Cognition Together. Barcelona: Jul 2004.
We define a new learning task, minimum average lookahead grammar induction, with strong potential implications for incremental parsing in NLP and cognitive models. Our thesis is that a suitable learning bias for grammar induction is to minimize the degree of lookahead required, on the underlying tenet that language evolution drove grammars to be efficiently parsable in incremental fashion. The input to the task is an unannotated corpus, plus a nondeterministic constraining grammar that serves as an abstract model of environmental constraints confirming or rejecting potential parses. The constraining grammar typically allows ambiguity and is itself poorly suited for an incremental parsing model, since it gives rise to a high degree of nondeterminism in parsing. The learning task, then, is to induce a deterministic LR(k) grammar under which it is possible to incrementally construct one of the correct parses for each sentence in the corpus, such that the average degree of lookahead needed to do so is minimized. This is a significantly more difficult optimization problem than merely compiling LR(k) grammars, since kis not specified in advance. Clearly, naive approaches to this optimization can easily be computationally infeasible. However, by making combined use of GLR ancestor tables and incremental LR table construction methods, we obtain an O(n3+2m) greedy approximation algorithm for this task that is quite efficient in practice.
- Marine CARPUAT, Weifeng SU, and Dekai WU. "Augmenting Ensemble
Classification for Word Sense Disambiguation with a Kernel PCA
Model". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
The HKUST word sense disambiguation systems benefit from a new nonlinear Kernel Principal Component Analysis (KPCA) based disambiguation technique. We discuss and analyze results from the Senseval-3 English, Chinese, and Multilingual Lexical Sample data sets. Among an ensemble of four different kinds of voted models, the KPCA-based model, along with the maximum entropy model, outperforms the boosting model and naive Bayes model. Interestingly, while the KPCA-based model typically achieves close or better accuracy than the maximum entropy model, nevertheless a comparison of predicted classifications shows that it has a significantly different bias. This characteristic makes it an excellent voter, as confirmed by results showing that removing the KPCA-based model from the ensemble generally degrades performance.
- Grace NGAI, Dekai WU, Marine CARPUAT, Chi-Shing WANG, and
Chi-Yung WANG. "Semantic Role
Labeling with Boosting, SVMs, Maximum Entropy, SNOW, and Decision
Lists". Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the HKPolyU-HKUST systems which were entered into the Semantic Role Labeling task in Senseval-3. Results show that these systems, which are based upon common machine learning algorithms, all manage to achieve good performances on the non-restricted Semantic Role Labeling task.
- Richard WICENTOWSKI, Grace NGAI, Dekai WU, Marine CARPUAT, Emily
THOMFORDE, and Adrian PACKEL. "Joining
forces to resolve lexical ambiguity: East meets West in Barcelona".
Third International Workshop on the Evaluation of Systems for the
Semantic Analysis of Text (Senseval-3). ACL-2004 Workshop.
Barcelona: Jul 2004.
This paper describes the component models and combination model built as a joint effort between Swarthmore College, Hong Kong Poly U, and HKUST. Though other models described elsewhere contributed to the final combination model, this paper focuses solely on the joint contributions to the ``Swat-HK'' effort.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "Raising the Bar:
Stacked Conservative Error Correction Beyond Boosting". Fourth
International Conference on Language Resources and Evaluation
(LREC-2004). Lisbon: May 2004.
We introduce a conservative error correcting model, Stacked TBL, that is designed to improve the performance of even high-performing models like boosting, with little risk of accidentally degrading performance. Stacked TBL is particularly well suited for corpus-based natural language applications involving high-dimensional feature spaces, since it leverages the characteristics of the TBL paradigm that we appropriate. We consider here the task of automatically annonating named entities in text corpora. The task does pose a number of challenges for TBL, to which there are some simple yet effective solutions. We discuss the empirical behavior of Stacked TBL, and consider evidence that despite its simplicity, more complex and time-consuming variants are not generally required.
- Lufeng ZHAI, Pascale FUNG, Richard SCHWARTZ, Marine CARPUAT and
Dekai WU. "Using
N-best Lists for Named Entity Recognition from Chinese Speech".
Human Language Technology Conference of the North American Chapter of
the Association for Computational Linguistics (HLT/NAACL-2004).
Boston: May 2004.
We present the first known result for named entity recognition (NER) in realistic large-vocabulary spoken Chinese. We establish this result by applying a maximum entropy model, currently the single best known approach for textual Chinese NER, to the recognition output of the BBN LVCSR system on Chinese Broadcast News utterances. Our results support the claim that transferring NER approaches from text to spoken language is a significantly more difficult task for Chinese than for English. We propose re-segmenting the ASR hypotheses as well as applying post-classification to improve the performance. Finally, we introduce a method of using n-best hypotheses that yields a small but nevertheless useful improvement NER accuracy. We use acoustic, phonetic, language model, NER and other scores as confidence measure. Experimental results show an average of 6.7% relative improvement in precision and 1.7% relative improvement in F-measure.
- Dekai WU, Grace NGAI, and Marine CARPUAT. "N-fold Templated
Piped Correction". First International Joint Conference on
Natural Language Processing (IJCNLP-2004). Hainan, China: Mar 2004.
We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction (NTPC), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.
- Dekai WU. "The HKUST leading
question translation system". Machine Translation Summit IX. New Orleans:
Sep 2003.
Slides from Have we found the Holy Grail? (Panel with Ed Hovy, Elliot Macklovitch (chair), Hermann Ney, Steve Richardson, and Dekai Wu.)
- Dekai WU, Grace NGAI, and Marine CARPUAT. "A stacked, voted,
stacked model for named entity recognition". Computational
Natural Language Learning (CoNLL-2003), at Human Language
Technology Conference of the North American Chapter of the Association of
Computational Linguistics (HLT/NAACL-2003). Edmonton, Canada: May
2003.
This paper investigates stacking and voting methods for combining strong classifiers like boosting, SVM, and TBL, on the named-entity recognition task. We demonstrate several effective approaches, culminating in a model that achieves error rate reductions on the development and test sets of 63.6% and 55.0% (English) and 47.0% and 51.7% (German) over the CoNLL-2003 standard baseline respectively, and 19.7% over a strong AdaBoost baseline model from CoNLL-2002.
- Dekai WU, Grace NGAI, Marine CARPUAT, Jeppe LARSEN, and Yongshen
YANG. "Boosting for named
entity recognition". Computational Natural Language Learning
(CoNLL-2002), at 19th International Conference on Computational
Linguistics (Coling-2002), 195-198. Taipei: Sep 2002.
This paper presents a system that applies boosting to the task of named-entity identification. The CoNLL-2002 shared task, for which the system is designed, is language-independent named-entity recognition. Using a set of features which are easily obtainable for almost any language, the presented system uses boosting to combine a set of weak classifiers into a final system that performs significantly better than that of an off-the-shelf maximum entropy classifier.
- Robert WILENSKY, David N CHIN, Marc LURIA, James MARTIN, James
MAYFIELD, and Dekai WU. "The Berkeley UNIX Consultant Project". In
Stephen J HEGNER, Paul McKEVITT, Peter NORVIG, and Robert WILENSKY
(editors), Intelligent Help Systems for Unix. 49-94. Dordrecht:
Kluwer. ISBN 0-7923-6641-7. May 2001.
Also in Artificial Intelligence Review 14(1-2): 43-88 (2000).
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Bracketing and
aligning words and constituents in parallel text using Stochastic
Inversion Transduction Grammars". In Jean VERONIS (editor),
Parallel Text Processing: Alignment and Use of Translation
Corpora. Dordrecht: Kluwer. ISBN 0-7923-6546-1. Aug 2000.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Dekai WU. "Alignment". In Robert DALE, Hermann MOISL, and Harold
SOMERS (editors), Handbook of Natural Language Processing.
415-458. New York: Marcel Dekker. ISBN 0-8247-9000-6. Jul 2000.
In this chapter we discuss the work done on automatic alignment of parallel texts for various purposes. Fundamentally, an alignment algorithm accepts as input a bitext, and produces as output a map that identifies corresponding passages between the texts. A rapidly-growing body of research on bitext alignment, beginning around 1990, attests to the importance of alignment to translators, bilingual lexicographers, adaptive machine translation systems, and even ordinary readers. A wide variety of techniques now exist, ranging from the most simple (counting characters or words) to the more sophisticated, sometimes involving linguistic data (lexicons) which may or may not have been automatically induced themselves. Techniques have been developed for aligning passages of various granularities: paragraphs, sentences, constituents, collocations, and words. Some techniques make use of apparent morphological features. Others rely on cognates and loan-words. Of particular interest is work done on languages which do not have a common writing system. The robustness and generality of different techniques has generated much discussion.
- SUI Zhifang, ZHAO Jun, and Dekai WU. "An
Information-Theory-Based Feature Type Analysis for the Modelling of
Statistical Parsing". ACL-2000. Hong Kong: Oct 2000.
The paper proposes an information-theory-based method for feature types analysis in probabilistic evaluation modelling for statistical parsing. The basic idea is that we use entropy and conditional entropy to measure whether a feature type grasps some of the information for syntactic structure prediction. Our experiment quantitatively analyzes several feature types' power for syntactic structure prediction and draws a series of interesting conclusions.
- Yanlei DIAO, Hongjun LU, and Dekai WU. "A comparative study of
classification based personal e-mail filtering". Fourth
Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD
2000): 408-419. Kyoto: Apr 2000.
This paper addresses personal E-mail filtering by casting it in the framework of text classification. Modeled as semi-structured documents, E-mail messages consist of a set of fields with predefined semantics and a number of variable length free-text fields. While most work on classification either concentrates on structured data or free text, the work in this paper deals with both of them. To perform classification, a naive Bayesian classifier was designed and implemented, and a decision tree based classifier was implemented. The design considerations and implementation issues are discussed. Using a relatively large amount of real personal E-mail data, comprehensive comparative study was conducted using the two classifiers. The importance of different features is reported. Results of other issues related to building an effective personal E-mail classifier are presented and discussed. It is shown that both classifiers can perform filtering with reasonable accuracy. While the decision tree based classifier outperforms the Bayesian classifier when features and training size are selected optimally for both, a carefully designed naive Bayesian classifier is more robust.
- Dekai WU, SUI Zhifang, and ZHAO Jun. "An information-based method for
selecting feature types for word prediction". Sixth European
Conference on Speech Communication and Technology (EUROSPEECH'99).
Budapest: Sep 1999.
This paper uses an information-based approach to conduct feature types selection for language modeling in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure through analyzing an English treebank corpus and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide reliable reference for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Learning a lightweight robust
deterministic parser". Sixth European Conference on Speech
Communication and Technology (EUROSPEECH'99). Budapest: Sep 1999.
We describe a method for automatically learning a parser from labeled, bracketed corpora that results in a fast, robust, lightweight parser that is suitable for real-time dialog systems and similar applications. Unlike ordinary parsers, all grammatical knowledge is captured in the learned decision trees, so no explicit phrase-structure grammar is needed. Another characteristic of the architecture is robustness, since the input need not fit pre-specified productions. The runtime architecture is very slim and references two learned decision trees that allow the parser to operate in a "strictly deterministic" manner in Marcus' (1977) sense. Even without using specific lexical features, we have achieved respectable labeled bracket accuracies of about 81% precision and 82% recall. Processing speed is more than 500 words per CPU second. We keep the parameter space small (in comparison to other statistically learned parsers) by using only part-of-speech tags and constituent labels as features for learning the decision trees. Without any optimization, the decision trees consume only 6M of memory, making it possible to run on platforms with limited memory. The learning method is readily applicable to other languages. Preliminary experiments on a Chinese corpus (which contains about 3000 sentences from Chinese primary school text) have yielded results comparable to that for English.
- Vincent CHOW and Dekai WU. "On the use of right context in
sense-disambiguating language models". Sixth European Conference
on Speech Communication and Technology (EUROSPEECH'99). Budapest:
Sep 1999.
We investigate the utility of right-context (look-ahead information) in incremental left-to-right language models with word sense disambiguation, and discover somewhat unexpectedly that using right-context in addition to left-context (history) may actually reduce accuracy. We describe a left-to-right incremental naive-Bayes sense disambiguator, and then experimentally evaluate three apparently well-motivated extensions to take into account right-context information. The results argue that the contribution of right-context is limited, and that using it would probably necessitate sacrificing pure left-to-right processing.
- Shuwu ZHANG, Harald SINGER, Dekai WU, Yoshinori SAGISAKA. "Improving n-gram modeling using
distance-related unit association maximum entropy language modeling".
Sixth European Conference on Speech Communication and Technology
(EUROSPEECH'99). Budapest: Sep 1999.
In this paper, a distance-related unit association maximum entropy (DUAME) language modeling is proposed. This approach can model an event (unit subsequence) using the co-occurrence of full distance unit association (UA) features so that it is able to pursue a functional approximation to higher order N-gram with significantly less memory requirement. A smoothing strategy related to this modeling will also be discussed. Preliminary experimental results have shown that DUAME modeling is comparable to conventional N-gram modeling in perplexity.
- Daniel CHAN Ka-Leung and Dekai WU. "Automatically merging
lexicons that have incompatible part-of-speech categories". Joint
SIGDAT Conference on Empirical Methods in Natural Language Processing and
Very Large Corpora (EMNLP/VLC-99). Maryland: Jun 1999.
We present a new method to automatically merge lexicons that employ different incompatible POS categories. Such incompatibilities have hindered efforts to combine lexicons to maximize coverage with reasonable human effort. Given an "original lexicon", our method is able to merge lexemes from an "additional lexicon" into the original lexicon, converting lexemes from the additional lexicon with about 89% precision. This level of precision is achieved with the aid of a device we introduce called an anti-lexicon, which neatly summarizes all the essential information we need about the co-occurrence of tags and lemmas. Our model is intuitive, fast, easy to implement, and does not require heavy computational resources nor training corpus.
- Dekai WU, ZHAO Jun, and SUI Zhifang. "An
information-theoretic empirical analysis of dependency-based feature
types for word prediction models". Joint SIGDAT Conference on
Empirical Methods in Natural Language Processing and Very Large Corpora
(EMNLP/VLC-99). Maryland: Jun 1999.
Over the years, many proposals have been made to incorporate assorted types of feature in language models. However, discrepancies between training sets, evaluation criteria, algorithms, and hardware environments make it difficult to compare the models objectively. In this paper, we take an information theoretic approach to select feature types in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure, using a Chinese treebank and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide guidelines for feature type selection in language modeling.
- Aboy WONG and Dekai WU. "Are phrase structured
grammars useful in statistical parsing?". NLPRS 1999.
Beijing: Nov 1999.
In this paper, we argue: (1) To parse accurately, a grammar is not necessary. (2) It is possible to parse deterministically by not conforming to an explicit grammar. We support the above claims by presenting our parser, which is lightweight, grammar-less, deterministic and have the highest accuracy among tag based parsers. The speed of our parser is more than 500 words per CPU second and only 6M of memory is needed for loading the parsing model. In our architecture, the grammatical information is captured by the parsing model. Our parsing model differs from others in that, extra information about how to group constituents are provided. Thus an explicit grammar is not needed in our algorithm.
- Michael CARL and Dekai WU. "Inferring maximally
invertible bi-grammars for example-based machine translation".
NLPRS 1999. Beijing: Nov 1999.
This paper discusses inference strategies of context-free bi-grammars for example based machine translation (EBMT). The EBMT system EDGAR is discussed in detail. The notion of invertible contextfree feature bi-grammar is introduced in order to provide a means to decide upon the degree of ambiguity of the inferred bi-grammar. It is claimed that a maximally invertible bi-grammar can enhance the precision of the bilingual alignment process, reduce the complexity of the inferred grammar, and uncover inconsistencies in bi-corpora. This paper describes preliminary reflections and thus no empirical evaluation of the method is provided.
- Dekai WU and Hongsing WONG. "Machine translation with a
stochastic grammatical channel". COLING-ACL'98. Montreal:
Aug 1998.
We introduce a stochastic grammatical channel model for machine translation, that synthesizes several desirable characteristics of both statistical and grammatical machine translation. As with the pure statistical translation model described by Wu (1996) (in which a bracketing transduction grammar models the channel), alternative hypotheses compete probabilistically, exhaustive search of the translation hypothesis space can be performed in polynomial time, and robustness heuristics arise naturally from a language-independent inversion-transduction model. However, unlike pure statistical translation models, the generated output string is guaranteed to conform to a given target grammar. The model employs only (1) a translation lexicon, (2) a context-free grammar for the target language, and (3) a bigram language model. The fact that no explicit bilingual translation rules are used makes the model easily portable to a variety of source languages. Initial experiments show that it also achieves significant speed gains over our earlier model.
- Dekai WU. "A position statement on Chinese segmentation". Presented at the Chinese Language Processing Workshop, University of Pennsylvania, Philadelphia, Jul 1998.
- Dekai WU. "Stochastic inversion
transduction grammars and bilingual parsing of parallel corpora".
Computational Linguistics 23(3):377-404, Sep 1997.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well-suited to model ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
- Ciprian CHELBA, David ENGLE, Frederick JELINEK, Victor JIMENEZ, Sanjeev
KHUDANPUR, Lidia MANGU, Harry PRINTZ, Eric RISTAD, Ronald ROSENFELD,
Andreas STOLCKE, and Dekai WU. "Structure and performance of a
dependency language model". EUROSPEECH'97. Rhodes, Greece:
Sep 1997.
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar. Such a grammar expresses the relations between words by a directed graph. Because the edges of this graph may connect words that are arbitrarily far apart in a sentence, this technique can incorporate the predictive power of words that lie outside of bigram or trigram range. We have built several simple dependency models, as we call them, and tested them in a speech recognition experiment. We report experimental results for these models here, including one that has a small but statistically significant advantage (p < .02) over a bigram language model.
- Pascale FUNG, Bertram SHI, Dekai WU, LAM Wai Bun, and WONG Shuen
Kong. "Dealing with
multilinguality in a spoken language query translator".
ACL/EACL-97 Workshop on Spoken Language Translation. Madrid: Jul
1997.
Robustness is an important issue for multilingual speech interfaces for spoken language translation systems. We have studied three aspects of robustness in such a system: accent differences, mixed language input, and the use of common feature sets for HMM-based speech recognizers for English and Cantonese. The results of our preliminary experiments show that accent differences case recognizer performance to degrade. A rather surprising finding is that for mixed language input, a straightforward implementation of a mixed language model-based speech recognizer performs less well than the concatenation of pure language recognizers. Our experimental results also show that a common feature set, parameter set, and common algorithm lead to different performance output for Cantonese and English speech recognition modules.
- Dekai WU. "A
polynomial-time algorithm for statistical machine translation".
ACL-96: 34th Annual Meeting of the Assoc. for Computational
Linguistics. Santa Cruz, CA: Jun. 1996.
We introduce a polynomial-time algorithm for statistical machine translation. This algorithm can be used in place of the expensive, slow best-first search strategies in current statistical translation architectures. The approach employs the stochastic bracketing transduction grammar (SBTG) model we recently introduced to replace earlier word alignment channel models, while retaining a bigram language model. The new algorithm in our experience yields major speed improvement with no significant loss of accuracy.
- Xuanyin XIA and Dekai WU. "Parsing Chinese with an
almost-context-free grammar". EMNLP-96, Conference on Empirical
Methods in Natural Language Processing. Philadelphia: May 1996.
We describe a novel parsing strategy we are employing for Chinese. We believe progress in Chinese parsing technology has been slowed by the excessive ambiguity that typically arises in pure context-free grammars. This problem has inspired a modified formalism that enhances our ability to write and maintain robust large grammars, by constraining productions with left/right contexts and/or nonterminal functions. Parsing is somewhat more expensive than for pure context-free parsing, but is still efficient by both theoretical and empirical analyses. Encouraging experimental results with our current grammar are described.
- Dekai WU and Xuanyin XIA. "Large-scale automatic extraction of
an English-Chinese lexicon". Machine Translation
9(3-4): 285-313. 1995.
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant vocabulary and corpus size. The learned vocabulary size is about 6,500 English words, achieving translation precision in the 86-96% range, with alignment proceeding at paragraph, sentence, and word levels.
Specifically, we report (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus, (2) experiments supporting the usefulness of restricted lexical cues for statistical paragraph and sentence alignment, and (3) experiments that question the role of hand-derived monolingual lexicons for automatic word translation acquitision.
Using a hand-derived monolingual lexicon, the learned translation lexicon averages 2.33 Chinese translations per English entry, with a manually-filtered precision of 95.1%, and an automatically-filtered weighted precision of 86.0%. We then introduce a fully automatic two-stage statistical methodology that is able to learn translations for collocations. A statistically-learned monolingual Chinese lexicon is first used to segment the Chinese text, before applying bilingual training to produce 6,429 English entries with 2.25 Chinese translations per entry. This method improves the manually-filtered precision to 96.0% and the automatically-filtered weighted precision to 91.0%, an error rate reduction of 35.7% from using a hand-derived monolingual lexicon.
- Dekai WU. "Stochastic
inversion transduction grammars, with application to segmentation,
bracketing, and alignment of parallel corpora". IJCAI-95: 14th
Intl. Joint Conf. on Artificial Intelligence, 1328-1335. Montreal:
Aug 1995.
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with potential application to a variety of parallel corpus analysis problems. The formalism combines three tactics against the constraints that render finite-state transducers less useful: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual parsing algorithm. A convenient normal form is shown to exist, and we discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks.
- Dekai WU. "An algorithm
for simultaneously bracketing parallel texts by aligning words".
ACL-95: 33rd Annual Meeting of the Assoc. for Computational
Linguistics, 244-251. Cambridge, MA: Jun 1995.
We describe a grammarless method for simultaneously bracketing both halves of a parallel text and giving word alignments, assuming only a translation lexicon for the language pair. We introduce inversion-invariant transduction grammars which serve as generative models for parallel bilingual sentences with weak order constraints. Focusing on transduction grammars for bracketing, we formulate a normal form, and a stochastic version amenable to a maximum-likelihood bracketing algorithm. Several extensions and experiments are discussed.
- Dekai WU. "Trainable
coarse bilingual grammars for parallel text bracketing". WVLC-3:
3rd Annual Workshop on Very Large Corpora, 69-82. Cambridge, MA: Jun
1995.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe two new strategies to automatic bracketing of parallel corpora, with particular application to languages where prior grammar resources are scarce: (1) coarse bilingual grammars, and (2) unsupervised training of such grammars via EM (expectation-maximization). Both methods build upon a formalism we recently introduced called stochastic inversion transduction grammars. The first approach borrows a coarse monolingual grammar into our bilingual formalism, in order to transfer knowledge of one language's constraints to the task of bracketing the texts in both languages. The second approach generalizes the inside-outside algorithm to adjust the grammar parameters so as to improve the likelihood of a training corpus. Preliminary experiments on parallel English-Chinese text are supportive of these strategies.
- Dekai WU. "Grammarless
extraction of phrasal translation examples from parallel texts".
TMI-95, Sixth International Conference on Theoretical and
Methodological Issues in Machine Translation, v2, 354-372. Leuven,
Belgium: Jul 1995.
We describe a method for identifying subsentential phrasal translation examples in sentence-aligned parallel corpora, using only a probabilistic translation lexicon for the language pair. Our method differs from previous approaches in that (1) it is founded on a formal basis, making use of an inversion transduction grammar (ITG) formalism that we recently developed for bilingual language modeling, and (2) it requires no language-specific monolingual grammars for the source and target languages. Instead, we devise a generic, language-independent constituent-matching ITG with inherent expressiveness properties that correspond to a desirable level of matching flexibility. Bilingual parsing, in conjunction with a stochastic version of the ITG formalism, performs the phrasal translation extraction.
- Dekai WU and Cindy NG. "Using brackets to improve
search for statistical machine translation". PACLIC-10, 10th
Pacific Asia Conference on Language, Information and Computation.
Hong Kong: Dec 1995.
We propose a method to improve search time and space complexity in statistical machine translation architectures, by employing linguistic bracketing information on the source language sentence. It is one of the advantages of the probabilistic formulation that competing translations may be compared and ranked by a principled measure, but at the same time, optimizing likelihoods over the translation space dictates heavy search costs. To make statistical architectures practical, heuristics to reduce search computation must be incorporated. An experiment applying our method to a prototype Chinese-English translation system demonstrates substantial improvement.
- Pascale FUNG and Dekai WU. "Coerced Markov Models for
cross-lingual lexical tag relations". TMI-95, Sixth International
Conference on Theoretical and Methodological Issues in Machine
Translation, v1, 240-255. Leuven, Belgium: Jul 1995.
We introduce the Coerced Markov Model (CMM) to model the relationship between the lexical sequence of a source language and the tag sequence of a target language, with the objective of constraining search in statistical transfer-based machine translation systems. CMMs differ from standard hidden Markov models in that state sequence assignments can take on values coerced from external sources. Given a Chinese sentence, a CMM can be used to predict the corresponding English tag sequence, thus constraining the English lexical sequence produced by a translation model. The CMM can also be used to score competing translation hypotheses in N-best models. Three fundamental problems for CMM designed are discussed. Their solutions lead to the training and testing stages of CMM.
- Eva FONG and Dekai WU. "Learning restricted
probabilistic link grammars". IJCAI-95 Workshop on New Approaches
to Learning for Natural Language Processing. Montreal: Aug 1995.
Also in Stefan WERMTER, Ellen RILOFF, Gabriele SCHELER (editors), Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing, 173-187. 1996. Berlin: Springer-Verlag.
We describe a language model employing a new headed-disjuncts formulation of Lafferty's (1992) probabilistic link grammar, together with (1) an EM training method for estimating the probabilities, and (2) a procedure for learning some simple lexicalized grammar structures. The model in its simplest form is a generalization of n-gram models, but in its general form possesses context-free expressiveness. Unlike the original experiments on probabilistic link grammars, we assume that no hand-coded grammar is initially available (as with n-gram models). We employ untyped links to concentrate the learning on lexical dependencies, and our formulation uses the lexical identities of heads to influence the structure of the parse graph. After learning, the language model consists of grammatical rules in the form of a set of simple disjuncts for each word, plus several sets of probability parameters. The formulation extends cleanly toward learning more powerful context-free grammars. Several issues relating to generalization bias, linguistic constraints, and parameter smoothing are considered. Preliminary experimental results on small artificial corpora are supportive of our approach.
- Dekai WU and Pascale FUNG. "Improving Chinese tokenization
with linguistic filters on statistical lexical acquisition".
ANLP-94: 4th Conference on Applied Natural Language Processing,
180-181. Stuttgart: Oct 1994.
The first step in Chinese NLP is to tokenize or segment character sequences into words, since the text contains no word delimiters. Recent heavy activity in this area has shown the biggest stumbling block to be words that are absent from the lexicon, since successful tokenizers to date have been based on dictionary lookup (e.g., Chang & Chen 1993, Chiang et al. 1992, Lin et al. 1993, Wu & Tseng 1993, Sproat et al. 1994).
We present empirical evidence for four points concerning tokenization of Chinese text:
(1) More rigorous ``blind'' evaluation methodology is needed to avoid inflated accuracy measurements; we introduce the nk-blind method.
(2) The extent of the unknown-word problem is far more serious than generally thought, when tokenizing unrestricted texts in realistic domains.
(3) Statistical lexical acquisition is a practical means to greatly improve tokenization accuracy with unknown words, reducing error rates as much as 32.0%.
(4) When augmenting the lexicon, linguistic constraints can provide simple inexpensive filters yielding significantly better precision, reducing error rates as much as 49.4%. - Dekai WU and Xuanyin XIA. "Learning an English-Chinese
lexicon from a parallel corpus". AMTA-94: Assoc. for Machine
Translation, 206-213. Columbia, MD: Oct 1994.
We report experiments on automatic learning of an English-Chinese translation lexicon, through statistical training on a large parallel corpus. The learned vocabulary size is non-trivial at 6,517 English words averaging 2.33 Chinese translations per entry, with a manually-filtered precision of 95.1% and a single-most-probable precision of 91.2%. We then introduce a significance filtering method that is fully automatic, yet still yields a weighted precision of 86.0%. Learning of translations is adaptive to the domain. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant corpus size with a non-toy vocabulary.
- Pascale FUNG and Dekai WU. "Statistical augmentation of a
Chinese machine-readable dictionary". WVLC-2: 2nd Annual Workshop
on Very Large Corpora, 69-85. Kyoto: Aug 1994.
Also in Susan ARMSTRONG, Kenneth W. CHURCH, Pierre ISABELLE, Sandra MANZI, Evelyne TZOUKERMANN, and David YAROWSKY (editors), Natural Language Processing Using Very Large Corpora. Dordrecht: Kluwer. ISBN 0-7923-6055-9. Nov 1999.
We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.
- Dekai WU. "Aligning a
parallel English-Chinese corpus statistically with lexical criteria".
ACL-94: 32nd Annual Meeting of the Assoc. for Computational
Linguistics, 80-87. Las Cruces, NM: Jun 1994.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Dekai WU. Aligning parallel English-Chinese texts
statistically with lexical criteria. Technical Report HKUST-CS93-9.
We describe our experience with automatic alignment of sentences in parallel English-Chinese texts. Our report concerns three related topics: (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus; (2) experiments addressing the applicability of Gale & Church's (1991) length-based statistical method to the task of alignment involving a non-Indo-European language; and (3) an improved statistical method that also incorporates domain-specific lexical cues.
- Graeme HIRST and Dekai WU. "Not all reflexive reasoning is deductive". Behavioral and Brain Sciences 16(3): 462-463. 1993.
- Dekai WU. "Approximating maximum-entropy
ratings for evidential parsing and semantic interpretation".
IJCAI-93: 13th Intl. Joint Conf. on Artificial Intelligence,
1290-1296. Chamberry, France: Aug 1993.
We consider the problem of assigning probabilistic ratings to hypotheses in a natural language interpretation system. To facilitate integrating syntactic, semantic, and conceptual constraints, we allow a fully compositional frame representation, which permits co-indexed syntactic constituents and/or semantic entities filling multiple roles. In addition the knowledge base contains probabilistic information encoded by marginal probabilities on frames. These probabilities are used to specify typicality of real-world scenarios on one hand, and conventionality of linguistic usage patterns on the other. Because the theoretical maximum-entropy solution is infeasible in the general case, we propose an approximate method. This method's strengths are (1) its ability to rate compositional structures, and (2) its flexibility with respect to the inputs chosen by the system it is embedded in. Arbitrary sets of hypotheses from the front-end processor can be accepted, as well as arbitrary subsets of constraints heuristically chosen from the long-term knowledge base.
- Dekai WU. "Estimating
probability distributions over hypotheses with variable unification".
AAAI-93: 11th National Conf. on Artificial Intelligence,
790-795. Washington, D.C.: Jul 1993.
We analyze the difficulties in applying Bayesian belief networks to language interpretation domains, which typically involve many unification hypotheses that posit variable bindings. As an alternative, we observe that the structure of the underlying hypothesis space permits an approximate encoding of the joint distribution based on marginal rather than conditional probabilities. This suggests an implicit binding approach that circumvents the problems with explicit unification hypotheses, while still allowing hypotheses with alternative unifications to interact probabilistically. The proposed method accepts arbitrary subsets of hypotheses and marginal probability constraints, is robust, and is readily incorporated into standard unification-based and frame-based models.
- Dekai WU. "An
image-schematic system of thematic roles". PACLING-93: 1st Conf.
of the Pacific Association for Computational Linguistics, 323-332.
Vancouver: Apr 1993.
We describe a system of thematic roles and frames designed to address a number of problems in semantic representations at the lexical semantic level. Our primary objective is broad expressiveness, so that real domains can practically be encoded. However, for both empirical and computational reasons we limit the number of role types to four, allocating this structure to the strongest associations. We show how the system incorporates image-schematic semantics to encode various schematization operations relating to scales and reification.
- Andreas STOLCKE and Dekai WU. "Tree matching with
recursive distributed representations". AAAI 1992 Workshop on
Integrating Neural and Symbolic Processes---The Cognitive Dimension.
San Jose, CA: Jul 1992. Also available as ICSI Technical Report TR-92-025.
We present an approach to the structure unification problem using distributed representations of hierarchical objects. Binary trees are encoded using the recursive auto-association method (RAAM), and a unification network is trained to perform the tree matching operation on the RAAM representations. It turns out that this restricted form of unification can be learned without hidden layers and producing good generalization if we allow the error signal from the unification task to modify both the unification network and the RAAM representations themselves.
- Dekai WU. "Active acquisition of
user models: Implications for decision-theoretic dialog planning and plan
recognition". User Modeling and User-Adapted Interaction
1(2): 149-172. 1991.
This article investigates the implications of active user model acquisition upon plan recognition, domain planning, and dialog planning in dialog architectures. A dialog system performs active user model acquisition by querying the user during the course of the dialog. Existing systems employ passive strategies that rely on inferences drawn from passive observation of the dialog. Though passive acquisition generally reduces unnecessary dialog, in some cases the system can effectively shorten the overall dialog length by selectively initiating subdialogs for acquiring information about the user.
We propose a theory identifying conditions under which the dialog system should adopt active acquisition goals. Active acquisition imposes a set of rationality requirements not met by current dialog architectures. To ensure rational dialog decisions, we propose significant extensions to plan recognition, domain planning, and dialog planning models, incorporating decision-theoretic heuristics for expected utility. The most appropriate framework for active acquisition is a multi-attribute utility model wherein plans are compared along multiple dimensions of utility. We suggest a general architectural scheme, and present an example from a preliminary implementation.
- Dekai WU. "A continuum of induction methods for learning probability distributions with generalization". Thirteenth Annual Conference of the Cognitive Science Society (CogSci 1991). Chicago. 949-953.
- Dekai WU. "Probabilistic unification-based integration of syntactic and semantic preferences for nominal compounds". Thirteenth International Conference on Computational Linguistics (COLING 1990). Helsinki. v2 413-418.
- Dekai WU. "A probabilistic approach to marker propagation". IJCAI 1989. Detroit, MI. 574-582.
- Dekai WU. "Review of Natural Language Understanding". AI Magazine 10(1): 88-90 (1989).
- Robert WILENSKY, David N CHIN, Marc LURIA, James H MARTIN, James H
MAYFIELD, and Dekai WU. "The Berkeley UNIX
Consultant Project". Computational Linguistics 14(3):
35-84 (1988). Also available as UC
Berkeley Technical Report CSD-89-520.
UC (UNIX Consultant) is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
- Dekai WU. "Concretion inferences in natural language understanding". GWAI 1987. Springer-Verlag. 74-83.
- Robert WILENSKY, James MAYFIELD, Anthony ALBERT, David CHIN, Charles
COX, Marc LURIA, James H MARTIN, and Dekai WU. UC---A Progress
Report. UC
Berkeley Technical Report CSD-87-303.
UC is an intelligent, natural-language interface that allows naive users to learn about the UNIX operating system. UC was undertaken because the task was thought to be both a fertile domain for Artificial Intelligence research and a useful application of AI work in planning, reasoning, natural language processing, and knowledge representation.
Department of Computer Science | |
The Hong Kong University of Science and Technology | |
All rights reserved |