HLTC Special Seminar Series

Department of Computer Science
Department of Electrical and Electronic Engineering

Human Language Technology Center

Linguistically Informed Engineering

Nianwen XUE
University of Colorado at Boulder

Date :     27 April 2006 (Thursday)
Time :     16:00-18:00
Venue :   Rm 2578 (HLTC Conference Room, Lifts 29-30)


Corpus-based approaches have been the predominant research paradigm in Natural Language Processing for the past decade or so. In the first part of this talk, I will describe our efforts to build a multi-layered, multi-dimensional Chinese corpus during the past seven years that we hope will provide the fuel for research in Chinese Language Processing. I will first discuss the syntactic annotation in the Chinese Treebank, and move on to the semantic annotation of verbs and their nominalizations in the Chinese Propbank. And finally I will touch on some preliminary work we have done on Chinese discourse connectives. A recurring issue in corpus annotation is the competing demands of linguistic (most intuitive and elegant representation) and engineering (high annotation consistency) principles and I will discuss some tradeoffs that we have made in each of the annotation tasks discussed. In the second part of the talk, I will describe some machine-learning systems that we have developed using this corpus as training and test material. I will present some experimental results on Semantic Role Labeling of Chinese verbs and their nominalizations and discuss some challenges facing Chinese NLP.


Nianwen Xue (Ph.D., Linguistics, University of Delaware, 2001) was a Postdoctoral Fellow at University of Pennsylvania from 2001 to 2005. He is currently a Senior Research Associate in the Center for Spoken Language Research at the University of Colorado at Boulder. He has been a key member of the Chinese Treebank Project since 1998, and the leader of the Chinese Proposition Bank, the Chinese NomBank, and other Chinese annotation projects. He has also developed machine-learning systems that make use of these resources to acquire linguistic and semantic structures, such as Chinese word segmentation, Chinese parsing, and Chinese and English Semantic Role Labeling.

