HKUST's ideal research environment is situated in beautiful Clear Water Bay, Hong Kong.

HLTC Special Seminar Series

THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
Department of Computer Science and Engineering
Department of Electronic and Computer Engineering

Human Language Technology Center
Department of Information & Systems Management
Department of Mathematics
Department of Computer Science and Engineering
JOINT STATISTICS SEMINAR

Nonparametric Bayesian Methods in Language Modeling

Dr. Daichi MOCHIHASHI
NTT Communication Science Laboratories

Date :     23 May 2008 (Friday)
Time :     16:00-17:00
Venue :   LTH (Chen Kuan Cheng Forum, Lifts 27/28)

Abstract

In this talk, I will introduce some nonparametric Bayesian approaches recently grown in natural language processing, using such as Dirichlet processes, Pitman-Yor processes and their hierarchical extensions.

In the first part of the talk, I will first present what natural language processing is and why language modeling is a quite interesting and important problem. Nonparametric Bayesian priors will prove very useful there: it allows to automatically infer latent "categories" (syntactic and semantic) without human intervention, which needed enormous effort and are often inaccurate to descibe actual phenomena in natural language. Among many natural language processing techniques, "n-gram" language models, i.e. (n-1) order Markov models over words, are very fundamental and heavily used in speech recognition and statistical machine translation.

In the second part of the talk, I will present my latest work on "infinite-gram" language model or "infinite Markov model" in NIPS 2007, where Markov order n is integrated out nonparametrically. This amounts to introducing a very simple prior over stochastic infinite trees, other than the Kingman's coalescents: it might have a close relationship to tailfree processes. I will present experimental results on large texts using a Gibbs sampler, and discuss about exchangeability and relationship to information theory.

Biography

Daichi Mochihashi is a postdoctoral researcher in NTT Communication Science Laboratories, Kyoto, Japan (Japanese equivalent of AT&T Labs Research). He obtained his BS and PhD from University of Tokyo and Nara Institute of Science and Technology, respectively, in 1998 and 2005. His main interest is natural language processing, especially from Bayesian point of view. After graduation, he was a researcher at ATR Spoken Language Communication Research Laboratories and conducted research on language modeling. He joined NTT in 2007, as a member of machine learning group.

*** All are Welcome ***

For enquiries, please call 2358-7008 or visit our website at http://www.cs.ust.hk/~hltc/seminars.html.

Human Language Technology Center
The Hong Kong University of Science & Technology
HKUST, Clear Water Bay, Hong Kong
+852 2358-8831
hltc@cs.ust.hk
http://www.cs.ust.hk/~hltc
Last updated: 2008.05.16 Dekai Wu