COMP 326 (300H) - Spring 2011

Spring 2011, COMP 300H Introduction to Natural Language Processing [3-0-1:3]
Lecture 1, TTh 12:00-13:20, Rm 1511
Prof. Dekai WU, Rm 3539, 2358-6989,

Lab 1A TA: Jackie LO Chi-kiu, W 17:00-17:50, Rm 4214,

You are welcome to knock on the door of the instructor any time. The TAs' office hours are posted at


Welcome to COMP326! (This course is temporarily called COMP300H while the official course code is being added to the academic calendar.) Tutorials will begin after Week 2.

Always check the Discussion Forum for up-to-the-minute announcements.

Discussion forum is at Always read before asking/posting/emailing your question. This forum is based on modern, powerful software, instead of using the old clunky ITSC newsgroup.
Course home page is at
Tutorial info is at


Academic Calendar Description

COMP 326. Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.



To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.


All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.

University policy requires that students who cheat more than once be expelled. Please review the academic integrity topic from your UST Student Orientation.

Warning: sophisticated plagiarism detection systems are in operation!


You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.


The course will be graded on a curve, but no matter what the curve is, I guarantee you the following.

If you achieve 85% you will receive at least a A grade.
75% B
65% C
55% D

Your grade will be determined by a combination of factors:

Midterm exam ~20%
Final exam ~30%
Participation ~10%
Assignments ~40%


No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.

There will be one midterm worth approximately 20%, and one final exam worth approximately 30%.


Science and engineering (including software engineering!) is about communication between people. Good participation in class and/or the online forum will count for approximately 10%.


All assignments must be submitted by 23:00 on the due date. Scheme programming assignments must run under Chicken Scheme on Linux. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.

Programming assignments will account for a total of approximately 50%.


All information for tutorials is at


Date Wk Event Topic Supplemental Assignments
2011.02.08 1 Lecture Administrivia
2011.02.10 1 Lecture Interactive simulation: Testing AI via translation
2011.02.15 2 Lecture Why MT? Driving AI and CS research: scientific, engineering, and social implications
2011.02.17 2 Lecture Why Chinese MT? The world's most difficult languages and translations; Data analysis of empirical simulation results: lexical context, syntax, semantics, pragmatics, real world knowledge Rosenberg 1979
2011.02.24 3 Lecture Word sense disambiguation; Classification problems; Naive Bayes classifier models
2011.03.01 4 Lecture Source-channel (noisy channel) models; Bayesian models; Generative models; Alignment
2011.03.03 4 Lecture Expectation-maximization; Parallel corpora; SMT Model 0; EM training for Model 0 lexical translation probabilities
2011.03.08 5 Lecture Entropy, cross-entropy, and perplexity; Bigram and n-gram language models
2011.03.10 5 Lecture Upper bounding the entropy of English; Training vs. testing corpora; Maximum likelihood models; EM training of Model 0 alignment probabilities Brown et al. 1992
2011.03.15 6 Lecture IBM Model 1 with EM estimation Brown et al. 2003
2011.03.17 6 Lecture HMM/WFSA models; Unrolling HMMs
2011.03.22 7 Lecture HMM alignment model; Generative vs. algebraic interpretation as first-order Markov model with hidden states Vogel et al. 1996
2011.03.24 7 Lecture EM training for HMM/WFSA models Rabiner 1989
2011.03.29 8 Lecture Finite-state language models; EM training of POS taggers
2011.03.31 8 Lecture Review for midterm
2011.04.05 9 -- Ching Ming Festival
2011.04.06 9 Exam Midterm (Rm 3007 17:00-18:00, during Tutorial time)
2011.04.07 9 Lecture WFSTs
2011.04.12 10 Lecture IBM Model 2; IBM Model 3
2011.04.14 10 Lecture EM training for IBM Model 3; IBM Models 4 and 5
2011.04.19 11 Lecture Decoding with translation and language models; Extending HMM/WFSA models for word segmentation
2011.04.21 11 Lecture Extending WFST models for bilingual word segmentation
2011.04.26 12 Lecture Decoding with permutation models
2011.04.28 12 Lecture Stochastic context-free grammars; Dependency grammar formulations; Parsing with SCFGs
2011.05.10 14 -- The Birthday of the Buddha
2011.05.13 14 Lecture Semantics; Word sense disambiguation for SMT; Semantic role labeling for SMT (Rm 2612A 16:00-18:40)
2011.05.17 14 Lecture Syntax-directed transduction grammars; Stochastic inversion transduction grammars; Biparsing with SITGs
2011.05.18 14 Lecture MT with ITGs; Generative capacity of ITGs; Cognitive modeling with ITGs; Semantic roles and ITGs; Experimental support for ITGs (Rm 3408 17:00-18:00, during Tutorial time)
2011.05.26 15 Exam Final (Rm 3008 16:30-19:30)


Last updated: 2011.05.18