COMP 326 (300H) - Spring 2011
Spring 2011, COMP 300H Introduction to Natural Language Processing
[3-0-1:3]
Lecture 1, TTh 12:00-13:20, Rm 1511
Prof. Dekai WU, Rm 3539,
2358-6989, dekai@cs.ust.hk
Lab 1A TA: Jackie LO Chi-kiu, W 17:00-17:50, Rm 4214, jackielo@cs.ust.hk
You are welcome to knock on the door of the instructor any time. The TAs' office hours are posted at http://course.cs.ust.hk/comp300h/ta/.
ANNOUNCEMENTS
Welcome to COMP326! (This course is temporarily called COMP300H while the official course code is being added to the academic calendar.) Tutorials will begin after Week 2.
Always check the Discussion Forum for up-to-the-minute
announcements.
Discussion forum is at http://comp151.cse.ust.hk/~dekai/content/?q=forum/3.
Always read before asking/posting/emailing your question. This forum is based
on modern, powerful software, instead of using the old clunky ITSC newsgroup.
Course home page is at http://www.cs.ust.hk/~dekai/326/.
Tutorial info is at http://course.cs.ust.hk/comp300h/ta/.
ORIENTATION
Academic Calendar Description
COMP 326. Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.
TEXT/REFERENCE BOOKS
- Handbook of Natural Language Processing (2nd edition). Nitin INDURKHYA and Fred J. DAMERAU (editors). Chapman & Hall / CRC Press, 2010. ISBN-13: 978-1420085921.
- Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (2nd edition). Daniel JURAFSKY and James H. MARTIN. Prentice Hall, 2008. ISBN-13: 978-0131873216.
- Foundations of Statistical Natural Language Processing. Christopher D. MANNING and Hinrich SCHÜTZE. MIT Press, 1999. ISBN-13: 978-0262133609.
- Natural Language Processing with Python. Steven BIRD, Ewan KLEIN and Edward LOPER. O'Reilly Media, 2009. ISBN-13: 978-0596516499.
- Artificial Intelligence: A Modern Approach (2nd Edition), by Stuart RUSSELL and Peter NORVIG. Prentice-Hall, 2003. ISBN-13: 978-0137903955.
- Structure and Interpretation of Computer Programs (2nd edition),
by Harold ABELSON and Gerald Jay SUSSMAN,
with Julie SUSSMAN. MIT Press, 1984. ISBN-10:
0-262-01077-1.
Full text and code are available online at no cost for the Scheme book (Structure and Interpretation of Computer Programs) at http://mitpress.mit.edu/sicp/.
HONOR POLICY
To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.Plagiarism
All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.
University policy requires that students who cheat more than once be expelled. Please review the academic integrity topic from your UST Student Orientation.
Warning: sophisticated plagiarism detection systems are in operation!
Collaboration
You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.GRADING
The course will be graded on a curve, but no matter what the curve is, I guarantee you the following.
If you achieve | 85% | you will receive at least a | A | grade. |
75% | B | |||
65% | C | |||
55% | D |
Your grade will be determined by a combination of factors:
Midterm exam | ~20% |
Final exam | ~30% |
Participation | ~10% |
Assignments | ~40% |
Examinations
No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.There will be one midterm worth approximately 20%, and one final exam worth approximately 30%.
Participation
Science and engineering (including software engineering!) is about communication between people. Good participation in class and/or the online forum will count for approximately 10%.
Assignments
All assignments must be submitted by 23:00 on the due date. Scheme programming assignments must run under Chicken Scheme on Linux. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.
Programming assignments will account for a total of approximately 50%.
Tutorials
All information for tutorials is at http://course.cs.ust.hk/comp300h/ta/.
SCHEDULE
Date | Wk | Event | Topic | Supplemental | Assignments |
2011.02.08 | 1 | Lecture | Administrivia | ||
2011.02.10 | 1 | Lecture | Interactive simulation: Testing AI via translation | ||
2011.02.15 | 2 | Lecture | Why MT? Driving AI and CS research: scientific, engineering, and social implications | ||
2011.02.17 | 2 | Lecture | Why Chinese MT? The world's most difficult languages and translations; Data analysis of empirical simulation results: lexical context, syntax, semantics, pragmatics, real world knowledge | Rosenberg 1979 | |
2011.02.24 | 3 | Lecture | Word sense disambiguation; Classification problems; Naive Bayes classifier models | ||
2011.03.01 | 4 | Lecture | Source-channel (noisy channel) models; Bayesian models; Generative models; Alignment | ||
2011.03.03 | 4 | Lecture | Expectation-maximization; Parallel corpora; SMT Model 0; EM training for Model 0 lexical translation probabilities | ||
2011.03.08 | 5 | Lecture | Entropy, cross-entropy, and perplexity; Bigram and n-gram language models | ||
2011.03.10 | 5 | Lecture | Upper bounding the entropy of English; Training vs. testing corpora; Maximum likelihood models; EM training of Model 0 alignment probabilities | Brown et al. 1992 | |
2011.03.15 | 6 | Lecture | IBM Model 1 with EM estimation | Brown et al. 2003 | |
2011.03.17 | 6 | Lecture | HMM/WFSA models; Unrolling HMMs | ||
2011.03.22 | 7 | Lecture | HMM alignment model; Generative vs. algebraic interpretation as first-order Markov model with hidden states | Vogel et al. 1996 | |
2011.03.24 | 7 | Lecture | EM training for HMM/WFSA models | Rabiner 1989 | |
2011.03.29 | 8 | Lecture | Finite-state language models; EM training of POS taggers | ||
2011.03.31 | 8 | Lecture | Review for midterm | ||
2011.04.05 | 9 | -- | Ching Ming Festival | ||
2011.04.06 | 9 | Exam | Midterm (Rm 3007 17:00-18:00, during Tutorial time) | ||
2011.04.07 | 9 | Lecture | WFSTs | ||
2011.04.12 | 10 | Lecture | IBM Model 2; IBM Model 3 | ||
2011.04.14 | 10 | Lecture | EM training for IBM Model 3; IBM Models 4 and 5 | ||
2011.04.19 | 11 | Lecture | Decoding with translation and language models; Extending HMM/WFSA models for word segmentation | ||
2011.04.21 | 11 | Lecture | Extending WFST models for bilingual word segmentation | ||
2011.04.26 | 12 | Lecture | Decoding with permutation models | ||
2011.04.28 | 12 | Lecture | Stochastic context-free grammars; Dependency grammar formulations; Parsing with SCFGs | ||
2011.05.10 | 14 | -- | The Birthday of the Buddha | ||
2011.05.13 | 14 | Lecture | Semantics; Word sense disambiguation for SMT; Semantic role labeling for SMT (Rm 2612A 16:00-18:40) | ||
2011.05.17 | 14 | Lecture | Syntax-directed transduction grammars; Stochastic inversion transduction grammars; Biparsing with SITGs | ||
2011.05.18 | 14 | Lecture | MT with ITGs; Generative capacity of ITGs; Cognitive modeling with ITGs; Semantic roles and ITGs; Experimental support for ITGs (Rm 3408 17:00-18:00, during Tutorial time) | ||
2011.05.26 | 15 | Exam | Final (Rm 3008 16:30-19:30) |
BACKGROUND REVIEW
Topics
- Scheme slides
- Scheme R5RS [html, pdf]
- Chicken Scheme 3.4 manual
- Chicken Scheme 3 eggs
- COMP221 A1
- COMP221 A2
dekai@cs.ust.hk
Last updated: 2011.05.18