COMP 4221/5221 - Spring 2019
Spring 2019, COMP 4221 Introduction to Natural Language Processing
[3-0-1:3]
Spring 2019, COMP 5221 Natural Language Processing [3-0-0:3]
Lecture 1, TuTh 10:30-11:50, Rm 5619 (L31-32)
Prof. Dekai WU, Rm 3556,
2358-6989, dekai@cs.ust.hk
Tut 1A W 10:30-11:20, Rm 2407, L17-18
You are welcome to knock on the door of the instructor any time. The TA's office hours are posted at http://course.cs.ust.hk/comp4221/ta/.
ANNOUNCEMENTS
Welcome to COMP4221 for UGs and COMP5221 for PGs! (The COMP4221 course was formerly called COMP300H and COMP326, and the COMP5221 course for PGs was formerly called COMP526.) Tutorials will begin in Week 2.
Always check the Discussion Forum for up-to-the-minute
announcements.
Discussion forum is at http://comp151.cse.ust.hk/~dekai/content/?q=forum/3.
Always read before asking/posting/emailing your question. You must
register for your account at the first lecture, tutorial, or lab.
Course home page is at http://www.cs.ust.hk/~dekai/4221/.
Tutorial info is at http://course.cs.ust.hk/comp4221/ta/.
ORIENTATION
< hx:include src=html/outcomes.html> hx:include>Abbreviated Course Catalog Description
COMP 4221. Human language technology for text and spoken language. Machine learning, syntactic parsing, semantic interpretation, and context-based approaches to machine translation, text mining, and web search.
Course Description
Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.
TEXTBOOKS
- Introduction to Text Alignment: Statistical Machine Translation Models from Bitexts to Bigrammars (forthcoming), by Dekai WU. Springer.
- Artificial Intelligence: A Modern Approach (2nd Edition), by Stuart RUSSELL and Peter NORVIG. Prentice-Hall, 2003. ISBN-13: 978-0137903955.
- Structure and Interpretation of Computer Programs (2nd edition),
by Harold ABELSON and Gerald Jay SUSSMAN,
with Julie SUSSMAN. MIT Press, 1984. ISBN-10:
0-262-01077-1.
Full text and code are available online at no cost for the Scheme book (Structure and Interpretation of Computer Programs) at http://mitpress.mit.edu/sicp/.
HONOR POLICY
To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.Plagiarism
All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.
University policy requires that students who cheat more than once be expelled. Please review the cheating topic from your UST Student Orientation.
Warning: sophisticated plagiarism detection systems are in operation!
Collaboration
You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.GRADING
Course grading will be adjusted to the difficulty of assignments and exams. Moreover, I guarantee you the following.
If you achieve | 85% | you will receive at least a | A | grade. |
75% | B | |||
65% | C | |||
55% | D |
Your grade will be determined by a combination of factors:
Midterm exam | ~20% |
Final exam | ~25% |
Participation | ~5% |
Assignments | ~50% |
Examinations
No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.There will be one midterm worth approximately 20%, and one final exam worth approximately 25%.
Participation
Science and engineering (including software engineering!) is about communication between people. Good participation in class and/or the online forum will count for approximately 5%.
Assignments
All assignments must be submitted by 23:00 on the due date. Scheme programming assignments must run under Chicken Scheme on Linux. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.
Programming assignments will account for a total of approximately 50%.
Tutorials
All information for tutorials is at http://course.cs.ust.hk/comp4221/ta/.
SYLLABUS
Date | Wk | Event | Topic | |
2019.01.31 | 1 | Lecture | Welcome; Introduction; Survey | |
2019.02.05 | 1 | Holiday | Lunar New Year | |
2019.02.07 | 1 | Holiday | Lunar New Year | |
2019.02.12 | 2 | Lecture | Impact of AI on ethics and society | |
2019.02.13 | 2 | Lecture | Does God play dice? Assumptions: scientific method, hypotheses, models, learning, probability; linguistic relativism and the Sapir-Whorf hypothesis; inductive bias, language bias, search bias; the great cycle of intelligence [at tutorial] | |
2019.02.14 | 2 | Lecture | Languages of the world Administrivia (honor statement, HKUST classroom conduct) |
|
2019.02.19 | 3 | Lecture | Learning to translate: engineering, social, and scientific motivations | |
2019.02.20 | 3 | Lecture | "It's all Chinese to me": linguistic complexity; challenges in modeling translation [at tutorial] | |
2019.02.21 | 3 | Lecture | Is machine translation intelligent? Interactive simulation [at tutorial] | |
2019.02.26 | 4 | Lecture | Evaluating translation quality: alignment; aligning semantic frames: Interactive exercise | |
2019.02.27 | 4 | Lecture | Evaluating translation quality: HMEANT [at tutorial] | |
2019.02.28 | 4 | Lecture | Evaluating translation quality: MEANT | |
2019.03.05 | 5 | Lecture | Evaluating translation quality: semantic role labeling (SRL), case frames, semantic frames, predicate-argument structure | |
2019.03.06 | 5 | Lecture | Automatic semantic role labeling (ASRL) [at tutorial] | |
2019.03.07 | 5 | Lecture | Implementing a feedforward neural network based part-of-speech tagger Assignment 1 due 2019.03.15 23:59; context-independent POS tagging | |
2019.03.12 | 6 | Lecture | I/O representations for feedforward networks; context-dependent POS tagging | |
2019.03.13 | 6 | Tutorial | Basic probability theory; conditional probabilities; Bayes' theorem | |
2019.03.14 | 6 | Lecture | Example-based, instance-based, memory-based, case-based, analogy-based, lazy learning for classification; translation via nearest neighbors (NN); k-NN; weighted k-NN | |
2019.03.19 | 7 | Exam | Midterm (closed book; handwritten notebook only) | |
2019.03.21 | 7 | Lecture | Midterm review; machine translation techniques | |
2019.03.26 | 8 | Lecture | Learning vs performance components in machine learning; supervised learning; Word sense disambiguation; lexical choice; example-based prediction models; nearest neighbor classifiers; similarity metrics; kNN classifiers | |
2019.03.28 | 8 | Lecture | Exploring different feedforward neural network architectures for POS tagging; model design following scientific method for machine learning in practice Assignment 2 due 2019.04.10 23:59; AI ethics | |
2019.04.02 | 9 | Lecture | Naive Bayes classifiers for WSD and lexical choice | |
2019.04.04 | 9 | Lecture | Modern approaches to SRL; corporate responsibility in AI | |
2019.04.09 | 10 | Lecture | Implementing chunkers and shallow parsers via IOBES tagging plus a POS tagger Assignment 3 due 2019.05.10 23:59 | |
2019.04.11 | 10 | Lecture | Chunking via IOBES representations; shallow bracketing | |
2019.04.16 | 11 | Lecture | Shallow syntactic parsing; shallow semantic parsing; language bias of IOBES representations, bags of words, and one-hot representations | |
2019.04.18 | 11 | Lecture | Introduction to word embeddings | |
2019.04.23 | 11 | Lecture | Vector space models; classic word vector approaches | |
2019.04.25 | 11 | Lecture | Learning word embeddings via prediction tasks; skip-grams; word2vec; Assignment 3 discussion | |
2019.04.30 | 12 | Lecture | Recursive autoencoders (RAEs) and recursive auto-associative memories (RAAM); learning word embeddings by making RAEs predict; AI ethics in China; utilitarian and consequentialist ethics; Asilomar AI principles | |
2019.05.02 | 12 | Lecture | Context-free grammars (CFGs); generative vs parsing models; top-down vs bottom-up parsing; dynamic programming based chart parsing; Cocke-Kasami-Younger (CKY) parsing | |
2019.05.07 | 13 | Lecture | From CFGs to ITGs (monolingual vs bilingual modeling); how bilingual conditions make grammar induction easier; the mystery of the magic number 4 in semantic frames; simple and full syntax-directed transduction grammars (SDTGs); introductioon to inversion transduction grammars (ITGs); tree vs matrix constituent alignment visualizations | 2019.05.08 | 13 | Lecture | ITG characteristics; stochastic ITGs; polynomial-time transduction and learning; resolving the mystery of the magic number 4 | 2019.05.09 | 13 | Lecture | From RAAM to TRAAM (transduction RAAM); recursive neural network realizations of ITGs; a self-learning rap battle bot |
2019.05.27 | Exam | Final (closed book; handwritten notebook only) LG1 Table Tennis Room (LG1031) 08:30-11:30 |
dekai@cs.ust.hk
Last updated: 2019.05.09