COMP4221 Introduction to Natural Language Processing, Fall 2020, HKUST

Dekai Wu |

Course organization



All lectures and tutorials will be held ONLINE LIVE INTERACTIVELY at the regularly scheduled times.

You can find the recurring Zoom meetings for the lectures and tutorials in Canvas. You are highly recommended to join the meetings from there. Note that these Zoom meetings only admit authenticated users with ITSC accounts (with domain or You can only join the meetings via either of the two paths above.You must register for the lectures and tutorials at the following links. After registering, you will receive a confirmation email containing information about joining the meeting.

After you are registered, you may use the following links to join the lectures and tutorials:

If you haven’t done so, please watch this video to get your HKUST Zoom account ready as soon as possible, not just for this course but also for all other courses at HKUST:

Times and places

Lecture 1: TuTh 09:00-10:20, Rm TBA

Tutorial 1: M 16:00-16:50, Rm TBA

Office hours: TuThu 10:30-11:00. The TA's office hours are posted at


Course: is the master home page for the course.

Tutorial: contains all information for the tutorials.

Forum: is where all discussion outside class should be done. Always read before asking/posting/emailing your question. Note that you must register for your account at the first lecture, tutorial, or lab.


Abbreviated course catalog description

COMP 4221. Human language technology for text and spoken language. Machine learning, syntactic parsing, semantic interpretation, and context-based approaches to machine translation, text mining, and web search.

Course description

Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.

Learning objectives

At the end of the Natural Language Processing course, you will have achieved the following outcomes.

  1. General
    1. Possess solid understanding of the fundamental concepts of natural language processing
    2. Possess solid understanding of the fundamental concepts of machine translation, and grasp how it stress tests all aspects of human intelligence and language processing
  2. Transduction
    1. Know foundational input-output formulations of transduction, such as alignment, chunking, classification, dependency relations, and parsing
    2. Understand the relationship between noisy channel and loglinear models of string transduction, and their Bayesian interpretations
  3. Syntax
    1. Understand the relationship between word segmentation and phrasal lexicons, the relationship to transduction and alignment, and associated algorithms
    2. Understand the relationship between traditional grammatical formalisms versus stochastic and weighted grammars
    3. Understand the strengths and weaknesses of part-of-speech models, and associated tagging algorithms
    4. Understand the various fundamental approaches to parsing, and how they deal with syntactic ambiguity
  4. Alignment
    1. Understand how bilingual models of syntax generalize upon monolingual models to improve learnability
    2. Understand the combinatorial and empirical trade-offs between various learning models of alignment and compositionality, and their associated algorithms
    3. Understand the core methods for inducing lexicons, translation lexicons, phrasal translation lexicons, as well as permutation and reordering models
  5. Decoding
    1. Understand the combinatorial and empirical trade-offs between various runtime models for translation, and their associated algorithms
    2. Understand how bilingual transduction models generalize upon monolingual parsing models
  6. Semantics
    1. Understand lexical semantics models for word sense disambiguation, their relationship to phrasal lexicons and transduction, and associated ambiguity resolution algorithms
    2. Understand lexical semantics models for semantic frames (predicate-argument structures), and associated semantic role labeling algorithms



Honor policy

To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.


All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.

University policy requires that students who cheat more than once be expelled. Please review the cheating topic from your UST Student Guide.

Warning: sophisticated plagiarism detection systems are in operation!


You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.


Course grading will be adjusted to the difficulty of assignments and exams. Moreover, I guarantee you the following.

Grade guarantees
If you achieve 85% you will receive at least a A grade.
75% B
65% C
55% D

Your grade will be determined by a combination of factors:

Grade weighting
Exams 0% (due to university coronovirus meaures)
Pop quizzes ~10%
Class participation ~15%
Forum participation ~10%
Assignments ~65%


No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.


Science and engineering (including software engineering!) is about communication between people. Good participation in class will count for approximately 15%, and good participation in the online forum will count for approximately 10%.


All assignments must be submitted by 23:00 on the due date. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.

Scheme programming assignments must run under Chicken Scheme on Linux.

Programming assignments will account for a total of approximately 65%.

Required readings

Any linked material (unless labeled "Supplementary references") is required reading that you are responsible for.


We will choose from the topics is in the list below.


date wk event topic
20200908 1 Lecture Welcome; introduction; survey; administrivia (honor statement, HKUST classroom conduct)
20200910 2 Lecture Does God play dice? Assumptions: scientific method, hypotheses, models, learning, probability; linguistic relativism and the Sapir-Whorf hypothesis; inductive bias, language bias, search bias; the great cycle of intelligence
20200915 2 Lecture Languages of the world
20200917 3 Lecture Learning to translate: engineering, social, and scientific motivations
Lecture Language structures thought (Sapir-Whorf hypothesis); "It's all Chinese to me": linguistic complexity; challenges in modeling translation [at tutorial]
Lecture Is machine translation intelligent? Interactive simulation; Turing test; translation based test of AI; error analysis; what's still missing in AI
Lecture Introduction to search
Lecture State spaces; the anagram problem
Lecture Conditional decomposition; Markovian approximations; memoryless property; Markov models
Lecture Character n-gram models; goal states and objective functions
Lecture Uninformed search; BFS, DFS, depth-bounded search, iterative deepening, bidirectional search
Lecture Informed search; greedy best-first search; agenda-driven search; search fringe; Dijkstra's shortest path algorithm
Lecture implementing agenda-driven search for finding best anagrams (Assignment due TBA); physical demo of agenda-driven search for Dijkstra's algorithm; anagrams with replacement
Lecture Basic probability theory; conditional probabilities; Bayes' theorem
Lecture Chinese anagrams; word vs character n-grams; first- and second-order n-grams; Shannon
Lecture hidden Markov models, finite-state models; weighted and stochastic FSAs; parts of speech; generation vs recognition/parsing
Lecture Converting state-based to transition based FSAs for both logical and stochastic FSAs; segmental HMM/SFA/WFSAs; WFST: finite-state translation models
Lecture HMM/SFSA/WFSA decoding, evaluation, learning: unrolling in time; ways to search the lattice; formalization for Viterbi decoding and evaluation [slides]
Lecture Introduction to neural networks and their language biases
Lecture Implementing feedforward neural network n-gram models; model design following scientific method for machine learning/adaptation in practice (Assignment due TBA)
Lecture Search bias; forward algorithm for HMM/SFSA/WFSAs
Lecture Training HMM/SFSA/WFSAs: backward algorithm; completing missing data; expectations; EM (expectation maximization); Baum-Welch parameter estimation
Lecture RNN: recurrent neural networks
Lecture Implementng RNNs; RNN language models; recurrent model design (Assignment due TBA); memoization/caching; HMM alignment models; memory-bounded search; heuristics; admissibility; A* search; memory-bounded heuristic search
Lecture Revisiting knoweldge representation and language bias; propositional logic; conjunctive normal form; AND/OR graphs; AND/OR hypergraphs; forward chaining
Lecture Definite clauses; definite clause grammars; Knuth's algorithm; context-free grammars (CFGs); weighted and stochastic CFGs
Lecture Probabilistic dynamic programming based chart parsing; probabilistic Cocke-Kasami-Younger (CKY) parsing; inside-outside algorithm; parsing by theorem proving; productions as inference rules; backward chaining; abductive (diagnostic or explanatory) inference vs deductive inference
Lecture From CFGs to ITGs (monolingual vs bilingual modeling); how bilingual conditions make grammar induction easier; the mystery of the magic number 4 in semantic frames; simple and full syntax-directed transduction grammars (SDTGs); tree vs matrix constituent alignment visualizations
Lecture Exploring inversion transduction grammars (ITG); ITG characteristics; stochastic ITGs; polynomial-time transduction and learning; resolving the mystery of the magic number 4
Lecture Evaluating translation quality: alignment; aligning semantic frames: Interactive exercise
Lecture Evaluating translation quality: MEANT
Lecture Evaluating translation quality: semantic role labeling (SRL), case frames, semantic frames, predicate-argument structure
Lecture Automatic semantic role labeling (ASRL)
Lecture Implementing a feedforward neural network based part-of-speech tagger Assignment due TBA; context-independent POS tagging
Lecture I/O representations for feedforward networks; context-dependent POS tagging
Tutorial Basic probability theory; conditional probabilities; Bayes' theorem
Lecture Example-based, instance-based, memory-based, case-based, analogy-based, lazy learning for classification; translation via nearest neighbors (NN); k-NN; weighted k-NN
Lecture Learning vs performance components in machine learning; supervised learning; Word sense disambiguation; lexical choice; example-based prediction models; nearest neighbor classifiers; similarity metrics; kNN classifiers
Lecture Exploring different feedforward neural network architectures for POS tagging; model design following scientific method for machine learning in practice Assignment due TBA
Lecture Naive Bayes classifiers for WSD and lexical choice
Lecture Modern approaches to SRL
Lecture Implementing chunkers and shallow parsers via IOBES tagging plus a POS tagger Assignment due TBA
Lecture Chunking via IOBES representations; shallow bracketing
Lecture Shallow syntactic parsing; shallow semantic parsing; language bias of IOBES representations, bags of words, and one-hot representations
Lecture Introduction to word embeddings
Lecture Vector space models; classic word vector approaches
Lecture Learning word embeddings via prediction tasks; skip-grams; word2vec;
Lecture Recursive autoencoders (RAEs) and recursive auto-associative memories (RAAM); learning word embeddings by making RAEs predict
Lecture Context-free grammars (CFGs); generative vs parsing models; top-down vs bottom-up parsing; dynamic programming based chart parsing; Cocke-Kasami-Younger (CKY) parsing
Lecture Recursive autoencoders (RAE) and recursive auto-associative memory (RAAM); TRAAM (transduction RAAM); recursive neural network realizations of ITGs; a self-learning rap battle bot