COMP5221 Natural Language Processing, Spring 2024, HKUST

Dekai Wu dekai@cs.ust.hk | http://www.cs.ust.hk/~dekai/bio.html

20240122

Course organization

Logistics

Announcements

All lectures and tutorials will be held ONLINE LIVE INTERACTIVELY at the regularly scheduled times.

You can find the recurring Zoom meetings for the lectures and tutorials in Canvas. You are highly recommended to join the meetings from there. Note that these Zoom meetings only admit authenticated users with ITSC accounts (with domain connect.ust.hk or ust.hk. You can only join the meetings via either of the two paths above.You must register for the lectures and tutorials at the following links. After registering, you will receive a confirmation email containing information about joining the meeting.

After you are registered, you may use the following links to join the lectures and tutorials:

https://hkust.zoom.us/j/96172418612 (lecture)
https://hkust.zoom.us/j/91379228113 (tutorial)

If you haven’t done so, please watch this video to get your HKUST Zoom account ready as soon as possible, not just for this course but also for all other courses at HKUST:

https://rvc.ust.hk/mgmt/media.aspx?path=V20244_MTPC
https://rvc.usthk.cn/download/auth-download.php?V20244_MTPC.mp4 (access in Mainland China)

Times and places

Lecture 1: WF 13:30-14:50, Rm 6591 (Lift 31-32).

Office hours: W 15:00-16:00. The TA's office hours are posted at http://course.cs.ust.hk/comp5221/ta/.

Sites

Course: http://www.cs.ust.hk/~dekai/5221/ is the master home page for the course.

Tutorial: http://course.cs.ust.hk/comp5221/ta/ contains all information for the tutorials.

Forum: http://comp151.cse.ust.hk/~dekai/content/?q=forum/3. is where all discussion outside class should be done. Always read before asking/posting/emailing your question. Note that you must register for your account at the first lecture, tutorial, or lab.

Description

Abbreviated course catalog description

COMP 5221. Language modeling from basics to LLMs. Techniques for parsing, interpretation, context modeling, generation. How neural and statistical approaches interact with linguistic constraints. Applications include machine translation, dialogue chatbots, cognitive modeling, and knowledge acquisition

Course description

Human language technology for processing text and spoken language. Fundamental machine learning, syntactic parsing, semantic interpretation, and context models, algorithms, and techniques. Applications include machine translation, web technologies, text mining, knowledge management, cognitive modeling, intelligent dialog systems, and computational linguistics.

Learning objectives

At the end of the Natural Language Processing course, you will have achieved the following outcomes.

General
1. Possess solid understanding of the fundamental concepts of natural language processing
2. Possess solid understanding of the fundamental concepts of language modeling, interpretation, and translation, and grasp how it stress tests all aspects of human intelligence and language processing
Transduction
1. Know foundational input-output formulations of transduction, such as alignment, chunking, classification, dependency relations, and parsing
2. Understand the relationship between noisy channel and loglinear models of string transduction, and their Bayesian interpretations
Syntax
1. Understand the relationship between word segmentation and phrasal lexicons, the relationship to transduction and alignment, and associated algorithms
2. Understand the relationship between traditional grammatical formalisms versus stochastic and weighted grammars
3. Understand the strengths and weaknesses of part-of-speech models, and associated tagging algorithms
4. Understand the various fundamental approaches to parsing, and how they deal with syntactic ambiguity
Alignment
1. Understand how bilingual models of syntax generalize upon monolingual models to improve learnability
2. Understand the combinatorial and empirical trade-offs between various learning models of alignment and compositionality, and their associated algorithms
3. Understand the core methods for inducing lexicons, translation lexicons, phrasal translation lexicons, as well as permutation and reordering models
Decoding
1. Understand the combinatorial and empirical trade-offs between various runtime models for translation, and their associated algorithms
2. Understand how bilingual transduction models generalize upon monolingual parsing models
Semantics
1. Understand lexical semantics models for word sense disambiguation, their relationship to phrasal lexicons and transduction, and associated ambiguity resolution algorithms
2. Understand lexical semantics models for semantic frames (predicate-argument structures), and associated semantic role labeling algorithms

Textbooks

Introduction to Text Alignment: Statistical Machine Translation Models from Bitexts to Bigrammars (forthcoming), by Dekai WU.
Artificial Intelligence: A Modern Approach (3rd Edition), by Stuart RUSSELL and Peter NORVIG. Pearson, 2015. ISBN-13: 978-9332543515.
Structure and Interpretation of Computer Programs (2nd edition), by Harold ABELSON and Gerald Jay SUSSMAN, with Julie SUSSMAN. MIT Press, 1984. ISBN-10: 0-262-01077-1. Full text and code are available online at no cost for the Scheme book (Structure and Interpretation of Computer Programs) at http://mitpress.mit.edu/sicp/.

Policies

Honor policy

To receive a passing grade, you are required to sign an honor statement acknowledging that you understand and will uphold all policies on plagiarism and collaboration.

Plagiarism

All materials submitted for grading must be your own work. You are advised against being involved in any form of copying (either copying other people's work or allowing others to copy yours). If you are found to be involved in an incident of plagiarism, you will receive a failing grade for the course and the incident will be reported for appropriate disciplinary actions.

University policy requires that students who cheat more than once be expelled. Please review the cheating topic from your UST Student Guide.

Warning: sophisticated plagiarism detection systems are in operation!

Collaboration

You are encouraged to collaborate in study groups. However, you must write up solutions on your own. You must also acknowledge your collaborators in the write-up for each problem, whether or not they are classmates. Other cases will be dealt with as plagiarism.

Grading

Course grading will be adjusted to the difficulty of assignments and exams. Moreover, I guarantee you the following.

Grade guarantees
If you achieve	85%	you will receive at least a	A	grade.
	75%		B
	65%		C
	55%		D

Your grade will be determined by a combination of factors:

Grade weighting
Exams	0% (due to university coronovirus meaures)
Pop quizzes	~10%
Class participation	~15%
Forum participation	~10%
Assignments	~65%

Examinations

No reading material is allowed during the examinations. No make-ups will be given unless prior approval is granted by the instructor, or you are in unfavorable medical condition with physician's documentation on the day of the examination. In addition, being absent at the final examination results in automatic failure of the course according to university regulations, unless prior approval is obtained from the department head.

Participation

Science and engineering (including software engineering!) is about communication between people. Good participation in class will count for approximately 15%, and good participation in the online forum will count for approximately 10%.

Assignments

All assignments must be submitted by 23:00 on the due date. Assignments will be collected electronically using the automated CASS assignment collection system. Late assignments cannot be accepted. Sorry, in the interest of fairness, exceptions cannot be made.

Scheme programming assignments must run under Chicken Scheme on Linux.

Programming assignments will account for a total of approximately 65%.

Required readings

Any linked material (unless labeled "Supplementary references") is required reading that you are responsible for.

Syllabus

Topics will be recorded below.

Topics

Topics
date	wk	event	topic
20240130	1	Lecture	Welcome; introduction; survey; administrivia (honor statement, HKUST classroom conduct)