Course Code: MSC-BDT5002, Spring 2020
Course Title: Knowledge Discovery and Data Mining

Lecture: Mon 7:30 - 10:20pm G010, CYT Bld


Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)



Assignment Hand-in by Email:

(all the assignments are individual ones)



Assignment 1 (Due Date : March 15th, 2020 11:59pm)

Assignment 2 (Due Date : April 7th, 2020 11:59pm)

Assignment 3 (Due Date : April 26th, 2020 11:59pm) Data Set

Assignment 4 (Due Data: May 13th, 2020 11:59pm)


Midterm Exam: March 30th, 2020, 7:30pm-10:30pm



Project (Due Date: May 22nd, 11:59pm, 2020 )




Project Group




Take home exam, period (May 29th June 1st, the exam paper will be released 8:00pm on May 29th, due time is 8:00am on June 1st )


Course Description

Data mining has emerged as a major frontier field of study in recent yearsAimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, data mining has successfully integrated techniques from the fields of database, statistics and AI. This course will provide a broad overview of the field, preparing the students with the ability to conduct research and development in the field


      The Data Mining Process

      Preprocessing and Model Evaluation



      Association Studies

      Text and Web Mining

      Social Networks

      Other Applications

Marking Scheme

The course work includes assignment, a midterm exam and a final exam.  The marking scheme is as follows.

      Assignments -- 20%

      Project ---- 20%

      Exams -- 60%

o   Midterm 20%

o   Final :40%

Course Material:


1.    Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers.

2.   Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.

3.    Data Mining.  by Ian Witten and Ebe Frank. (Google books)


Welcome to MSC-BD5002

Midterm Exam: March 30th, 2018, 7:30pm-10:20pm



Lecture Slides

Text book



Feb 24th

Introduction to Courses and Data Mining  (PPT, PDF), Preparing Data (PPT, PDF) Data Preprocessing (PDF)

HK, Chapter 1,2

Mon, March 2nd

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF) (Example of FP-tree)

HK, Chapter 6

TSk, Chapter 2


Mon, March 9th

Advanced Frequent Pattern Mining (PPT, PDF)

Frequent Pattern Mining over Stream (PPT, PDF)

HK, Chapter 7

TSK, Chapter 3



March 16th

Classification: Basic Concepts (PPT, PDF)

TSK, Chapter 8

HK, Chapter 4 




Ensemble Methods (PPT, PDF)

TSK, Chapter 8

HK, Chapter 4 


Mon, March 30th

Midterm Exam




April 6th

Classification: Advanced Methods (PPT, PDF)

NN-intro (PDF, thanks for the slides of Brian Thompson)

NN Example (PPT, PDF)


HK, Chapter 9

TSK, Chapter 4

HK, Chapter 10


Tue, April 20th

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 11

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is Nearest Neighbor Meaningful?", ICDT 1999 (PDF)


R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", SIGMOD 1998



C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering for Mining Numerical Data", SIGKDD 1999 (PDF)

Mon, April 27th

Cluster Analysis: Advanced Methods (PPT, PDF)

Mon, May 4th

Outlier Analysis  (PPT, PDF), LOF Example (PDF

TSK, Chapter 10

HK, Chapter 12

Mon, May 11th

Social Networks (PPT),

HK, Chapter 4, and 5

Mon, May 18th

Webdata Mining(PPT, PDF)

Graph Neural Networks