Course Code: MSC-BDT5002/MSC-IT 5210, Fall 2017
Course Title: Knowledge Discovery and Data Mining

L1: Thu 7:30 - 10:20pm, Room 4619

L2: Thu 3:00 -5:50pm, Room 2306


Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)


Assignment Hand-in by Email: (for BDT students)

                                            (for IT students)




All the assignments are individual assignments.

Assignment 1 (PDF) (due Oct 6th, 11:59pm, 2017)  Supplementary Instruction for Assignment 1

Assignment 2 (PDF) (due Nov.13th, 11:59pm, 2017) Supplementary Instruction for Assignment 2

Assignment 3 (PDF) (due Dec 1st, 11:59pm , 2017) Data Set


Midterm Exam: Oct 12th, 2017, 7:30pm-10:30pm




Course Description

Data mining has emerged as a major frontier field of study in recent yearsAimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, data mining has successfully integrated techniques from the fields of database, statistics and AI. This course will provide a broad overview of the field, preparing the students with the ability to conduct research and development in the field


·         The Data Mining Process

·         Preprocessing and Model Evaluation

·         Classification

·         Clustering

·         Association Studies

·         Text and Web Mining

·         Social Networks

·         Other Applications

Marking Scheme

The course work includes assignment, a midterm exam and a final exam.  The marking scheme is as follows.

Course Material:


1.    Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers.

2.   Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.

3.    Data Mining.  by Ian Witten and Ebe Frank. (Google books)


Welcome to MSC-BD5002/IT5210

Midterm Exam: Oct 12th, 2017, 7:30pm-10:20pm

Midterm Exam Scores:  MSC-IT


Midterm Exam Sample Answer

Assignment 1 Scores: MSC-IT



Assignment 1 Sample Answer


Assignment 2 Scores: MSC-IT


Assignment 2 Sample Answer


Final Exam Sample Answer

Final Exam Score:  MSC-IT



Nov 2nd lecture will be given from 7:30pm-10:30pm in LTA

Final Exam, Dec 7th, 7 :30-10 :30pm, LTB (MSCBDT) LTC (MSCIT)

Tentative Schedule



Lecture Slides

Text book


Thur, Sept 7th

Introduction to Courses and Data Mining  (PPT, PDF), Preparing Data (PPT, PDF)

HK, Chapter 1,2

Thur, Sept 14th

Data Preprocessing (PDF)

HK, Chapter 3

TSk, Chapter 2

Thur, Sept 21st

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF) (Example of FP-tree)

TSK, Chapter 6

HK, Chapter 6

Thur, Sept.


Advanced Frequent Pattern Mining (PPT, PDF)

Frequent Pattern Mining over Stream (PPT, PDF)

TSK, Chapter 7

HK, Chapter 7



Thur, Oct 12nd

Midterm Exam



Thur, Oct 19th

Classification: Basic Concepts (PPT, PDF) Ensemble Methods (PPT, PDF)

HK, Chapter 8

TSK, Chapter 4

Thur, Oct 26th

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 10

Thur, Nov 2nd

Cluster Analysis: Advanced Methods (PPT, PDF)

HK, Chapter 11

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is Nearest Neighbor Meaningful?", ICDT 1999 (PDF)


R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", SIGMOD 1998



C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering for Mining Numerical Data", SIGKDD 1999 (PDF)

Thur, Nov 9th

Outlier Analysis  (PPT, PDF), LOF Example (PDF)

TSK, Chapter 10

HK, Chapter 12

Thur, Nov 16th

Data Cube and OLAP (PPT, PDF)

HK, Chapter 4, and 5

Thur, Nov 23rd

Social Networks (PPT),

Collaborative Filter (PDF)

Thur, Nov 30th

Web Data Mining (PPT, PDF)


Final Review(PPT, PDF)