Course Code: MSC-BDT5002/MSC-IT 5210, Fall 2017
Course Title: Knowledge Discovery and Data Mining

L1: Thu 7:30 - 10:20pm, Room 4619

L2: Thu 3:00 -5:50pm, Room 2306


Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)


Assignment Hand-in by Email: (for BDT students)

                                            (for IT students)




All the assignments are individual assignments.

Assignment 1 (PDF) (due Oct 6th, 11:59pm, 2017)  Supplementary Instruction for Assignment 1

Assignment 2 (PDF) (due Nov. 3rd, 11:59pm, 2017)

Assignment 3 (PDF) (due Dec 1st, 11:59pm , 2017)


Midterm Exam: Oct 12th, 2017, 7:30pm-10:30pm Venue: TBD




Course Description

Data mining has emerged as a major frontier field of study in recent yearsAimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, data mining has successfully integrated techniques from the fields of database, statistics and AI. This course will provide a broad overview of the field, preparing the students with the ability to conduct research and development in the field


·         The Data Mining Process

·         Preprocessing and Model Evaluation

·         Classification

·         Clustering

·         Association Studies

·         Text and Web Mining

·         Social Networks

·         Other Applications

Marking Scheme

The course work includes assignment, a midterm exam and a final exam.  The marking scheme is as follows.

Course Material:


1.    Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers.

2.   Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.

3.    Data Mining.  by Ian Witten and Ebe Frank. (Google books)


Welcome to MSC-BD5002/IT5210

Midterm Exam: Oct 12th, 2017, 7:30pm-10:20pm

Tentative Schedule



Lecture Slides

Text book


Thur, Sept 7th

Introduction to Courses and Data Mining  (PPT, PDF), Preparing Data (PPT, PDF)

HK, Chapter 1,2

Thur, Sept 14th

Data Preprocessing (PDF)

HK, Chapter 3

TSk, Chapter 2

Thur, Sept 21st

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF) (Example of FP-tree)

TSK, Chapter 6

HK, Chapter 6

Thur, Sept.


Advanced Frequent Pattern Mining (PPT, PDF)

Frequent Pattern Mining over Stream (PPT, PDF)

TSK, Chapter 7

HK, Chapter 7



Thur, Oct 12nd

Midterm Exam



Thur, Oct 19th

Classification: Basic Concepts (PPT, PDF) Ensemble Methods (PPT, PDF)

HK, Chapter 8

TSK, Chapter 4


Thur, Oct 26th


Classification: Advanced Methods (PPT, PDF)

HK, Chapter 9

TKS, Chapter 5


Thur, Nov 2nd

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 10


Thur, Nov 9th

Cluster Analysis: Advanced Methods (PPT, PDFi)

HK, Chapter 11

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is Nearest Neighbor Meaningful?", ICDT 1999 (PDF)


R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", SIGMOD 1998 (PDF)


C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering for Mining Numerical Data", SIGKDD 1999 (PDF)


Thur, Nov 16th

Outlier Analysis  (PPT, PDF), LOF Example (PDF)

Midterm Review (PPT)

TSK, Chapter 10

HK, Chapter 12


Thur, Nov 23rd

Data Cube and OLAP (PPT, PDF)

HK, Chapter 4, and 5


Thur, Nov


Social Networks (PPT), Soical Network Privacy (PPT, PDF)



Web Data Mining (PPT, PDF)


Final Review(PPT, PDF)