Course Code: MSC-BDT5002/MSC-IT 5210, Fall 2017
Course Title: Knowledge Discovery and Data Mining

L1: Thu 7:30 - 10:20pm, Room 4619

L2: Thu 3:00 -5:50pm, Room 2306

Instructor:

Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)

TAs:

Assignment Hand-in by Email: mscbdt5002fall17@gmail.com (for BDT students)

                                                     mscit5210fall17@gmail.com (for IT students)

 

 

Assignments

All the assignments are individual assignments.

Assignment 1 (PDF) (due Oct 6th, 11:59pm, 2017)  Supplementary Instruction for Assignment 1

Assignment 2 (PDF) (due Nov.13th, 11:59pm, 2017) Supplementary Instruction for Assignment 2

Assignment 3 (PDF) (due Dec 1st, 11:59pm , 2017) Data Set

 

Midterm Exam: Oct 12th, 2017, 7:30pm-10:30pm

 

Final  

 

Course Description

Data mining has emerged as a major frontier field of study in recent yearsAimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, data mining has successfully integrated techniques from the fields of database, statistics and AI. This course will provide a broad overview of the field, preparing the students with the ability to conduct research and development in the field

Topics:

·         The Data Mining Process

·         Preprocessing and Model Evaluation

·         Classification

·         Clustering

·         Association Studies

·         Text and Web Mining

·         Social Networks

·         Other Applications

Marking Scheme

The course work includes assignment, a midterm exam and a final exam.  The marking scheme is as follows.

Course Material:

Textbooks

1.    Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers.

2.   Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.

3.    Data Mining.  by Ian Witten and Ebe Frank. (Google books)
 

News:

Welcome to MSC-BD5002/IT5210

Midterm Exam: Oct 12th, 2017, 7:30pm-10:20pm

Midterm Exam Scores:  MSC-IT

                                          MSC-BDT

Midterm Exam Sample Answer

Assignment 1 Scores: MSC-IT

                                     MSC-BDT

 

Assignment 1 Sample Answer

 

Assignment 2 Scores: MSC-IT

                                     MSC-BDT

Assignment 2 Sample Answer

 

Final Exam Sample Answer

Final Exam Score:  MSC-IT

                                  MSC-BDT

 

Nov 2nd lecture will be given from 7:30pm-10:30pm in LTA

Final Exam, Dec 7th, 7 :30-10 :30pm, LTB (MSCBDT) LTC (MSCIT)

Tentative Schedule

 

Date

Lecture Slides

Text book

Video

Thur, Sept 7th

Introduction to Courses and Data Mining  (PPT, PDF), Preparing Data (PPT, PDF)

HK, Chapter 1,2


http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170907_86354

Thur, Sept 14th

Data Preprocessing (PDF)

HK, Chapter 3

TSk, Chapter 2


http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170914_60635

Thur, Sept 21st

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF) (Example of FP-tree)

TSK, Chapter 6

HK, Chapter 6

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170921_21994

Thur, Sept.

28th

Advanced Frequent Pattern Mining (PPT, PDF)

Frequent Pattern Mining over Stream (PPT, PDF)

TSK, Chapter 7

HK, Chapter 7 

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170928_95021

 

 

Thur, Oct 12nd

Midterm Exam

 

 

Thur, Oct 19th

Classification: Basic Concepts (PPT, PDF) Ensemble Methods (PPT, PDF)

HK, Chapter 8

TSK, Chapter 4

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171019_34474

Thur, Oct 26th

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 10

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171026_72223

Thur, Nov 2nd

Cluster Analysis: Advanced Methods (PPT, PDF)

HK, Chapter 11

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is Nearest Neighbor Meaningful?", ICDT 1999 (PDF)

 

R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", SIGMOD 1998

(PDF)

 

C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering for Mining Numerical Data", SIGKDD 1999 (PDF)

https://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171102_43373

Thur, Nov 9th

Outlier Analysis  (PPT, PDF), LOF Example (PDF)

TSK, Chapter 10

HK, Chapter 12

https://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171109_40757

Thur, Nov 16th

Data Cube and OLAP (PPT, PDF)

HK, Chapter 4, and 5

https://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171116_92333

Thur, Nov 23rd

Social Networks (PPT),

Collaborative Filter (PDF)

https://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171123_48902

Thur, Nov 30th

Web Data Mining (PPT, PDF)

 

Final Review(PPT, PDF)

 


https://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_171130_11305