Course Code: MSC-BDT5002/MSC-IT 5210, Fall 2017
Course Title: Knowledge Discovery and Data Mining

L1: Thu 7:30 - 10:20pm, Room 4619

L2: Thu 3:00 -5:50pm, Room 2306

Instructor:

Lei Chen (send e-mail for questions regarding the class and for arranging individual meetings)

TAs:

Assignment Hand-in by Email: mscbdt5002fall17@gmail.com (for BDT students)

                                                     mscit5210fall17@gmail.com (for IT students)

 

 

Assignments

All the assignments are individual assignments.

Assignment 1 (PDF) (due Oct 6th, 11:59pm, 2017)  Supplementary Instruction for Assignment 1

Assignment 2 (PDF) (due Nov. 3rd, 11:59pm, 2017)

Assignment 3 (PDF) (due Dec 1st, 11:59pm , 2017)

 

Midterm Exam: Oct 12th, 2017, 7:30pm-10:30pm Venue: TBD

 

Final  

 

Course Description

Data mining has emerged as a major frontier field of study in recent yearsAimed at extracting useful and interesting patterns and knowledge from large data repositories such as databases and the Web, data mining has successfully integrated techniques from the fields of database, statistics and AI. This course will provide a broad overview of the field, preparing the students with the ability to conduct research and development in the field

Topics:

·         The Data Mining Process

·         Preprocessing and Model Evaluation

·         Classification

·         Clustering

·         Association Studies

·         Text and Web Mining

·         Social Networks

·         Other Applications

Marking Scheme

The course work includes assignment, a midterm exam and a final exam.  The marking scheme is as follows.

Course Material:

Textbooks

1.    Data Mining -- Concepts and Techniques by Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers.

2.   Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson International Edition, 2005.

3.    Data Mining.  by Ian Witten and Ebe Frank. (Google books)
 

News:

Welcome to MSC-BD5002/IT5210

Midterm Exam: Oct 12th, 2017, 7:30pm-10:20pm

Tentative Schedule

 

Date

Lecture Slides

Text book

Video

Thur, Sept 7th

Introduction to Courses and Data Mining  (PPT, PDF), Preparing Data (PPT, PDF)

HK, Chapter 1,2


http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170907_86354

Thur, Sept 14th

Data Preprocessing (PDF)

HK, Chapter 3

TSk, Chapter 2


http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170914_60635

Thur, Sept 21st

Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods (PPT, PDF) (Example of FP-tree)

TSK, Chapter 6

HK, Chapter 6

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170921_21994

Thur, Sept.

28th

Advanced Frequent Pattern Mining (PPT, PDF)

Frequent Pattern Mining over Stream (PPT, PDF)

TSK, Chapter 7

HK, Chapter 7 

http://rvc.ust.hk/mgmt/media.aspx?path=17FA_CSIT5210-L1_170928_95021

 

 

Thur, Oct 12nd

Midterm Exam

 

 

Thur, Oct 19th

Classification: Basic Concepts (PPT, PDF) Ensemble Methods (PPT, PDF)

HK, Chapter 8

TSK, Chapter 4

 

Thur, Oct 26th

 

Classification: Advanced Methods (PPT, PDF)

HK, Chapter 9

TKS, Chapter 5

 

Thur, Nov 2nd

Cluster Analysis: Basic Concepts and Methods (PPT, PDF)

HK, Chapter 10

 

Thur, Nov 9th

Cluster Analysis: Advanced Methods (PPT, PDFi)

HK, Chapter 11

K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is Nearest Neighbor Meaningful?", ICDT 1999 (PDF)

 

R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications", SIGMOD 1998 (PDF)

 

C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering for Mining Numerical Data", SIGKDD 1999 (PDF)

 

Thur, Nov 16th

Outlier Analysis  (PPT, PDF), LOF Example (PDF)

Midterm Review (PPT)

TSK, Chapter 10

HK, Chapter 12

 

Thur, Nov 23rd

Data Cube and OLAP (PPT, PDF)

HK, Chapter 4, and 5

 

Thur, Nov

30th

Social Networks (PPT), Soical Network Privacy (PPT, PDF)

 

 

Web Data Mining (PPT, PDF)

 

Final Review(PPT, PDF)