Class Schedule and Lecture Notes

(Tentative)

Spring 2006

 

Week of

Lectures (PPT)

 Assignments

Reading Material
WF = Witten and Frank Book;
HK = Han and Kamber Book;

Feb 1, Feb 6

Introduction to the course
Knowing your data: Input and Output
Data Preparation

Chi-squared Test and Principle Component Analysis

Fisher discriminant analysis: an intro

 

HK: Chap 1
WF: Chap 1
Paper: Mining Data, by Miriam Wasserman, Federal Reserve Bank of Boston, Regional Review, Oct 27, 2000.

Feb 13, 20

Introduction to Probability and Information Theory

 Linear Regression Analysis  and Linear Regression Demo

Data Mining Model Evaluation I  and Data Mining Model Evaluation II

Assignment 1 is given out (Due on March 9 2006 in class) 

HK: Chap 2.1, 2.2
HK: Chap 3, (except 3.41, Wavelet Section and Natural Partitioning)

Readings on Chi-Squared Test

Readings on Principal Component Analysis

Paper on 1R by Holte

Additional Notes on Logistic Regression

Additional notes on t-test

March 1


Paper reading on PCA and Model Evaluation:

1. AUC and Measures of Performance (Charles Ling)

2. PCA in Face Recognition

3. PCA in Bioinformatics

3 Papers.ppt
 

 

 

March 1

Bayesian Classification: Bayesian intro, Part 1, and Part 2

Decision Trees (part 1 , part 2)

 

Pedro Domingo's paper:

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss, Domingos & Pazzani. Machine Learning, 29, 103-130, 1997.

March 6

 

cost sensitive decision trees

Cost sensitive Naive Bayes

 

Cost sensitive trees: ICML 2004 paper

Cost-sensitive Naive Bayes: ICDM 2004 paper

March 13

Association Analysis (1) and (2)

 

 Assignment 2 is given out on March 14 (Due on March 28)

FP Growth Paper

March 20

Support Vector Machines:

http://www.cse.msu.edu/~cse802/Papers/802_SVM.ppt

(Other tutorials:  Tutorial Slides and  Tutorial Notes)

 

 

 

Bioinformatics Talk by Prof. S. Y. Kung (March 23, 2-3 pm) in Room 3301 (via lift nos. 17/18)

 

 

March 27

Clustering 

K-modes algorithm for clustering

K-Medoids Method and Improvements

EM Algorithm (Slides and A Note from Prof. Zhai, Chengxiang's UIUC class)

 

 

Fisher's 1987 paper and a paper on Category Utility

Density-based Clustering Paper

A Gentle Tutorial on EM and HMM

Probabilistic User Behavior Models by Eren Monovaglu, et al. ICDM2003

 

Apr 3 Web and Text Mining  

1.      J. Kleinberg: Authoritative sources in a hyperlinked environment Journal of the ACM 46(1999).

2.      A related PPT presentation: [PPT] A Close Look at HITS

3.      Google's PageRank Algorithm

 

April 3 CRM Assignment 3 is given out on Apr 4 (Due on April 25)
  1.      S. H. S. Chee, J. Han, and K. Wang, '' RecTree: An Efficient Collaborative Filtering Method

  2. M. Rihcardson and P. Domongos Mining Knowledge-Sharing Sites for Viral Marketing

  3.  Ke Wang and Ming-Yen Su "Item selection by "hub-authority" profit ranking" SIGKDD 2002, Edmonton, Slides

April 10

Topic Selection For the Course Project

 Boosting, Bagging and Ensemble learning

 

A Short Introduction to Boosting

April 20

Semi-supervised Learning, Co training: a tutorial, Applications on Web

 

 A survey on semi-supervised learning

April 24

Data Mining in Data Streams: A tutorial on mining data streams, and a PPT tutorial slide from HKU on CVFDT Algorithm

 

 Mining Time Changing Data Streams, Hulten et al. ACM KDD2001

May 1

May 1st is a holiday, and the tentative date for Midterm Exam: Thursday, May 4th (an example midterm exam from a similar course)

 

 

May 8, 15

(student presentations: 25 min each: 20 min for presentation, 5 min for questions)

 

 

 

Other References:

 

  1. CBC- Clustering Based Text Classification Requiring Minimal (Microsoft Research Beijing)
  2. Bing Liu, Building Text Classifiers Using Positive and Unlabeled Examples
  3. Microsoft Working Paper on Mining Web Logs
  4. iMAP: Discovering Complex Semantic Mappings between Database Schemas, Robin Dhamankar,et al. Sigmod 2004. 
  5. F. Li and Y. Yang. A loss function analysis for classification methods in text categorization (ps.gz) The Twentith International Conference on Machine Learning (ICML'03), pp472-479, 2003.
  6. Boosting Lazy Decision Trees
  7. C.X. Ling and C. Li. Data Mining for Direct Marketing - Specific Problems and Solutions. Proceedings of Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 73 - 79. 1998. (PS file)
     
  8. Ke Wang, Senqiang Zhou, Jack Man Shun Yeung, Qiang Yang, Mining Customer Value: From Association Rules to Direct Marketing, In The Proceedings of the IEEE International Conference on Data Engineering (ICDE 03), March 2003, Bangalore, India.  Pages 738--740.
  9. Qiang Yang, Jie Yin, Charles Ling and Tielin Chen, Post-processing Decision Trees to Extract Actionable Knowledge  Proceedings of the 2003 IEEE International Conference on Data Mining (ICDM 2003)Florida, USA, November 2003.  IEEE Computer Society.