Topic Selection For COMP 537 Spring 2004

Requirement:

1. One or two people per group, but not three.

2. Must involve experiments on selected benchmark datasets

3. Topics can be either a re-implementation of a recent paper, or a new idea. Approval required.

4. Weeks of May 3 and 10: Presenting your ideas using PowerPoint, similar to a conference presentation (20 min for presentation, 10 min for questioning)

5. 1 Page Proposal due on April 13th, and final term paper due on May 13th

6: How to search for topics:

consult the citeseer at: http://citeseer.ist.psu.edu/cis
Read ACM KDD Proceedings at: http://portal.acm.org/browse_dl.cfm?linked=1&part=series&idx=SERIES939&coll=portal&dl=ACM
Find Papers on ICDM Websites, DMKD Journal Homepages, Electronic Journals at Library

7. After you have an idea: Talk to instructor in person, schedule weekly meetings with the instructor and group member.

8. Each student must choose a paper, as part of the related work, within the selected topics to present during the week of April 19 and 26.

Marks Assignment

1. Whole project=60% of course marks

2. Proposal: 5%

3. Individual paper presentation: 5%

4. Final Presentation: 10%

5. Term Paper: 40%

Potential Topics

Continue the research on cost-sensitive learning when there are missing values and tests have costs. Consider different situations, for example, when all tests must be done at once, or when the tests cannot exceed a certain cost upper limit. Consider applying cost-sensitive learning to different classifiers. Consider using cost-sensitive learning for experimental design for, for example, bioinformatics. Consider combining solutions for predicting missing values in tests and doing tests. Explore the use of MDP to do cost-sensitive learning. Test on UCI datasets. See the paper I assigned you.
Explore using different measures for machine learning, such as using AUC and/or loss functions to design new algorithms. Test on UCI datasets or information retrieval benchmarks. (see Flach's ICML 02 paper.)
Co-learning: apply reinforcement learning idea and matrix algebra to make learning perform better, when dealing with multiple datasets that are linked. Examples are web pages and query logs. Other examples are UCI data subsets that are inter-connected. In reinforcement clustering, apply reinforcement learning to make the performance better. Explore how to learn a better classifier using co-learning. Explore the idea on web-log mining (clustering and labeling). (See http://www.cs.uic.edu/~liub/NSF/New-PSC.html, Zen et a.: CBC: Clustering Based Text Classification Requiring Minimal Labeled Data. ICDM 2003: 443-450, and Wang et al: ReCoM: reinforcement clustering of multi-type interrelated data objects. SIGIR 2003: 274-281)
Post-processing of classifiers to extract useful rules. While a classifier tells you what things are, useful rules tell you how to change from one thing to another, under certain constraints. For example, how to change a student from grade B to grade A, based on a classifier. (See our ICDM 03 Paper)
Handling Missing Values (see paper on missing value estimation methods for DNA microarrays)