Topic Selection For COMP 537 Spring 2004
Requirement:
1. One or two people per group, but not three.
2. Must involve experiments on selected benchmark datasets
3. Topics can be either a re-implementation of a recent
paper, or a new idea. Approval required.
4. Weeks of May 3 and 10: Presenting your ideas using
PowerPoint, similar to a conference presentation (20 min for presentation, 10
min for questioning)
5. 1 Page Proposal due on April 13th, and final
term paper due on May 13th
6: How to search for topics:
7. After you have an idea: Talk to instructor in person,
schedule weekly meetings with the instructor and group member.
8. Each student must choose a paper, as part of the
related work, within the selected
topics to present during the week of April 19 and 26.
Marks Assignment
1. Whole project=60% of course marks
2. Proposal: 5%
3. Individual paper presentation: 5%
4. Final Presentation: 10%
5. Term Paper: 40%
Potential Topics
- Continue the research on cost-sensitive learning when there are missing
values and tests have costs. Consider different situations, for example,
when all tests must be done at once, or when the tests cannot exceed a certain
cost upper limit. Consider applying cost-sensitive learning to different
classifiers. Consider using cost-sensitive learning for experimental
design for, for example, bioinformatics. Consider combining solutions
for predicting missing values in tests and doing tests. Explore the use of MDP
to do cost-sensitive learning. Test on UCI datasets. See the paper
I assigned you.
- Explore using different measures for machine learning, such as using AUC
and/or loss functions to design new algorithms. Test on UCI datasets or
information retrieval benchmarks. (see
Flach's ICML 02 paper.)
- Co-learning: apply reinforcement learning idea and matrix algebra to make
learning perform better, when dealing with multiple datasets that are linked.
Examples are web pages and query logs. Other examples are UCI data
subsets that are inter-connected. In reinforcement clustering, apply
reinforcement learning to make the performance better. Explore how to
learn a better classifier using co-learning. Explore the idea on web-log
mining (clustering and labeling). (See
http://www.cs.uic.edu/~liub/NSF/New-PSC.html, Zen et a.: CBC: Clustering
Based Text Classification Requiring Minimal Labeled Data.
ICDM 2003: 443-450, and Wang et al: ReCoM: reinforcement clustering of
multi-type interrelated data objects.
SIGIR 2003: 274-281)
- Post-processing of classifiers to extract useful rules. While a
classifier tells you what things are, useful rules tell you how to change from
one thing to another, under certain constraints. For example, how to
change a student from grade B to grade A, based on a classifier. (See our
ICDM 03 Paper)
- Handling Missing Values (see paper on
missing value estimation methods for DNA microarrays)