|
¡@
COMP5331: Knowledge Discovery in Databases
Instructor:
Prof. Raymond
Chi-Wing Wong
Office Hours: TBA
Time: Monday and Wednesday (10:30am-11:50am)
Venue: Rm 2504 (LT 25/26)
Area: DB or AI (This course can count towards one of the areas only and
cannot be double counted towards the required credits).
TA:
Tianwen CHEN
Email: tchenaj <AT> connect.ust.hk
Office Hours: TBA
Data mining has emerged as a major frontier field of study in recent years.
Aimed at extracting useful and interesting patterns and knowledge from large
data repositories such as databases and the Web, the field of data mining
integrates techniques from database, statistics and artificial intelligence.
This course will provide a broad overview of the field, preparing the students
with the ability to conduct research in the field.
- Association
- Clustering
- Classification
- Data Warehouse
- Data Mining over Data Streams
- Web Databases
- Papers
- Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei :
Morgan Kaufmann Publishers (3rd edition)
- Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Boston : Pearson Addison Wesley (2006)
- Assignment 30%
- Project 30%
- Final Exam 40%
¡@
NOTE: No late submissions are allowed.
No. |
Topic |
References |
1 |
Overview (ppt) |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition)
Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Boston : Pearson Addison Wesley (2006)
|
2 |
Association (ppt) |
R. Agrawal, R. Srikant, "Fast Algorithms for Mining
Association Rules", VLDB 1994 (pdf) |
3 |
FP-Tree (ppt) |
J. Han, J. Pei, Y. Yin, "Mining Frequent Patterns without
Candidate Generation", SIGMOD 2000 (pdf) |
4 |
Clustering (ppt) |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition)
Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Boston : Pearson Addison Wesley (2006)
|
5 |
Other Clustering Techniques (ppt) |
A. P. Demster, N. M. Laird, D. B. Rubin, "Maximum
Likelihood from Incomplete Data via the EM Algorithm", Journal of the Royal
Statistical Society, Series B, Vol. 39, No. 1, 1977 (pdf)
M. Ester, H.-P. Kriegel, J. Sander, X. Xu, "A Density-based Algorithm for
Discovering Clusters in Large Spatial Databases with Noise", SIGKDD 1996 (pdf)
T. Zhang, R. Ramakrishnan, M. Livny, "BIRCH: An efficient data clustering
method for very large databases", SIGMOD 1996 (pdf) |
6 |
Outlier (ppt) |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition)
M. M. Breunig, H.-P. Kriegel, R. T. Ng, J. Sander, "LOF: Identifying
Density-Based Local Outliers", SIGMOD 2000 (pdf) |
7 |
Subspace Clustering (ppt) |
K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft, "When is
Nearest Neighbor Meaningful?", ICDT 1999 (pdf)
R. Agrawal, J. Gehrke, D. Gunopulos, P Raghavan, "Automatic
Subspace Clustering of High Dimensional Data for Data Mining Applications",
SIGMOD 1998 (pdf)
C.-H. Cheng, A. W.-C. Fu and Y. Zhang, "Entropy-based Subspace Clustering
for Mining Numerical Data", SIGKDD 1999 (pdf)
¡@ |
8 |
Classification (ppt) |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition)
Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar
Boston : Pearson Addison Wesley (2006)
|
9 |
Other Classification Model 1:
Support Vector Machine (ppt)
¡@ |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition) |
10 |
Other Classification Model 2:
Neural Network (ppt)
¡@ |
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian Pei.
Morgan Kaufmann Publishers (3rd edition) |
11 |
Other Classification Model 3:
Recurrent Neural Network (pptx)
¡@ |
S. Hochreiter, J. Schmidhuber. "Long Short-Term Memory", Neural Computation. 9 (8): 1735–1780 (1997) (pdf)
K. Cho, B. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio. "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation", arXiv 2014 (pdf) |
12 |
Data Warehouse (ppt) |
V. Harinarayan, A. Rajaraman, J. Ullman, "Implementing Data
Cubes Efficiently", SIGMOD 1996 (pdf)
¡@ |
13 |
Data Mining over Data Streams (ppt) |
G. S. Manku, R. Motwani, "Approximate Frequency Counts over
Data Streams", VLDB 2002 (pdf)
A. Metwally, D. Agrawal, A. El Abbadi, "Efficient Computation of Frequent
and Top-k Elements in Data Streams", ICDT 2005 (pdf) |
14 |
Other Data Stream Models (ppt) |
P. Domingos and G. Hulten, "Mining High-Speed Data Streams",
SIGKDD 2000 (pdf)
¡@ |
15 |
Web DB (ppt) |
J. M. Kleinberg, "Authoritative Sources in a Hyperlinked
Environment", Journal of the ACM, 46:5, Sept. 1999, pp 604-632 (pdf)
L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank Citation Ranking:
Bringing Order to the Web", Manuscript, 1998 (pdf) |
16 |
Multi-Criteria Decision Making (ppt) |
D. Papadias, Y. Tao, G. Fu, B. Seeger, "Progressive Skyline
Computation in Database Systems", ACM Transactions on Database Systems (TODS),
30(1), 41-82, 2005 (pdf)
¡@ |
17 |
Advanced Topic (ppt) |
-
¡@ |
¡@
- Online Final Exam
Date: 13 Dec, 2019 (Fri) (HK Time)
Time: 4:30pm-6:30pm (HK Time)
Venue: A silent place near to you with good internet access (Original Exam Venue: LT L (CYT Building) )
Submission Site: Canvas ("Courses" --> "COMP5331 (L1) ..." --> "Assignments" --> "Final Exam")
-
Some of you may not access Canvas directly from your normal desktop (e.g., accessing from some countries with firewall). In this case, our university gives the guideline about "Accessing Canvas via a virtual desktop".
Please read the guideline under title "Accessing Canvas via a virtual desktop" of this link.
Details of Online Final Exam:
- Exam Paper Delivery
- The instructor will send an email between 4:25pm and 4:30pm on 13 Dec (Fri) (HK Time)
to all of you.
- This email contains the exam paper (PDF).
- Exam Paper Writing
- You could do this exam paper in the exam period.
- In this exam, we are doing in a "trusted" environment that you should do the exam paper by yourself.
Please do not discuss or communicate with other people when you are doing the exam.
(Honestly, we could not prevent you from communicating with others due to this "online" setting.
Honesty is a kind of attitude that you should have as a univeristy student.)
- You could write on a sheet of paper
or you could type it electronically (in any form).
Note: If you need to write some symbols for some questions,
writing on a sheet of paper is faster.
- The final submission of your exam paper is a PDF file.
If you write on a sheet of paper, you have to scan it (or take a picture)
to generate a PDF file for submission.
If you type it electronically (in some forms like DOC), you should convert
the format to the PDF format.
- If you have any questions (about the exam paper) in this exam period,
please send an email to me. I am ready to reply emails to you.
- In the exam period, you may check emails sent by me (if any)
if there are some clarifications about the questions of the exam paper.
- Exam Paper Submission
- At the time before the exam ending time (i.e., 6:30pm),
please generate a PDF file and submit your PDF file.
We allow 10 minutes buffer for submission
(e.g., you could submit within 10 minutes after the exam ending time)
since you may need to generate the PDF file and the network may be slow.
- The submission site is Canvas.
This Canvas system has a feature to find any plagiarism (if any).
This is the reason why we are using Canvas.
Details of Trial Final Exam:
- Since it may be your first time to deal with this "online" exam and it is your first time
to deal with this Canvas system in our course, I will have a "trial" final exam session
(which is a short version of the real exam) as follows.
Trial Exam Date: 12 Dec, 2019 (Thu) (HK time) (i.e., one day before the real exam)
Trial Exam Starting Time: 4:30pm-5:00pm (HK time)
Venue: A silent place near to you with good internet access
- All procedures in this "trial" final exam are the same as those in the "real" final exam
(but, the duration of this exam is short only).
- This "trial" final exam is for you to "experience" the whole "online" exam format only.
- There are no scores counted for this "trial" final exam.
Thus, it is "optional". However, you are encouraged to experience this so that
in the "real" online exam, you could be quick for the real exam.
- If you have some other tasks (e.g., other exams) to do in this "trial" final exam period, it does not matter.
This is because the submission site will allow submissions after this trial exam ending time
(to facilitate students who will not be free in this trial exam time).
The details of the course project can be found in this
link
¡@
|