Prerequisites
Participants are expected to be familiar with the basics of probability and statistics, multivariable calculus, and
linear algebra (familiarity with basic vector/matrix notation should be sufficient).
We can recommend the following online materials for self-study:
- MIT course Introduction to Probability. Excellent course covering a lot of material (not just probability, as the name suggest, but also statistical inference).
- Khan Academy course Multivariable Calculus.
Note: these courses cover a lot more material than actually required for the data mining course!
Literature
The required literature consists of the lecture slides (see the schedule), the lecture notes, and selected book chapters and articles. Below, we specify the required literature per subject. If some part is optional additional reading, it is stated explicitely.
Introduction
- A. Feelders, H. Daniels, M. Holsheimer Methodological and Practical Aspects of Data Mining Information & Management 37(5), 2000, pp. 271-281.
Classification Trees, Regression Trees, Bagging and Random Forests
- Lecture Notes on Classification Trees.
- Chapter 8 from An Introduction to Statistical Learning (ISLR).
Look here for video lectures for the book.
Undirected Graphical Models (Markov Random Fields)
Frequent Item Set Mining
Text Mining
- Chapters 4 (Naive Bayes and Sentiment) and 5 (Logistic Regression) from Speech and Language Processing (3rd ed.) by Jurafsky and Martin.
Bayesian Networks
Social Network Mining
- Qing Lu, Lise Getoor: Link-based Classification,
Proceedings of ICML-2003, Washington DC, 2003. - David Liben-Nowell, Jon Kleinberg: The Link Prediction Problem for Social Networks.
- Mohammad Al Hasan, Vineet Chaoji, Saeed Salem, and Mohammed Zaki:
Link Prediction Using Supervised Learning, SDM Workshop on Link Analysis, 2006.