Pattern Set Mining

Planning

Which papers are discussed at which meeting is listed here. Links to these papers are given under the tab Literature

April 29: Introduction to the course.
May 4: papers 1 - 3. The first two papers created the field of frequent pattern mining, the third was among the first - if not the first - that showed that one does not need transactions to do frequent pattern mining.
May 6: papers 4 and 5. One way to limit the number of patterns you discover is by formulating more exactly what patterns you want to find, i.e., by specifying (extra) constraints. Clearly it would be best to discover all patterns that satisfy the constraints directly rather than by a filtering step. This is called constraint pushing. The first paper for today introduces FP-Growth an algorithm to find all frequent item sets. The second paper shows how (a certain class of) constraints can be pushed into FP-Growth.
May 11: papers 6 and 7. Constraint pushing is one way to avoid filtering, there are alternatives. The first for today gives such an alternative. While the second questions whether pushing constraints is a good idea at all.
May 13: paper 8. Constraints are one way to limit the number of interesting patterns, condensed representations are another one. A condensed representation is a subset of all frequent item sets that allows the reconstruction of the complete set of frequent item sets (including their support!) without looking at the data. Today we encounter the first such representation, viz., closed itemsets.
May 18: paper 9. Non-derivable itemsets. As condensed as it can be.
May 20: papers 10 - 12. The constraints we saw earlier are a way to define what makes a pattern interesting. Another way of solving the pattern explosion problem is by specifiyng what makes a set of patterns interesting. If you know that, all you have to do is search for the most interesting set of patterns. Today we search for informative collections of item sets.
May 25: paper 13. Tiles are another way to define what makes a pattern -- and a set of patterns -- interesting.
May 27: no class because of Ascension day
June 1: papers 14 and 15. Statistics offers different ways to determine whether or not a set of patterns is interesting. Today we look at the first statistical approach, viz., sampling.
June 3: papers 16 and 17. Next to sampling, Statistics offers statistical tests to determine the importance (significance) of results. Todays papers both use tests to find good sets of patterns
June 8: papers 18 and 19. The final approach that Statistics offers for pattern set selection is by modelling. Todays (related) papers use Maximal Entropy and Subjective Interestingness to model.
June 10: Paper 20. Statistics is one principled way to infer knowledge from data. A more computational way is called Algorithmic Information Theory (AIT). Todays paper shows how the Minimum Description Length Principle (MDL), which is an instantiation of AIT, can be used to discover interesting sets of patterns by introducing the Krimp algorithm.