The Course

One of the characteristic problems of Big Data is that the volume of data requires sub-linear algorithms to process it. Sub-linear often simply means that you sample the data. But how do you sample the data if you don't know the distribution? PAC learning is an answer to that problem. The problem does not only depend on the data distribution, but also on the data mining problem you want to solve. To a large extend this course aims to solve the problem how to sample for frequent item set mining. Other topics are addressed because they illustrate the power of the PAC learning framework or because they have a direct bearing on the frequent item set mining problem.

This is a tough course, we discuss many theorems and their proofs. Handling Big Data in a sound way requires firm foundations. However, note that it is about understanding -- that is being able to explain -- this formal framework rather than being able to reproduce it. See the exam here to better understand what that means

To understand the material it is important you ask questions whenever you are in doubt

  • Firstly: feel free to ask questions when I'm explaining. Because I'm sharing my screen with you during classes all I see are the slides, i.e., I don't see raised hands or comments in the chat. Hence, it is perfectly fine to switch one your microphone and ask whatever you want to ask whenever you have a question, I don't mind if you interupt me. Remember not asking a question you have is infinitely more stupid than any question you may have.
  • Secondly: there is an (almost) weekly Q&A session, see here. In normal times these were exercise classes. Now they have a double purpose. Firstly you can ask questions about the lectures -- to make it easy I'll put a shared document up for each Q&A session in which you can (anonymously) type in your questions. Secondly, the usual exercises are also posted and you can, of course, also ask any question you have about those exercises.
  • Note that the Q&A sessions will continue when the lectures are done and you are preparing your essay. So any questions that come up while writting will still be answered.

As long as the course is online, we could record the lectures. There is a legal issue, however. The GDPR implies that I need your permision to record you when you ask a question. However, because I'm the one who grades you, I cannot ask for your permission since that could be construed as coercion. So, only if you can convince me that all students unanimously, without coercion or duress, want the lectures to be recorded the lectures will be recorded.

Note that if the lectures are recorded, the GDPR prohibits you to repost the recordings anywhere else than on the Teams site.

The Q&A sessions will not be recorded. All students should feel free to ask whatever question.