Practical Assignments
There are two practical assignments, counting for 30% and 20% of your final grade respectively.
Both practical assignments should be made in teams of 3 students.
The purpose of the first practical assignment is to deepen your understanding of classification tree algorithms and random forests by making your own implementation from scratch in Python or R. Furthermore, you will develop basic data analysis skills by applying your algorithm to a bug prediction data set. Use of Python or R is mandatory for the first assignment.
To install Python, please go here. A list of IDE's for Python can be found here.
To install R, please go here. R Studio is an IDE for R. It is warmly recommended.
The purpose of the second practical assignment is to further develop your data analysis (and pre-processing!) skills. You will analyse a collection of text documents, and have to build a prediction model. For the second assignment, you may use whatever tools you want to perform the analysis, and you can use existing implementations of the data analysis algorithms you apply.
Assignment 1: Classification Trees and Random Forests (deadline: Friday, October 11, 2024)
- The assignment.
- Getting started with the assignment in Python.
- Getting started with the assignment in R.
- The credit data.
- The pima indians data.
- Download the data for part 2 of the assignment here.
- The accompanying article can be found here.
Assignment 2: Text Mining (deadline: Friday, November 1, 2024)
- The assignment.
- Go here to get the data.