This is an old revision of the document!

Lab 10 - Machine Learning


  • Understand basic concepts of machine learning
  • Remember examples of real-world problems that can be solved with machine learning
  • Learn the most common performance evaluation metrics for machine learning models
  • Analyse the behaviour of typical machine learning algorithms using the most popular techniques
  • Be able to compare multiple machine learning models


The exercises will be solved in Python, using various popular libraries that are usually integrated in machine learning projects:

  • Scikit-Learn: fast model development, performance metrics, pipelines, dataset splitting
  • Pandas: data frames, csv parser, data analysis
  • NumPy: scientific computation
  • Matplotlib: data plotting

All tasks are tutorial based and every exercise will be associated with at least one “TODO” within the code. Those tasks can be found in the exercises package, but our recommendation is to follow the entire skeleton code for a better understanding of the concepts presented in this laboratory class. Each functionality is properly documented and for some exercises, there are also hints placed in the code.

Because the various tasks and exercises are spread throughout the laboratory text, they are marked with a ⚠️ emoji. Make sure you look for this emoji so that you don't miss any of them!

⚠️ [5p] Task 4.B

Comment the results by specifying which is the best model in terms of fitting and which are the models that overfit or underfit the dataset.


Solve the tasks marked with TODO - TASK B.

⚠️ [15p] Exercise 5

In this exercise, you will learn how to properly evaluate a clustering model. We chose a K-means clustering algorithm for this example, but feel free to explore other alternatives. You can find out more about K-means clustering algorithms here. For all the associated tasks, you don't have to use any input file, because the clusters are generated in the skeleton. The model must learn how to group together points in a 2D space.

The solution for this exercise should be written in the TODO sections marked in the file. Please follow the skeleton code and understand what it does. To run the code, uncomment perform_clustering() in

⚠️ [5p] Task 5.A

Compute the silhouette score of the model by using a Scikit-learn function found in the metrics package.


Solve the tasks marked with TODO - TASK A.

⚠️ [10p] Task 5.B

Fetch the centres of the clusters (the model should already have them ready for you :-)) and plot them together with a colourful 2D representation of the data groups. Your plot should look similar to the one below:

You can also play around with the standard deviation of the generated blobs and observe the different outcomes of the clustering algorithm:


You should be able to discuss these observations with the assistant.

HINT: The plotting code is very similar to the one found in the skeleton. You can also Google it out. ;-)


Look at the hint above and solve the tasks marked with TODO - TASK B. Make at least 3 changes to the standard deviation. That means that 3 plots should be generated. Save each plot in a separate file.

⚠️ [10p] Exercise 6


Please take a minute to fill in the feedback form for this lab.


ep/labs/10.1638627222.txt.gz · Last modified: 2021/12/04 16:13 by vlad.stefanescu
CC Attribution-Share Alike 3.0 Unported Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0