This shows you the differences between two versions of the page.

Link to this comparison view

ep:labs:10 [2021/12/04 16:13]
vlad.stefanescu [⚠️ [10p] Task 4.A]
ep:labs:10 [2022/09/24 14:46] (current)
Line 9: Line 9:
   * Be able to compare multiple machine learning models   * Be able to compare multiple machine learning models
-===== Exercises ​=====+===== Resources ​=====
-The exercises will be solved in Python, using various ​popular libraries that are usually integrated in machine learning projects:+In this lab, we will study basic performance evaluation techniques used in machine learning, covering elementary concepts such as classification,​ regression, data fitting, clustering and much more.  
 +You will work in an environment that is easy to use, and provides a couple of tools like manipulating data and visualizing results. We will use a **Jupyer Notebook** hosted on **Google Colab**, which comes with a variety of useful tools already installed. 
 +The exercises will be solved in Python, using popular libraries that are usually integrated in machine learning projects:
   * [[https://​scikit-learn.org/​stable/​documentation.html|Scikit-Learn]]:​ fast model development,​ performance metrics, pipelines, dataset splitting   * [[https://​scikit-learn.org/​stable/​documentation.html|Scikit-Learn]]:​ fast model development,​ performance metrics, pipelines, dataset splitting
Line 18: Line 22:
   * [[https://​matplotlib.org/​3.1.1/​users/​index.html|Matplotlib]]:​ data plotting   * [[https://​matplotlib.org/​3.1.1/​users/​index.html|Matplotlib]]:​ data plotting
-All tasks are tutorial based and every exercise ​will be associated with at least one "​**TODO**"​ within the code. Those tasks can be found in the //​exercises//​ package, but our recommendation is to follow the entire skeleton code for a better understanding of the concepts presented in this laboratory class. Each functionality is properly documented and for some exercises, there are also hints placed in the code.+As datasets, we will use some public corpora provided by the Kaggle community:
-<note important>​ +  ​[[https://​www.kaggle.com/​uciml/​pima-indians-diabetes-database/​data|Classification Dataset]] 
-Because the various ​**tasks** and **exercises** are **spread throughout the laboratory text**, they are marked with a ⚠️ emojiMake sure you look for this emoji so that you don't miss any of them! +  * [[https://​www.kaggle.com/​zaraavagyan/weathercsv|Regression dataset]]
 +You can also check out these cheet sheets for fast reference to the most common libraries:
 +**Cheat sheets:**
 +  * [[https://​perso.limsi.fr/​pointal/​_media/​python:​cours:​mementopython3-english.pdf)|python]]
 +  * [[https://​s3.amazonaws.com/​assets.datacamp.com/​blog_assets/​Numpy_Python_Cheat_Sheet.pdf|numpy]]
 +  * [[https://​s3.amazonaws.com/​assets.datacamp.com/​blog_assets/​Python_Matplotlib_Cheat_Sheet.pdf|matplotlib]]
 +  * [[https://​s3.amazonaws.com/​assets.datacamp.com/​blog_assets/​Scikit_Learn_Cheat_Sheet_Python.pdf|sklearn]]
 +  * [[https://​github.com/​pandas-dev/​pandas/​blob/​master/​doc/​cheatsheet/​Pandas_Cheat_Sheet.pdf|pandas]]
 <​solution -hidden> <​solution -hidden>
 </​solution>​ </​solution>​
 +===== Tasks =====
 +==== Google Colab Notebook ====
 +For this lab, we will use Google Colab for exploring performance evaluation in machine learning. Please solve your tasks [[https://​github.com/​vladastefanescu/​machine-learning-introduction/​blob/​main/​Machine_Learning_Introduction.ipynb|here]] by clicking "​**Open in Colaboratory**"​.
 +You can then export this python notebook as a PDF (**File -> Print**) and upload it to **Moodle**.
 +===== Feedback =====
 +Please take a minute to fill in the **[[https://​forms.gle/​LWBWYsMiJq8FsYdN9 | feedback form]]** for this lab.
-==== ⚠️ [5p] Task 4.B ==== 
-Comment the results by specifying which is the **best model** in terms of fitting and which are the models that **overfit** or **underfit** the dataset. 
-⚠️⚠️ **NON-DEMO TASK** 
-Solve the tasks marked with **TODO - TASK B**. 
-==== ⚠️ [15p] Exercise 5 ==== 
-In this exercise, you will learn how to properly evaluate a **clustering model**. We chose a **K-means clustering algorithm** for this example, but feel free to explore other alternatives. You can find out more about K-means clustering algorithms [[https://​towardsdatascience.com/​understanding-k-means-clustering-in-machine-learning-6a6e67336aa1|here]]. For all the associated tasks, you don't have to use any input file, because the clusters are generated in the skeleton. The model must learn how to group together **points in a 2D space**. 
-<note important>​ 
-The solution for this exercise should be written in the **TODO** sections marked in the //​**clustering.py**//​ file. Please follow the skeleton code and understand what it does. To run the code, uncomment **perform_clustering()** in //​**app.py**//​. 
-==== ⚠️ [5p] Task 5.A ==== 
-Compute the **silhouette score** of the model by using a //​Scikit-learn//​ function found in the **metrics** package. 
-⚠️⚠️ **NON-DEMO TASK** 
-Solve the tasks marked with **TODO - TASK A**. 
-==== ⚠️ [10p] Task 5.B ==== 
-Fetch the **centres of the clusters** (the model should already have them ready for you :-)) and **plot** them together with a **colourful 2D representation** of the data groups. Your plot should look similar to the one below: 
-{{ :​ep:​labs:​22._clustering_plot.png?​600 |}} 
-You can also play around with the **standard deviation** of the generated blobs and observe the different outcomes of the clustering algorithm: 
-You should be able to discuss these observations with the assistant. 
-**HINT: **The **plotting code** is very similar to the one found in the skeleton. You can also [[https://​lmgtfy.com/?​q=plot+k+means+clusters+python|Google]] it out. ;-) 
-⚠️⚠️ **NON-DEMO TASK** 
-Look at the hint above and solve the tasks marked with **TODO - TASK B**. Make **at least 3** changes to the standard deviation. That means that **3 plots should be generated**. Save each plot **in a separate file**. 
-==== ⚠️ [10p] Exercise 6 ==== 
-⚠️⚠️ **NON-DEMO TASK** 
-Please take a minute to fill in the **[[https://​forms.gle/​KHMVUhNfCPoR71Ew7 | feedback form]]** for this lab. 
-===== References ===== 
-[[https://​www.kaggle.com/​uciml/​pima-indians-diabetes-database/​data|Classification Dataset]] 
-[[https://​towardsdatascience.com/​a-beginners-guide-to-linear-regression-in-python-with-scikit-learn-83a8f7ae2b4f|Regression dataset]] 
ep/labs/10.1638627222.txt.gz · Last modified: 2021/12/04 16:13 by vlad.stefanescu
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0