This shows you the differences between two versions of the page.
ep:labs:10 [2021/12/04 16:13] vlad.stefanescu [⚠️ [15p] Exercise 4] |
ep:labs:10 [2025/02/11 22:58] (current) cezar.craciunoiu [Lab 9 - Machine Learning Optimization] |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Lab 10 - Machine Learning ====== | + | ====== Lab 10 - Machine Learning Optimization ====== |
===== Objectives ===== | ===== Objectives ===== | ||
- | * Understand basic concepts of machine learning | + | * TODO |
- | * Remember examples of real-world problems that can be solved with machine learning | + | |
- | * Learn the most common performance evaluation metrics for machine learning models | + | |
- | * Analyse the behaviour of typical machine learning algorithms using the most popular techniques | + | |
- | * Be able to compare multiple machine learning models | + | |
- | ===== Exercises ===== | + | ===== Resources ===== |
- | The exercises will be solved in Python, using various popular libraries that are usually integrated in machine learning projects: | + | TODO |
- | * [[https://scikit-learn.org/stable/documentation.html|Scikit-Learn]]: fast model development, performance metrics, pipelines, dataset splitting | + | ===== Tasks ===== |
- | * [[https://pandas.pydata.org/pandas-docs/stable/|Pandas]]: data frames, csv parser, data analysis | + | |
- | * [[https://numpy.org/doc/|NumPy]]: scientific computation | + | |
- | * [[https://matplotlib.org/3.1.1/users/index.html|Matplotlib]]: data plotting | + | |
- | All tasks are tutorial based and every exercise will be associated with at least one "**TODO**" within the code. Those tasks can be found in the //exercises// package, but our recommendation is to follow the entire skeleton code for a better understanding of the concepts presented in this laboratory class. Each functionality is properly documented and for some exercises, there are also hints placed in the code. | + | ==== Google Colab Notebook ==== |
- | <note important> | + | TODO |
- | Because the various **tasks** and **exercises** are **spread throughout the laboratory text**, they are marked with a ⚠️ emoji. Make sure you look for this emoji so that you don't miss any of them! | + | |
- | </note> | + | |
+ | ===== Feedback ===== | ||
- | <solution -hidden> | + | Please take a minute to fill in the **[[https://forms.gle/NpSRnoEh9NLYowFr5 | feedback form]]** for this lab. |
- | Solution: {{:ep:labs:lab_12_ml_revisited_solution.zip}} | + | |
- | </solution> | + | |
Line 37: | Line 27: | ||
- | ==== ⚠️ [10p] Task 4.A ==== | ||
- | For each model, make predictions on both the **training set** and **test set** and compute the corresponding **accuracy values**. | ||
- | <note> | ||
- | The model is already trained so you can directly use it to yield predictions on the two sets. And in order to evaluate these predictions, you can use the already familiar **evaluate_classifier** function from //**classification.py**//. | ||
- | </note> | ||
- | ⚠️⚠️ **NON-DEMO TASK** | ||
- | Look at the hint above and solve the tasks marked with **TODO - TASK A**. | ||
- | ==== ⚠️ [5p] Task 4.B ==== | ||
- | |||
- | Comment the results by specifying which is the **best model** in terms of fitting and which are the models that **overfit** or **underfit** the dataset. | ||
- | |||
- | ⚠️⚠️ **NON-DEMO TASK** | ||
- | |||
- | Solve the tasks marked with **TODO - TASK B**. | ||
- | |||
- | ==== ⚠️ [15p] Exercise 5 ==== | ||
- | |||
- | In this exercise, you will learn how to properly evaluate a **clustering model**. We chose a **K-means clustering algorithm** for this example, but feel free to explore other alternatives. You can find out more about K-means clustering algorithms [[https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1|here]]. For all the associated tasks, you don't have to use any input file, because the clusters are generated in the skeleton. The model must learn how to group together **points in a 2D space**. | ||
- | |||
- | <note important> | ||
- | The solution for this exercise should be written in the **TODO** sections marked in the //**clustering.py**// file. Please follow the skeleton code and understand what it does. To run the code, uncomment **perform_clustering()** in //**app.py**//. | ||
- | </note> | ||
- | |||
- | |||
- | ==== ⚠️ [5p] Task 5.A ==== | ||
- | |||
- | Compute the **silhouette score** of the model by using a //Scikit-learn// function found in the **metrics** package. | ||
- | |||
- | ⚠️⚠️ **NON-DEMO TASK** | ||
- | |||
- | Solve the tasks marked with **TODO - TASK A**. | ||
- | |||
- | ==== ⚠️ [10p] Task 5.B ==== | ||
- | |||
- | Fetch the **centres of the clusters** (the model should already have them ready for you :-)) and **plot** them together with a **colourful 2D representation** of the data groups. Your plot should look similar to the one below: | ||
- | |||
- | {{ :ep:labs:22._clustering_plot.png?600 |}} | ||
- | |||
- | You can also play around with the **standard deviation** of the generated blobs and observe the different outcomes of the clustering algorithm: | ||
- | |||
- | <code> | ||
- | CLUSTERS_STD = 2 | ||
- | </code> | ||
- | |||
- | You should be able to discuss these observations with the assistant. | ||
- | |||
- | <note> | ||
- | **HINT: **The **plotting code** is very similar to the one found in the skeleton. You can also [[https://lmgtfy.com/?q=plot+k+means+clusters+python|Google]] it out. ;-) | ||
- | </note> | ||
- | |||
- | ⚠️⚠️ **NON-DEMO TASK** | ||
- | |||
- | Look at the hint above and solve the tasks marked with **TODO - TASK B**. Make **at least 3** changes to the standard deviation. That means that **3 plots should be generated**. Save each plot **in a separate file**. | ||
- | |||
- | ==== ⚠️ [10p] Exercise 6 ==== | ||
- | |||
- | ⚠️⚠️ **NON-DEMO TASK** | ||
- | |||
- | Please take a minute to fill in the **[[https://forms.gle/KHMVUhNfCPoR71Ew7 | feedback form]]** for this lab. | ||
- | ===== References ===== | ||
- | |||
- | [[https://www.kaggle.com/uciml/pima-indians-diabetes-database/data|Classification Dataset]] | ||
- | |||
- | [[https://towardsdatascience.com/a-beginners-guide-to-linear-regression-in-python-with-scikit-learn-83a8f7ae2b4f|Regression dataset]] | ||
- | |||
- | {{namespace>:ep:labs:10:contents:tasks&nofooter&noeditbutton}} |