Show page

Differences

This shows you the differences between two versions of the page.

--- ewis:laboratoare:07 [2022/04/19 22:22]
alexandru.predescu [Training. Validation. Testing.]
+++ ewis:laboratoare:07 [2023/04/19 18:08] (current)
alexandru.predescu [Exercises]
@@ Line 210: / Line 210: @@
 <note tip>
-RMSE and R-Squared provide a rough estimation of model over/underfitting on a given dataset. Cross-validation can be used to evaluate the model on independent datasets.
+RMSE and R-Squared provide a rough estimation of model over/underfitting on a given dataset when comparing test and validation results. Cross-validation can be further used to evaluate the model on independent datasets.
 </note>
+==== Scikit-learn ====
+<code python>
+import numpy as np
+from sklearn.linear_model import LinearRegression
+import matplotlib.pyplot as plt
+# define some test data
+x = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
+y = np.array([2, 3, 3, 4, 7, 8, 12, 14, 16, 20])
+# split train/test data
+n_train = int(len(x)*0.8)
+x_train = x[:n_train]
+y_train = y[:n_train]
+x_predict = x[n_train:]
+# create and fit the LR model
+model = LinearRegression()
+reg = model.fit(x_train.reshape(-1, 1), y_train)
+# print model parameters
+print(model.coef_)
+print(model.intercept_)
+# use the model to make predictions on the test dataset
+model_output = model.predict(x.reshape(-1, 1))
+# plot the results compared to the target values
+plt.plot(x, y, model_output)
+plt.title("Linear regression")
+plt.xlabel("x")
+plt.ylabel("y")
+plt.legend(["target", "predicted"])
+plt.show()
+</code>
 ==== Exercises ====
@@ Line 218: / Line 256: @@
 Download the {{:ewis:laboratoare:lab7:project_lab7.zip|project archive}} and unzip on your PC. Install the requirements using pip (e.g. //py -3 -m pip install -r requirements.txt//).
-The code sample (//task12.py//) uses linear regression to fit a sample of generated data.
+The script (//task1.py//) uses linear regression to fit a sample of generated data.
 Run the program and solve the following scenarios:
   * Experiment with different polynomial orders
   * Plot the RMSE and R-Squared values for each case
-*This task is required for solving Task 2.
-=== Task 2 (3p) ===
 Based on the experimental results in Task 1, answer the following questions:
-  * Q1: Which is the optimal polynomial order for this dataset with regards to the RMSE and model complexity? Tip: the RMSE improvement starts to decrease after a certain polynomial order on the RMSE chart.
+  * Q1: Which is the optimal polynomial order for this dataset with regards to the RMSE and model complexity? Hint: the RMSE improvement starts to decrease after a certain polynomial order on the RMSE chart.
-  * Q2: Which is the optimal polynomial order for this dataset with regards to the R-Squared coefficient and model complexity? Tip: the R2 coefficient improvement starts to decrease after a certain polynomial order on the R-Squared chart.
+  * Q2: Which is the optimal polynomial order for this dataset with regards to the R-Squared coefficient and model complexity? Hint: the R2 coefficient improvement starts to decrease after a certain polynomial order on the R-Squared chart.
   * Q3: Explain the results based on the provided function that is used to generate the dataset.
-Submit your answers on Moodle as PDF report.
+=== Task 2 (3p) ===
-=== Task 3 (4p) ===
+The script (//task2.py//) loads a dataset from a CSV file. Run a similar script as Task 1, and present your results.
-The code sample (//task3.py//) loads the Boston Housing Dataset and trains a linear model over multiple features. The prediction results (median housing prices in thousands of dollars) are shown in the plot and compared to the original dataset.
+[[https://www.kaggle.com/datasets/meetnagadia/bitcoin-stock-data-sept-17-2014-august-24-2021|Bitcoin Price Dataset]]
+=== Task 3 (3p) ===
+The script (//task3.py//) loads the Boston Housing Dataset and trains a linear model over multiple features. The prediction results (median housing prices in thousands of dollars) are shown in the plot and compared to the original dataset.
 Run the program and solve the following scenarios:
   * [TODO 1] Change the size of the training dataset (percent) and evaluate the models that are obtained in each case using RMSE
@@ Line 245: / Line 284: @@
   * Q2. What is the amount (percent) of training data that provides the best results in terms of prediction accuracy on validation data?
   * Q3. What happens if the amount training data is small, e.g. 10%, with regards to the prediction accuracy and the over/underfitting of the regression model?
-Submit your answers on Moodle as PDF report.
 ==== Resources ====
-  * {{:ewis:laboratoare:lab7:project_lab7.zip|Project}}
+  * {{:ewis:laboratoare:lab7:lab7.zip|Project}}
   * {{:ewis:laboratoare:python_workflow.pdf|Python Workflow}}
   * [[https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html|The Boston Housing Dataset]]

Laboratories

Resources

ewis/laboratoare/07.1650396128.txt.gz · Last modified: 2022/04/19 22:22 by alexandru.predescu

Show page Old revisions

Media Manager Back to top