Show page

Differences

This shows you the differences between two versions of the page.

--- dsm:assignments:02 [2024/10/06 22:45]
emilian.radoi created
+++ dsm:assignments:02 [2025/11/13 17:00] (current)
andrei.niculae1004
@@ Line 1: / Line 1: @@
 ====== Competition ======
+The competition is hosted on Kaggle at [[https://www.kaggle.com/competitions/dsm-2025|this link]].
+Each competitor will participate individually. Please login using your **student mail** (@stud.acs.upb.ro) and check this box.
+{{:dsm:assignments:kaggle_mail_share.png?500|}}
+We provide a [[https://www.kaggle.com/code/andreiniculae/dsm-2025-starting-code|starter code]] which demonstrates how to read the data, train a network and make a submission. You are encouraged to start your work from this notebook.
+Beating the baseline on the private leaderboard will reward **1p**, top 3 on the private leaderboard will have their final exam grade equal to 10 (4p).
+===== Description =====
+Traditional image classification models heavily rely on accurately labeled data for training, but in real-world scenarios, acquiring large quantities of labeled images can be costly and time-consuming.
+Aditionally, available datasets exhibits a notable class imbalance, with benign cases significantly outnumbering malignant ones. This imbalance is consistent with the epidemiological reality that the majority of individuals undergoing screening are found to be cancer-free, as malignancies occur in only a small subset of the tested population.
+In this challenge, we provide you with a dataset that poses both obstacles: a significant portion of the training data remains unlabeled and the labeled data is heavily imbalanced.
+Your task is to develop innovative deep learning algorithms and techniques to overcome these challenges and build a robust image classification model.
+To succeed in this competition, participants are encouraged to explore semi-supervised/unsupervised learning methods that leverage the unlabeled data to improve the model's performance. Developing strategies to mitigate the impact of the class imbalance and enhance the model's ability to generalize effectively will be crucial. We encourage creative ideas.
+===== Data =====
+=== Files ===
+**train_labeled.csv** - paths to the labeled training set, with their corresponding labels
+**train_unlabeled.csv** - paths to the unlabeled training set
+**test.csv** - paths for the test set, for which you will need to make predictions
+=== Columns ===
+**ID** - the path to the file
+**label** - 0 for benign samples, 1 for malignant samples
+===== Rules =====
+   * Using pretrained models is not allowed.
+   * Using additional training data apart from the data provided is not allowed.
+   * Searching on the internet for the clean dataset or labels for the test set is not allowed.
+   * You will need to provide the jupyter notebook/python script that was used to train the model that generated your submitted solution. Upload it on [[https://curs.upb.ro/2025/mod/assign/view.php?id=24468|moodle]]

dsm/assignments/02.1728243941.txt.gz · Last modified: 2024/10/06 22:45 by emilian.radoi

Show page Old revisions

Media Manager Back to top

Differences

General Information

Lectures

Labs

Assignments