Differences

This shows you the differences between two versions of the page.

Link to this comparison view

dsm:assignments:02 [2024/10/06 22:45]
emilian.radoi created
dsm:assignments:02 [2025/10/06 20:14] (current)
andrei.niculae1004
Line 1: Line 1:
 ====== Competition ====== ====== Competition ======
 +
 +<​hidden>​
 +The competition is hosted on Kaggle at [[https://​www.kaggle.com/​competitions/​dsm-2025|this link]].
 +
 +Each competitor will participate individually. Please login using your **student mail** (@stud.acs.upb.ro) and check this box.
 +
 +{{:​dsm:​assignments:​kaggle_mail_share.png?​500|}}
 +
 +We provide a [[https://​www.kaggle.com/​code/​andreiniculae/​dsm-2025-starting-code|starter code]] which demonstrates how to read the data, train a network and make a submission. You are encouraged to start your work from this notebook. ​
 +
 +Beating the baseline on the private leaderboard will reward **1p**, top 3 on the private leaderboard will have their final exam grade equal to 10 (4p).
 +===== Description =====
 +Traditional image classification models heavily rely on accurately labeled data for training, but in real-world scenarios, acquiring large quantities of labeled images can be costly and time-consuming.
 +
 +Aditionally,​ available datasets exhibits a notable class imbalance, with benign cases significantly outnumbering malignant ones. This imbalance is consistent with the epidemiological reality that the majority of individuals undergoing screening are found to be cancer-free,​ as malignancies occur in only a small subset of the tested population.
 +
 +In this challenge, we provide you with a dataset that poses both obstacles: a significant portion of the training data remains unlabeled and the labeled data is heavily imbalanced.
 +
 +Your task is to develop innovative deep learning algorithms and techniques to overcome these challenges and build a robust image classification model.
 +
 +To succeed in this competition,​ participants are encouraged to explore semi-supervised/​unsupervised learning methods that leverage the unlabeled data to improve the model'​s performance. Developing strategies to mitigate the impact of the class imbalance and enhance the model'​s ability to generalize effectively will be crucial. We encourage creative ideas.
 +
 +===== Data =====
 +=== Files ===
 +
 +**train_labeled.csv** - paths to the labeled training set, with their corresponding labels
 +
 +**train_unlabeled.csv** - paths to the unlabeled training set
 +
 +**test.csv** - paths for the test set, for which you will need to make predictions
 +
 +=== Columns ===
 +
 +**ID** - the path to the file
 +
 +**label** - 0 for benign samples, 1 for malignant samples
 +
 +===== Rules =====
 +   * Using pretrained models is not allowed.
 +   * Using additional training data apart from the data provided is not allowed.
 +   * Searching on the internet for the clean dataset or labels for the test set is not allowed.
 +   * You will need to provide the jupyter notebook/​python script that was used to train the model that generated your submitted solution. Upload it on [[https://​curs.upb.ro/​2025/​mod/​assign/​view.php?​id=24468|moodle]]
 +</​hidden>​
dsm/assignments/02.1728243941.txt.gz · Last modified: 2024/10/06 22:45 by emilian.radoi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0