Competition

The competition is hosted on Kaggle at this link.

Each competitor will participate individually. Please login using your student mail (@stud.acs.upb.ro).

We provide a starter code which demonstrates how to read the data, train a network and make a submission. You are encouraged to start your work from this notebook.

Beating the baseline on the private leaderboard will reward 1p, top 3 on the private leaderboard will have their final exam grade equal to 10 (4p).

Description

Traditional image classification models heavily rely on accurately labeled data for training, but in real-world scenarios, acquiring large quantities of labeled images can be costly and time-consuming.

Aditionally, available datasets exhibits a notable class imbalance, with benign cases significantly outnumbering malignant ones. This imbalance is consistent with the epidemiological reality that the majority of individuals undergoing screening are found to be cancer-free, as malignancies occur in only a small subset of the tested population.

In this challenge, we provide you with a dataset that poses both obstacles: a significant portion of the training data remains unlabeled and the labeled data is heavily imbalanced.

Your task is to develop innovative deep learning algorithms and techniques to overcome these challenges and build a robust image classification model.

To succeed in this competition, participants are encouraged to explore semi-supervised/unsupervised learning methods that leverage the unlabeled data to improve the model's performance. Developing strategies to mitigate the impact of the class imbalance and enhance the model's ability to generalize effectively will be crucial. We encourage creative ideas.

Data

Files

train_labeled.csv - paths to the labeled training set, with their corresponding labels

train_unlabeled.csv - paths to the unlabeled training set

test.csv - paths for the test set, for which you will need to make predictions

Columns

ID - the path to the file

label - 0 for benign samples, 1 for malignant samples

Rules

  • Using pretrained models is not allowed.
  • Using additional training data apart from the data provided is not allowed.
  • Searching on the internet for the clean dataset or labels for the test set is not allowed.
  • You will need to provide the jupyter notebook/python script that was used to train the model that generated your submitted solution. Thus, the solution has to be reproducible. You can use the following snippet to ensure that the starting seed is the same:
def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
dsm/assignments/02.txt · Last modified: 2024/11/04 15:31 by andrei.niculae1004
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0