Table of Contents

Competition

Competition baseline beaters, complete thisname form!

The competition is hosted on Kaggle at this link.

Each competitor will participate individually. Please login using your student mail (@stud.acs.upb.ro) and check this box.

We provide a starter code which demonstrates how to read the data, train a network and make a submission. You are encouraged to start your work from this notebook.

Beating the baseline on the private leaderboard will reward 1p, top 3 on the private leaderboard will have their final exam grade equal to 10 (4p).

Description

Traditional image classification models heavily rely on accurately labeled data for training, but in real-world scenarios, acquiring large quantities of labeled images can be costly and time-consuming.

Aditionally, available datasets exhibits a notable class imbalance, with benign cases significantly outnumbering malignant ones. This imbalance is consistent with the epidemiological reality that the majority of individuals undergoing screening are found to be cancer-free, as malignancies occur in only a small subset of the tested population.

In this challenge, we provide you with a dataset that poses both obstacles: a significant portion of the training data remains unlabeled and the labeled data is heavily imbalanced.

Your task is to develop innovative deep learning algorithms and techniques to overcome these challenges and build a robust image classification model.

To succeed in this competition, participants are encouraged to explore semi-supervised/unsupervised learning methods that leverage the unlabeled data to improve the model's performance. Developing strategies to mitigate the impact of the class imbalance and enhance the model's ability to generalize effectively will be crucial. We encourage creative ideas.

Data

Files

train_labeled.csv - paths to the labeled training set, with their corresponding labels

train_unlabeled.csv - paths to the unlabeled training set

test.csv - paths for the test set, for which you will need to make predictions

Columns

ID - the path to the file

label - 0 for benign samples, 1 for malignant samples

Rules

def seed_everything(seed=42):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False