This is an old revision of the document!


Project

  • Team: 2 members.
  • Project Selection:
    • Option 1: Choose from the list of pre-defined project ideas provided below.
    • Option 2: Propose your own project idea, which requires approval from the course team.

Project Workflow

Each team is required to:

  • Implement their chosen project idea, building a model or system that addresses a specific medical data science problem.
  • Evaluate their approach using appropriate metrics (accuracy, precision, recall, etc.), and compare results to existing state-of-the-art methods.
  • Document their progress and findings in both a formal report and presentation, in English, using the IEEE format available here.

Milestones and Deliverables

  • Objective: Establish the research context by reviewing and summarising related work (existing studies and relevant literature related to your topic).
    • Action Items:
      • Conduct a thorough literature review of papers, articles, and studies relevant to your project.
      • Summarise the current state-of-the-art methods in the field.
      • Identify gaps in the research or areas for potential improvement.
    • Documentation (English, IEEE format): Create a report section (2 pages excluding references) detailing your findings, including citations of key papers and a discussion of how your project will build upon or differ from existing work.
    • Grading
      • (0.3p) References: Include a minimum of 10 academic papers in your review.
      • (0.2p) Research Questions: Provide at least 2 meaningful research questions that will answer, or that should be addressed in future work. Helpful guide here.
      • (0.5p) Content: Capture the current landscape of the topic. You may use qualitative surveys for inspiration. Also check the tips below). In your review explicitly answer these sub-questions, each worth 0.1p:
        • What datasets are used and why? (0.1p)
        • What benchmarks or evaluation methods are used? Are there any limitations? (0.1p)
        • What are the current shortcomings in the field? (0.1p)
        • How do the works relate to each other? (e.g: one paper addresses another's limitations, several share the same assumptions or techniques)? (0.1p)
        • What architectures and training techniques are used? Any noteworthy or novel approaches? (0.1p)
    • Presentation:
      • Prepare a 5 minute presentation (with slides) that summarises your review.
      • State the topic / problem, the context, and the direction you intend to pursue (your research questions).
      • The presentation is graded on a 0 to 1 scale and will be used to scale the M1 score.
  • Tips:
    • How to read research papers efficiently - the 3-pass method.
    • Suggested workflow:
      • Start with recent literature reviews (surveys of scholarly sources) related to your topic of interest - example.
      • Select individual papers from the surveys and read them in detail.
      • Document each papers by noting its main contributions (these are usually stated explicitly by the authors).
      • Explore top conferences using the CORE ranking portal.
        • Focus on highly ranked conferences such as CVPR, ECCV, NeurIPS, EMNLP, etc. You can scout the accepted papers in the current year by searching for them on arXiv. Example: NeurIPS 2025 accepted papers.
      • Evaluate papers based on the number of citations (though newer papers may have fewer), year of publication and author credibility.
    • Choose a topic that truly interests you. It will make the research process more engaging and enjoyable.
    • Upload M1 (Documentation): Upload must contain title and authors.
2. M2 (18.11.24) - Dataset Collection and Baseline Results (1p)
  • Objective: Obtain the datasets required for your project and implement a baseline model for comparison.
    • Action Items:
      • Dataset Collection:
        • Obtain a relevant dataset, either from the provided resources or other public sources (e.g., Kaggle, UCI, Papers with Code).
        • Preprocess the data (e.g., cleaning, normalization, dealing with missing values).
      • Baseline Model:
        • Implement at least one baseline method (e.g., logistic regression, support vector machines, a simple neural network, pretrained model).
        • Obtain preliminary results to compare against future improvements.
      • Evaluation Metrics: Choose appropriate metrics (e.g., accuracy, F1-score, ROC-AUC) and document initial performance.
    • Documentation (English, IEEE format): Submit a report section (2 pages excluding references) describing the dataset, preprocessing steps, baseline model, and results.
    • Grading
      • (0.5p) Dataset description: Description and rationale for the selected datasets, including dataset purpose, number of records, data quality, collection method, size, and feature description.
      • (0.1p) Baseline description: Clear explanation of the implemented baseline method.
      • (0.3p) Initial results: Presentation of baseline performance results
      • (0.1p) Result analysis: Interpretation and insights from the obtained results.
    • Upload M2 (Documentation): Upload must contain title and authors.
3. M3 (18.12.24) - Own Contribution (1p)
  • Objective: Develop and implement your original contribution to the field, either by addressing a new research problem or improving an existing approach.
    • Types of contributions:
      • Address a new problem: Investigate a medical data science challenge that has not been sufficiently explored in prior work.
      • Improve existing methods:
        • Improve results: Optimise existing models or experiment with alternative techniques to achieve better performance.
        • Extensive experiments: Conduct comprehensive testing to evaluate your model’s robustness across multiple datasets or varying conditions. Novelty is important: even if your model does not outperform the state-of-the-art, it should offer a new perspective or insight.
          • Examples:
            • Use a different deep learning architecture (e.g., ResNet vs. EfficientNet).
            • Apply a novel training strategy, such as self-supervised learning or data augmentation techniques.
            • Propose a hybrid model that combines multiple approaches (e.g., combining CNNs with decision trees).
    • Documentation (English, IEEE format): Write a report section (2 pages excluding references) justifying your chosen approach, detailing your contribution, how it differs from existing work, and comparing your experimental results to the baseline and state-of-the-art.
    • Grading:
      • (0.3p) Clear and well-justified explanation of your contribution/s.
      • (0.5p) Implementation and results on selected dataset/s.
      • (0.2p) Result analysis, and comparison with the baseline.
    • Upload M3 (Documentation and Code): Upload must contain title and authors.
4. M4 (08.01.25) - Final paper + Presentation (1p)
  • Objective: Compile your project into a well-organised academic report.
    • Action Items:
      • Write a research-style report following the IEEE format (8 pages excluding references).
      • Structure:
        • Abstract: Summarise the project, contributions and key findings.
        • Introduction: Describe the problem, motivation and background.
        • Related Work: Include the literature review from M1.
        • Methodology: Detail your approach, including algorithms, models and techniques.
        • Experiments: Describe datasets, baseline methods, and results from M2.
        • Own Contribution: Document your original contribution, as in M3.
        • Results and Discussion: Present results with visualisations (graphs, tables) and discuss implications.
        • Conclusion: Summarise outcomes, limitations and suggest directions for future research.
    • Documentation (English, IEEE format): Submit a polished, formal academic report in IEEE format (8 pages excluding references).
    • Presentation (6 minutes) - prepare a concise, visually clear presentation covering:
      • The problem addressed and its relevance.
      • Key steps of your methodology.
      • Experimental results and your main contributions.
      • Conclusions and potential areas for future research.
    • Presentations tips: create well-polished slides with clear visuals, including figures, graphs and performance metrics.
    • Evaluation: The presentation will be graded based on clarity, depth of explanation, the quality of results and the Q&A section. The final project grade will be weighted based on the presentation quality.
    • Upload M4 (Documentation and Slides): Upload must contain title and authors.

Grading System

  • Total Points = M1 + M2 + M3 + M4
    • M1 and M4 will have their grade (G) scaled by the score of the presentation (P): TOTAL = G * P

Examples of Project Ideas

1. Bad Posture Detection

  • Objective: Detect posture abnormalities from videos or images and suggest exercises to correct them.
  • Relevant work: Posture Detection.

2. Smoker Detection

  • Objective: Identify whether a person is a smoker based on lung capacity, voice analysis, or X-ray images.
  • Dataset: Gather data from publicly available voice or medical image datasets.
  • Note: Each of the modalities (audio, video, image) chosen, or their combination may result in a different project, without much overlap.

3. Retinal Lesion Detection

  • Objective: Detect retinal lesions from medical images, aiding early diagnosis of conditions like diabetic retinopathy.
  • Proposed Dataset: Retinal Lesions Dataset.

4. Fracture Detection in X-rays

  • Objective: Develop a model that identifies fractures in X-ray images, which could help radiologists in making faster diagnoses.
  • Proposed Datasets: MURA, RSNA etc.

5. Cancer Detection from Histopathology Images

  • Objective: Predict the severity of COVID-19 cases using patient data such as demographics, clinical tests, and symptoms.
  • Proposed Dataset: Choose an appropriate one from maduc7/Histopathology-Datasets

6. Alzheimer’s Disease Progression Prediction

  • Objective: Predict the progression of Alzheimer’s disease using imaging (e.g., MRI) or genetic data.
  • Proposed dataset: OASIS Alzheimer's Detection.

7. Interpretation of Knee MRI

  • Objective: Develop models for automated interpretation of knee MRs.
  • Proposed dataset: MRNet - Kaggle. / MRNet

8. Your own project

We encourage you to choose and define your own project.

Potential contributions

  • NEW - The term “New” refers to something that was not tried for YOUR task, so you can try to adapt techniques from other tasks and they will count as contributions.

Depending on the task and recent work, contributions may be:

  • New pipelines: Some solutions are implemented using a pipeline of models. You can tackle some parts of it and try to improve them.
  • Different architecture: You can modify the structure of a well established model, BUT the modification should be based on sound reasons, even though in the end it may not give better results. Random “Mutations” of known models won't count as a contribution.
  • Augmentation techniques: Check if you can augment the data in a new way. Maybe synthetically generated data can help, or not.
  • A new benchmark or new evaluation metrics: If you feel the tests in the literature are not robust to some cases, you can design a new set of qualitative tests. This should translate in at least a couple of hundred new tests / examples.
  • Explainability: If you feel that the works you reviewed do not provide much insight into the decisions that are being made, well, you can work on that: evaluate existing explainability tools under your task conditions.
  • Cross-task adaptation: Explore whether techniques that worked in related domains can be adapted to your task.
  • Robustness to noise/adversaries: Investigate how the system performs under noisy, adversarial, or out-of-distribution inputs, and propose methods to improve robustness.
  • Human-in-the-loop integration: Design hybrid workflows where humans assist the model (or vice versa) to achieve better results than either alone.

Tips for contributions:

  • Check limitations and future work: Most papers will have discussions around their limitations and propose future work items. Sometimes it is just that the authors did not focus on that aspect.
  • Error analysis on the baseline: You can analyse the errors made by your baseline and try to propose targeted solutions. In this case, the baseline should be a well performing model from the related work, not a simple finetuned architecture.
dsm/assignments/01.1759683157.txt.gz · Last modified: 2025/10/05 19:52 by emilian.radoi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0