This is an old revision of the document!


Project

  • Team: 2 members.
  • Project Selection:
    • Option 1: Choose from the list of pre-defined project ideas provided below.
    • Option 2: Propose your own project idea, which requires approval from the course team.

Project Workflow

Each team is required to:

  • Implement their chosen project idea, building a model or system that addresses a specific medical data science problem.
  • Evaluate their approach using appropriate metrics (accuracy, precision, recall, etc.), and compare results to existing state-of-the-art methods.
  • Document their progress and findings in both a formal report and presentation, in English, using the IEEE format available here.

Milestones and Deliverables

  • Objective: Establish the research context by reviewing and summarising related work (existing studies and relevant literature related to your topic).
    • Action Items:
      • Conduct a thorough literature review of papers, articles, and studies relevant to your project.
      • Summarise the current state-of-the-art methods in the field.
      • Identify gaps in the research or areas for potential improvement.
    • Documentation: Create a report section (2 pages excluding references) detailing your findings, including citations of key papers and a discussion of how your project will build upon or differ from existing work, in English.
    • Upload Documentation: Upload (Must contain title and authors)
    • Grading
      • (0.3p) References: Include a minimum of 10 academic papers in your review.
      • (0.2p) Research Questions: Provide at least 2 meaningful research questions that will answer, or that should be addressed in future work. Helpful guide here.
      • (0.5p) Content: Capture the current landscape of the topic. You may use qualitative surveys for inspiration. Also check the tips below). In your review explicitly answer these sub-questions, each worth 0.1p:
        • What datasets are used and why? (0.1p)
        • What benchmarks or evaluation methods are used? Are there any limitations? (0.1p)
        • What are the current shortcomings in the field? (0.1p)
        • How do the works relate to each other? (e.g: one paper addresses another's limitations, several share the same assumptions or techniques)? (0.1p)
        • What architectures and training techniques are used? Any noteworthy or novel approaches? (0.1p)
    • Presentation:
      • Prepare a 5 minute presentation (with slides) that summarises your review.
      • State the topic / problem, the context, and the direction you intend to pursue (your research questions).
      • The presentation is graded on a 0 to 1 scale and will be used to scale the M1 score.
  • Tips:
    • How to read research papers efficiently - the 3-pass method.
    • Suggested workflow:
      • Start with recent literature reviews (surveys of scholarly sources) related to your topic of interest - example.
      • Select individual papers from the surveys and read them in detail.
      • Document each papers by noting its main contributions (these are usually stated explicitly by the authors).
      • Explore top conferences using the CORE ranking portal.
        • Focus on highly ranked conferences such as CVPR, ECCV, NeurIPS, EMNLP, etc. You can scout the accepted papers in the current year by searching for them on arXiv. Example: NeurIPS 2025 accepted papers.
      • Evaluate papers based on the number of citations (though newer papers may have fewer), year of publication and author credibility.
    • Choose a topic that truly interests you. It will make the research process more engaging and enjoyable.
2. M2 (18.11.24) - Dataset Collection and Baseline Results (1p)
  • Objective: Obtain the datasets required for your project and implement a baseline model for comparison.
    • Action Items:
      • Dataset Collection:
        • Obtain a relevant dataset, either from the provided resources or other public sources (e.g., Kaggle, UCI, Papers with Code).
        • Preprocess the data (e.g., cleaning, normalization, dealing with missing values).
      • Baseline Model:
        • Implement at least one baseline method (e.g., logistic regression, support vector machines, a simple neural network, pretrained model).
        • Obtain preliminary results to compare against future improvements.
      • Evaluation Metrics: Choose appropriate metrics (e.g., accuracy, F1-score, ROC-AUC) and document initial performance.
    • Documentation (IEEE conference paper format): Submit a report section (2 pages excluding references) describing the dataset, preprocessing steps, baseline model, and results.
    • Upload Documentation: Upload
    • Grading
      • (0.5p) Dataset description: Description and rationale for the selected datasets, including dataset purpose, number of records, data quality, collection method, size, and feature description.
      • (0.1p) Baseline description: Clear explanation of the implemented baseline method.
      • (0.3p) Initial results: Presentation of baseline performance results
      • (0.1p) Result analysis: Interpretation and insights from the obtained results.
3. M3 (18.12.24) - Own Contribution (1p)
  • Objective: Implement your novel contribution to the field, either by solving a new problem or improving an existing method.
    • Types of Contributions:
      • Address a New Problem: Tackle a medical data science issue that has not been addressed by related work.
      • Improve Existing Methods:
        • Improve Results: Enhance the performance of an existing solution by optimising models or experimenting with different techniques.
        • Extensive Experiments: Conduct extensive experiments to assess your model’s robustness, including testing with different datasets or under varying conditions.
        • New Approach: Introduce a new method or architecture (e.g., switching from traditional CNNs to transformers), even if it does not outperform state-of-the-art methods, as long as it provides a novel perspective.
          • Examples:
            • Use a different deep learning architecture (e.g., ResNet vs. EfficientNet).
            • Apply a novel training strategy, such as self-supervised learning or data augmentation techniques.
            • Propose a hybrid model that combines multiple approaches (e.g., combining CNNs with decision trees).
    • Documentation (IEEE conference paper format): Write a report section (2 pages excluding references) justifying your chosen approach, detailing your contribution, how it differs from existing work, and comparing your experimental results to the baseline and state of the art.
    • Upload Documentation and code: Upload
    • Grading:
      • Description of proposed contribution: 0.3
      • Implementation and results on selected dataset: 0.5
      • Result analysis, comparison with baseline: 0.2
4. M4 (08.01.25) - Final paper + Presentation (1p)
  • Objective: Compile your project into a well-organised academic report.
    • Action Items:
      • Write a research-style report following the IEEE conference paper format.
      • Structure:
        • Abstract: Briefly summarize your project, contributions, and key findings.
        • Introduction: Explain the problem you are addressing, motivation, and background.
        • Related Work: Include the summary from M1.
        • Methodology: Detail your approach, including algorithms, models, and techniques used.
        • Experiments: Describe the datasets, baseline methods, and results from M2.
        • Own Contribution: Document your original contribution, as outlined in M3.
        • Results and Discussion: Present detailed results with visualizations (graphs, tables) and discuss their implications.
        • Conclusion: Summarize the outcomes, limitations, and future work.
    • Documentation: Submit a polished, formal academic report in IEEE format (8 pages excluding references).
    • Upload Documentation: Upload
    • Presentation: Prepare a 6 minute presentation covering:
      • The problem you addressed and its relevance.
      • Key steps of your methodology.
      • Experimental results and key contributions.
      • Conclusions and potential areas for future research.
    • Create well- polished slides with clear visuals, including figures, graphs, and performance metrics.
    • Evaluation: Your presentation will be graded based on clarity, depth of explanation, the quality of results and the Q&A section. The final project grade will be weighted by your presentation quality.
    • Upload Presentation: Upload

Grading System

  • Total Points = M1 + M2 + M3 + M4
    • M1 and M4 will have their grade (G) scaled by the score of the presentation (P): TOTAL = G * P

Examples of Project Ideas

1. Bad Posture Detection

  • Objective: Detect posture abnormalities from videos or images and suggest exercises to correct them.
  • Relevant work: Posture Detection.

2. Smoker Detection

  • Objective: Identify whether a person is a smoker based on lung capacity, voice analysis, or X-ray images.
  • Dataset: Gather data from publicly available voice or medical image datasets.
  • Note: Each of the modalities (audio, video, image) chosen, or their combination may result in a different project, without much overlap.

3. Retinal Lesion Detection

  • Objective: Detect retinal lesions from medical images, aiding early diagnosis of conditions like diabetic retinopathy.
  • Proposed Dataset: Retinal Lesions Dataset.

4. Fracture Detection in X-rays

  • Objective: Develop a model that identifies fractures in X-ray images, which could help radiologists in making faster diagnoses.
  • Proposed Datasets: MURA, RSNA etc.

5. Cancer Detection from Histopathology Images

  • Objective: Predict the severity of COVID-19 cases using patient data such as demographics, clinical tests, and symptoms.
  • Proposed Dataset: Choose an appropriate one from maduc7/Histopathology-Datasets

6. Alzheimer’s Disease Progression Prediction

  • Objective: Predict the progression of Alzheimer’s disease using imaging (e.g., MRI) or genetic data.
  • Proposed dataset: OASIS Alzheimer's Detection.

7. Interpretation of Knee MRI

  • Objective: Develop models for automated interpretation of knee MRs.
  • Proposed dataset: MRNet - Kaggle. / MRNet

8. Your own project

We encourage you to choose and define your own project.

Potential contributions

  • NEW - The term “New” refers to something that was not tried for YOUR task, so you can try to adapt techniques from other tasks and they will count as contributions.

Depending on the task and recent work, contributions may be:

  • New pipelines: Some solutions are implemented using a pipeline of models. You can tackle some parts of it and try to improve them.
  • Different architecture: You can modify the structure of a well established model, BUT the modification should be based on sound reasons, even though in the end it may not give better results. Random “Mutations” of known models won't count as a contribution.
  • Augmentation techniques: Check if you can augment the data in a new way. Maybe synthetically generated data can help, or not.
  • A new benchmark or new evaluation metrics: If you feel the tests in the literature are not robust to some cases, you can design a new set of qualitative tests. This should translate in at least a couple of hundred new tests / examples.
  • Explainability: If you feel that the works you reviewed do not provide much insight into the decisions that are being made, well, you can work on that: evaluate existing explainability tools under your task conditions.
  • Cross-task adaptation: Explore whether techniques that worked in related domains can be adapted to your task.
  • Robustness to noise/adversaries: Investigate how the system performs under noisy, adversarial, or out-of-distribution inputs, and propose methods to improve robustness.
  • Human-in-the-loop integration: Design hybrid workflows where humans assist the model (or vice versa) to achieve better results than either alone.

Tips for contributions:

  • Check limitations and future work: Most papers will have discussions around their limitations and propose future work items. Sometimes it is just that the authors did not focus on that aspect.
  • Error analysis on the baseline: You can analyse the errors made by your baseline and try to propose targeted solutions. In this case, the baseline should be a well performing model from the related work, not a simple finetuned architecture.
dsm/assignments/01.1759679766.txt.gz · Last modified: 2025/10/05 18:56 by emilian.radoi
CC Attribution-Share Alike 3.0 Unported
www.chimeric.de Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0