May 5 at 2 pm
Doctoral Programme | Computer Science
Defense | Weakly Supervised Multimodal Explanations for Medical Image Classification
Student | Isabel Cristina Rio-Torto de Oliveira
Date: May 5
Time: 2 pm
Venue: Room FC6 029
President:
Miguel Tavares Coimbra
Full Professor
Faculty of Sciences, University of Porto
Arguentes:
Mauricio Reyes
Associate Professor
ARTORG Center for Biomedical Engineering Research, University of Bern (Suiça)
Ana Catarina Fidalgo Barata
Assistant Professor
Instituto Superior Técnico, University of Lisbon
Vogais:
Francesco Renna
Assistant Professor
Faculty of Sciences, University of Porto
Luís Filipe Pinto de Almeida Teixeira (Orientador)
Associate Professor
Faculty of Engineering, University of Porto
Abstract:
Deep Learning (DL) models have demonstrated remarkable success in medical image analysis, achieving or even surpassing human performance. However, their lack of interpretability remains a critical barrier to clinical adoption. Research in Explainable AI (XAI) has grown to address this challenge. Most methods explore post-hoc visual explanations because they are easy to apply and do not require retraining. However, these have critical limitations: they highlight which features are relevant but not why, they are often hard to interpret, and their faithfulness to the model is uncertain. Moreover, different users (e.g., clinicians vs. patients) have varying needs and pref erences, so explainability methods need to be able to adapt and generate different modalities of explanations. While XAI methods improve interpretability, they often add to the annotation bur den, especially for concept-based explanations or Natural Language Explanations (NLEs). For this last explanation modality, relying too heavily on full human supervision can even make ex planations unfaithful. This thesis addresses these challenges by pursuing two main objectives: (i) to develop alter natives to post-hoc visual explanations for medical image classification, and (ii) to achieve this without increasing annotation burden, leveraging weakly supervised learning. To this end, the work explores multiple explanation modalities and introduces novel frameworks that balance in terpretability, scalability, and predictive performance. First, we investigate in-model visual expla nation methods, including attention mechanisms and inherently interpretable architectures such as B-cos networks, providing the first systematic study of their utility in medical imaging. While at tention and B-cos maps offer localized interpretability, results reveal important limitations regard ing faithfulness and task-specific performance. Second, a new paradigm for concept-based expla nations is proposed through CBVLM, which transforms Large Vision-Language Models (LVLMs) into Concept Bottleneck Models (CBMs). Operating under weak supervision (In-Context Learn ing (ICL)), CBVLM significantly reduces annotation costs while outperforming traditional base lines across diverse medical datasets. Third, the thesis advances research on NLEs by introducing WeNLEX, a weakly supervised framework capable of generating faithful and audience-adaptive textual explanations. Notably, WeNLEX enhances both explainability and predictive accuracy when integrated into model training. Finally, the thesis presents DeViL, a vision–language ex planation framework that aligns visual and textual modalities by training on image–caption pairs, producing open-vocabulary attributions and natural language descriptions of model features. Collectively, these contributions demonstrate that effective medical XAI can be achieved with out costly annotation pipelines, through the strategic use of weak supervision and multimodal explanations.
