doctorate PD:CC

February 25 at 2:30 p.m.


Doctoral Programme | Computer Science

Defense | Unfolding the Temporal Structure of Narratives

Student | Hugo Miguel Oliveira de Sousa 

 

Date: February 25
Time: 14:30
Venue: Room FC5 278


President:

António Mário da Silva Marcos Florido
Associate Professor
Faculty of Sciences, University of Porto


Examiners:

Adam Jatowt
Professor
Department of Computer Science, University of Innsbruck (Austria)

António Manuel Horta Branco
Full Professor
Faculty of Sciences, University of Lisbon


Committee Members:

Bruno Emanuel da Graça Martins
Associate Professor
Instituto Superior Técnico, University of Lisbon

Álvaro Pedro de Barros Borges Reis Figueira
Assistant Professor
Faculty of Sciences, University of Porto

Alípio Mário Guedes Jorge (Supervisor)
Full Professor
Faculty of Sciences, University of Porto


Abstract:

When reading a story or a news article, humans can understand the chronological order of mentioned events even when such information is vaguely defined. This is a fundamental skill for the comprehension of a narrative. For instance, from the sentence “Bob sent a message to Alice while she was leaving her birthday party.” we comprehend that the occurrence of the event “sent” is included in the time span of the event “leaving”, despite not being explicitly stated on the text. This PhD thesis addresses the task of temporal information extraction, tackling both core challenges and practical applications across multiple domains and languages. We structure the problem into two main components: temporal entity identification and temporal relation classification. 

For temporal entity identification, we explore methods across diverse settings. We de velop a biomedical entity extraction pipeline for Portuguese oncology health records, com bining neural modelswithentity linking. We also study the use of large language models to extract narrative entities from Portuguese news articles via prompt engineering, showing that their effectiveness can be comparable with methods that were fine-tuned for the task. Additionally, we introduce TEI2GO, a suite of multilingual models for temporal expression identification that achieves state-of-the-art results in four of the six languages evaluated. For temporal relation classification, we propose decomposing interval relations into point relations between entity endpoints. This method reaches a temporal awareness score of 70.1% on the TempEval-3 dataset, establishing a new state of the art on this benchmark. Building on this insight, we introduce a novel formulation of the task which recasts rela tion classification as a sequential decision-making problem. This perspective enables the application of reinforcement learning algorithms to learn temporal reasoning from experi ence. All research was conducted on top of tieval, a Python library we developed and open-sourced to support the research community. This framework standardizes tempo ral information extraction evaluation across multiple corpora and provides domain-specific tools such as temporal closure and temporal awareness metrics. 

The contributions of this thesis range from practical advances in healthcare and multilin gual systems to methodological innovations in temporal identification and classification. Together, they advance the state of the art and broaden the foundations of temporal infor mation extraction.

Keywords: temporal information extraction, temporal entity identification, temporal rela tion classification