08 de maio às 14h30
Programa Doutoral | Ciência de Computadores
Provas | The Relationship Between Artificial Intelligence, Machine Learning, and Research Data Management
Estudante | Christine Rose Kirkpatric
Data: 08 de maio
Hora: 14h30
Local: Auditório FC6 029
Presidente:
Pedro Gabriel Dias Ferreira
Professor Catedrático
Faculdade de Ciências da Universidade do Porto
Arguentes:
Luiz Olavo Bonino da Silva Santos
Associate Professor
University of Twente (The Netherlands)
Jane Greenberg
Alice B. Kroeger Professor
College of Computing and Informatics, Drexel University (USA)
Vogais:
Pedro Miguel Alves Brandão
Professor Auxiliar
Faculdade de Ciências da Universidade do Porto
Inês de Castro Dutra (Orientadora)
Professora Auxiliar
Faculdade de Ciências da Universidade do Porto
Abstract:
As artificial intelligence (AI) and machine learning (ML) increasingly shape data-driven science, assumptions that progress depends primarily on scale—larger models and larger datasets—have obscured the foundational role of data preparation, stewardship, and quality. Good practices have been set aside in the AI race. This has been most clearly the case in the proliferation of Deep Learning tools, which rely on vast amounts of (scavenged) data. These advances have required specialized infrastructure at scale that consume massive amounts of energy (and water). This has informed a growing hesitancy about the energy consumption of AI. This dissertation examines the intersection of AI and Research Data Management (RDM), two domains that have historically evolved along separate methodological and disciplinary lines, to assess how data practices might inform ML accuracy and overall AI efficiency. Drawing on principles from FAIR (Findable, Accessible, Interoperable, Reusable) and open science, the work investigates what it means for data to be ‘machine readable’ and ‘machine actionable’ across different AI data architectures, including simple data structures used by ‘Data Science’ ML, foundation models, and knowledge graphs. Through experimental studies exploring MLCommons benchmarks and data from the Unidade de Cardiologia Materno Fetal (UCMF), the research demonstrates that while FAIR practices improve data reuse and accessibility, they are not sufficient on their own to ensure AI readiness; following RDM practices in addition to the FAIR principles helps develop ML models that are both more effective and efficient. This work demonstrates that AI methods require different types of data preparation and explores how these data requirements align with the FAIR principles. This dissertation challenges the notion that “FAIR equals AI ready,” defines AI readiness for data, and provides actionable guidance for data producers, stewards, and AI practitioners. By bridging Information Science and Computer Science perspectives, this work contributes both conceptual clarity and applied methods for aligning RDM practices with the realities of using current AI methods and technologies.
