June 1 at 2:30 pm
Doctoral Programme | Computer Science
Defense | Multi-tool integration for comprehensive characterization and validation of genomic and transcriptomic data
Student | Marta Patrícia Ribeiro Ferreira
Date: June 1
Time: 2:30 pm
Venue: Room FC6 029
President:
Alípio Mário Guedes Jorge
Professor Catedrático
Faculdade de Ciências da Universidade do Porto
Examiners:
Rui Manuel Ribeiro Castro Mendes
Professor Auxiliar
Escola de Engenharia da Universidade do Minho
Joana Gonçalves de Gouveia Maia Xavier
Professora Auxiliar Convidada
Faculdade de Medicina e Ciências Biomédicas da Universidade do Algarve
Committee Members:
Miriam Raquel Seoane Pereira Seguro Santos
Professora Auxiliar
Faculdade de Ciências da Universidade do Porto
Pedro Gabriel Dias Ferreira (Co-orientador)
Professor Auxiliar com Agregação
Faculdade de Ciências da Universidade do Porto
Abstract:
The advent of Next-Generation Sequencing (NGS) technologies has reshaped the landscape of genomic and transcriptomic research, unlocking unprecedented opportunities for precision medicine. However, the complexity and volume of sequencing data present substantial computational challenges, particularly in variant detection, data interpretation, and handling low-quality samples. This thesis addresses these challenges by developing novel computational pipelines designed to optimize and automate whole-genome sequencing (WGS) and RNA sequencing (RNA-seq) analysis, ultimately advancing bioinformatics workflows in genomics.
We present TotalGenome, an advanced, modular pipeline that integrates multiple state-of-the-art variant callers (DeepVariant, HaplotypeCaller, Lumpy, Delly, and GRIDSS) to improve the detection of single nucleotide variants (SNVs) and structural variants (SVs) in WGS data. By combining several tools, TotalGenome achieves enhanced accuracy in variant calling, particularly in the analysis of noncoding regulatory variants associated with hereditary diffuse gastric cancer (HDGC). The pipeline demonstrates how multi-tool synergies can enhance both sensitivity and precision, providing a more comprehensive approach to genomic variant detection and improving risk prediction in clinical genetics.
For RNA-seq data, we introduce Transcriptomate, a flexible, scalable, and userfriendly tool for transcriptomic profiling that simplifies the process of differential expression analysis, while ensuring real-time visualization of gene expression dynamics. Designed to function independently of traditional workflow management systems (WFMS), Transcriptomate streamlines the process of data analysis and metric extraction, making it highly suitable for both computational biologists and experimental researchers. When applied to CDH1-related gastric cancer, the tool identified key gene expression changes and regulatory disruptions, offering a deeper understanding of tumorigenesis.
This thesis also explores the integration of advanced sequencing technologies in the context of challenging samples, with NanoString and AmpliSeq serving as reliable alternatives for low-quality RNA analysis. NanoString demonstrated robust fibrosis gene expression profiling, and AmpliSeq accurately characterized oncocytic tumor subtypes from frozen samples, revealing distinct molecular signatures and hub genes.
At the intersection of bioinformatics and computational techniques, this work emphasizes the importance of automated pipelines and integrative methodologies in advancing genomic research. The future directions of this thesis involve leveraging machine learning and artificial intelligence to improve variant prioritization, enhance enrichment analysis for deeper biological insights, and develop sophisticated multi-omics integration strategies, pushing the boundaries of precision medicine and systems biology.
