"Some ideas for practical model selection in NLP"

No próximo dia 20 de Abril de 2022, pelas 15h30 na sala S2 do DCC (FC6 1.40),  o Prof. Manuel Vilares Ferro irá dar uma palestra intitulada "Some ideas for practical model selection in NLP".


A palestra é organizada pelo DCC-FCUP e pelo grupo de investigação LIAAD-INESCTEC e é aberta a todos os interessados.


Short Bio:


Manuel Vilares Ferro is a Graduate in Mathematics (1987) with a degree (1988) from the Univ. of Santiago de Compostela and PhD in Computer Science from the Univ. of Nice-Sophia Antipolis (France) (1992). In 1991 he joined the Univ. of A Coruña, where he was associated professor (1994-2001), and in that year he won a chair at the Univ. of Vigo, where he coordinates the COLE group. This later is recognized by the Xunta de Galicia as an excellence research unit, and by the Univ. of Vigo as a reference group. His area of expertise is Computational Linguistics and Natural Language Processing, with applications to Information Retrieval (IR), Question Answering (QA) and Opinion Mining (OM). He has led 8 national projects, in collaboration with other groups, coordinating 4 of them as well as 7 projects, 5 of them coordinated, financed by the Xunta de Galicia always in the commented area, having coordinated 2. He has also led 3 actions between Spain and France, and 1 between Spain and Portugal. He coordinates (2006-14) the Galician Network of Language Processing and IR, and since 1993 the agreement between the Univs. of Vigo and A Coruña with the Ramón Piñeiro Center for Research in Humanities, for the development of a Galician language tagger. He has also led research contracts with companies (3.14 Financial Contents, Feuga, ...) and participated in others (Telémaco S.L,...), always in the above mentioned field. He coordinated the ESF Research Networking Programme: Evaluating Information Access Systems at the Univ. of Vigo. Author of 2 books, 7 book chapters, 39 articles in JCR journals, 26 articles in other journals and 60 papers in conferences with peer review and publication of proceedings, most of them(53) international. He has supervised 11 PhD theses, 3 of which with an extraordinary doctorate award and 4 with European/International mention. He has organized 5 international conferences) and he is a regular referee of JCR indexed journals.



The possibility of accessing massive amounts of data and the decline in the cost of disk storage have decisively contributed to the growing popularity of Machine Learning (ML) algorithms as the basis for modelling computational tasks. However, preparing training databases is often an expensive and time-consuming activity, especially when expert knowledge is needed. An area of work that is particularly sensitive to these inconveniences is natural language processing, especially when it involves new application domains where training resources are scarce or even non-existent. This justifies the interest of developing techniques that allow us to discard as soon as possible
learning strategies that do not meet our specifications.
Our aim is to evaluate the training effort, supporting decision making in order to look for a trade-off between the need for both human and computational resources during the learning process, and the performance of the generated model. We focus on three fundamental matters, depending on the type of ML strategy that is considered: early estimation of accuracy, efficient sampling and early detection of over-training phenomena.

