Towards Efficient and Scalable Probabilistic Inductive Logic Programming
Joana Côrte-Real
July 2018
Abstract
Many real-world processes exhibit both relational structure and
uncertainty, and Machine Learning approaches should be able to deal
with these aspects. Probabilistic Inductive Logic Programming
(PILP), a subset of Statistical Relational Learning, uses Inductive
Logic Programming (ILP) extended with probabilistic facts to produce
meaningful and interpretable models. This merger between First Order
Logic (FOL) theories and uncertainty makes PILP an adequate tool for
knowledge representation and extraction. However, this flexibility in
PILP systems is coupled with an exponential search space growth when
looking for good models (inherited from ILP), and so often only a
subset of all possible models is explored due to limited
resources. Furthermore, the probabilistic evaluation of FOL theories,
from the underlying probabilistic logic language and its solver, is
also computationally demanding. In order to mitigate this problem,
this thesis introduces novel PILP pruning strategies that can help
reduce the time taken to generate good probabilistic theories, and
shows how they can be applied in several distinct parts of the PILP
search space. It also describes a safe pruning criterion which
guarantees that the optimal model is not pruned away when used in
combination with one of the pruning strategies, as well as two
alternative more aggressive criteria that do not provide this
guarantee. Because these pruning criteria are based on the data’s
probabilistic characteristics, the use of pruning strategies results
in minimum information loss, which in turn maintains the quality of
the generated theories while causing a significant reduction in
average execution time. Experiments performed using benchmarks from
different areas and two PILP systems (one developed during this thesis
and another from the literature) show that the pruning strategies are
effective in maintaining predictive accuracy for all criteria and
experimental settings, and reducing the execution time when using some
of the more aggressive strategies and criteria, compared to using no
pruning. Finally, a real world study using medical data was conducted
and the resulting PILP models were more accurate in identifying false
negatives when compared to other techniques.
Bibtex
@PhdThesis{cortereal-phd,
author = {J. Côrte-Real},
title = {{Towards Efficient and Scalable Probabilistic Inductive Logic Programming}},
school = {University of Porto},
address = {Portugal},
month = {July},
year = {2018},
type = {{PhD Thesis}},
}
Download Thesis
PDF file