Towards Efficient and Scalable Probabilistic Inductive Logic Programming

Joana Côrte-Real

July 2018


Abstract

Many real-world processes exhibit both relational structure and uncertainty, and Machine Learning approaches should be able to deal with these aspects. Probabilistic Inductive Logic Programming (PILP), a subset of Statistical Relational Learning, uses Inductive Logic Programming (ILP) extended with probabilistic facts to produce meaningful and interpretable models. This merger between First Order Logic (FOL) theories and uncertainty makes PILP an adequate tool for knowledge representation and extraction. However, this flexibility in PILP systems is coupled with an exponential search space growth when looking for good models (inherited from ILP), and so often only a subset of all possible models is explored due to limited resources. Furthermore, the probabilistic evaluation of FOL theories, from the underlying probabilistic logic language and its solver, is also computationally demanding. In order to mitigate this problem, this thesis introduces novel PILP pruning strategies that can help reduce the time taken to generate good probabilistic theories, and shows how they can be applied in several distinct parts of the PILP search space. It also describes a safe pruning criterion which guarantees that the optimal model is not pruned away when used in combination with one of the pruning strategies, as well as two alternative more aggressive criteria that do not provide this guarantee. Because these pruning criteria are based on the data’s probabilistic characteristics, the use of pruning strategies results in minimum information loss, which in turn maintains the quality of the generated theories while causing a significant reduction in average execution time. Experiments performed using benchmarks from different areas and two PILP systems (one developed during this thesis and another from the literature) show that the pruning strategies are effective in maintaining predictive accuracy for all criteria and experimental settings, and reducing the execution time when using some of the more aggressive strategies and criteria, compared to using no pruning. Finally, a real world study using medical data was conducted and the resulting PILP models were more accurate in identifying false negatives when compared to other techniques.

Bibtex

@PhdThesis{cortereal-phd,
  author =  {J. Côrte-Real},
  title =   {{Towards Efficient and Scalable Probabilistic Inductive Logic Programming}},
  school =  {University of Porto},
  address = {Portugal},
  month =   {July},
  year =    {2018},
  type =    {{PhD Thesis}},
}

Download Thesis

PDF file