Parallel ILP for Distributed Memory Architectures
Nuno A. Fonseca, Ashwin Srinivasan, Fernando Silva and Rui Camacho
2009
Abstract
The growth of machine-generated relational databases, both in the
sciences and in industry, is rapidly outpacing our ability to extract
useful information from them by manual means. This has brought into
focus machine learning techniques like Inductive Logic Programming
(ILP) that are able to extract human-comprehensible models for complex
relational data. The price to pay is that ILP techniques are not
efficient: they can be seen as performing a form of discrete
optimisation, which is known to be computationally hard; and the
complexity is usually some super-linear function of the number of
examples. While little can be done to alter the theoretical bounds on
the worst-case complexity of ILP systems, some practical gains may
follow from the use of multiple processors. In this paper we survey
the state-of-the-art on parallel ILP. We implement several parallel
algorithms and study their performance using some standard
benchmarks. The principal findings of interest are these: (1) of the
techniques investigated, one that simply constructs models in parallel
on each processor using a subset of data and then combines the models
into a single one, yields the best results; and (2) sequential
(approximate) ILP algorithms based on randomized searches have lower
execution times than (exact) parallel algorithms, without sacrificing
the quality of the solutions found.
Bibtex
@Article{fonseca-mlj09,
author = {N. A. Fonseca and A. Srinivasana and F. Silva and R. Camacho},
title = {{Parallel ILP for Distributed Memory Architectures}},
journal = {Machine Learning Journal},
pages = {257--279},
volume = {74},
number = {3},
year = {2009},
}
Download Paper
Springer