Ricardo Rocha Homepage

Compile the hypothesis space: do it once, use it often

Nuno A. Fonseca, Rui Camacho, Ricardo Rocha and Vítor Santos Costa

2008

Abstract

Inductive Logic Programming (ILP) is a powerful and well-developed abstraction for multi-relational data mining techniques. Despite the considerable success of ILP, deployed ILP systems still have efficiency problems when applied to complex problems. In this paper we propose a novel technique that avoids the procedure of deducing each example to evaluate each constructed clause. The technique is based on the Mode Directed Inverse Entailment approach to ILP, where a bottom clause is generated for each example and the generated clauses are subsets of the literals of such bottom clause. We propose to store in a prefix-tree all clauses that can be generated from all bottom clauses together with some extra information. We show that this information is sufficient to estimate the number of examples that can be deduced from a clause and present an ILP algorithm that exploits this representation. We also present an extension of the algorithm where each prefix-tree is computed only once (compiled) per example. The evaluation of hypotheses requires only basic and efficient operations on trees. This proposal avoids re-computation of hypothesis' value in theory-level search, in cross-validation evaluation procedures and in parameter tuning. Both proposals are empirically evaluated on real applications and considerable speedups were observed.

Bibtex

@Article{fonseca-fi08,
  author =    {N. A. Fonseca and R. Camacho and R. Rocha and V. Santos Costa},
  title =     {{Compile the hypothesis space: do it once, use it often}},
  journal =   {Fundamenta Informaticae},
  pages =     {45--67},
  volume =    {89},
  number =    {1},
  year =      {2008},
}

Download Paper

PDF file

IOS Press