Technical Report: DCC-97-7

Optimising Parallel Logic Programming Systems for Scalable Machines

VĂ­tor Santos Costa & Ricardo Bianchini
LIACC & COPPE/Sistemas
Universidade do Porto & Universidade Federal do Rio de Janeiro

September 1997


Abstract

Logic programs are good examples of symbolic applications that often exhibit large amounts of implicit parallelism and that can greatly benefit from parallel computers. Parallel logic programming (PLP) systems have obtained excellent results for traditional bus-based shared-memory architectures. However, the scalable multiprocessors being developed today pose new challenges, such as the high latency of memory accesses and the demand for scalability. Our experience with a sophisticated PLP system, Andorra-I, demonstrates that indeed performance suffers greatly on modern architectures. This paper addresses the question of whether the poor performance is inherent to the complex structure of PLP systems or can be improved through careful analysis and tuning. In order to answer this question, we perform a detailed analysis of the cache behaviour of all Andorra-I data structures via execution-driven simulation of a DASH-like multiprocessor. Based on this analysis we optimise the Andorra-I code using 5 different techniques. We present the isolated and combined performance improvements provided by these optimisations. The overall results we obtained are quite impressive. A few of our logic programs begun to approach linear speedup as a result of our optimisations. In fact, for one of our programs the speedup of the modified Andorra-I is a factor of 3 higher than that of the original version of the system on 24 processors. Our main conclusion is then that, even though PLP systems are indeed complex and sometimes irregular, these systems can and should perform well on modern scalable multiprocessors.

Keywords: Scalable Multiprocessors, Parallel Logic Programming, And-Or Parallelism, Performance Evaluation.