next up previous
Next: Runtime Environment: DAOS Up: repspringer Previous: repspringer

Introduction

Parallel processing is a very important alternative to obtain good performance from parallel programs. However in order to obtain good performance from parallel programs, programmers need to deal with a very complex programming model. Therefore it is very common to find that the development of parallel systems/programs take much longer than the development of their sequential versions. Our work focus on the implicit parallelisation of logic programs. Logic programming presents a declarative programming model that contrasts with the popular imperative programming model in that the programmer needs to worry on what to solve and not on how to solve a problem. Because of this declarative characteristic, logic programming offers some opportunities to exploit implicit parallelism, i.e., programmers write their sequential programs and can run them efficiently in a parallel environment without any modification to the source code. This clearly reduces the cost of software development, as the user writes only one program that will run on one processor just as well as in several processors. Our work concentrates on developing techniques to efficiently implement parallel logic programming systems.

The rapidly growing interest in logic programming, and in parallel logic programming systems can be confirmed with the quality of papers presented in the most important conferences in the area: International Logic Programming Symposium (ILPS), International Conference on Logic Programming (ICLP), and Joint International Conference and Symposium on Logic Programming (JICSLP). Specific workshops on Parallelism and Implementation Techniques held with these main conferences have been very successful and produced published books with selected papers [47]. The Practical Applications of Prolog (PAP) conference and Practical Aspects of Declarative Languages Workshop also confirm the interest of industries on the use of Prolog as a language to software development. Many other important conferences (PLDI, PPOPP, Europar, PLILP and others) have published very good quality papers in the development of sequential or parallel logic programming systems. Lastly, the Journal of Logic Programming (JLP) is the specialised journal in the area.

Our main goal in having a parallel logic programming environment is to allow programmers to faster develop applications that are suitable to be developed in a declarative language such as optimisation problems, relational databases, inductive logic programming, and resource allocation problems among many others. The benchmark set we defined consists of such programs.

Logic programming exposes two main forms of parallelism that can be exploited implicitly:

In the literature we can find several systems that exploit only and-parallelism [61,59], only or-parallelism [2,77,20,108] or a combination of and- and or-parallelism. Systems that combine both and- and or-parallelism are still under research [55,111,112,75,76], and several problems remain to be solved. For instance, what kind of data structure is more adequate to combine and-parallelism with or-parallelism, or what kind of and-parallelism is more common in application programs, or yet what kind of implementation strategy to use when developing parallel logic programming systems for hybrid parallel architectures such as distributed shared memory systems [79]. Most well-known parallel logic programming systems currently available were implemented for shared-memory architectures, because of its simpler programming model. However some applications need more scalable architectures. Therefore, it is our objective to develop a parallel software that can run efficiently on scalable architectures such as the NCP2 [3] or the Alewife [1].

Parallel logic programming systems have been evolved in three main frontiers: (1) new algorithms and data structures for integrating both and-parallelism and or-parallelism, (2) efficient techniques to dynamically distribute work among the processors, and (3) compile-time analysis support for efficient parallelisation. Our work tackles several of these issues focusing on static analysis tools, post-run analysis tools, and implementation strategies for the runtime environment, including smart scheduling strategies.

Among algorithms and data structures to support or-parallelism, we can cite several research work in the literature, for shared memory architectures [71,2], and for distributed memory [29,73,20,77] architectures. As mentioned before, the most popular systems were implemented for shared-memory architectures because of the simpler programming model. These systems use either stack sharing (binding array model [108]) or stack copying [2] in order to maintain multiple bindings for different branches of the search tree.

Among the systems that exploit independent and-parallelism, we can mention &-Prolog [62], and &ACE [59], both implemented for shared-memory machines.

Systems that implement dependent and-parallelism are DASWAM [90], Parlog [52], AKL [60] and &ACE [59], also implemented for shared-memory machines.

Integration of and-parallelism with or-parallelism is a very complex task and few systems were successful in doing it:

All these systems were designed and developed for shared-memory architectures. Several distributed memory PLPs have been proposed. Some support pure IAP [107], or pure ORP [19,77]. Both ANDP and ORP are supported in Conery's And/Or model [30], in the dataflow based models [67], and in the Distributed Andorra Prolog [16].

Systems implemented for centralised memory architectures present several limitations: (1) the hardware is not scalable, (2) algorithms and data structures designed for shared-memory architectures sometimes are not scalable. Distributed implementations of parallel logic programming systems that combine both and-parallelism with or-parallelism require quite complex parallel programming skills.

New software/hardware platforms have been developed that combine the shared memory programming model with a scalable hardware (distributed shared memory systems - DSMs [79]). In this work we propose a new parallel logic programming system that integrates the shared-memory programming model with the distributed-memory programming model, in order to obtain an efficient system capable of running on a hybrid architecture (for example, a cluster of workstations connected via a fast network that allows a shared-memory programming model). Research carried out by the authors about performance of parallel logic programming systems originally written for shared-memory machines, shows that it is possible to achieve significant performance improvements when running these systems on DSM architectures by only making modifications to data layout and making few modifications to the algorithms [86,85,84,89,25,24,23]. Our experiments also show that architectural parameters can affect significantly the performance of parallel logic programming systems [93,92,36]. These results confirm that our parallel runtime model designed to run in DSM platforms can obtain maximum performance from scalable architectures such as the NCP2 and Alewife already mentioned before.

Regarding work distribution, several research work have been done on the implementation of efficient techniques for distributing work among the processors during parallel execution of Prolog programs. The execution of Prolog programs tends to generate irregular computational patterns where work (both and-work and or-work) is created dynamically. Therefore, the scheduler is an important part of any parallel logic programming system that aims at efficiency. Several schedulers were written for distributing or-work in Aurora, for example: Argonne [21], Wavefront [18], Manchester [22], Bristol [15] and Dharma [95]. Each one of them deals with some kind of scheduling problem: dispatching on topmost or bottom-most work, parallel execution of side-effect predicates, execution of speculative work, application-dependent strategies etc.

In the context of our project, several scheduling strategies were implemented in order to maximise performance, minimise frequency and number of communication messages while keeping locality [105,33,28]. In order to develop and test these strategies simultaneously with the DAOS prototype, they were either simulated or implemented on our available parallel logic programming systems: Plosys [77] and Andorra-I [87].

Regarding compile-time analysis tools for parallelisation support, we concentrated on granularity and complexity analysis. Our work is based on the GRANLOG model [14] that extends and generalises previous models proposed in the literature [99,38,37]. The GRANLOG tool, composed of several modules, accepts a Prolog program as input and generates an annotated Prolog program containing information about complexity of goals and clauses. The information produced by GRANLOG is very useful to guide scheduling decisions. For example, based on the number and grain size of alternatives in a choicepoint, the scheduler can select better pieces of work to assign to processors. Global information about amounts of and-parallelism and or-parallelism can be inferred from the grain sizes provided by the GRANLOG tool, in order that we can choose between sources of and-parallel work or sources of or-parallel work [42,43,45,46]. In the context of our work three research works were carried out utilising GRANLOG information: (1) the centralised scheduler used by the Plosys system was modified in order to support GRANLOG information [49,50], (2) the Andorra-I reconfigurer [44] and the Andorra-I engine were modified to support GRANLOG information [65], (3) our DSLP scheme (Distributed Scheduling for Logic Programming) was designed and developed using information provided by GRANLOG [33].

Our preliminary results are very encouraging and show that logic programming can be an alternative to efficient software development producing very good performance at a very low programming cost. Static analysis can produce useful information to guide scheduling decisions, and our runtime model can be more efficient than systems developed for specific parallel platforms (shared-memory only or distributed-memory only).

Our work is organised as follows. Section 2 presents our DAOS (Distributing And/Or in Scalable machines) runtime environment describing the engine used, main data structures, memory management, algorithms to combine and-parallelism with or-parallelism and scheduling issues. Section 3 presents our compile-time analysis tools that generates useful information to efficiently exploit parallelism and guide scheduling decisions. Section 4 presents our post-run tool useful to visualise the parallel execution of Prolog programs and identify sources of overheads and opportunities for improving performance, mainly in the schedulers. Section 5 gives a brief description of the benchmark set we defined for Prolog. Last, section 6 presents our conclusions and future work.


next up previous
Next: Runtime Environment: DAOS Up: repspringer Previous: repspringer
Ines de Castro Dutra
1999-05-04