Studying Resource Management Systems
Resource Management Systems (RMS) are responsible for managing the
submission of user applications. The main tasks of an RMS are:
to accept a job with a given specification and possible
requirements
to place the job in a waiting queue
to plan the best way of dispatching the jobs to the waiting
queue, allocating resources in the best possible way, according to
the specification/requirements provided by the user
The specification/requirements are usually written by the user and
stored in a file which is then used by the RMS to decide when and
where to dispatch a task.
The RMS performs as a scheduler on top of the operating system.
The main advantage of using an RMS is to be able to organize/plan the
execution of several user jobs that can have different requirements,
and using resources efficiently.
Several RMS are available. For example: Condor, SGE (Sun Grid
Engine), PBS (Portable Batch System), Torque (an open source version
of PBS), LSF (Load Sharing Facility), Load Leveler, among others.
Your task is to
choose one of them (NB: some of them are proprietary), install it in
your machine, configure and test with some scenarios for the
requirements (for example, you may configure the waiting queues and
priorities).
You can choose any program to test your RMS.
Examples of programs to be tested can be found in:
Distributed calculation of pi written in C, using MPI
C Program to implement quicksort (sequential) : use different
list sizes and submit several quicksorts
Below, it follows some examples of requirements:
your program should take a maximum of one hour to run
your program should run with two nodes, each with two cpus
your program should run using exactly four cpus, no matter the
computer architecture (i.e., multicore or distributed)
your program redirects all standard I/O to a disk file out.txt
your program is parallel and uses MPI
your program uses lots of memory (i.e., 16GBytes)
your program is interactive and uses two cpus during half an hour
What you should hand out:
Brief report answering the following questions:
- What are the main characteristics of the RMS you installed?
(brief description of the RMS, kind of language used, scheduling
model, etc)
- Where did you find the RMS? Did you need to configure some
repository? Was the RMS already installed in your system?
- Details about the configuration: did you need to configure
your machine to install the RMS? What kind of component did you
need to configure? What choices did you use to configure the RMS?
Did you need to modify, initialize or enable any service of the SO
to have the RMS up?
- Was it possible to describe all example requirements above
with the language provided by the installed RMS?
- Submit several jobs, observe the queue and report on waiting
times and execution times.
Deadline: February 27th, 2013.