Studying Resource Management Systems

Resource Management Systems (RMS) are responsible for managing the submission of user applications. The main tasks of an RMS are:

  • to accept a job with a given specification and possible requirements
  • to place the job in a waiting queue
  • to plan the best way of dispatching the jobs to the waiting queue, allocating resources in the best possible way, according to the specification/requirements provided by the user
  • The specification/requirements are usually written by the user and stored in a file which is then used by the RMS to decide when and where to dispatch a task.

    The RMS performs as a scheduler on top of the operating system.

    The main advantage of using an RMS is to be able to organize/plan the execution of several user jobs that can have different requirements, and using resources efficiently.

    Several RMS are available. For example: Condor, SGE (Sun Grid Engine), PBS (Portable Batch System), Torque (an open source version of PBS), LSF (Load Sharing Facility), Load Leveler, among others. Your task is to choose one of them (NB: some of them are proprietary), install it in your machine, configure and test with some scenarios for the requirements (for example, you may configure the waiting queues and priorities).

    You can choose any program to test your RMS. Below it follows some examples of requirements:

  • your program should take a maximum of one hour to run
  • your program should run with two nodes, each with two cpus
  • your program should run using exactly four cpus, no matter the computer architecture (i.e., multicore or distributed)
  • your program redirects all standard I/O to a disk file out.txt
  • your program is parallel and uses MPI
  • your program uses lots of memory (i.e., 16GBytes)
  • your program is interactive and uses two cpus during half an hour
  • What you should hand out:

  • Brief report answering the following questions:

    Deadline: March 6th, 2012.