Checkpointing mechanism for a parallel program

  • Posted
  • Proposals 1
  • Remote
  • #10634
  • Expired
Michael J. has already sent a proposal.
  • 2

Description

Experience Level: Expert
The aim of the project is to create a checkpointing solution. The application, while running should be able to checkpoint itself at different interval, continue after a checkpoint is done and restart when the application fails.so, there should be these three functions available:

checkpoint()
continue()
restart()

The user should also have the option to put function calls in his program that will trigger the checkpointing mechnaism. That is checkpoint interval should be both user defined and automatic.

The checkpoint mechanism should be able to checkpoint parallel applications with processes running on different machines. In my case it is MPI applications ( MPI applications are c applications with a set of libraries that enables parallelisation of applications. The syntax is c)

Clarification Board

    There are no clarification messages.