Software tools for decentralized servicing of parallelMPI-jobs streams in geographically distributed multicluster computer systems
Parallel program scheduling refers to the meaningful problem of computer systems (CS)functioning organization. For every incoming program the execution computing resources (subsystems)have to be found. System composition dynamics and variable workload of computernodes should be taken into account by job scheduler.Centralized schedulers have the inherent significant shortcoming: CSs front-end failure maycause a whole system failure. Besides that, time overhead for resources search increase in case ofusing these tools in large-scale CS. Thus problem of development of decentralized models, algorithmsand system software for parallel program scheduling in distributed CS is urgent.In case of decentralized scheduling collective of schedulers have to function in system andsearch necessary resources for programs. It allows reaching persistence also in large-scale CS,making them able to continue functioning at some subsystems failures.Decentralized algorithms and software of parallel program scheduling in geographicallydistributedand GRID systems are proposed in this paper.Geographically-distributed CS, composed of H subsystems, is considering. N is total numberof computer nodes on subsystems. Let ni be number of nodes in subsystem i S = {1, 2, …, H}.Let also bij = b(i, j, m) be channel capacity between i, j S subsystems while transferring m bytesmessage ([b(i, j, m)] = byte/s).Scheduler functioning on each subsystem manages parallel program queue and searches forcomputing resources for job execution. Collective of schedulers is represented by graphG = (S, E), where schedulers and logical links between them correspond to vertices and edges.Presence of (i, j) E edge means that scheduler i is able to send resource requests (jobs of itsqueue) to scheduler j. Set of vertices j incident to i represents its local neighborhoodL(i) = {j S: (i, j) E}.User sends job with resource request to one of the schedulers. That scheduler (in accordancewith realized algorithms) searches (sub)optimal subsystem j∗ S for users job to compute.Let job is characterized by rank r - number of parallel branches, expected execution time t(walltime) and total size z of execution files and data ([z] = byte).GBroker software suite of parallel MPI-programs decentralized scheduling in geographicallydistributedCS has been created and is being developed in the Centre of Parallel Computing Technologyof Siberian State University of Telecommunications and Information Sciences (CPCTSibSUTIS) and Computer Systems Laboratory of Institute of Semiconductor Physics of SB RAS(ISP SB RAS).Software suite includes scheduler gbroker, client tool gclient and netmon monitor of channelsperformance between subsystems at TCP/IP protocol level.Developed toolkit of parallel job scheduling was investigated on active multicluster CS, createdby CPCT SibSUTIS together with ISP SB RAS. Experiments results have shown that meanjob service times in centralized and decentralized scheduling are comparable. Scheduling time israther minor in comparison with job execution time.Decentralized scheduling software suite is one of necessary components for providing persistenceof geographically-distributed multicluster CS of CPCT SibSUTIS and ISP SB RAS.
Keywords
GRID-systems, geographically-distributed computer systems, Parallel tasks scheduling, brokering, GRID-системы, пространственно-распределенные вычислительные системы, диспетчеризация параллельных программAuthors
Name | Organization | |
Kurnosov Mikhail G. | Siberian State University of Telecommunications and Information Sciences (Novosibirsk) | mkurnosov@gmail.com |
Paznikov Alexey A. | Siberian State University of Telecommunications and Information Sciences (Novosibirsk) | apaznikov@gmail.com |
References
