Software tools for decentralized servicing of parallelMPI-jobs streams in geographically distributed multicluster computer systems | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2011. № 3(16).

Software tools for decentralized servicing of parallelMPI-jobs streams in geographically distributed multicluster computer systems

Parallel program scheduling refers to the meaningful problem of computer systems (CS)functioning organization. For every incoming program the execution computing resources (subsystems)have to be found. System composition dynamics and variable workload of computernodes should be taken into account by job scheduler.Centralized schedulers have the inherent significant shortcoming: CSs front-end failure maycause a whole system failure. Besides that, time overhead for resources search increase in case ofusing these tools in large-scale CS. Thus problem of development of decentralized models, algorithmsand system software for parallel program scheduling in distributed CS is urgent.In case of decentralized scheduling collective of schedulers have to function in system andsearch necessary resources for programs. It allows reaching persistence also in large-scale CS,making them able to continue functioning at some subsystems failures.Decentralized algorithms and software of parallel program scheduling in geographicallydistributedand GRID systems are proposed in this paper.Geographically-distributed CS, composed of H subsystems, is considering. N is total numberof computer nodes on subsystems. Let ni be number of nodes in subsystem i  S = {1, 2, …, H}.Let also bij = b(i, j, m) be channel capacity between i, j  S subsystems while transferring m bytesmessage ([b(i, j, m)] = byte/s).Scheduler functioning on each subsystem manages parallel program queue and searches forcomputing resources for job execution. Collective of schedulers is represented by graphG = (S, E), where schedulers and logical links between them correspond to vertices and edges.Presence of (i, j)  E edge means that scheduler i is able to send resource requests (jobs of itsqueue) to scheduler j. Set of vertices j incident to i represents its local neighborhoodL(i) = {j  S: (i, j)  E}.User sends job with resource request to one of the schedulers. That scheduler (in accordancewith realized algorithms) searches (sub)optimal subsystem j∗  S for users job to compute.Let job is characterized by rank r - number of parallel branches, expected execution time t(walltime) and total size z of execution files and data ([z] = byte).GBroker software suite of parallel MPI-programs decentralized scheduling in geographicallydistributedCS has been created and is being developed in the Centre of Parallel Computing Technologyof Siberian State University of Telecommunications and Information Sciences (CPCTSibSUTIS) and Computer Systems Laboratory of Institute of Semiconductor Physics of SB RAS(ISP SB RAS).Software suite includes scheduler gbroker, client tool gclient and netmon monitor of channelsperformance between subsystems at TCP/IP protocol level.Developed toolkit of parallel job scheduling was investigated on active multicluster CS, createdby CPCT SibSUTIS together with ISP SB RAS. Experiments results have shown that meanjob service times in centralized and decentralized scheduling are comparable. Scheduling time israther minor in comparison with job execution time.Decentralized scheduling software suite is one of necessary components for providing persistenceof geographically-distributed multicluster CS of CPCT SibSUTIS and ISP SB RAS.

Download file
Counter downloads: 312

Keywords

GRID-systems, geographically-distributed computer systems, Parallel tasks scheduling, brokering, GRID-системы, пространственно-распределенные вычислительные системы, диспетчеризация параллельных программ

Authors

NameOrganizationE-mail
Kurnosov Mikhail G.Siberian State University of Telecommunications and Information Sciences (Novosibirsk)mkurnosov@gmail.com
Paznikov Alexey A.Siberian State University of Telecommunications and Information Sciences (Novosibirsk)apaznikov@gmail.com
Всего: 2

References

Курносов М.Г., Пазников А.А. Децентрализованное обслуживание потоков параллельных задач в пространственно-распределенных вычислительных системах // Вестник СибГУТИ. 2010. № 2 (10). С. 79−86.
Caron E., Garonne V., Tsaregorodtsev A. Evaluation of meta-scheduler architectures and task assignment policies for high throughput computing // Technical report. Institut National de Recherche en Informatique et en Automatique. 2005.
Andreetto P., Borgia S., Dorigo A. Practical approaches to grid workload and resource management in the EGEE project // CHEP'04: Proc. of the Conference on Computing in High Energy and Nuclear Physics. 2004. V. 2. P. 899−902.
Frey J. et. al. Condor-G: A computation management agent for multi-institutional grids // Cluster Computing. 2001. V. 5. P. 237−246.
Buyya R., Abramson D., Giddy J. Nimrod/G: an architecture for a resource management and scheduling system in a global computational Grid // Proc. of the 4th International Conference on High Performance Computing in Asia-Pacific Region. 2000. P. 283−289.
Montero R., Huedo E., Llorente I. Grid resource selection for opportunistic job migration // 9th International Euro-Par Conference. 2003. V. 2790. P. 366−373.
Huedo E., Montero R., Llorente I. A framework for adaptive execution on grids // Software - Practice and Experience (SPE). 2004. V. 34. P. 631−651.
Xiaohui W., Zhaohui D., Shutao Y. CSF4: A WSRF compliant meta-scheduler // Proc. of World Congress in Computer Science Computer Engineering, and Applied Computing. 2006. P. 61−67.
Хорошевский В.Г. Архитектура вычислительных систем. М.: МГТУ им. Н.Э. Баумана. 2008. 520 с.
Евреинов Э.В., Хорошевский В.Г. Однородные вычислительные систем. Новосибирск: Наука, 1978. 320 с.
 Software tools for decentralized servicing of parallelMPI-jobs streams in geographically distributed multicluster computer systems | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2011. № 3(16).

Software tools for decentralized servicing of parallelMPI-jobs streams in geographically distributed multicluster computer systems | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2011. № 3(16).

Download file