Modeling network contention effects on process allocation in computer systems discrete function and automatons
Interconnection networks of modern high-performance distributed computer systems have at least a two-level hierarchical organization. The first level of the communication network is formed by the switch-based network (InfiniBand, Ethernet). The second level is represented by a shared memory of SMP/NUMA-computer nodes. In such systems a communication time between processors depends on their replacement in the system. In this paper, we present a benchmark for estimating the message passing time when MPI-processes share the communication channels. We analyze the degradation of the communication network performance when message passing queues are formed for computer systems with NUMA/SMP computer nodes. We consider three levels of communication environment: shared memory of a computer node, processor interconnect in NUMA nodes, network interconnect between nodes (InfiniBand and Gigabit Ethernet). The Resource and Jobs Management Systems (RJMS) form a subsystem of p processor cores. If computer systems consist of multiprocessor nodes, this problem has many solutions. For example, a symmetric set of nodes that has the rank equal to eight can be formed in three ways: one computational node with eight processor cores (1x8), two nodes with four cores (2x4) and four nodes with two cores (4x2). A completion time of collective communication operations on these subsystems will be different. Therefore, the development of algorithms that determine nodes allocation taking into account a message passing structure of the target program has practical interest. Authors have developed a software for predicting the execution time of the All-to-all operation on the given subsystem of nodes. A software uses the results of an experimental estimate of the performance degradation for the MPI_Send/MPI_Recv operations during simultaneous use of the communication channel by a set of processes.
Keywords
параллельное мультипрограммирование, организация функционирования, вычислительные системы, collective communications, network contention, computer clustersAuthors
Name | Organization | |
Peryshkova Eugene N. | Siberian State University of Telecommunications and Information; Rzhanov Institute of Semiconductor Physics of SB RAS | e.peryshkova@gmail.com |
Kurnosov Mikhail G. | Siberian State University of Telecommunications and Information Sciences; Rzhanov Institute of Semiconductor Physics of SB RAS | mkurnosov@gmail.com |
References

Modeling network contention effects on process allocation in computer systems discrete function and automatons | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2019. № 47. DOI: 10.17223/19988605/47/11