Algorithms of optimization of scalable thread-safe pool based on diffracting trees for multicore computing systems
Most of all modern multicore computer systems (CS) are large-scale, hierarchically organized and include multiple architectures. Program execution time on these systems strongly depends on the efficiency of parallel thread synchronization methods. With increasing number of CPU cores in computer systems the problem of scalable concurrent data structures development dramatically arises. Such data structures must ensure scalable access with increasing number of parallel threads and workload. Thread-safe pool is one of the most widely used concurrent data structures. Pool is an unordered set of objects, which supports operations for insertion (push) and removal (pop) of the objects. Pools are widely used while producer-consumer model implementation in multithreaded programs. In this model some producer threads generate the objects followed by their utilization by consumer threads. Simple concurrent pools implementations based on concurrent queues (both lockable and lock-free) poorly scale for large number of threads and high pool access rate. There are some methods of access contention reduction like elimination arrays or delegation of pool operation to the remote CPU cores. Nonetheless these methods lead to the bottlenecks on certain elements and severe throughput reduction in the case of large number of threads and high intensity of pool operations. Workpile and work-stealing methods ensure predictable operations time performance but they are not effective on low frequency of pool treatment. Diffraction trees is one of the perspective approaches for access contention reduction in thread-safe data structures. There are some works which propose pool implementations based on diffraction trees with using of elimination arrays. The main drawback of these implementations are high overheads on active waiting and atomic variable states synchronization on each of the tree node. This fact severely increases the complexity of tree traversal from the root to the leaves. Thus the tree efficiency decreases with increasing of its size. Some pool implementations severely violate FIFO/LIFO order while operations performance. The other drawback is the optimization of variable parameters of these structures. In this paper we propose the novel approach for scalable concurrent lock-free pool implementation on the basis of diffraction trees. The approach is based on localization of tree nodes access and Thread-local storage (TLS) utilization. This approach increases throughput at high and low pool load and minimizes the operation latency. The concurrent pools LocOptDTPool and TLSDTPool were developed by the authors on the basis of proposed approach. The pools contain the diffraction tree, atomic bool arrays, corresponding to the tree nodes, array of concurrent queues, corresponding to the tree leaves, the CPU core affinity manager, thread number counters and push (insert) and pop (remove) methods. The concurrent queues in the tree leaves may be implemented in different ways. In the current implementation of the pools we used lock-free concurrent queues from boost library. In the LocOptDTPool each node of diffraction tree contains two bool atomic arrays which size is no more than thread number. Each thread accesses to the corresponding array's element in order to localize tree nodes' atomic bit access. Moreover, compared with the elimination array methods, new approach allows to reduce the overheads, arising from additional (elimination) array's elements and active waiting for the pair thread. For the minimization of operation latency in case of low workload LocOptDTPool implements the counting of the current number of active threads in the pool. If the current workload (thread number) is low, then objects distribution among the queues doesn't lead to the significant increase of pool throughput. In this case the only queue is used for element storage in the pool. This algorithm increases the pool throughput in case of low workload. The main idea of TLSDTPool is the allocation of the arrays bits in the tree nodes in the Thread-local storage (TLS). The tree nodes contain ordinary bool arrays. This approach allows to avoid the expensive atomic operations and reduces access contention for the shared bits in the tree nodes. For the queues load uniformity in the TLSDTPool we proposed the algorithm of tree nodes initialization based on the binary representation of thread identifiers. This algorithm submits the initial state of the array's bits according to the thread identifier. That scheme minimizes the impact of the "worst case", rising from the imbalance of the queues workload. The experiments for the developed pools on cluster computer systems has shown, that LocOptDTPool scales well for high number of threads and shows an increase of throughput until the number of threads is equal to the number of CPU cores. Throughput of Lo-cOptDTPool for the entire range of thread number corresponds the results of TLSDTPool based on Thread-local storage. At the same time the maximum throughput was achieved for thread number equals to processors core number or slightly more. Thread-safe lock-free queues are recommended as the objects storage in tree leaves. Thanks to tree initialization algorithm the concurrent queues were balanced well and possible "worst case" didn't significantly effect on pool efficiency. Designed pools can be used in the producer-consumer model in multithreading programs with constant number of active threads, where high throughput and low latency is highly desirable. The pools provides high scalability at multithreading programs execution, in compared with the similar pool implementations on the basis of diffraction trees. The maximum algorithm efficiency is achieved at the thread number, equals to total processor cores number. Tree size increasing doesn't lead to throughput reduction.
Keywords
многопоточное программирование, распределяющие деревья, неблокируемые структуры данных, масштабируемость, потокобезопасный пул, multithreaded programming, diffracting trees, lock-free data structures, scalability, thread-safe poolAuthors
Name | Organization | |
Anenkov Alexandr D. | Siberian State University of Telecommunications and Information Sciences | alex.anenkov@outlook.com |
Paznikov Alexey A. | Siberian State University of Telecommunications and Information Sciences | apaznikov@gmail.com |
References

Algorithms of optimization of scalable thread-safe pool based on diffracting trees for multicore computing systems | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2017. № 39. DOI: 10.17223/19988605/39/10