Optimization method of remote core locking | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2017. № 38. DOI: 10.17223/19988605/38/8

Optimization method of remote core locking

The number of processor cores in shared memory computer systems is increasing nowadays. Therefore, the need of scalable parallel programs rises up. These programs must be efficient on large number of parallel threads. Among the perspective scalable approaches of multithreading programming, one can emphasize lock-free algorithms and data structures and software transactional memory. The main drawbacks of lock-free algorithms and data structures are their restrained area of applying and essential complexity of parallel programs development. Besides the throughput of those structures is comparable with lock-based structures. Current implementations of software transactional memory do not provide high performance of multithreading programs and have not common in real applications yet. Classical approach of lock-based critical sections in multithreading programs remains the most prevalent. Locks are easier than lock-free algorithms and data structures and ensures high level of productivity. Besides that, the most of present software is lock-based. Thus, development of scalable locks is substantial now. Locks scalability depends on access contention while accessing of parallel threads to share memory and space cache locality. Access contention takes place while simultaneous access of multiple thread to one critical section, protected by one synchronization object. This leads to cache become inefficiency. Cache locality is significant when the critical section contains the access to shared data, which was accessed previously on the other core. This circumstance leads to cache misses and severe critical section execution time increasing. The most efficient approaches of scalable locks implementation are CAS spinlocks, MCS-locks, Flat combining CC-Synch, DSM-Synch, Oyama lock. However, these methods cannot guarantee the minimum of critical section execution. The critical section execution time includes time of lock ownership transfer and critical section execution time. In this work, we consider the method of critical section execution on remote processor cores (Remote Core Locking, RCL). RCL minimizes the execution time of legacy multithreading programs by reduction critical section execution critical path. This method replaces critical sections in legacy programs by procedures remote calls. All critical sections protected by one lock are performed by server thread launched on remote core. Therefore, by execution of the critical section by the particular thread minimizes the overheads of lock ownership transfer. In the current implementation RCL, they do not consider the non-uniform access to memory in NUMA-systems. These systems are the compositions multicore processors connected with local area of global memory. Time of critical section execution in RCL depends on RCL-server latency while accessing to memory blocks within critical section. Thus, in these programs memory should be allocated on NUMA node with the RCL-server. This reduces time of memory accesses and minimizes of programs execution time. In this work, we propose the algorithm minimizing the memory access time in NUMA systems while RCL-programs execution. The algorithm optimizes the memory allocation on NUMA-systems. The essential of the algorithms is allocating on the NUMA-node, on which RCL-server is executing. The results of experiments show that the memory allocation algorithm increases the throughput of critical sections execution by reducing the time of RCL-server access to memory.

Download file
Counter downloads: 226

Keywords

RCL, многопоточное программирование, критические секции, масштабируемость, RCL, multithreading programming, critical sections, scalability

Authors

NameOrganizationE-mail
Paznikov Alexey A.Rzhanov Institute of Semiconductor Physics Siberian Branch of Russian Academy of Sciencesapaznikov@gmail.com
Всего: 1

References

Lozi J.P. et al. Remote Core Locking: Migrating Critical-Section Execution to Improve the Performance of Multithreaded Applications // USENIX Annual Technical Conference. 2012. P. 65-76.
Calciu I., Gottschlich J.E., Herlihy M. Using elimination and delegation to implement a scalable NUMA-friendly stack // Proc. Use-nix Workshop on Hot Topics in Parallelism (HotPar). 2013. P. 1-7.
Metreveli Z., Zeldovich N., Kaashoek M.F. Cphash: A cache-partitioned hash table // ACM SIGPLAN Notices. ACM, 2012. V. 47, No. 8. P. 319-320.
Suleman M.A. et al. Accelerating critical section execution with asymmetric multi-core architectures // ACM SIGARCH Computer Architecture News. ACM, 2009. V. 37. No. 1. P. 253-264.
Oyama Y., Taura K., Yonezawa A. Executing parallel programs with synchronization bottlenecks efficiently // Proceedings of the International Workshop on Parallel and Distributed Computing for Symbolic and Irregular Applications, PDSIA '99. 1999. P. 1-24.
Fatourou P., Kallimanis N.D. Revisiting the combining synchronization technique // ACM SIGPLAN Notices. ACM, 2012. V. 47. No. 8. P. 257-266.
Hendler D. et al. Flat combining and the synchronization-parallelism tradeoff // Proceedings of the twenty-second annual ACM sym posium on Parallelism in algorithms and architectures. ACM, 2010. P. 355-364.
Mellor-Crummey J.M., Scott M.L. Algorithms for scalable synchronization on shared-memory multiprocessors // ACM Transactions on Computer Systems (TOCS). 1991. V. 9. No. 1. P. 21-65.
Herlihy M., Shavit N. The Art of Multiprocessor Programming, Revised Reprint. Elsevier, 2012. 528 с.
Herlihy M., Moss J. E.B. Transactional memory: Architectural support for lock-free data structures // Proceedings of the 20th annual international symposium on computer architecture ACM. ACM, 1993. V. 21. No. 2. P. 289-300.
Хорошевский В.Г. Распределённые вычислительные системы с программируемой структурой // Вестник СибГУТИ. 2010. № 2 (10). С. 3-41.
 Optimization method of remote core locking | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2017. № 38. DOI: 10.17223/19988605/38/8

Optimization method of remote core locking | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2017. № 38. DOI: 10.17223/19988605/38/8

Download full-text version
Counter downloads: 733