Evaluating the generalization ability of deep learning models for sound source localization | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2025. № 72. DOI: 10.17223/19988605/72/11

Evaluating the generalization ability of deep learning models for sound source localization

In this paper, the generalization ability of deep learning models used to solve the sound source localization problem with a spatial resolution of 10° is evaluated when the configuration settings are changed. The generalization ability of the models was evaluated in a closed reverberant environment using an orthogonal microphone array. Two models were considered: SI-GCC-CNN, which is based on combining the features of sound intensity and generalized cross-correlation - phase transform as input data for convolutional neural networks, and SI-CNN, which is based on feeding the features of the sound intensity into the convolutional neural network. Simulation and modeling results show that the SI-GCC-CNN model is effective in its generalization ability and outperforms the SI-CNN model, achieving an improvement in localization accuracy by 22,1% when changing the size of the room, by 15,6% when changing the location of the microphone array and by 32% when changing the distance between the source and the center of the microphone array. Contribution of the authors: the authors contributed equally to this article. The authors declare no conflicts of interests.

Keywords

generalization ability, deep learning models, reverberant environment, orthogonal microphone array, sound intensity, generalized cross-correlation - phase transform, convolutional neural networks, sound source localization

Authors

NameOrganizationE-mail
Shahoud Ghiath M.Siberian Federal Universityghiathlovealaa@gmail.com
Agafonov Evgeny D.Siberian Federal Universityevgeny.agafonov@mail.ru
Всего: 2

References

Zhu, N. & Reza, T. (2019) A modified cross-correlation algorithm to achieve the time difference of arrival in sound source locali zation. Measurement and Control. 52(3-4). pp. 212-221. DOI: 10.1177/0020294019827977.
Chiariotti, P., Martarelli, M. & Castellini, P. (2019) Acoustic beamforming for noise source localization - Reviews, methodology and applications. Mechanical Systems and Signal Processing. 120. pp. 422-448.
Zhong, Y., Xiang, J., Chen, X., Jiang, Y. & Pang, J. (2018) Multiple Signal Classification-Based Impact Localization in Composite Structures Using Optimized Ensemble Empirical Mode Decomposition. Applied Sciences. 8(9). pp. 1447.
Desai, D. & Mehendale, N. (2022) A Review on Sound Source Localization Systems. Archives of Computational Methods in Engi neering. 29(7). pp. 4631-4642. DOI: 0.1007/s11831-022-09747-2.
Shahoud, G.M. & Agafonov, E.D. (2024) Analysis of Approaches and Methods to Acoustic Sources Localization. Journal of Siberian Federal University. Engineering & Technologies. 17(3). pp. 380-398.
Grumiaux, P.A., Kitic, S., Girin, L. & Guerin, A. (2022) A survey of sound source localization with deep learning methods. Journal of the Acoustical Society of America. 152( 1). pp. 107-151.
Nguyen, T.N.T., Gan, W.S., Ranjan, R. & Jones, D.L. (2020) Robust source counting and DOA estimation using spatial pseudo spectrum and convolutional neural network. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 28. pp. 26262637. DOI: 10.1109/TASLP.2020.3019646.
Nguyen, T.N.T., Nguyen, N.K., Phan, H., Pham, L., Ooi, K., Jones, D.L. & Gan, W.S. (2021) A general network architecture for sound event localization and detection using transfer learning and recurrent neural network. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 935-939. DOI: 10.1109/ICASSP39728.2021.9414602.
Adavanne, S., Politis, A., Nikunen, J. & Virtanen, T. (2018) Sound event localization and detection of overlapping sources using convolutional recurrent neural networks. IEEE Journal of Selected Topics in Signal Processing. 13(1). pp. 34-48.
He, W., Motlicek, P. & Odobez, J.M. (2019) Adaptation of multiple sound source localization neural networks with weak supervision and domain-adversarial training. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 770-774.
Perotin, L., Serizel, R., Vincent, E. & Guerin, A. (2018) CRNN-based joint azimuth and elevation localization with the Ambisonics intensity vector. 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC). pp. 241-245. DOI: 10.1109/IWAENC.2018.8521403.
Liu, N., Chen, H., Songgong, K. & Li, Y. (2021) Deep learning assisted sound source localization using two orthogonal first-order differential microphone arrays. Journal of the Acoustical Society of America. 149(2). pp. 1069-1084. DOI: 10.1121/10.0003445.
Li, Q., Zhang, X. & Li, H. (2018) Online direction of arrival estimation based on deep learning. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2616-2620. DOI: 10.1109/ICASSP.2018.8461386.
Shahoud, G.M. & Agafonov, E.D. (2024) A combined model for localizing acoustic sources using deep learning technology. Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaya tekhnika i informatika - Tomsk State University Journal of Control and Computer Science. 68. pp. 100-111. DOI: 10.17223/19988605/68/11.
Ciaburro, G. & Iannace, G. (2021) Acoustic characterization of rooms using reverberation time estimation based on supervised learning algorithm. Applied Sciences. 11(4). Art. 1661. DOI: 10.3390/app11041661.
 Evaluating the generalization ability of deep learning models for sound source localization | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2025. № 72. DOI: 10.17223/19988605/72/11

Evaluating the generalization ability of deep learning models for sound source localization | Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitelnaja tehnika i informatika – Tomsk State University Journal of Control and Computer Science. 2025. № 72. DOI: 10.17223/19988605/72/11

Download full-text version
Counter downloads: 66