YOLO convolutional neural network models with attention mechanism for real-time computer vision systems
New models of convolutional neural networks (SNN) with an attention mechanism are being developed and investigated to solve the problems of object detection of small-sized flying objects (FO). Two basic CNN models of the YOLO class were selected as the initial ones for the development of new CNN models: YOLOv5s and YOLOv8s. Based on them, four hybrid CNN models were created using the SWT module and the SEA module, implementing different versions of the attention mechanism. For training, validation and research of the basic and hybrid models, a dataset with labeled images of small-sized FO of three classes was used: «Unmanned aerial vehicle (UAV) of helicopter type», «UAV of airplane type» and «Bird». Research has demonstrated that the hybrid YOLOv8s + SEA model is the most preferable option for designing real-time computer vision systems intended for the detection of small-sized FO. Contribution of the authors: the authors contributed equally to this article. The authors declare no conflicts of interests.
Keywords
real-time computer vision system,
YOLO convolutional neural network,
detection and classification of flying objects,
attention mechanismAuthors
| Klekovkin Vadim A. | National Research Tomsk Polytechnic University | vak37@tpu.ru |
| Markov Nikolay G. | National Research Tomsk Polytechnic University | markovng@tpu.ru |
| Nebaba Stepan G. | National Research Tomsk Polytechnic University | stepanlfx@tpu.ru |
Всего: 3
References
Tan M., Pang R., Le Q.V. EfficientDet: Scalable and Efficient Object Detection // CVPR. 2020. Art. 09070. URL: https://arxiv.org/abs/1911.09070 (accessed: 10.04.2025).
Гудфеллоу Я., Бенджио И., Курвилль А. Глубокое обучение. М. : ДМК-Пресс, 2018. 652 с.
Zoev I.V., Markov N.G., Ryzhova S.E.Intelligent computer vision system for unmanned aerial vehicles for monitoring technological objects of oil and gas industry // Bulletin of the Tomsk Polytechnic University. Geo Assets Engineering. 2019. V 330 (11). P. 3449. doi: 10.18799/24131830/2019/11/2346.
Alzubaidi L., Zhang J., Humaidi A.J., Al-Dujaili A., Duan Y, Al-Shamma O., Santamaria J., Fadhel M.A., Al-Amidie M., Farhan L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions // Journal of Big Data. 2021. V. 8 (53). P. 1-77. doi: 10.1186/s40537-021-00444-8.
Небаба С.Г., Марков Н.Г Сверточные нейронные сети семейства YOLO для мобильных систем компьютерного зрения // Компьютерные исследования и моделирование. 2024. Т. 16, № 3. С. 615-631. doi: 10.20537/2076-7633-2024-16-3-615-631.
Wu S., Lu X., Guo C., Guo H. Accurate UAV Small Object Detection Based on HRFPN and EfficentVMamba // Sensors. 2024. V. 24 (5). Art. 4966. doi: 10.3390/s24154966.
Клековкин В.А., Марков Н.Г., Небаба С.Г. Обнаружение и классификация малоразмерных летающих объектов на изобра жениях с использованием сверточных нейронных сетей семейства YOLOv5 // Доклады ТУСУР 2024. Т. 27, № 4. С. 103110. doi: 10.21293/1818-0442-2024-27-4-103-110.
Lin T.Y, Maire M., Belongie S., Hays J., Perona P, Ramanan D., Zitnick C.L. Microsoft COCO: Common objects in context // Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014 : proc. Springer International Publishing, 2014. Pt. V 13. P. 740-755. doi: 10.48550/arXiv.1405.0312.
Bochkovskiy A., Wang C.Y, Liao H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection // arXiv. 2020. Art. 10934v1. P. 1-17. doi: 10.48550/arXiv.2004.10934.
Olorunshola O.E., Irhebhude M.E., Evwiekpaefe A.E. A Comparative Study of YOLOv5 and YOLOv7 Object Detection Algorithms // Journal of Computing and Social Informatics. 2023. V. 2. P. 1-12. doi: 10.33736/jcsi.5070.2023.
Филичкин C.A., Вологдин С.В. Сравнение эффективности алгоритмов YOLOv5 и YOLOv8 для обнаружения средств индивидуальной защиты человека // Интеллектуальные системы в производстве. 2023. Т. 21, № 3. С. 124-131.
Vaswani A. et al. Attention is all you need // 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA. V 30. URL: https://arxiv.org/abs/1706.03762 (accessed: 10.04.2025).
Dosovitskiy A. et al. An image is worth 16x16 words: Transformers for image recognition at scale // arXiv. 2020. Art. 11929. doi: arXiv:2010.11929. 2020.
Han K. et al. A survey on visual transformer // arXiv. 2012. Art. 12556. doi: 10.48550/arXiv.2012.12556.
Li J., Zhang J., Shao Y., Liu F. SRE-YOLOv8: An Improved UAV Object Detection Model Utilizing Swin Transformer and RE-FPN // Sensors. 2024. V. 24 (12). Art. 3918. doi: 10.3390/s24123918.
Hu J., Shen L., Sun G. Squeeze-and-excitation networks // IEEE / CVF Conference on Computer Vision and Pattern Recognition. 2018. P. 7132-7141. doi: 10.1109/CVPR.2018.00745.
Liu Z., Lin Y, Cao Y, Hu H., Wei Y, Zhang Z., Lin S., Guo B. Swin transformer: Hierarchical vision transformer using shifted windows // IEEE / CVF International Conference on Computer Vision (ICCV). 2021. P. 9992-10002. doi: 10.48550/arXiv.2103.14030.
Yang J. et al. Focal modulation networks // Advances in Neural Information Processing Systems 35 (NeurIPS 2022). 2022. V 35. Р. 4203-4217.