Alphabet SIBI sign language recognition using YOLOv11 for real-time gesture detection

Penulis

  • Salsabilla Azahra Putri Universitas Ahmad Dahlan, Yogyakarta, Indonesia https://orcid.org/0000-0002-7718-1298
  • Murinto Murinto Universitas Ahmad Dahlan, Yogyakarta, Indonesia
  • Sunardi Sunardi Universitas Ahmad Dahlan, Yogyakarta, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i1.408

Kata Kunci:

Backbone optimization, Data augmentation, Gesture recognition, Real-time detection, Sibi alphabet

Abstrak

Modern gesture recognition systems for sign language face challenges in balancing computational efficiency and detection accuracy in complex and dynamic environments. To address this, this study proposes a SIBI alphabet recognition framework based on YOLOv11, optimized for real-time applications. The model architecture integrates a modified, efficient YOLOv11 backbone to enable accurate hand gesture feature extraction with minimal latency. A custom SIBI dataset comprising alphabet signs and essential vocabulary is used to train the model, supported by data augmentation techniques that enhance robustness against variations in position, lighting, and background. Experimental results demonstrate that the model achieves a high detection accuracy with an mAP50 of 97%, while significantly reducing computational complexity. These findings present a meaningful scientific contribution by showcasing how a lightweight yet highly accurate deep learning model can be effectively applied to sign language recognition, particularly for SIBI in the Indonesian context. From a practical standpoint, this framework offers a real-time gesture detection solution that is suitable for deployment on resource-constrained devices, making it accessible for mobile or embedded systems. The system can replace or complement traditional communication aids, especially in inclusive education, public services, and healthcare. Furthermore, the proposed method can be adapted for gesture-based interaction in other domains such as athletic training, physical education, and app-based fitness programs where accurate and real-time motion recognition is essential.

Referensi

Aboud, H., Elsayaad, F., Gad, S. S., Abdelaziz, M., & Atia, A. (2024). Automated Rats Detection and Tracking for Behavioral Analysis in Biological Experiments. In 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) (pp. 307–312). IEEE. http://doi.org/10.1109/MIUCC62295.2024.10783596

Affairs, M. of S. (2023). Data Penyandang Disabilitas 2022–2023. Jakarta.

Agrawal, A. K., Kumar, J., Kumar, A., & Arvind, P. (2024). Analogous sign language communication using gesture detection. Australian Journal of Electrical and Electronics Engineering, 21(4), 486–498. http://doi.org/10.1080/1448837X.2024.2337495

Ahmed, A., Farhan, M., Eesaar, H., Chong, K. T., & Tayara, H. (2024). From Detection to Action: A Multimodal AI Framework for Traffic Incident Response. Drones, 8(12), 741. http://doi.org/10.3390/drones8120741

Alamsyah, A. (2024). Detection of Indonesian Sign Language System using Convolutional Neural Network ( CNN ) with Nadam Optimizer. Atlantis Press International BV. http://doi.org/10.2991/978-94-6463-589-8

Ali, M. L., & Zhang, Z. (2024). The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection. Computers, 13(12), 336. http://doi.org/10.3390/computers13120336

Chen, G., Wang, F., Li, W., Hong, L., Conradt, J., Chen, J., … Knoll, A. (2022). NeuroIV: Neuromorphic Vision Meets Intelligent Vehicle Towards Safe Driving With a New Database and Baseline Evaluations. IEEE Transactions on Intelligent Transportation Systems, 23(2), 1171–1183. http://doi.org/10.1109/TITS.2020.3022921

Das, A., Sayma, J., Barman, A. N., & Hasan, K. M. A. (2024). Application of YOLOv11 Classification for Efficient Waste Segmentation in Australia’s Recycling Facilities. In 2024 IEEE Asia-Pacific Conference on Geoscience, Electronics and Remote Sensing Technology (AGERS) (pp. 70–74). IEEE. http://doi.org/10.1109/AGERS65212.2024.10932955

Dignan, C., Perez, E., Ahmad, I., Huber, M., & Clark, A. (2022). An AI-based Approach for Improved Sign Language Recognition using Multiple Videos. Multimedia Tools and Applications, 81(24), 34525–34546. http://doi.org/10.1007/s11042-021-11830-y

Gomez, A., & Arzuaga, E. (2024). Real Time American Sign Language Recognition Using Yolov6 Model (pp. 343–353). http://doi.org/10.1007/978-3-031-67447-1_25

Guo, J., Han, K., Wu, H., Tang, Y., Chen, X., Wang, Y., & Xu, C. (2022). CMT: Convolutional Neural Networks Meet Vision Transformers. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 12165–12175). IEEE. http://doi.org/10.1109/CVPR52688.2022.01186

He, L., Zhou, Y., Liu, L., & Ma, J. (2024). Research and Application of YOLOv11-Based Object Segmentation in Intelligent Recognition at Construction Sites. Buildings, 14(12), 3777. http://doi.org/10.3390/buildings14123777

Jegham, N., Koh, C. Y., Abdelatti, M., & Hendawi, A. (2025). YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions. Retrieved from http://arxiv.org/abs/2411.00201

Khanam, R., & Hussain, M. (2024). YOLOv11: An Overview of the Key Architectural Enhancements. Retrieved from http://arxiv.org/abs/2410.17725

Liu, B., & Li, X. (2024). An Improved YOLOv11 Model for Detecting the Metal Roofing Tiles alongside the Railways. In 2024 4th International Conference on Artificial Intelligence, Robotics, and Communication (ICAIRC) (pp. 195–199). IEEE. http://doi.org/10.1109/ICAIRC64177.2024.10900077

Liu, Y., Nand, P., Hossain, M. A., Nguyen, M., & Yan, W. Q. (2023). Sign language recognition from digital videos using feature pyramid network with detection transformer. Multimedia Tools and Applications, 82(14), 21673–21685. http://doi.org/10.1007/s11042-023-14646-0

Ong, S. C. W., & Ranganath, S. (2005). Automatic Sign Language Analysis: A Survey and the Future beyond Lexical Meaning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(6), 873–891. http://doi.org/10.1109/TPAMI.2005.112

Parico, A. I. B., & Ahamed, T. (2021). Real Time Pear Fruit Detection and Counting Using YOLOv4 Models and Deep SORT. Sensors, 21(14), 4803. http://doi.org/10.3390/s21144803

Rodríguez-Lira, D.-C., Córdova-Esparza, D.-M., Álvarez-Alvarado, J. M., Romero-González, J.-A., Terven, J., & Rodríguez-Reséndiz, J. (2024). Comparative Analysis of YOLO Models for Bean Leaf Disease Detection in Natural Environments. AgriEngineering, 6(4), 4585–4603. http://doi.org/10.3390/agriengineering6040262

Roy, A. M., Bhaduri, J., Kumar, T., & Raj, K. (2023). WilDect-YOLO: An efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Ecological Informatics, 75, 101919. http://doi.org/10.1016/j.ecoinf.2022.101919

Safitri, M., Yuniarno, E. M., & Rachmadi, R. F. (2024). Indonesian Sign Language (SIBI) Recognition and Extraction Using Convolutional Neural Networks - Symmetric Deletion Spelling Correction. In 2024 International Seminar on Intelligent Technology and Its Applications (ISITIA) (pp. 220–225). IEEE. http://doi.org/10.1109/ISITIA63062.2024.10667714

Sani, A. R., Zolfagharian, A., & Kouzani, A. Z. (2024). Automated defects detection in extrusion 3D printing using YOLO models. Journal of Intelligent Manufacturing. http://doi.org/10.1007/s10845-024-02543-8

Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision, 128(2), 336–359. http://doi.org/10.1007/s11263-019-01228-7

Shivaprasad Yadav, S. G., Itagi, S., Krishna Suresh, B. V. N. V, K.L, H., & A C, R. (2023). Human Illegal Activity Recognition Based on Deep Learning Techniques. In 2023 IEEE International Conference on Integrated Circuits and Communication Systems (ICICACS) (pp. 01–07). IEEE. http://doi.org/10.1109/ICICACS57338.2023.10099857

Sihananto, A. N., Safitri, E. M., Maulana, Y., Fakhruddin, F., & Yudistira, M. E. (2023). Indonesian Sign Language Image Detection Using Convolutional Neural Network (CNN) Method. Inspiration: Jurnal Teknologi Informasi Dan Komunikasi, 13(1), 13–21. http://doi.org/10.35585/inspir.v13i1.37

Statistik, B. P. (2022). Susenas 2022: Penyandang Disabilitas Menurut Jenis Disabilitas dan Provinsi. Jakarta.

Sümbül, H. (2024). A Novel Mems and Flex Sensor-Based Hand Gesture Recognition and Regenerating System Using Deep Learning Model. IEEE Access, 12, 133685–133693. http://doi.org/10.1109/ACCESS.2024.3448232

Tourani, A., Soroori, S., Shahbahrami, A., & Akoushideh, A. (2021). Iranis: A Large-scale Dataset of Iranian Vehicles License Plate Characters. In 2021 5th International Conference on Pattern Recognition and Image Analysis (IPRIA) (pp. 1–5). IEEE. http://doi.org/10.1109/IPRIA53572.2021.9483461

Xiao, B., & Kang, S.-C. (2021). Development of an Image Data Set of Construction Machines for Deep Learning Object Detection. Journal of Computing in Civil Engineering, 35(2). http://doi.org/10.1061/(ASCE)CP.1943-5487.0000945

Yang, J., Tian, T., Liu, Y., Li, C., Wu, D., Wang, L., & Wang, X. (2024). A Rainy Day Object Detection Method Based on YOLOv11 Combined with FFT and MF Model Fusion. In 2024 International Conference on Advanced Control Systems and Automation Technologies (ACSAT) (pp. 246–250). IEEE. http://doi.org/10.1109/ACSAT63853.2024.10823725

Zanevych, Y., Yovbak, V., Basystiuk, O., Shakhovska, N., Fedushko, S., & Argyroudis, S. (2024). Evaluation of Pothole Detection Performance Using Deep Learning Models Under Low-Light Conditions. Sustainability, 16(24), 10964. http://doi.org/10.3390/su162410964

Zhang, C., Cheng, H., Wu, R., Ren, B., Zhu, Y., & Peng, N. (2024). Development of a Traffic Congestion Prediction and Emergency Lane Development Strategy Based on Object Detection Algorithms. Sustainability, 16(23), 10232. http://doi.org/10.3390/su162310232

Diterbitkan

2025-07-18

Cara Mengutip

Putri, S. A., Murinto, M., & Sunardi, S. (2025). Alphabet SIBI sign language recognition using YOLOv11 for real-time gesture detection. Jurnal Mandiri IT, 14(1), 123–135. https://doi.org/10.35335/mandiri.v14i1.408