Implementation of vision transformer for offensive language detection on tiktok social media
DOI:
https://doi.org/10.35335/mandiri.v14i1.435Kata Kunci:
Content Moderation, Deep Learning, Offensive Language Detection, Tiktok, Vision TransformerAbstrak
The rise of social media platforms such as TikTok has introduced new challenges in content moderation, particularly concerning the spread of offensive language and hate speech. One promising approach to addressing this issue is through automatic detection using deep learning technology. This study implements the Vision Transformer (ViT) to detect offensive language on the TikTok platform based on visual data in the form of comment screenshots. The dataset used consists of 1,401 labeled images categorized into two classes: offensive and non-offensive. The training process was conducted over 50 epochs without a validation split, and the evaluation was carried out using accuracy, precision, recall, and F1-score metrics. Results showed high performance, with an accuracy of 99.93%, precision of 0.9979, recall of 1.000, and F1-score of 1.000 at the 40th epoch, maintaining stability through the end of training. These findings demonstrate that ViT is effective in extracting visual features from image-based comments, even without access to raw text. This approach is particularly relevant in the context of TikTok, where comments often appear in visual formats such as thumbnails, screenshots, or reaction videos. This research opens up opportunities for the implementation of image-based offensive language detection systems that can enhance content moderation by adapting to various visual formats. Further development is recommended using a larger dataset and more systematic data splitting to test the model’s generalization capability.
Referensi
Abdullah, P. M. (2015). Living in the world that is fit for habitation : CCI’s ecumenical and religious relationships. In Aswaja Pressindo.
Alifah Arde Ajeng Hamidah, Sinta Rosalina, & Slamet Triyadi. (2023). Kajian Sosiolinguistik Ragam Bahasa Gaul di Media Sosial Tiktok pada Masa Pandemi Covid-19 dan Pemanfaatannya Sebagai Kamus Bahasa Gaul. Jurnal Onoma: Pendidikan, Bahasa, Dan Sastra, 9(1), 61–68. https://doi.org/10.30605/onoma.v9i1.2029
Balat, M., Gabr, M., Bakr, H., & Zaky, A. B. (2024). TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids. NILES 2024 - 6th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings, 337–340. https://doi.org/10.1109/NILES63360.2024.10753192
Bilali, A., Katsiroumpa, A., Koutelekos, I., Dafogianni, C., Gallos, P., Moisoglou, I., & Galanis, P. (2025). Association Between TikTok Use and Anxiety, Depression, and Sleepiness Among Adolescents: A Cross-Sectional Study in Greece. Pediatric Reports, 17(2). https://doi.org/10.3390/pediatric17020034
Chhabra, A., & Kumar Vishwakarma, D. (2024). MHS-STMA: Multimodal Hate Speech Detection via Scalable Transformer-Based Multilevel Attention Framework.
Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 1–13. https://doi.org/10.1186/s12864-019-6413-7
Chochlakis, G., Srinivasan, T., Thomason, J., & Narayanan, S. (2022). VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media. http://arxiv.org/abs/2208.09021
Conte, G., Iorio, G. Di, Esposito, D., Romano, S., Panvino, F., Maggi, S., Altomonte, B., Casini, M. P., Ferrara, M., & Terrinoni, A. (2025). Scrolling through adolescence: a systematic review of the impact of TikTok on adolescent mental health. European Child and Adolescent Psychiatry, 34(5), 1511–1527. https://doi.org/10.1007/s00787-024-02581-w
Contreras Ortiz, A., Santiago, R. R., Hernandez, D. E., & Lopez-Montiel, M. (2025). Multiclass Evaluation of Vision Transformers for Industrial Welding Defect Detection. Mathematical and Computational Applications, 30(2), 1–21. https://doi.org/10.3390/mca30020024
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.
Dou, Z. Y., Xu, Y., Gan, Z., Wang, J., Wang, S., Wang, L., Zhu, C., Zhang, P., Yuan, L., Peng, N., Liu, Z., & Zeng, M. (2022). An Empirical Study of Training End-to-End Vision-and-Language Transformers. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 18145–18155. https://doi.org/10.1109/CVPR52688.2022.01763
Dwitama, A. P. J. (2021). Deteksi Ujaran Kebencian Pada Twitter Bahasa Indonesia Menggunakan Machine Learning: Reviu Literatur. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi, 1(1). https://doi.org/10.20885/snati.v1i1.5
Islam, K. (2022). Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work. 50, 1–7. http://arxiv.org/abs/2203.01536
Jadmiko, R. S., & Damariswara, R. (2022). Analisis Bahasa Kasar yang Ditirukan Anak Remaja dari Media Sosial Tiktok di Desa Mojoarum Kecamatan Gondang Kabupaten Tulungagung. Stilistika: Jurnal Pendidikan Bahasa Dan Sastra, 15(2), 227. https://doi.org/10.30651/st.v15i2.13162
Jannah, K. A. M., Aiman, U., Hasda, S., Fadilla, Z., Ardiawan, T. M. K. N., & Sari, M. E. (2022). Metodologi Penelitian Kuantitatif Metodologi Penelitian Kuantitatif. In Metodologi Penelitian Kuantitatif (Issue May).
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys, 54(10), 1–30. https://doi.org/10.1145/3505244
Krisdanu, C. A., & Kiranastari Asoka Sumantri. (2023). TikTok sebagai Media Pemasaran Digital di Indonesia. Jurnal Lensa Mutiara Komunikasi, 7(2), 24–36. https://doi.org/10.51544/jlmk.v7i2.4173
Kurniawan, A. A., & Mustikasari, M. (2021). Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia. Jurnal Informatika Universitas Pamulang, 5(4), 544. https://doi.org/10.32493/informatika.v5i4.6760
Kurniawan, A. W., & Puspitaningtyas, Z. (2023). Metode Penelitian Kuantitatif (Edisi Revisi). In Yayasan Kita Menulis (Vol. 4, Issue 1).
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., Wei, F., & Guo, B. (2022). Swin Transformer V2: Scaling Up Capacity and Resolution. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170
Nabila Budihartana, S., & Sudrajat, R. H. (2024). Pengaruh Konten Tiktok Akun @Imeyhou Terhadap Etika Komunikasi Remaja. 11(6), 6921.
Rendragraha, A. D., Bijaksana, M. A., & Romadhony, A. (2021). Pendekatan Metode Transformers untuk Deteksi Bahasa Kasar dalam Komentar Berita Online Indonesia. E-Proceeding of Engineering, 8(2), 3385–3395.
Romindo, R., Pangaribuan, J. J., & Barus, O. P. (2023). Implementasi Algoritma Tf-Idf Dan Support Vector Machine Terhadap Analisis Pendeteksi Komentar Cyberbullying Di Media Sosial Tiktok. Device, 13(1), 124–134. https://doi.org/10.32699/device.v13i1.5260
Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., & Beyer, L. (2022). How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers. Transactions on Machine Learning Research, 2022-May(18).
Sujarweni, V., & Wiratna. (2023). Akuntansi Sektor Publikif: Untuk Mahasiswa Psikologi.
Sulistiyawati, P., Alzami, F., Prabowo, D. P., Pramunendar, R. A., Megantara, R. A., Purinsyira, N., & Irawan, E. (2022). Prediksi Kata Kasar Berbahasa Indonesia Menggunakan Machine Learning Berbasis Mobile Infrastructure. Transmisi, 24(2), 55–61. https://doi.org/10.14710/transmisi.24.2.55-61
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357.
Wallace, A. R. (2024). “on the Tendency of Varieties To Depart Indefinitely From the Original Type.” Evolution in Victorian Britain: Volume I: Evolution before Darwin, 1, 371–379. https://doi.org/10.4324/9781003490548-32
Wang, W., Huang, J., Chen, C., Gu, J., Zhang, J., Wu, W., He, P., & Lyu, M. (2023). Validating Multimedia Content Moderation Software via Semantic Fusion. ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 1, 576–588. https://doi.org/10.1145/3597926.3598079
Zhang, Q., Xu, Y., Zhang, J., & Tao, D. (2023). ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond. International Journal of Computer Vision, 131(5), 1141–1162. https://doi.org/10.1007/s11263-022-01739-w
Zhu, H., Chen, B., & Yang, C. (2023). Understanding Why ViT Trains Badly on Small Datasets: An Intuitive Perspective. 1–10. http://arxiv.org/abs/2302.03751
Unduhan
Diterbitkan
Cara Mengutip
Terbitan
Bagian
Lisensi
Hak Cipta (c) 2025 Zulekha Rahmawaty, Fatsyarina Fitriastuti, Ryan Ari Setyawan

Artikel ini berlisensi Creative Commons Attribution-NonCommercial 4.0 International License.




