Implementation of vision transformer for offensive language detection on tiktok social media

Zulekha Rahmawaty; Fatsyarina Fitriastuti; Ryan Ari Setyawan

doi:10.35335/mandiri.v14i1.435

Authors

Zulekha Rahmawaty Universitas Janabadra Yogyakarta, Indonesia
Fatsyarina Fitriastuti Universitas Janabadra Yogyakarta, Indonesia
Ryan Ari Setyawan Universitas Janabadra Yogyakarta, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i1.435

Keywords:

Content Moderation, Deep Learning, Offensive Language Detection, Tiktok, Vision Transformer

Abstract

The rise of social media platforms such as TikTok has introduced new challenges in content moderation, particularly concerning the spread of offensive language and hate speech. One promising approach to addressing this issue is through automatic detection using deep learning technology. This study implements the Vision Transformer (ViT) to detect offensive language on the TikTok platform based on visual data in the form of comment screenshots. The dataset used consists of 1,401 labeled images categorized into two classes: offensive and non-offensive. The training process was conducted over 50 epochs without a validation split, and the evaluation was carried out using accuracy, precision, recall, and F1-score metrics. Results showed high performance, with an accuracy of 99.93%, precision of 0.9979, recall of 1.000, and F1-score of 1.000 at the 40th epoch, maintaining stability through the end of training. These findings demonstrate that ViT is effective in extracting visual features from image-based comments, even without access to raw text. This approach is particularly relevant in the context of TikTok, where comments often appear in visual formats such as thumbnails, screenshots, or reaction videos. This research opens up opportunities for the implementation of image-based offensive language detection systems that can enhance content moderation by adapting to various visual formats. Further development is recommended using a larger dataset and more systematic data splitting to test the model’s generalization capability.

References

Abdullah, P. M. (2015). Living in the world that is fit for habitation : CCI’s ecumenical and religious relationships. In Aswaja Pressindo.

Alifah Arde Ajeng Hamidah, Sinta Rosalina, & Slamet Triyadi. (2023). Kajian Sosiolinguistik Ragam Bahasa Gaul di Media Sosial Tiktok pada Masa Pandemi Covid-19 dan Pemanfaatannya Sebagai Kamus Bahasa Gaul. Jurnal Onoma: Pendidikan, Bahasa, Dan Sastra, 9(1), 61–68. https://doi.org/10.30605/onoma.v9i1.2029

Balat, M., Gabr, M., Bakr, H., & Zaky, A. B. (2024). TikGuard: A Deep Learning Transformer-Based Solution for Detecting Unsuitable TikTok Content for Kids. NILES 2024 - 6th Novel Intelligent and Leading Emerging Sciences Conference, Proceedings, 337–340. https://doi.org/10.1109/NILES63360.2024.10753192

Bilali, A., Katsiroumpa, A., Koutelekos, I., Dafogianni, C., Gallos, P., Moisoglou, I., & Galanis, P. (2025). Association Between TikTok Use and Anxiety, Depression, and Sleepiness Among Adolescents: A Cross-Sectional Study in Greece. Pediatric Reports, 17(2). https://doi.org/10.3390/pediatric17020034

Chhabra, A., & Kumar Vishwakarma, D. (2024). MHS-STMA: Multimodal Hate Speech Detection via Scalable Transformer-Based Multilevel Attention Framework.

Chicco, D., & Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics, 21(1), 1–13. https://doi.org/10.1186/s12864-019-6413-7

Chochlakis, G., Srinivasan, T., Thomason, J., & Narayanan, S. (2022). VAuLT: Augmenting the Vision-and-Language Transformer for Sentiment Classification on Social Media. http://arxiv.org/abs/2208.09021

Conte, G., Iorio, G. Di, Esposito, D., Romano, S., Panvino, F., Maggi, S., Altomonte, B., Casini, M. P., Ferrara, M., & Terrinoni, A. (2025). Scrolling through adolescence: a systematic review of the impact of TikTok on adolescent mental health. European Child and Adolescent Psychiatry, 34(5), 1511–1527. https://doi.org/10.1007/s00787-024-02581-w

Contreras Ortiz, A., Santiago, R. R., Hernandez, D. E., & Lopez-Montiel, M. (2025). Multiclass Evaluation of Vision Transformers for Industrial Welding Defect Detection. Mathematical and Computational Applications, 30(2), 1–21. https://doi.org/10.3390/mca30020024

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). an Image Is Worth 16X16 Words: Transformers for Image Recognition At Scale. ICLR 2021 - 9th International Conference on Learning Representations.

Dou, Z. Y., Xu, Y., Gan, Z., Wang, J., Wang, S., Wang, L., Zhu, C., Zhang, P., Yuan, L., Peng, N., Liu, Z., & Zeng, M. (2022). An Empirical Study of Training End-to-End Vision-and-Language Transformers. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 18145–18155. https://doi.org/10.1109/CVPR52688.2022.01763

Dwitama, A. P. J. (2021). Deteksi Ujaran Kebencian Pada Twitter Bahasa Indonesia Menggunakan Machine Learning: Reviu Literatur. Jurnal Sains, Nalar, Dan Aplikasi Teknologi Informasi, 1(1). https://doi.org/10.20885/snati.v1i1.5

Islam, K. (2022). Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work. 50, 1–7. http://arxiv.org/abs/2203.01536

Jadmiko, R. S., & Damariswara, R. (2022). Analisis Bahasa Kasar yang Ditirukan Anak Remaja dari Media Sosial Tiktok di Desa Mojoarum Kecamatan Gondang Kabupaten Tulungagung. Stilistika: Jurnal Pendidikan Bahasa Dan Sastra, 15(2), 227. https://doi.org/10.30651/st.v15i2.13162

Jannah, K. A. M., Aiman, U., Hasda, S., Fadilla, Z., Ardiawan, T. M. K. N., & Sari, M. E. (2022). Metodologi Penelitian Kuantitatif Metodologi Penelitian Kuantitatif. In Metodologi Penelitian Kuantitatif (Issue May).

Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys, 54(10), 1–30. https://doi.org/10.1145/3505244

Krisdanu, C. A., & Kiranastari Asoka Sumantri. (2023). TikTok sebagai Media Pemasaran Digital di Indonesia. Jurnal Lensa Mutiara Komunikasi, 7(2), 24–36. https://doi.org/10.51544/jlmk.v7i2.4173

Kurniawan, A. A., & Mustikasari, M. (2021). Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia. Jurnal Informatika Universitas Pamulang, 5(4), 544. https://doi.org/10.32493/informatika.v5i4.6760

Kurniawan, A. W., & Puspitaningtyas, Z. (2023). Metode Penelitian Kuantitatif (Edisi Revisi). In Yayasan Kita Menulis (Vol. 4, Issue 1).

Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L., Wei, F., & Guo, B. (2022). Swin Transformer V2: Scaling Up Capacity and Resolution. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 11999–12009. https://doi.org/10.1109/CVPR52688.2022.01170

Nabila Budihartana, S., & Sudrajat, R. H. (2024). Pengaruh Konten Tiktok Akun @Imeyhou Terhadap Etika Komunikasi Remaja. 11(6), 6921.

Rendragraha, A. D., Bijaksana, M. A., & Romadhony, A. (2021). Pendekatan Metode Transformers untuk Deteksi Bahasa Kasar dalam Komentar Berita Online Indonesia. E-Proceeding of Engineering, 8(2), 3385–3395.

Romindo, R., Pangaribuan, J. J., & Barus, O. P. (2023). Implementasi Algoritma Tf-Idf Dan Support Vector Machine Terhadap Analisis Pendeteksi Komentar Cyberbullying Di Media Sosial Tiktok. Device, 13(1), 124–134. https://doi.org/10.32699/device.v13i1.5260

Steiner, A., Kolesnikov, A., Zhai, X., Wightman, R., Uszkoreit, J., & Beyer, L. (2022). How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers. Transactions on Machine Learning Research, 2022-May(18).

Sujarweni, V., & Wiratna. (2023). Akuntansi Sektor Publikif: Untuk Mahasiswa Psikologi.

Sulistiyawati, P., Alzami, F., Prabowo, D. P., Pramunendar, R. A., Megantara, R. A., Purinsyira, N., & Irawan, E. (2022). Prediksi Kata Kasar Berbahasa Indonesia Menggunakan Machine Learning Berbasis Mobile Infrastructure. Transmisi, 24(2), 55–61. https://doi.org/10.14710/transmisi.24.2.55-61

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. Proceedings of Machine Learning Research, 139, 10347–10357.

Wallace, A. R. (2024). “on the Tendency of Varieties To Depart Indefinitely From the Original Type.” Evolution in Victorian Britain: Volume I: Evolution before Darwin, 1, 371–379. https://doi.org/10.4324/9781003490548-32

Wang, W., Huang, J., Chen, C., Gu, J., Zhang, J., Wu, W., He, P., & Lyu, M. (2023). Validating Multimedia Content Moderation Software via Semantic Fusion. ISSTA 2023 - Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 1, 576–588. https://doi.org/10.1145/3597926.3598079

Zhang, Q., Xu, Y., Zhang, J., & Tao, D. (2023). ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond. International Journal of Computer Vision, 131(5), 1141–1162. https://doi.org/10.1007/s11263-022-01739-w

Zhu, H., Chen, B., & Yang, C. (2023). Understanding Why ViT Trains Badly on Small Datasets: An Intuitive Perspective. 1–10. http://arxiv.org/abs/2302.03751

Implementation of vision transformer for offensive language detection on tiktok social media

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Accreditation Certificate

QUICK MENU

Current Issue

Language

Information

Jurnal Mandiri IT