Hybrid clustering and supervised learning model for digital MSME segmentation

Authors

  • Dona Marcelina Universitas Indo Global Mandiri, Indonesia
  • Terttiaavini Terttiaavini Universitas Indo Global Mandiri, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i1.404

Keywords:

Digital Policy, Hybrid Clustering, MSMEs Segmentation, Supervised Learning, Unsupervised Learning

Abstract

Digitalization became a key factor in enhancing the competitiveness of Micro, Small, and Medium Enterprises (MSMEs). However, its implementation still faced several challenges, including low levels of technology adoption and inaccurate data segmentation. This study aimed to develop a hybrid approach by combining clustering techniques and supervised learning to conduct segmentation and prediction of MSMEs based on their level of digitalization. Four clustering algorithms were tested: K-Means, Agglomerative, Gaussian Mixture Model, and HDBSCAN. The evaluation results showed that HDBSCAN outperformed the other algorithms, achieving the highest Silhouette Score (0.3501), the lowest Davies-Bouldin Index (0.9557), and the highest Calinski-Harabasz Index (132.38). The segmentation process generated three distinct clusters: Cluster 0 (Traditional – low digitalization, small revenue), Cluster 1 (Semi-Digital – moderate technology adoption, medium revenue), and Cluster 2 (Fully Digital – high technology adoption, large revenue). These cluster results were then used as labels to train six classification algorithms. Among them, XGBoost and Neural Network delivered the best performance, reaching a prediction accuracy of 98.63%. The main contribution of this study was the development of an analytical framework for data-driven segmentation and prediction of MSMEs, providing more precise, targeted, and adaptive support for national digitalization strategies.

References

Addanki, R., McGregor, A., Meliou, A., & Moumoulidou, Z. (2022). Improved Approximation and Scalability for Fair Max-Min Diversification. https://arxiv.org/pdf/2201.06678

Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., & Aljaaf, A. J. (2020). A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. 3–21. https://doi.org/10.1007/978-3-030-22475-2_1

Avram, A., Matei, O., Pintea, C.-M., Pop, P. C., & Anton, C. A. (2021). Comparative Analysis of Clustering Techniques for a Hybrid Model Implementation. In Á. Herrero, C. Cambra, D. Urda, J. JSedano, H. Quintián, & E. Corchado (Eds.), 15th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2020) (pp. 22–32). Springer, Cham. https://doi.org/10.1007/978-3-030-57802-2_3

Baderi, F. (2024, December 7). UMKM Pilar Pemulihan dan Pertumbuhan Ekonomi Nasional. Harian Ekonomi Neraca. https://www.neraca.co.id/article/209137/umkm-pilar-pemulihan-dan-pertumbuhan-ekonomi-nasional

Bahrini, R., & Qaffas, A. A. (2019). Impact of Information and Communication Technology on Economic Growth: Evidence from Developing Countries. Economies 2019, Vol. 7, Page 21, 7(1), 21. https://doi.org/10.3390/economies7010021

Baulkani, S., Nifasath, P. S., & Priyanga, M. M. (2024). Machine Learning Technologies for Agricultural Prediction to Enhance Economic Growth. Smart Technologies for Sustainable Development Goals, 178–195. https://doi.org/10.1201/9781003519010-11

Boateng, E. Y., Otoo, J., & Abaye, D. A. (2020). Basic Tenets of Classification Algorithms K-Nearest-Neighbor, Support Vector Machine, Random Forest and Neural Network: A Review. Journal of Data Analysis and Information Processing, 08(04), 341–357. https://doi.org/10.4236/jdaip.2020.84020

Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(2), 299–312. https://doi.org/10.5505/pajes.2021.62687

Eliza, Hadi, F., & Zefriyenn. (2024). Pengembangan E-Commerce di Era Digitalisasi pada UMKM Produk Kale Kota Padang Panjang. Jurnal Pengabdian Kepada Masyarakat Nusantara, 5(2), 2732–2743. https://doi.org/10.55338/jpkmn.v5I2.3342

Godwin, G., Junaedi, S. R. P., Hardini, M., & Purnama, S. (2024). Inovasi Bisnis Digital untuk Mendorong Pertumbuhan UMKM melalui Teknologi dan Adaptasi Digital. ADI Bisnis Digital Interdisiplin Jurnal, 5(2), 41–47. https://doi.org/10.34306/abdi.v5I2.1172

Gu, Z. (2022). Complex heatmap visualization. IMeta, 1(3), e43. https://doi.org/https://doi.org/10.1002/imt2.43

Heryati, A., Terttiaavini, T., Cahyani, S., Romli, H., & Zaliman, I. (2025). Optimasi Strategi Pemasaran E-Commerce Melalui Prediksi Konversi Berbasis Machine Learning. JSAI: Journal Scientific and Applied Informatics, 8(1), 66–73. https://doi.org/10.36085

Juwitasari, A. (2023, January 7). Refleksi 2022 dan Outlook 2023, Kemenkop UKM Ungkap Pencapaian dan Rencana Untuk Pelaku UMKM. https://ukmindonesia.id/baca-deskripsi-program/refleksi-2022-dan-outlook-2023-kemenkop-ukm-ungkap-pencapaian-dan-rencana-untuk-pelaku-umkm

Khodabandehlou, S., & Zivari Rahman, M. (2017). Comparison of supervised machine learning techniques for customer churn prediction based on analysis of customer behavior. Journal of Systems and Information Technology, 19(1/2), 65–93. https://doi.org/10.1108/jsit-10-2016-0061

Kuhn, M., & Johnson, K. (2019). Feature Engineering and Selection: A Practical Approach for Predictive Models. Feature Engineering and Selection: A Practical Approach for Predictive Models, 1–297. https://doi.org/10.1201/9781315108230

Liu, F., & Deng, Y. (2021). Determine the Number of Unknown Targets in Open World Based on Elbow Method. IEEE Transactions on Fuzzy Systems, 29(5), 986–995. https://doi.org/10.1109/tfuzz.2020.2966182

Marcelina, D., Kurnia, A., & Terttiaavini, T. (2023). Analisis Klaster Kinerja Usaha Kecil dan Menengah Menggunakan Algoritma K-Means Clustering. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 3(2), 293–301. https://doi.org/10.57152/malcom.v3i2.952

Mardiana, R., Fahdillah, Y., Kadar, M., Hassandi, I., & Mandasari, R. (2024). Implementasi Transformasi Digital dan Kecerdasan Buatan Sebagai Inovasi Untuk UMKM pada Era Revolusi Industri 4.0. Jurnal Ilmiah Manajemen Dan Kewirausahaan (JUMANAGE), 3(1). https://doi.org/10.51642/ppmj.v31i04.404

Milo, T., & Somech, A. (2020). Automating Exploratory Data Analysis via Machine Learning: An Overview. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2617–2622. https://doi.org/10.1145/3318464.3383126

Montesinos López, O. A., Montesinos López, A., & Crossa, J. (2022). Overfitting, Model Tuning, and Evaluation of Prediction Performance. In Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing. https://doi.org/10.1007/978-3-030-89010-0

Rashid, J., & Waheed, K. (2020). Missing Values and Outliers in Research Data. Pakistan Postgraduate Medical Journal, 31(04), 167–167. https://doi.org/10.51642/ppmj.v31i04.404

Ren, H., Khailany, B., Fojtik, M., & Zhang, Y. (2023). Machine Learning and Algorithms: Let Us Team Up for EDA. IEEE Design and Test, 40(1), 70–76. https://doi.org/10.1109/mdat.2022.3143427

Safak, V. (2020). Min-Mid-Max Scaling, Limits of Agreement, and Agreement Score. ArXiv. https://arxiv.org/pdf/2006.12904

Sinaga, K. P., & Yang, M.-S. (2020). Unsupervised K-Means Clustering Algorithm. IEEE Access, 8, 80716–80727. https://doi.org/10.1109/access.2020.2988796

Susmaga, R. (2004). Confusion Matrix Visualization. Intelligent Information Processing and Web Mining, 107–116. https://doi.org/10.1007/978-3-540-39985-8_12

Terttiaavini, T. (2024). A Hybrid Approach Using K-Means Clustering and the SAW Method for Evaluating and Determining the Priority of SMEs in Palembang City. INSYST: Journal of Intelligent System and Computation, 6(1), 46–53. https://doi.org/10.52985/insyst.V6i1.392

Terttiaavini, T., Zamzam, F., Ramadhan, M., K. Rosni, A., Setiawan Saputra, T., Heryati, A., & Dhamayanti. (2018). Clustering Analysis of Premier Research Fields. International Journal of Engineering & Technology, 7(4.44). https://doi.org/10.14419/ijet.v7i4.44.26860

Trento Oliveira, L., Kuffer, M., Schwarz, N., & Pedrassoli, J. C. (2023). Capturing deprived areas using unsupervised machine learning and open data: a case study in São Paulo, Brazil. European Journal of Remote Sensing, 56(1). https://doi.org/10.1080/22797254.2023.2214690

Downloads

Published

2025-07-19

How to Cite

Marcelina, D., & Terttiaavini, T. (2025). Hybrid clustering and supervised learning model for digital MSME segmentation . Jurnal Mandiri IT, 14(1), 86–96. https://doi.org/10.35335/mandiri.v14i1.404