Bayesian-Optimized XGBoost Model for Predicting Mushroom Toxicity

Authors

  • Aria Kusumah Sastradinata Republic of Indonesia Defence University, Bogor, Indonesia https://orcid.org/0009-0009-1139-5857
  • Sunarta Sunarta Republic of Indonesia Defense University, Bogor, Indonesia
  • Rd. Apip Miptahudin Republic of Indonesia Defence University, Bogor, Indonesia
  • M. Daffa Abdurrahman Republic of Indonesia Defence University, Bogor, Indonesia
  • Rangga Taqwa Republic of Indonesia Defence University, Bogor, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i2.465

Keywords:

Bayesian Optimization, Classification, Machine Learning, Mushroom Toxicity, XGBoost

Abstract

Mushroom poisoning remains a significant public health concern due to the morphological similarities between edible and poisonous species, making traditional identification unreliable. This study aims to develop an accurate and interpretable machine learning framework for mushroom toxicity prediction using a Bayesian-Optimized Extreme Gradient Boosting (XGBoost) model. The dataset consists of morphological and ecological features derived from the secondary mushroom dataset, which underwent preprocessing through imputation, standardization, and one-hot encoding. Bayesian Optimization, implemented via the Hyperopt Tree-structured Parzen Estimator (TPE) algorithm, was employed to automatically fine-tune the XGBoost hyperparameters, thereby improving convergence and reducing manual experimentation. The model’s performance was evaluated using 10-fold cross-validation and standard metrics, including accuracy, precision, recall, F1-score, and the Area Under the ROC Curve (AUC). Experimental results demonstrated that the proposed framework achieved an exceptionally high performance with an accuracy of 99.99% and an AUC of 1.0000, indicating near-perfect discrimination between edible and poisonous mushrooms. Feature importance analysis further revealed that habitat, veil color, and stem root were the most influential predictors of toxicity. The findings highlight the effectiveness of Bayesian-optimized ensemble learning in handling high-dimensional biological data, offering a reliable, transparent, and computationally efficient approach for biosafety assessment and ecological data analysis.

References

A Ilemobayo, J., Durodola, O., Alade, O., J Awotunde, O., T Olanrewaju, A., Falana, O., Ogungbire, A., Osinuga, A., Ogunbiyi, D., Ifeanyi, A., E Odezuligbo, I., & E Edu, O. (2024). Hyperparameter Tuning in Machine Learning: A Comprehensive Review. Journal of Engineering Research and Reports, 26(6), 388–395. https://doi.org/10.9734/jerr/2024/v26i61188

Ali, S., Vasudev, L., Salma, S., & Kumar, D. (2024). Mushrooms as Bioindicators of Environmental Toxins. In Mushroom Magic: Biochemistry and Nutritional Value of Fungi (pp. 292–306). CRC Press. https://doi.org/10.1201/9781003570257-18

Arslan, M., Azam, M., Ali, M., Hashmi, M. U., Kousar, A., Zafar, Z., & Muazzam, A. (2024). A Comparative Study of Machine Learning Methods for Optimizing Mushroom Classification. Journal of Computing & Biomedical Informatics, 08(01). https://doi.org/10.56979/801/2024

Bian, K., & Priyadarshi, R. (2024). Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues. Archives of Computational Methods in Engineering, 31(7), 4209–4233. https://doi.org/10.1007/s11831-024-10110-w

El-Ramady, H., Abdalla, N., Badgar, K., Llanaj, X., Törős, G., Hajdú, P., Eid, Y., & Prokisch, J. (2022). Edible Mushrooms for Sustainable and Healthy Human Food: Nutritional and Medicinal Attributes. Sustainability (Switzerland), 14(9), 4941. https://doi.org/10.3390/su14094941

Garnett, R. (2023). Bayesian Optimization. In Bayesian Optimization. Cambridge University Press. https://doi.org/10.1017/9781108348973

Gorriz, J. M., Clemente, R. M., Segovia, F., Ramirez, J., Ortiz, A., & Suckling, J. (2024). Is K-fold cross validation the best model selection method for Machine Learning? http://arxiv.org/abs/2401.16407

Ibebuchi, C. C. (2025). Uncertainty in machine learning feature importance for climate science: a comparative analysis of SHAP, PDP, and gain-based methods. Theoretical and Applied Climatology, 156(9), 1–14. https://doi.org/10.1007/s00704-025-05703-9

Jia, X., Wang, T., & Zhu, H. (2023). Advancing Computational Toxicology by Interpretable Machine Learning. Environmental Science and Technology, 57(46), 17690–17706. https://doi.org/10.1021/acs.est.3c00653

Kotha, S. (2024). A STUDY ON THE IMPACT OF PREPROCESSING STEPS ON MACHINE LEARNING MODEL FAIRNESS. Purdue University Graduate School.

Li, H., Tian, Y., Menolli, N., Ye, L., Karunarathna, S. C., Perez-Moreno, J., Rahman, M. M., Rashid, M. H., Phengsintham, P., Rizal, L., Kasuya, T., Lim, Y. W., Dutta, A. K., Khalid, A. N., Huyen, L. T., Balolong, M. P., Baruah, G., Madawala, S., Thongklang, N., … Mortimer, P. E. (2021). Reviewing the world’s edible mushroom species: A new evidence-based classification system. Comprehensive Reviews in Food Science and Food Safety , 20(2), 1982–2014. https://doi.org/10.1111/1541-4337.12708

Linardatos, P., Papastefanopoulos, V., & Kotsiantis, S. (2021). Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1), 1–45. https://doi.org/10.3390/e23010018

Mapes, M., & Mouillot, E. (2023). Taxonomic Challenges and Advances in Eel Family Classification: Integrating Multidisciplinary Approaches. Journal of Fish Taxonomy, 29, 24–35.

Morera, A., Martínez de Aragón, J., Bonet, J. A., Liang, J., & de-Miguel, S. (2021). Performance of statistical and machine learning-based methods for predicting biogeographical patterns of fungal productivity in forest ecosystems. Forest Ecosystems, 8(1), 21. https://doi.org/10.1186/s40663-021-00297-w

Nematzadeh, S., Kiani, F., Torkamanian-Afshar, M., & Aydin, N. (2022). Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Computational Biology and Chemistry, 97, 107619. https://doi.org/10.1016/j.compbiolchem.2021.107619

Ortiz-Letechipia, J. S., Galvan-Tejada, C. E., Galván-Tejada, J. I., Soto-Murillo, M. A., Acosta-Cruz, E., Gamboa-Rosales, H., Padilla, J. M. C., & Luna-García, H. (2024). Classification and selection of the main features for the identification of toxicity in Agaricus and Lepiota with machine learning algorithms. PeerJ, 12, e16501. https://doi.org/10.7717/peerj.16501

Özben, T. Ç., & Güler, O. (2025). A vision transformer ensemble and mobile augmented reality solution for mushroom toxicity classification. Signal, Image and Video Processing, 19(11), 905. https://doi.org/10.1007/s11760-025-04445-5

Pérez Santín, E., Rodríguez Solana, R., González García, M., García Suárez, M. D. M., Blanco Díaz, G. D., Cima Cabal, M. D., Moreno Rojas, J. M., & López Sánchez, J. I. (2021). Toxicity prediction based on artificial intelligence: A multidisciplinary overview. Wiley Interdisciplinary Reviews: Computational Molecular Science, 11(5), e1516. https://doi.org/10.1002/wcms.1516

Rutter, G. (2010). Psilocybin Mushrooms of the World: an identification guide. Paul Stamets. In Edinburgh Journal of Botany (Vol. 56, Issue 3). Ten Speed Press. https://doi.org/10.1017/s0960428600001426

Sahin, E. K. (2020). Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using XGBoost, gradient boosting machine, and random forest. SN Applied Sciences, 2(7), 1308. https://doi.org/10.1007/s42452-020-3060-1

Swastika, W. (2025). Comparative Analysis of Neural Network Architecture Optimization : A Study on Genetic Algorithm , Random Search , Grid Search , and Adaptive Search Methods for Digit Classification. INSYST: Journal of Intelligent System and Computation, 07(01), 11–17.

Thakur, N., Thakur, H., & Kumar, A. (2022). Classification of Mushrooms Into Edible Or Poisonous Using ML.

Vujović, Ž. (2021). Classification Model Evaluation Metrics. International Journal of Advanced Computer Science and Applications, 12(6), 599–606. https://doi.org/10.14569/IJACSA.2021.0120670

Wagner, D., Heider, D., & Hattab, G. (2021). Secondary Mushroom Dataset. In UCI Machine Learning Repository. https://doi.org/10.24432/C5FP5Q

Wang, X., Jin, Y., Schmitt, S., & Olhofer, M. (2023). Recent Advances in Bayesian Optimization. ACM Computing Surveys, 55(13s), 1–36. https://doi.org/10.1145/3582078

Wilson, A., & Anwar, M. R. (2024). The Future of Adaptive Machine Learning Algorithms in High-Dimensional Data Processing. International Transactions on Artificial Intelligence (ITALIC), 3(1), 97–107. https://doi.org/10.33050/italic.v3i1.656

Zhang, P., Jia, Y., & Shang, Y. (2022). Research and application of XGBoost in imbalanced data. International Journal of Distributed Sensor Networks, 18(6). https://doi.org/10.1177/15501329221106935

ZLOBIN, M., & BAZYLEVYCH, V. (2025). Bayesian Optimization for Tuning Hyperparametrs of Machine Learning Models: a Performance Analysis in Xgboost. Computer Systems and Information Technologies, 1, 141–146. https://doi.org/10.31891/csit-2025-1-16

Downloads

Published

2025-10-24

How to Cite

Sastradinata, A. K., Sunarta, S., Miptahudin, R. A., Abdurrahman, M. D., & Taqwa, R. (2025). Bayesian-Optimized XGBoost Model for Predicting Mushroom Toxicity. Jurnal Mandiri IT, 14(2), 247–259. https://doi.org/10.35335/mandiri.v14i2.465