Design and construction of telegram bot-based data breach preprocessing application for cyber threat intelligence in institution x

Authors

  • Seto Gandhara Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
  • Tegar Pandu Satria Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
  • Hondor Saragih Universitas Pertahanan Republik Indonesia, Bogor, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i1.413

Keywords:

CSIRT, Cyber Threat Intelligence, Data Breach, Data Preprocessing, Telegram Bot

Abstract

Data breaches pose a significant threat in today's digital landscape, especially for organizations handling sensitive information, such as government institutions. These incidents can result in serious consequences, including risks to national security, loss of public trust, and financial harm. Institution X, an Indonesian organization dedicated to cyber threat prevention, faces challenges due to the high volume of unstructured and "dirty" leaked data, often shared via hidden platforms like the dark web and Telegram. To address this issue, a Telegram bot-based application was designed and developed using the Rapid Application Development (RAD) method. The application automates data collection, cleaning, and preprocessing, with features such as keyword-based search and CSV file conversion. It was built using Python and deployed through the Replit cloud platform, utilizing the Telebot library to interact with Telegram APIs. Internal testing covered six usage scenarios, including keyword processing, multi-file handling, and unauthorized access control, with all scenarios producing successful outcomes. The application significantly improves the CSIRT team's effectiveness and efficiency in responding to cyber threats. The results confirm the system’s readiness for operational deployment and its potential contribution to enhancing cyber threat intelligence for Institution X and other government agencies.

References

Adjaoute, A. (2021). Data breach detection. In US Patent 11,062,317. https://patents.google.com/patent/US11062317B2/en%0Ahttps://patentimages.storage.googleapis.com/3b/14/72/9383098a81f95c/US11062317.pdf

Agarwal, V. (2015). Research on Data Preprocessing and Categorization Technique for Smartphone Review Analysis. International Journal of Computer Applications, 131(4), 30–36. https://doi.org/10.5120/ijca2015907309

Allodi, L., & Massacci, F. (2017). Security events and cyber insurance: Insights from the empirical data. ACM Transactions on Information and System Security.

Almeshekah, M. H., & Spafford, E. H. (2016). Cyber security deception. Computers & Security, 68, 26-47.

Al Sweigart. (2015). Automate the Boring Stuff with Python. No Starch Press

Berreby, D. (2024). Chat Bots. Scientific American, 330(3), 50. https://doi.org/10.1038/scientificamerican0324-50

Beazley, D. M. (2009). Python Essential Reference (4th ed.). Addison-Wesley

Booch, G., Rumbaugh, J., & Jacobson, I. (2005). The Unified Modeling Language User Guide. Addison-Wesley.

Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale University Journal of Engineering Sciences, 28(2), 299–312. https://doi.org/10.5505/pajes.2021.62687

Chapman, C., & Stolee, K. T. (2016). Exploring regular expression usage and context in python. ISSTA 2016 - Proceedings of the 25th International Symposium on Software Testing and Analysis, 282–293. https://doi.org/10.1145/2931037.2931073

Chapman, C., Wang, P., & Stolee, K. T. (2017). Exploring regular expression comprehension. ASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 405–416. https://doi.org/10.1109/ASE.2017.8115653

Chromiński, K., Benko, Ľ., Hernández-Figueroa, Z. J., González-Domínguez, J. D., & Rodríguez-del-Pino, J. C. (2021). Python Fundamentals. Python Fundamentals, c. https://doi.org/10.17846/fpvai-2021-14

CNN Indonesia. (2021). Data 279 juta warga diduga bocor, dijual murah di dark web. Diakses dari https://www.cnnindonesia.com

Döhmen, T., Mühleisen, H., & Boncz, P. (2016). Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files. Dl.Acm.Org, 12(August). https://core.ac.uk/download/pdf/301647661.pdf

Grafberger, S., Stoyanovich, J., & Schelter, S. (2021). Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines. 11th Annual Conference on Innovative Data Systems Research, CIDR 2021.

Heri Khariono, Rizky Parlika, Kusuma, H. A., & Setyawan, D. A. (2021). Pemanfaatan Bot Telegram Sebagai E-Learning Ujian Berbasis File. Jurnal Informatika Polinema, 7(4), 65–72. https://doi.org/10.33795/jip.v7i4.696

Kotsias, J., Ahmad, A., & Scheepers, R. (2023). Adopting and integrating cyber-threat intelligence in a commercial organisation. European Journal of Information Systems, 32(1), 35–51. https://doi.org/10.1080/0960085X.2022.2088414

Kovtaniuk, M. S. (2022). Online compiler «Replit» usage during the study of the programming discipline. https://doi.org/10.30525/978-9934-26-277-7-108

Mäs, S., Henzen, D., Bernard, L., & Müller, M. (n.d.). 118Mäs-ShortPaper. 1–5.

Maura, M. F., & Sutabri, T. (2024). Analisis Penggunaan Platform Replit dalam Pembelajaran Coding: Studi Kasus Terhadap Tingkat Keterlibatan Pengguna dan Efektivitas Pembelajaran. IJM: Indonesian Journal of Multidisciplinary.

McKinney, W. (2011). pandas: a Foundational Python Library for Data Analysis and Statistics. Python for High Performance and Scientific Computing, 1–9.

McKinney, W. (2022). Pandas: Powerful Python Data Analysis Toolkit. Pandas - Powerful Python Data Analysis Toolkit, 1–3743. https://pandas.pydata.org/pandas-docs/version/1.4.4/

Mulyanto, A. D. (2020). Pemanfaatan Bot Telegram Untuk Media Informasi Penelitian. Matics, 12(1), 49. https://doi.org/10.18860/mat.v12i1.8847

Muslimin, Z., Wicaksono, M. A., Fadlurachman, M. F., & Ramli, I. (2019). Rancang Bangun Sistem Keamanan dan Pemantau Tamu pada Pintu Rumah Pintar Berbasis Raspberry Pi dan Chat Bot Telegram. Jurnal Penelitian Enjiniring, 23(2), 121–128. https://doi.org/10.25042/jpe.112019.05

Neto, N. N., Madnick, S., Paula, A. M. G. D., & Borges, N. M. (2021). Developing a Global Data Breach Database and the Challenges Encountered. Journal of Data and Information Quality, 13(1), 1–33. https://doi.org/10.1145/3439873

Raka, S. (2020). Pembuatan Program Presensi Pegawai Berbasis Web Pada PT Multifortuna Sinardelta. Kerja Praktek Teknik Informatika UNTAG, 1(1), 70.

Rogers, R. (2020). Deplatforming: Following extreme Internet celebrities to Telegram and alternative social media. European Journal of Communication, 35(3), 213–229. https://doi.org/10.1177/0267323120922066

Rossum, G. Van, & Drake, F. L. (2011). The Python Language Reference 2.6.2. Python Reference Manual, 109.

Saleem, H., & Naveed, M. (2020). SoK: Anatomy of Data Breaches. Proceedings on Privacy Enhancing Technologies, 2020(4), 153–174. https://doi.org/10.2478/popets-2020-0067

Tariq, M., et al. (2020). Threat intelligence on dark web forums: A domain-specific lexicon-based approach. Security and Privacy, 3(6), e123.

Updated, X. F. (2004). Input file format. Physically Based Rendering, 911–940. https://doi.org/10.1016/b978-012553180-1/50023-5

Verizon. (2023). 2023 Data Breach Investigations Report. Verizon Enterprise.

Downloads

Published

2025-07-23

How to Cite

Gandhara, S., Satria, T. P., & Saragih, H. (2025). Design and construction of telegram bot-based data breach preprocessing application for cyber threat intelligence in institution x. Jurnal Mandiri IT, 14(1), 14–158. https://doi.org/10.35335/mandiri.v14i1.413

Most read articles by the same author(s)