Design and construction of telegram bot-based data breach preprocessing application for cyber threat intelligence in institution x
DOI:
https://doi.org/10.35335/mandiri.v14i1.413Keywords:
CSIRT, Cyber Threat Intelligence, Data Breach, Data Preprocessing, Telegram BotAbstract
Data breaches pose a significant threat in today's digital landscape, especially for organizations handling sensitive information, such as government institutions. These incidents can result in serious consequences, including risks to national security, loss of public trust, and financial harm. Institution X, an Indonesian organization dedicated to cyber threat prevention, faces challenges due to the high volume of unstructured and "dirty" leaked data, often shared via hidden platforms like the dark web and Telegram. To address this issue, a Telegram bot-based application was designed and developed using the Rapid Application Development (RAD) method. The application automates data collection, cleaning, and preprocessing, with features such as keyword-based search and CSV file conversion. It was built using Python and deployed through the Replit cloud platform, utilizing the Telebot library to interact with Telegram APIs. Internal testing covered six usage scenarios, including keyword processing, multi-file handling, and unauthorized access control, with all scenarios producing successful outcomes. The application significantly improves the CSIRT team's effectiveness and efficiency in responding to cyber threats. The results confirm the system’s readiness for operational deployment and its potential contribution to enhancing cyber threat intelligence for Institution X and other government agencies.
References
Adjaoute, A. (2021). Data breach detection. In US Patent 11,062,317. https://patents.google.com/patent/US11062317B2/en%0Ahttps://patentimages.storage.googleapis.com/3b/14/72/9383098a81f95c/US11062317.pdf
Agarwal, V. (2015). Research on Data Preprocessing and Categorization Technique for Smartphone Review Analysis. International Journal of Computer Applications, 131(4), 30–36. https://doi.org/10.5120/ijca2015907309
Allodi, L., & Massacci, F. (2017). Security events and cyber insurance: Insights from the empirical data. ACM Transactions on Information and System Security.
Almeshekah, M. H., & Spafford, E. H. (2016). Cyber security deception. Computers & Security, 68, 26-47.
Al Sweigart. (2015). Automate the Boring Stuff with Python. No Starch Press
Berreby, D. (2024). Chat Bots. Scientific American, 330(3), 50. https://doi.org/10.1038/scientificamerican0324-50
Beazley, D. M. (2009). Python Essential Reference (4th ed.). Addison-Wesley
Booch, G., Rumbaugh, J., & Jacobson, I. (2005). The Unified Modeling Language User Guide. Addison-Wesley.
Çetin, V., & Yıldız, O. (2022). A comprehensive review on data preprocessing techniques in data analysis. Pamukkale University Journal of Engineering Sciences, 28(2), 299–312. https://doi.org/10.5505/pajes.2021.62687
Chapman, C., & Stolee, K. T. (2016). Exploring regular expression usage and context in python. ISSTA 2016 - Proceedings of the 25th International Symposium on Software Testing and Analysis, 282–293. https://doi.org/10.1145/2931037.2931073
Chapman, C., Wang, P., & Stolee, K. T. (2017). Exploring regular expression comprehension. ASE 2017 - Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 405–416. https://doi.org/10.1109/ASE.2017.8115653
Chromiński, K., Benko, Ľ., Hernández-Figueroa, Z. J., González-Domínguez, J. D., & Rodríguez-del-Pino, J. C. (2021). Python Fundamentals. Python Fundamentals, c. https://doi.org/10.17846/fpvai-2021-14
CNN Indonesia. (2021). Data 279 juta warga diduga bocor, dijual murah di dark web. Diakses dari https://www.cnnindonesia.com
Döhmen, T., Mühleisen, H., & Boncz, P. (2016). Multi-Hypothesis Parsing of Tabular Data in Comma-Separated Values (CSV) Files. Dl.Acm.Org, 12(August). https://core.ac.uk/download/pdf/301647661.pdf
Grafberger, S., Stoyanovich, J., & Schelter, S. (2021). Lightweight Inspection of Data Preprocessing in Native Machine Learning Pipelines. 11th Annual Conference on Innovative Data Systems Research, CIDR 2021.
Heri Khariono, Rizky Parlika, Kusuma, H. A., & Setyawan, D. A. (2021). Pemanfaatan Bot Telegram Sebagai E-Learning Ujian Berbasis File. Jurnal Informatika Polinema, 7(4), 65–72. https://doi.org/10.33795/jip.v7i4.696
Kotsias, J., Ahmad, A., & Scheepers, R. (2023). Adopting and integrating cyber-threat intelligence in a commercial organisation. European Journal of Information Systems, 32(1), 35–51. https://doi.org/10.1080/0960085X.2022.2088414
Kovtaniuk, M. S. (2022). Online compiler «Replit» usage during the study of the programming discipline. https://doi.org/10.30525/978-9934-26-277-7-108
Mäs, S., Henzen, D., Bernard, L., & Müller, M. (n.d.). 118Mäs-ShortPaper. 1–5.
Maura, M. F., & Sutabri, T. (2024). Analisis Penggunaan Platform Replit dalam Pembelajaran Coding: Studi Kasus Terhadap Tingkat Keterlibatan Pengguna dan Efektivitas Pembelajaran. IJM: Indonesian Journal of Multidisciplinary.
McKinney, W. (2011). pandas: a Foundational Python Library for Data Analysis and Statistics. Python for High Performance and Scientific Computing, 1–9.
McKinney, W. (2022). Pandas: Powerful Python Data Analysis Toolkit. Pandas - Powerful Python Data Analysis Toolkit, 1–3743. https://pandas.pydata.org/pandas-docs/version/1.4.4/
Mulyanto, A. D. (2020). Pemanfaatan Bot Telegram Untuk Media Informasi Penelitian. Matics, 12(1), 49. https://doi.org/10.18860/mat.v12i1.8847
Muslimin, Z., Wicaksono, M. A., Fadlurachman, M. F., & Ramli, I. (2019). Rancang Bangun Sistem Keamanan dan Pemantau Tamu pada Pintu Rumah Pintar Berbasis Raspberry Pi dan Chat Bot Telegram. Jurnal Penelitian Enjiniring, 23(2), 121–128. https://doi.org/10.25042/jpe.112019.05
Neto, N. N., Madnick, S., Paula, A. M. G. D., & Borges, N. M. (2021). Developing a Global Data Breach Database and the Challenges Encountered. Journal of Data and Information Quality, 13(1), 1–33. https://doi.org/10.1145/3439873
Raka, S. (2020). Pembuatan Program Presensi Pegawai Berbasis Web Pada PT Multifortuna Sinardelta. Kerja Praktek Teknik Informatika UNTAG, 1(1), 70.
Rogers, R. (2020). Deplatforming: Following extreme Internet celebrities to Telegram and alternative social media. European Journal of Communication, 35(3), 213–229. https://doi.org/10.1177/0267323120922066
Rossum, G. Van, & Drake, F. L. (2011). The Python Language Reference 2.6.2. Python Reference Manual, 109.
Saleem, H., & Naveed, M. (2020). SoK: Anatomy of Data Breaches. Proceedings on Privacy Enhancing Technologies, 2020(4), 153–174. https://doi.org/10.2478/popets-2020-0067
Tariq, M., et al. (2020). Threat intelligence on dark web forums: A domain-specific lexicon-based approach. Security and Privacy, 3(6), e123.
Updated, X. F. (2004). Input file format. Physically Based Rendering, 911–940. https://doi.org/10.1016/b978-012553180-1/50023-5
Verizon. (2023). 2023 Data Breach Investigations Report. Verizon Enterprise.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Seto Gandhara, Tegar Pandu Satria, Hondor Saragih

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
						
							



