Automated news monitoring and sentiment analysis system using web scraping and large language models

Authors

  • Zerusealtin David Naibaho Universitas Pertahanan Republik Indonesia, Bogor, Indonesia
  • Hondor Saragih Universitas Pertahanan Republik Indonesia, Bogor, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i3.492

Keywords:

Django Framework, Large Language Models, News Monitoring, Sentiment Analysis, Web Scraping

Abstract

Organizations increasingly require efficient systems to monitor and analyze vast online news data for timely and informed decision making. Manual monitoring is inadequate due to information overload and the time sensitive nature of digital content. This study presents the design, development, and evaluation of an automated web based news monitoring and sentiment analysis system integrating web scraping and artificial intelligence. The system was implemented using the Django web framework with a PostgreSQL database, Playwright browser automation for dynamic content extraction, and Google’s Gemini API for contextual sentiment classification. Three main functions were developed: automated data collection based on keywords and date ranges, AI driven sentiment analysis producing positive, negative, or neutral labels with contextual understanding, and automated reporting with interactive visualizations exportable to XLSX and CSV formats. Functional black box testing confirmed 100% success across 28 test cases, verifying reliability in authentication, data acquisition, sentiment analysis, and visualization. Performance evaluation showed that the system could collect 50–200 articles within 2–4 minutes and process sentiment analysis at 1–2 seconds per article. The proposed system effectively transforms manual workflows into fully automated operations, enabling systematic media monitoring, sentiment tracking, and data driven decision support.

References

Ayuningtyas, P. K., Atmodjo WP, D., & Rachmadi, P. (2023). Performance And Functional Testing With The Black Box Testing Method. International Journal of Progressive Sciences and Technologies, 39(2), 212. https://doi.org/10.52155/ijpsat.v39.2.5471

Black-Box and White-Box Testing. (2021). In Essentials of Software Testing (pp. 141–164). Cambridge University Press. https://doi.org/10.1017/9781108974073.011

Gupta, S. (2024). Extract Information From Web. In Web Scraping with Python. Apress. https://doi.org/10.1007/979-8-8688-0776-3_2

Hamburg, M., & Roman, A. (2025). Black-Box Testing for Practitioners: A Case of the New ISTQB Test Analyst Syllabus. 2025 IEEE Conference on Software Testing, Verification and Validation (ICST), 634–645. https://doi.org/10.1109/ICST62969.2025.10988991

Harshal Paratwar, Vitthal Waghere, Chaitanya Ambekar, Deepak S. Uplaonkar, S. S. (2023). Curated Datasets for Use in Automated Media Monitoring and Feedback System: “News Classification System” Dataset, “Government News Classification” Dataset. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 1019–1027. https://doi.org/10.17762/ijritcc.v11i9.8993

Huang, L. (2025). Creating a Dashboard for Interactive Data Visualization with Dash in Python. Programming Historian, 14. https://doi.org/10.46430/phen0124

Jeyachitra, R. K., & Manochandar, S. (2023). Machine Learning and Deep Learning. In Multimodal Biometric and Machine Learning Technologies (pp. 173–225). Wiley. https://doi.org/10.1002/9781119785491.ch10

Jin, H., Stringer, G., Do, P., Gorjian Jolfaei, N., Chow, C. W. K., Gorjian, N., Healey, A., Rameezdeen, R., & Saint, C. P. (2022). A Metadata Framework for Asset Management Decision Support: A Water Infrastructure Case Study. International Journal of Information Technology & Decision Making, 21(02), 517–540. https://doi.org/10.1142/S0219622021500693

Kadir, N. T., Hartanto, R., & Sulistyo, S. (2021). Modified Usability Test Scenario: User Story Approach to Evaluate Data Visualization Dashboard. IJITEE (International Journal of Information Technology and Electrical Engineering), 5(1), 1. https://doi.org/10.22146/ijitee.61201

Khorasani, M., Abdou, M., & Hernández Fernández, J. (2022). Authentication and Application Security. In Web Application Development with Streamlit (pp. 203–227). Apress. https://doi.org/10.1007/978-1-4842-8111-6_8

Krishna, V. V., & Gopinath, G. (2024). Software Development Life Cycle for Web Application by Using Traditional Methodology vs Agile Methodology. 2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), 1–6. https://doi.org/10.1109/ICONSTEM60960.2024.10568596

Ma, R., Yang, F., & Dong, F. (2024). Electric Power System Text Information Mining and Analysis Algorithm Combined with Natural Language Processing. 2024 International Conference on Telecommunications and Power Electronics (TELEPE), 202–207. https://doi.org/10.1109/TELEPE64216.2024.00042

Manikantam, S., Akhil, P., Reddy, K. R. A., Reddy, G. S. P., Hariharan, S., & Kekreja, V. (2024). Enhanced automated web scraping tool with proliferation of AI techniques. 2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET), 1–5. https://doi.org/10.1109/ICICET59348.2024.10616333

Murtaza, M. (2021). Analysis of comparative performance of deep learning models for sentiment analysis. The International FLAIRS Conference Proceedings, 34(1). https://doi.org/10.32473/flairs.v34i1.128739

Neural Networks and Deep Learning. (2021). In Machine Learning (pp. 105–142). The MIT Press. https://doi.org/10.7551/mitpress/13811.003.0007

Ortiz, P., & Freitas, L. (2025). Automated News Scraping and AI-Powered Analysis for Municipal Crime Mapping. Proceedings of the 17th International Conference on Agents and Artificial Intelligence, 742–749. https://doi.org/10.5220/0013178200003890

Pavlovic, M., Gligoric, C., Zdravkovic, F., & Pavlovic, D. (2024). Revolutionizing Management Accounting: The Role of Artificial Intelligence in Predictive Analytics, Automated Reporting, and Decision-Making. Business & Management Compass, 68(4), 23–42. https://doi.org/10.56065/nxn2gx53

Riyandi, A., Widodo, T., & Uyun, S. (2022). Classification of Damaged Road Images Using the Convolutional Neural Network Method. Telematika, 19(2), 147. https://doi.org/10.31315/telematika.v19i2.6460

Secco, C. A., & Nazemi, K. (2025). A modular visual analytics dashboard for patient health data. Information Visualization, 24(4), 336–350. https://doi.org/10.1177/14738716251360341

Senduk, F. X., Najoan, X. B. N., & Sompie, S. R. U. A. (2023). Pengembangan Arsitektur Microservices dengan RESTful API Gateway menggunakan Backend-for-frontend Pattern pada Portal Akademik Perguruan Tinggi. Jurnal Teknik Informatika, 18(1), 315–324. https://doi.org/10.35793/jti.v18i1.50402

Shah, A., Shah, H., Bafna, V., Khandor, C., & Nair, S. (2024). VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference. http://arxiv.org/abs/2410.09455

Shen, J., Yao, H., Liu, Z., Huyan, K., Zhang, L., Yang, Q., Wu, X., & Zhou, J. (2024). Utilizing Natural Language Processing for Efficient Text Analysis in the Era of Social Media. 2024 IEEE 25th China Conference on System Simulation Technology and Its Application (CCSSTA), 320–325. https://doi.org/10.1109/CCSSTA62096.2024.10691866

Sinanaj, L., & Bexheti, L. A. (2023). Analysis and Visualization of Road Accidents Using Heatmaps Based on Web Data. SEEU Review, 18(2), 176–190. https://doi.org/10.2478/seeur-2023-0064

Telang, T. (2023). Building RESTful Web Services. In Beginning Cloud Native Development with MicroProfile, Jakarta EE, and Kubernetes (pp. 77–110). Apress. https://doi.org/10.1007/978-1-4842-8832-0_4

Temitope, L. T., Diekola, O. H., & Ajayi, W. (2025). Enhancing Quality Assurance Practices in Software Development: Application of Agile Methodology. Asian Journal of Research in Computer Science, 18(10), 199–210. https://doi.org/10.9734/ajrcos/2025/v18i10773

Text Mining. (2022). In Linguistics. Oxford University Press. https://doi.org/10.1093/obo/9780199772810-0295

Thota, P., & Ramez, E. (2021). Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis. ACM International Conference Proceeding Series, 306–314. https://doi.org/10.1145/3453892.3461333

V, P., P, V., K, R. K., P, S., Khan, O., & Krishna, C. N. (2022). A Django Web Application to Promote Local Service Providers. 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 1517–1521. https://doi.org/10.1109/ICCMC53470.2022.9754099

Vanthana, V., & Kartheeban, K. (2022). Estimation of Accuracy Level for Sentiment Analysis using Machine Learning and Deep Learning Models. 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), 717–722. https://doi.org/10.1109/ICACRS55517.2022.10029086

Yang, H., Zhao, Y., Wu, Y., Wang, S., Zheng, T., Zhang, H., Ma, Z., Che, W., Wang, S., Wei, S., & Qin, B. (2025). Large language models meet text-centric multimodal sentiment analysis: a survey. Science China Information Sciences, 68(10), 200101. https://doi.org/10.1007/s11432-024-4593-8

Zhang, W., Deng, Y., Liu, B., Pan, S. J., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check. http://arxiv.org/abs/2305.15005

Zong, C., Xia, R., & Zhang, J. (2021). Sentiment Analysis and Opinion Mining. In Text Data Mining (pp. 163–199). Springer Singapore. https://doi.org/10.1007/978-981-16-0100-2_8

Downloads

Published

2026-01-15

How to Cite

Naibaho, Z. D., & Saragih, H. (2026). Automated news monitoring and sentiment analysis system using web scraping and large language models. Jurnal Mandiri IT, 14(3), 344–355. https://doi.org/10.35335/mandiri.v14i3.492

Most read articles by the same author(s)