Automated news monitoring and sentiment analysis system using web scraping and large language models
DOI:
https://doi.org/10.35335/mandiri.v14i3.492Keywords:
Django Framework, Large Language Models, News Monitoring, Sentiment Analysis, Web ScrapingAbstract
Organizations increasingly require efficient systems to monitor and analyze vast online news data for timely and informed decision making. Manual monitoring is inadequate due to information overload and the time sensitive nature of digital content. This study presents the design, development, and evaluation of an automated web based news monitoring and sentiment analysis system integrating web scraping and artificial intelligence. The system was implemented using the Django web framework with a PostgreSQL database, Playwright browser automation for dynamic content extraction, and Google’s Gemini API for contextual sentiment classification. Three main functions were developed: automated data collection based on keywords and date ranges, AI driven sentiment analysis producing positive, negative, or neutral labels with contextual understanding, and automated reporting with interactive visualizations exportable to XLSX and CSV formats. Functional black box testing confirmed 100% success across 28 test cases, verifying reliability in authentication, data acquisition, sentiment analysis, and visualization. Performance evaluation showed that the system could collect 50–200 articles within 2–4 minutes and process sentiment analysis at 1–2 seconds per article. The proposed system effectively transforms manual workflows into fully automated operations, enabling systematic media monitoring, sentiment tracking, and data driven decision support.
References
Ayuningtyas, P. K., Atmodjo WP, D., & Rachmadi, P. (2023). Performance And Functional Testing With The Black Box Testing Method. International Journal of Progressive Sciences and Technologies, 39(2), 212. https://doi.org/10.52155/ijpsat.v39.2.5471
Black-Box and White-Box Testing. (2021). In Essentials of Software Testing (pp. 141–164). Cambridge University Press. https://doi.org/10.1017/9781108974073.011
Gupta, S. (2024). Extract Information From Web. In Web Scraping with Python. Apress. https://doi.org/10.1007/979-8-8688-0776-3_2
Hamburg, M., & Roman, A. (2025). Black-Box Testing for Practitioners: A Case of the New ISTQB Test Analyst Syllabus. 2025 IEEE Conference on Software Testing, Verification and Validation (ICST), 634–645. https://doi.org/10.1109/ICST62969.2025.10988991
Harshal Paratwar, Vitthal Waghere, Chaitanya Ambekar, Deepak S. Uplaonkar, S. S. (2023). Curated Datasets for Use in Automated Media Monitoring and Feedback System: “News Classification System” Dataset, “Government News Classification” Dataset. International Journal on Recent and Innovation Trends in Computing and Communication, 11(9), 1019–1027. https://doi.org/10.17762/ijritcc.v11i9.8993
Huang, L. (2025). Creating a Dashboard for Interactive Data Visualization with Dash in Python. Programming Historian, 14. https://doi.org/10.46430/phen0124
Jeyachitra, R. K., & Manochandar, S. (2023). Machine Learning and Deep Learning. In Multimodal Biometric and Machine Learning Technologies (pp. 173–225). Wiley. https://doi.org/10.1002/9781119785491.ch10
Jin, H., Stringer, G., Do, P., Gorjian Jolfaei, N., Chow, C. W. K., Gorjian, N., Healey, A., Rameezdeen, R., & Saint, C. P. (2022). A Metadata Framework for Asset Management Decision Support: A Water Infrastructure Case Study. International Journal of Information Technology & Decision Making, 21(02), 517–540. https://doi.org/10.1142/S0219622021500693
Kadir, N. T., Hartanto, R., & Sulistyo, S. (2021). Modified Usability Test Scenario: User Story Approach to Evaluate Data Visualization Dashboard. IJITEE (International Journal of Information Technology and Electrical Engineering), 5(1), 1. https://doi.org/10.22146/ijitee.61201
Khorasani, M., Abdou, M., & Hernández Fernández, J. (2022). Authentication and Application Security. In Web Application Development with Streamlit (pp. 203–227). Apress. https://doi.org/10.1007/978-1-4842-8111-6_8
Krishna, V. V., & Gopinath, G. (2024). Software Development Life Cycle for Web Application by Using Traditional Methodology vs Agile Methodology. 2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), 1–6. https://doi.org/10.1109/ICONSTEM60960.2024.10568596
Ma, R., Yang, F., & Dong, F. (2024). Electric Power System Text Information Mining and Analysis Algorithm Combined with Natural Language Processing. 2024 International Conference on Telecommunications and Power Electronics (TELEPE), 202–207. https://doi.org/10.1109/TELEPE64216.2024.00042
Manikantam, S., Akhil, P., Reddy, K. R. A., Reddy, G. S. P., Hariharan, S., & Kekreja, V. (2024). Enhanced automated web scraping tool with proliferation of AI techniques. 2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET), 1–5. https://doi.org/10.1109/ICICET59348.2024.10616333
Murtaza, M. (2021). Analysis of comparative performance of deep learning models for sentiment analysis. The International FLAIRS Conference Proceedings, 34(1). https://doi.org/10.32473/flairs.v34i1.128739
Neural Networks and Deep Learning. (2021). In Machine Learning (pp. 105–142). The MIT Press. https://doi.org/10.7551/mitpress/13811.003.0007
Ortiz, P., & Freitas, L. (2025). Automated News Scraping and AI-Powered Analysis for Municipal Crime Mapping. Proceedings of the 17th International Conference on Agents and Artificial Intelligence, 742–749. https://doi.org/10.5220/0013178200003890
Pavlovic, M., Gligoric, C., Zdravkovic, F., & Pavlovic, D. (2024). Revolutionizing Management Accounting: The Role of Artificial Intelligence in Predictive Analytics, Automated Reporting, and Decision-Making. Business & Management Compass, 68(4), 23–42. https://doi.org/10.56065/nxn2gx53
Riyandi, A., Widodo, T., & Uyun, S. (2022). Classification of Damaged Road Images Using the Convolutional Neural Network Method. Telematika, 19(2), 147. https://doi.org/10.31315/telematika.v19i2.6460
Secco, C. A., & Nazemi, K. (2025). A modular visual analytics dashboard for patient health data. Information Visualization, 24(4), 336–350. https://doi.org/10.1177/14738716251360341
Senduk, F. X., Najoan, X. B. N., & Sompie, S. R. U. A. (2023). Pengembangan Arsitektur Microservices dengan RESTful API Gateway menggunakan Backend-for-frontend Pattern pada Portal Akademik Perguruan Tinggi. Jurnal Teknik Informatika, 18(1), 315–324. https://doi.org/10.35793/jti.v18i1.50402
Shah, A., Shah, H., Bafna, V., Khandor, C., & Nair, S. (2024). VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference. http://arxiv.org/abs/2410.09455
Shen, J., Yao, H., Liu, Z., Huyan, K., Zhang, L., Yang, Q., Wu, X., & Zhou, J. (2024). Utilizing Natural Language Processing for Efficient Text Analysis in the Era of Social Media. 2024 IEEE 25th China Conference on System Simulation Technology and Its Application (CCSSTA), 320–325. https://doi.org/10.1109/CCSSTA62096.2024.10691866
Sinanaj, L., & Bexheti, L. A. (2023). Analysis and Visualization of Road Accidents Using Heatmaps Based on Web Data. SEEU Review, 18(2), 176–190. https://doi.org/10.2478/seeur-2023-0064
Telang, T. (2023). Building RESTful Web Services. In Beginning Cloud Native Development with MicroProfile, Jakarta EE, and Kubernetes (pp. 77–110). Apress. https://doi.org/10.1007/978-1-4842-8832-0_4
Temitope, L. T., Diekola, O. H., & Ajayi, W. (2025). Enhancing Quality Assurance Practices in Software Development: Application of Agile Methodology. Asian Journal of Research in Computer Science, 18(10), 199–210. https://doi.org/10.9734/ajrcos/2025/v18i10773
Text Mining. (2022). In Linguistics. Oxford University Press. https://doi.org/10.1093/obo/9780199772810-0295
Thota, P., & Ramez, E. (2021). Web Scraping of COVID-19 News Stories to Create Datasets for Sentiment and Emotion Analysis. ACM International Conference Proceeding Series, 306–314. https://doi.org/10.1145/3453892.3461333
V, P., P, V., K, R. K., P, S., Khan, O., & Krishna, C. N. (2022). A Django Web Application to Promote Local Service Providers. 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 1517–1521. https://doi.org/10.1109/ICCMC53470.2022.9754099
Vanthana, V., & Kartheeban, K. (2022). Estimation of Accuracy Level for Sentiment Analysis using Machine Learning and Deep Learning Models. 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), 717–722. https://doi.org/10.1109/ICACRS55517.2022.10029086
Yang, H., Zhao, Y., Wu, Y., Wang, S., Zheng, T., Zhang, H., Ma, Z., Che, W., Wang, S., Wei, S., & Qin, B. (2025). Large language models meet text-centric multimodal sentiment analysis: a survey. Science China Information Sciences, 68(10), 200101. https://doi.org/10.1007/s11432-024-4593-8
Zhang, W., Deng, Y., Liu, B., Pan, S. J., & Bing, L. (2023). Sentiment Analysis in the Era of Large Language Models: A Reality Check. http://arxiv.org/abs/2305.15005
Zong, C., Xia, R., & Zhang, J. (2021). Sentiment Analysis and Opinion Mining. In Text Data Mining (pp. 163–199). Springer Singapore. https://doi.org/10.1007/978-981-16-0100-2_8
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Zerusealtin David Naibaho, Hondor Saragih

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




