Handwritten text segmentation using deep learning method
DOI:
https://doi.org/10.35335/mandiri.v14i4.509Keywords:
Deep Learning, Document Image Analysis, Handwritten Recognition, Text Segmentation, U-NetAbstract
The rapid development of artificial intelligence and deep learning technologies has increased the risk of digital task fabrication in academic environments, encouraging educators to reintroduce handwritten assignments as an authentic evaluation method. In handwritten document analysis systems, background segmentation is a critical preprocessing step that separates text from complex document backgrounds. This study proposes the use of the U-Net deep learning architecture for background segmentation of handwritten document images. Two datasets were employed: the public cBAD dataset and a custom dataset consisting of Indonesian handwritten student assignments. Both datasets were processed using an identical pipeline and evaluated using 5-fold cross-validation. Model performance was measured using the Dice Similarity Coefficient and Intersection over Union (IoU). Experimental results show that the proposed U-Net model achieved an average Dice coefficient (F1-Score) of 0.74 on the cBAD dataset and 0.83 on the student assignment dataset. These results indicate that the model performs consistently and demonstrates stable generalization across cross-validation folds. Therefore, the proposed approach is suitable as an initial segmentation stage in handwritten document recognition systems.
References
Arifin, M. (2017). Handwritten Javanese Character Recognition Using Discriminative Learning [Bachelor’s Thesis, Universitas Indonesia]. https://doi.org/10.1109/ICITISEE.2017.8285521
Calvo-Zaragoza, J., & Gallego, A.-J. (2019). A selectional auto-encoder approach for document image binarization. Pattern Recognition, 86, 37–47. https://doi.org/10.1016/j.patcog.2018.08.011
Diem, M., Kleber, F., Fiel, S., Grüning, T., & Gatos, B. (2017). cBAD: ICDAR2017 competition on baseline detection. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 1, 1355–1360.
Farhansyah, M. R., Johari, M. Z. F., Amiral, A., Purwarianti, A., Yuana, K. A., & Wijaya, D. T. (2024). DriveThru: A Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives. arXiv Preprint arXiv:2411.09318. https://doi.org/10.48550/arXiv.2411.09318
Fizaine, F. C., Bard, P., Paindavoine, M., Robin, C., Bouyé, E., Lefèvre, R., & Vinter, A. (2024). Historical text line segmentation using deep learning algorithms: Mask-rcnn against u-net networks. Journal of Imaging, 10(3), 65. https://doi.org/10.3390/jimaging10030065
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org
Grüning, T., Labahn, R., Diem, M., Kleber, F., & Fiel, S. (2018). READ-BAD: A Dataset and Evaluation Scheme for Baseline Detection in Archival Documents. International Conference on Document Analysis and Recognition (ICDAR). https://doi.org/10.1109/ICDAR.2017.307
He, S., & Schomaker, L. (2019). DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition, 91, 379–390. https://doi.org/10.1016/j.patcog.2019.01.025
Jo, J., Koo, H. I., Soh, J. W., & Cho, N. I. (2020). Handwritten text segmentation via end-to-end learning of convolutional neural networks. Multimedia Tools and Applications, 79(43), 32137–32150. https://doi.org/10.1007/s11042-020-09624-9
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv Preprint arXiv:1412.6980.
Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the International Joint Conference on Artificial Intelligence, 1137–1145.
Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text Line Segmentation of Historical Documents: A Survey. International Journal on Document Analysis and Recognition, 9(2–4), 123–138. https://doi.org/10.1007/s10032-007-0045-8
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965
Lukács, H. I., Beregi, B. Z., Porteleki, B., Fischl, T., & Botzheim, J. (2025). Attention U-Net-based semantic segmentation for welding line detection. Scientific Reports, 15(1), 15276. https://doi.org/10.1038/s41598-025-00257-2
Mechi, O., Mehri, M., Ingold, R., & Amara, N. E. B. (2019). Text line segmentation in historical document images using an adaptive u-net architecture. 2019 International Conference on Document Analysis and Recognition (ICDAR), 369–374. https://doi.org/10.1109/ICDAR.2019.00066
Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the International Conference on 3D Vision (3DV), 565–571. https://doi.org/10.1109/3DV.2016.79
Neha, F., Bhati, D., Shukla, D. K., Dalvi, S. M., Mantzou, N., & Shubbar, S. (2025). An analytics-driven review of U-Net for medical image segmentation. Healthcare Analytics, 100416. https://doi.org/10.1016/j.health.2025.100416
Oliveira, S. A., Seguin, B., & Kaplan, F. (2018). dhSegment: A generic deep-learning approach for document segmentation. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 7–12. https://doi.org/10.1109/ICFHR-2018.2018.00011
Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66. https://doi.org/10.1109/TSMC.1979.4310076
Prasetyo, A. & others. (2022). DeepLontar: A Dataset for Handwritten Balinese Character Detection and Recognition. Proceedings of an International Conference on Document Analysis. https://doi.org/10.1038/s41597-022-01867-5
Rabaev, I., & Litvak, M. (2025). Recent advances in text line segmentation and baseline detection in historical document images: A systematic review. International Journal on Document Analysis and Recognition. https://doi.org/10.1007/s10032-025-00526-w
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Sauvola, J., & Pietikäinen, M. (2000). Adaptive Document Image Binarization. Pattern Recognition, 33(2), 225–236. https://doi.org/10.1016/S0031-3203(99)00055-2
Tensmeyer, C., & Martinez, T. (2017). Document image binarization with fully convolutional neural networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 1, 99–104. https://doi.org/10.1109/ICDAR.2017.25
Xiong, W., Zhou, L., Yue, L., Li, L., & Wang, S. (2021). An enhanced binarization framework for degraded historical document images. EURASIP Journal on Image and Video Processing, 2021(1), 13. https://doi.org/10.1186/s13640-021-00556-4
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Zulkarnaen Hatala, Ahmad Thariq, Josseano Parera, Muhammad Hatala

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.




