Handwritten text segmentation using deep learning method

Zulkarnaen Hatala; Ahmad Thariq; Josseano Parera; Muhammad Hatala

doi:10.35335/mandiri.v14i4.509

Authors

Zulkarnaen Hatala Politeknik Negeri Ambon, Indonesia
Ahmad Thariq Politeknik Negeri Ambon, Indonesia
Josseano Parera Politeknik Negeri Ambon, Indonesia
Muhammad Hatala Institut Bisnis dan Teknologi Kalimantan, Indonesia

DOI:

https://doi.org/10.35335/mandiri.v14i4.509

Keywords:

Deep Learning, Document Image Analysis, Handwritten Recognition, Text Segmentation, U-Net

Abstract

The rapid development of artificial intelligence and deep learning technologies has increased the risk of digital task fabrication in academic environments, encouraging educators to reintroduce handwritten assignments as an authentic evaluation method. In handwritten document analysis systems, background segmentation is a critical preprocessing step that separates text from complex document backgrounds. This study proposes the use of the U-Net deep learning architecture for background segmentation of handwritten document images. Two datasets were employed: the public cBAD dataset and a custom dataset consisting of Indonesian handwritten student assignments. Both datasets were processed using an identical pipeline and evaluated using 5-fold cross-validation. Model performance was measured using the Dice Similarity Coefficient and Intersection over Union (IoU). Experimental results show that the proposed U-Net model achieved an average Dice coefficient (F1-Score) of 0.74 on the cBAD dataset and 0.83 on the student assignment dataset. These results indicate that the model performs consistently and demonstrates stable generalization across cross-validation folds. Therefore, the proposed approach is suitable as an initial segmentation stage in handwritten document recognition systems.

References

Arifin, M. (2017). Handwritten Javanese Character Recognition Using Discriminative Learning [Bachelor’s Thesis, Universitas Indonesia]. https://doi.org/10.1109/ICITISEE.2017.8285521

Calvo-Zaragoza, J., & Gallego, A.-J. (2019). A selectional auto-encoder approach for document image binarization. Pattern Recognition, 86, 37–47. https://doi.org/10.1016/j.patcog.2018.08.011

Diem, M., Kleber, F., Fiel, S., Grüning, T., & Gatos, B. (2017). cBAD: ICDAR2017 competition on baseline detection. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 1, 1355–1360.

Farhansyah, M. R., Johari, M. Z. F., Amiral, A., Purwarianti, A., Yuana, K. A., & Wijaya, D. T. (2024). DriveThru: A Document Extraction Platform and Benchmark Datasets for Indonesian Local Language Archives. arXiv Preprint arXiv:2411.09318. https://doi.org/10.48550/arXiv.2411.09318

Fizaine, F. C., Bard, P., Paindavoine, M., Robin, C., Bouyé, E., Lefèvre, R., & Vinter, A. (2024). Historical text line segmentation using deep learning algorithms: Mask-rcnn against u-net networks. Journal of Imaging, 10(3), 65. https://doi.org/10.3390/jimaging10030065

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press. https://www.deeplearningbook.org

Grüning, T., Labahn, R., Diem, M., Kleber, F., & Fiel, S. (2018). READ-BAD: A Dataset and Evaluation Scheme for Baseline Detection in Archival Documents. International Conference on Document Analysis and Recognition (ICDAR). https://doi.org/10.1109/ICDAR.2017.307

He, S., & Schomaker, L. (2019). DeepOtsu: Document enhancement and binarization using iterative deep learning. Pattern Recognition, 91, 379–390. https://doi.org/10.1016/j.patcog.2019.01.025

Jo, J., Koo, H. I., Soh, J. W., & Cho, N. I. (2020). Handwritten text segmentation via end-to-end learning of convolutional neural networks. Multimedia Tools and Applications, 79(43), 32137–32150. https://doi.org/10.1007/s11042-020-09624-9

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv Preprint arXiv:1412.6980.

Kohavi, R. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. Proceedings of the International Joint Conference on Artificial Intelligence, 1137–1145.

Likforman-Sulem, L., Zahour, A., & Taconet, B. (2007). Text Line Segmentation of Historical Documents: A Survey. International Journal on Document Analysis and Recognition, 9(2–4), 123–138. https://doi.org/10.1007/s10032-007-0045-8

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully Convolutional Networks for Semantic Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3431–3440. https://doi.org/10.1109/CVPR.2015.7298965

Lukács, H. I., Beregi, B. Z., Porteleki, B., Fischl, T., & Botzheim, J. (2025). Attention U-Net-based semantic segmentation for welding line detection. Scientific Reports, 15(1), 15276. https://doi.org/10.1038/s41598-025-00257-2

Mechi, O., Mehri, M., Ingold, R., & Amara, N. E. B. (2019). Text line segmentation in historical document images using an adaptive u-net architecture. 2019 International Conference on Document Analysis and Recognition (ICDAR), 369–374. https://doi.org/10.1109/ICDAR.2019.00066

Milletari, F., Navab, N., & Ahmadi, S.-A. (2016). V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Proceedings of the International Conference on 3D Vision (3DV), 565–571. https://doi.org/10.1109/3DV.2016.79

Neha, F., Bhati, D., Shukla, D. K., Dalvi, S. M., Mantzou, N., & Shubbar, S. (2025). An analytics-driven review of U-Net for medical image segmentation. Healthcare Analytics, 100416. https://doi.org/10.1016/j.health.2025.100416

Oliveira, S. A., Seguin, B., & Kaplan, F. (2018). dhSegment: A generic deep-learning approach for document segmentation. 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), 7–12. https://doi.org/10.1109/ICFHR-2018.2018.00011

Otsu, N. (1979). A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1), 62–66. https://doi.org/10.1109/TSMC.1979.4310076

Prasetyo, A. & others. (2022). DeepLontar: A Dataset for Handwritten Balinese Character Detection and Recognition. Proceedings of an International Conference on Document Analysis. https://doi.org/10.1038/s41597-022-01867-5

Rabaev, I., & Litvak, M. (2025). Recent advances in text line segmentation and baseline detection in historical document images: A systematic review. International Journal on Document Analysis and Recognition. https://doi.org/10.1007/s10032-025-00526-w

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

Sauvola, J., & Pietikäinen, M. (2000). Adaptive Document Image Binarization. Pattern Recognition, 33(2), 225–236. https://doi.org/10.1016/S0031-3203(99)00055-2

Tensmeyer, C., & Martinez, T. (2017). Document image binarization with fully convolutional neural networks. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), 1, 99–104. https://doi.org/10.1109/ICDAR.2017.25

Xiong, W., Zhou, L., Yue, L., Li, L., & Wang, S. (2021). An enhanced binarization framework for degraded historical document images. EURASIP Journal on Image and Video Processing, 2021(1), 13. https://doi.org/10.1186/s13640-021-00556-4

Handwritten text segmentation using deep learning method

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Accreditation Certificate

QUICK MENU

Current Issue

Language

Information

Jurnal Mandiri IT