Optimization of a FastText-Based BiLSTM Model with IndoBERT Semantic Data Augmentation for Indonesian Text Classification
Keywords:
Automatic Essay Scoring, BiLSTM, FastText, IndoBERT, Data AugmentationAbstract
Cognitive assessment through short-answer essays requires a consistent and objective scoring process; however, manual evaluation often suffers from time constraints and inter-rater variability. Automatic Essay Scoring (AES) has emerged as a promising approach to automate the assessment process. This study proposes an optimized Bidirectional Long Short-Term Memory (BiLSTM) model combined with FastText embeddings for Indonesian text classification using semantically augmented data generated by IndoBERT. The training dataset was obtained through the EDA_Synonym_IndoBERT augmentation technique on the UKARA dataset, while the validation and testing datasets consisted of original, non-augmented responses. Model optimization was achieved through the integration of Global Max Pooling to enhance feature representation and class weighting to mitigate class imbalance. Experimental results show that the proposed model achieved an accuracy of 93.49% on the validation set and 78.00% on the independent test set. The performance gap between validation and testing results indicates that, although semantic augmentation increases the diversity of training data, model generalization to previously unseen data remains a challenging issue. Furthermore, the implementation of class weighting improved the model's ability to recognize minority-class instances, achieving a recall score of 92%. These findings demonstrate that architectural optimization and training strategies play a crucial role in improving the performance of Automatic Essay Scoring systems for the Indonesian language
Abstract views: 0
,
PDF downloads: 0
Downloads
References
Al-Sultan, A., dkk. (2025). A novel approach for mitigating class imbalance in text classification. IEEE Access, PP(99), 1-1. https://doi.org/10.1109/ACCESS.2025.3611636
Aliyah, N. E., Sholikah, R. W., Firdausi, H., Ciptaningtyas, H. T., & Sabilla, I. A. (2025). Enhancing Automated Essay Scoring in Bahasa Indonesia with IndoBERT and IndoSBERT. 2025 International Conference on Smart Computing, IoT and Machine Learning (SIML). IEEE. https://doi.org/10.1109/SIML65326.2025.11080721
Ariyus, D., Manongga, D., & Sembiring, I. (2024). Enhancing Sentiment Analysis of Indonesian Tourism Video Content Commentary on TikTok: A FastText and Bi-LSTM Approach. Engineering, Technology & Applied Science Research, 14(6), 18020-18028. https://doi.org/10.48084/etasr.8859
Dhini, B. F., Girsang, A. S., Sufandi, U. U., & Kurniawati, H. (2023). Automatic essay scoring for discussion forum in online learning. Asian Association of Open Universities Journal, 18(3), 262-279. https://doi.org/10.1108/AAOUJ-05-2023-0050
Fadilah, N., & Priyanta, S. (2022). Automatic Essay Scoring Using Data Augmentation in Bahasa Indonesia. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 16(4), 401. https://doi.org/10.22146/ijccs.76396
Fadilah, N., & Zain, S. G. (2024). Rancang Bangun Sistem Penilaian Tes Essai Berbasis Web Di Testing Center UNM. PISCES: Journal of Progressive Information, Security, Computer and Embedded System, 2(1), 38–45. https://journal.diginus.id/PISCES/article/view/296
Mohammad (2025). CoGate-LSTM: A lightweight recurrent model that addresses extreme class imbalance through cosine-similarity feature gating. arXiv preprint arXiv:2510.17018v2. https://arxiv.org/html/2510.17018v2
Nur Azizah, A., Falach Asy'ari, M., Wisma Dwi Prastya, I., & Purwitasari, D. (2023). Easy Data Augmentation untuk Data yang Imbalance pada Konsultasi Kesehatan Daring. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(5), 1095–1104. https://doi.org/10.25126/jtiik.20231057082
Pratama, M. A., & Budi, I. (2026). Improved Text Classification for Indonesian Hate Speech Detection: FastText-LSTM Model with Easy Data Augmentation. Jurnal Sistem Informasi (JSI), 12(1), 45–56. https://doi.org/10.25126/jsi.20261219637
Rahma, I. A., & Suadaa, L. H. (2023). Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325
Ramadhan, T. A., & Purwarianti, A. (2025). Perbandingan Kinerja Model BiLSTM dan IndoBERT dalam Deteksi Karakteristik Teks Media Online di Indonesia. Jurnal Informatika dan Teknik Elektro Terapan (JITET), 13(2), 201–212. https://journal.eng.unila.ac.id/index.php/jitet/article/view/9365
Reknadi, D., Rohman, M. G., Mustain, M., & Utomo, A. F. L. (2025). Adaptation of Contrastive Learning and Augmentation for Indonesian Product Review Classification on Unbalanced Data Using Deep Learning and NLP. Generation Journal, 9(2), 115–126. https://ojs.unpkediri.ac.id/index.php/gj/article/view/22730
Setiawan, H., & Prasetyo, E. (2024). Fine-Tuned IndoBERT Based Model and Data Augmentation for Indonesian Language Paraphrase Identification. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 8(3), 415–423. https://doi.org/10.29207/resti.v8i3.5122
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Nur Fadilah, Bayu Anugerah Putra, Muh. Isbar Pratama

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






Email: