Optimasi Model BiLSTM Berbasis FastText pada Data Augmentasi Semantik IndoBERT untuk Klasifikasi Teks Bahasa Indonesia

Nur Fadilah; Bayu Anugerah Putra; Muh. Isbar Pratama

doi:10.61255/pisces.v4i1.1249

Authors

Nur Fadilah Universitas Negeri Makassar https://orcid.org/0009-0006-9450-6046
Bayu Anugerah Putra Universitas Muhammadiyah Riau
Muh. Isbar Pratama Universitas Negeri Makassar

DOI:

https://doi.org/10.61255/pisces.v4i1.1249

Keywords:

Automatic Essay Scoring, BiLSTM, FastText, IndoBERT, Data Augmentation

Abstract

Cognitive assessment through short-answer essays requires a consistent and objective scoring process; however, manual evaluation often suffers from time constraints and inter-rater variability. Automatic Essay Scoring (AES) has emerged as a promising approach to automate the assessment process. This study proposes an optimized Bidirectional Long Short-Term Memory (BiLSTM) model combined with FastText embeddings for Indonesian text classification using semantically augmented data generated by IndoBERT. The training dataset was obtained through the EDA_Synonym_IndoBERT augmentation technique on the UKARA dataset, while the validation and testing datasets consisted of original, non-augmented responses. Model optimization was achieved through the integration of Global Max Pooling to enhance feature representation and class weighting to mitigate class imbalance. Experimental results show that the proposed model achieved an accuracy of 93.49% on the validation set and 78.00% on the independent test set. The performance gap between validation and testing results indicates that, although semantic augmentation increases the diversity of training data, model generalization to previously unseen data remains a challenging issue. Furthermore, the implementation of class weighting improved the model's ability to recognize minority-class instances, achieving a recall score of 92%. These findings demonstrate that architectural optimization and training strategies play a crucial role in improving the performance of Automatic Essay Scoring systems for the Indonesian language

Abstract views: 49 ,

PDF downloads: 35

Downloads

Download data is not yet available.

References

Al-Sultan, A., dkk. (2025). A novel approach for mitigating class imbalance in text classification. IEEE Access, PP(99), 1-1. https://doi.org/10.1109/ACCESS.2025.3611636

Aliyah, N. E., Sholikah, R. W., Firdausi, H., Ciptaningtyas, H. T., & Sabilla, I. A. (2025). Enhancing Automated Essay Scoring in Bahasa Indonesia with IndoBERT and IndoSBERT. 2025 International Conference on Smart Computing, IoT and Machine Learning (SIML). IEEE. https://doi.org/10.1109/SIML65326.2025.11080721

Ariyus, D., Manongga, D., & Sembiring, I. (2024). Enhancing Sentiment Analysis of Indonesian Tourism Video Content Commentary on TikTok: A FastText and Bi-LSTM Approach. Engineering, Technology & Applied Science Research, 14(6), 18020-18028. https://doi.org/10.48084/etasr.8859

Dhini, B. F., Girsang, A. S., Sufandi, U. U., & Kurniawati, H. (2023). Automatic essay scoring for discussion forum in online learning. Asian Association of Open Universities Journal, 18(3), 262-279. https://doi.org/10.1108/AAOUJ-05-2023-0050

Fadilah, N., & Priyanta, S. (2022). Automatic Essay Scoring Using Data Augmentation in Bahasa Indonesia. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 16(4), 401. https://doi.org/10.22146/ijccs.76396

Fadilah, N., & Zain, S. G. (2024). Rancang Bangun Sistem Penilaian Tes Essai Berbasis Web Di Testing Center UNM. PISCES: Journal of Progressive Information, Security, Computer and Embedded System, 2(1), 38–45. https://journal.diginus.id/PISCES/article/view/296

Mohammad (2025). CoGate-LSTM: A lightweight recurrent model that addresses extreme class imbalance through cosine-similarity feature gating. arXiv preprint arXiv:2510.17018v2. https://arxiv.org/html/2510.17018v2

Nur Azizah, A., Falach Asy'ari, M., Wisma Dwi Prastya, I., & Purwitasari, D. (2023). Easy Data Augmentation untuk Data yang Imbalance pada Konsultasi Kesehatan Daring. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(5), 1095–1104. https://doi.org/10.25126/jtiik.20231057082

Pratama, M. A., & Budi, I. (2026). Improved Text Classification for Indonesian Hate Speech Detection: FastText-LSTM Model with Easy Data Augmentation. Jurnal Sistem Informasi (JSI), 12(1), 45–56. https://doi.org/10.25126/jsi.20261219637

Rahma, I. A., & Suadaa, L. H. (2023). Penerapan Text Augmentation untuk Mengatasi Data yang Tidak Seimbang pada Klasifikasi Teks Berbahasa Indonesia. Jurnal Teknologi Informasi dan Ilmu Komputer, 10(6), 1329–1340. https://doi.org/10.25126/jtiik.2023107325

Ramadhan, T. A., & Purwarianti, A. (2025). Perbandingan Kinerja Model BiLSTM dan IndoBERT dalam Deteksi Karakteristik Teks Media Online di Indonesia. Jurnal Informatika dan Teknik Elektro Terapan (JITET), 13(2), 201–212. https://journal.eng.unila.ac.id/index.php/jitet/article/view/9365

Reknadi, D., Rohman, M. G., Mustain, M., & Utomo, A. F. L. (2025). Adaptation of Contrastive Learning and Augmentation for Indonesian Product Review Classification on Unbalanced Data Using Deep Learning and NLP. Generation Journal, 9(2), 115–126. https://ojs.unpkediri.ac.id/index.php/gj/article/view/22730

Setiawan, H., & Prasetyo, E. (2024). Fine-Tuned IndoBERT Based Model and Data Augmentation for Indonesian Language Paraphrase Identification. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 8(3), 415–423. https://doi.org/10.29207/resti.v8i3.5122