Mapping thematic patterns in Indonesian novels through concept mining and computational linguistics
DOI:
https://doi.org/10.64595/lingtech.v2i1.128Keywords:
computational linguistics, concept mining, digital humanities, Indonesian novels, thematic analysisAbstract
Background: The expansion of Indonesian novels across historical periods has produced complex and overlapping thematic formations that remain difficult to map systematically using conventional close-reading approaches.
Objective: This study aims to identify dominant themes, trace their temporal shifts, and examine conceptual overlap among Indonesian novels through a computational framework.
Method: Employing a digital humanities approach, the study analyzes a corpus of 30 Indonesian novels (1920-2022) using concept mining, CF-IDF weighting, semantic similarity measurement, and network analysis.
Results: The findings reveal dominant thematic clusters centered on social inequality, nationalism, gender, religion, and modernization; clear temporal shifts in thematic emphasis across literary periods; and dense conceptual overlap, with social inequality functioning as a central thematic hub. Theme–theme projection and betweenness centrality analysis further demonstrate that thematic meaning emerges through relational structures rather than isolated categories.
Implication: These results strengthen empirical literary analysis by integrating computational rigor with interpretive criticism.
Novelty: This study introduces a replicable, network-based thematic mapping model for Indonesian novels, advancing computational literary studies in the Indonesian context.
Downloads
References
Abbas, A., Saleh, N. J., Pattu, A., Rahman, F., & Pammu, A. (2024). Comparison of American and Indonesian Women’s Notion in Cather’s O’pioneer! And Hamka’s Tenggelamnya Kapal Van Der Wijk. Journal of Language Teaching and Research, 15(5), 1489–1499. https://doi.org/10.17507/jltr.1505.10
Al-Ma’ruf, A. I., Arifin, Z., & Nugrahani, F. (2024). Exploring Ethical Frontiers: Moral Dimensions in the Tapestry of Contemporary Indonesian Literature. Studies in English Language and Education, 11(1), 587–604. https://doi.org/10.24815/siele.v11i1.35142
Aranda-Corral, G. A., Borrego-Díaz, J., & Galán-Páez, J. (2022). Concept learning consistency under three-way decision paradigm. International Journal of Machine Learning and Cybernetics, 13(10), 2977–2999. https://doi.org/10.1007/s13042-022-01576-w
Can, T. (2025). Why Go Digital?: Literary Studies in the Age of Digitalisation. In Exploration of the Intersection of Corpus Linguistics and Language Science (pp. 55–70). https://doi.org/10.4018/9798369381465.ch003
Fenlon, K., Frazier, E., & Muñoz, T. (2024). Digital Humanities. In Encyclopedia of Libraries, Librarianship, and Information Science, First Edition, Four Volume Set (Vol. 3, p. V3:501-V3:510). https://doi.org/10.1016/B978-0-323-95689-5.00140-1
Frolova, M. V. (2024). Indonesia, Islam, and Literature: Phenomenal Popularity of Sastra Islami. Vestnik Sankt-Peterburgskogo Universiteta Vostokovedenie i Afrikanistika, 16(2), 440–457. https://doi.org/10.21638/spbu13.2024.210
Gârdan, D., & Modoc, E. (2022). From Reading Books to Reading Data: Paradigm Shifts in Literary Studies after the Digital Turn. Revista Transilvania, 2022(10), 90–96. https://doi.org/10.51391/trva.2022.10.11
Hassanin, S. M., Al Bayomy, E. M., & Eleleidy, M. A. (2025). Leveraging Machine Learning and Natural Language Processing for Emotional and Thematic Analysis in Three Selected Contemporary English Novels. Theory and Practice in Language Studies, 15(12), 3833–3840. https://doi.org/10.17507/tpls.1512.03
IKAPI, T. (2022). Laporan Hasil Riset Perbukuan Indonesia (1 No. 1). Ikatan Penerbit Indonesia (IKAPI). https://www.ikapi.org/riset/
Kar, S. (2024). The Literary Canon Inside and Beyond Academia: Adaptations and Engagements. Interdisciplinary Literary Studies, 26(1), 112–133. https://doi.org/10.5325/intelitestud.26.1.0112
Li, K., Zha, H., Su, Y., & Yan, X. (2018). Concept Mining via Embedding. 2018-November, 267–276. https://doi.org/10.1109/ICDM.2018.00042
McGillivray, B., & Tóth, G. M. (2020). Applying Language Technology in Humanities Research: Design, Application, and the Underlying Logic (p. 126). https://doi.org/10.1007/978-3-030-46493-6
Nugraha, D. (2019). On the beginning of modern Indonesian literature. Humanities and Social Sciences Reviews, 7(6), 604–616. https://doi.org/10.18510/hssr.2019.7691
Omar, A. (2020). On the digital applications in the thematic literature studies of Emily Dickinson’s poetry. International Journal of Advanced Computer Science and Applications, 11(6), 361–365. https://doi.org/10.14569/IJACSA.2020.0110647
Omar, A. (2021a). Identifying themes in fiction: A centroid-based lexical clustering approach. Journal of Language and Linguistic Studies, 17, 580–594. https://www.jlls.org/index.php/jlls/article/view/2031
Omar, A. (2021b). Towards a Computational Model to Thematic Typology of Literary Texts: A Concept Mining Approach. International Journal of Advanced Computer Science and Applications, 12(12), 203–211. https://doi.org/10.14569/IJACSA.2021.0121226
Ort, C.-M. (2024). Text – knowledge – practice: For a sociology of knowledge option in literary studies. World Literature Studies, 16(2), 106–129. https://doi.org/10.31577/WLS.2024.16.2.9
Pradeep, M., Sasivardhan, T., Bodana, G., Shilpa, K., Savalapurapu, K., & Babu, G. C. (2025). Natural Language Processing for Literacy Text Mining: Extracting Knowledge From British National Corpus. 1816–1821. https://doi.org/10.1109/ICIRCA65293.2025.11089848
Rani, A. S. B., & Kamal, A. R. N. B. (2018). Text Mining to Concept Mining: Leads Feature Location in Software System. 2018 IEEE International Conference on Computational Intelligence and Computing Research, ICCIC 2018. https://doi.org/10.1109/ICCIC.2018.8782418
Tan, X. (2020). Topic extraction and classification method based on comment sets. Journal of Information Processing Systems, 16(2), 329–342. https://doi.org/10.3745/JIPS.04.0165
Teodorescu, H. N., & Bolea, S. C. (2022). A Comparative Lexical Analysis of Three Romanian Works – The Etymological Metalepsis Role and Etymological Indices. Romanian Journal of Information Science and Technology, 25(3–4), 275–289. https://romjist.ro/abstract-722.html
Tihomirovic, Z. (2020). Analiza in interpretacija prostora v literarnem besedilu. Slavisticna Revija, 68(4), 629–638. https://srl.si/ojs/srl/article/view/3885
Vaismoradi, M., & Snelgrove, S. (2019). Theme in qualitative content analysis and thematic analysis. Forum Qualitative Sozialforschung, 20(3). https://doi.org/10.17169/fqs-20.3.3376
Vinodini, S., & Adithya Pothan Raj, V. (2024). Enhancing Literary Analysis through Artificial Intelligence and Machine Learning: Insights from “The Alchemist.” 136–141. https://doi.org/10.1109/ICRAIS62903.2024.10811737
Wang, R., Zhou, D., Huang, H., & Zhou, Y. (2025). MIT: Mutual Information Topic Model for Diverse Topic Extraction. IEEE Transactions on Neural Networks and Learning Systems, 36(2), 2523–2537. https://doi.org/10.1109/TNNLS.2024.3357698
Wiyatmi, W., Suryaman, M., & Swatikasari, E. (2019). Developing an ecofeminist literary criticism model to cultivate an ecologically aware and feminist generation. Interdisciplinary Literary Studies, 21(4), 515–531. https://doi.org/10.5325/intelitestud.21.4.0515
Yuan, R., Vengadasamy, R., & Zheng, Y. (2025). Mapping Eileen Chang’s Novels with a Computational Analysis of Themes and Emotions. 3L: Language, Linguistics, Literature, 31(1), 322–338. https://doi.org/10.17576/3L-2025-3101-21
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Kun Andyan Anindita, Susan Hockey, Tomi Wahyu Septarianto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.








Creative Commons Attribution 4.0 International License