Sentiment analysis in low‑resource contexts: BERT’s impact on Central Kurdish

Muhealddin Awlla, Kozhin; Veisi, Hadi; Abas Abdullah, Abdulhady

Language Resources and Evaluation

Author:

Abstract: This paper enhances the study of sentiment analysis for the Central Kurdish lan- guage by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low-resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional word embedding model, such as Word2Vec, but with the emer- gence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the lan- guage under study, the Kurdish language, thus setting a new benchmark for senti- ment analysis in low-resource languages. The steps include collecting and normal- izing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidi- rectional Long Short-Term Memory (BiLSTM), Multi-Layer Perceptron (MLP), and finetuning the BERT classifier. The proposed approach consists of 3 classes: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best-performing clas- sifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maxi- mum accuracy achieved is 73.96%, while the fine-tuned BERT model tops the oth- ers with 75.37% accuracy. Additionally, the fine-tuned BERT model demonstrates a vast improvement when focused on two 2-class sentiment analyses positive and negative with an accuracy of 86.31%. The study makes a comprehensive compari- son, highlighting BERT’s superiority over the traditional ones based on accuracy and semantic understanding. It is motivated because several results are obtained that the proposed BERT-based models outperform Word2Vec models conventionally used here by a remarkable accuracy gain in most sentiment analysis tasks

URI: http://192.64.112.23/xmlui/handle/311/91

Subject: Sentiment analysis , Deep learning , BERT , BiLSTM , Central Kurdish language

Collections :

Article

Show full item record

contributor author	Muhealddin Awlla, Kozhin
contributor author	Veisi, Hadi
contributor author	Abas Abdullah, Abdulhady
date accessioned	2025-02-21T19:12:05Z
date available	2025-02-21T19:12:05Z
date issued	2025
identifier uri	http://192.64.112.23/xmlui/handle/311/91
description abstract	This paper enhances the study of sentiment analysis for the Central Kurdish lan- guage by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low-resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional word embedding model, such as Word2Vec, but with the emer- gence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the lan- guage under study, the Kurdish language, thus setting a new benchmark for senti- ment analysis in low-resource languages. The steps include collecting and normal- izing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidi- rectional Long Short-Term Memory (BiLSTM), Multi-Layer Perceptron (MLP), and finetuning the BERT classifier. The proposed approach consists of 3 classes: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best-performing clas- sifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maxi- mum accuracy achieved is 73.96%, while the fine-tuned BERT model tops the oth- ers with 75.37% accuracy. Additionally, the fine-tuned BERT model demonstrates a vast improvement when focused on two 2-class sentiment analyses positive and negative with an accuracy of 86.31%. The study makes a comprehensive compari- son, highlighting BERT’s superiority over the traditional ones based on accuracy and semantic understanding. It is motivated because several results are obtained that the proposed BERT-based models outperform Word2Vec models conventionally used here by a remarkable accuracy gain in most sentiment analysis tasks	en_US
language iso	en_US	en_US
publisher	Language Resources and Evaluation	en_US
subject	Sentiment analysis	en_US
subject	Deep learning	en_US
subject	BERT	en_US
subject	BiLSTM	en_US
subject	Central Kurdish language	en_US
title	Sentiment analysis in low‑resource contexts: BERT’s impact on Central Kurdish	en_US
type	Article	en_US