• English
    • Kurdish
        • English
        • Kurdish
      View Item 
      •   Soran University Repository
      • Articles
      • Article
      • View Item
      •   Soran University Repository
      • Articles
      • Article
      • View Item
      • All fields
      • Title
      • Subject
      • Author
      • Year
      • Publisher
      • Source Title
      • ISSN
      • DOI
      • ISBN
      Advanced Search
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Language Resources and Evaluation

      Sentiment analysis in low‑resource contexts: BERT’s impact on Central Kurdish

      Author:
      Muhealddin Awlla, Kozhin
      ,
      Veisi, Hadi
      ,
      Abas Abdullah, Abdulhady
      Abstract: This paper enhances the study of sentiment analysis for the Central Kurdish lan- guage by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low-resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional word embedding model, such as Word2Vec, but with the emer- gence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the lan- guage under study, the Kurdish language, thus setting a new benchmark for senti- ment analysis in low-resource languages. The steps include collecting and normal- izing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidi- rectional Long Short-Term Memory (BiLSTM), Multi-Layer Perceptron (MLP), and finetuning the BERT classifier. The proposed approach consists of 3 classes: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best-performing clas- sifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maxi- mum accuracy achieved is 73.96%, while the fine-tuned BERT model tops the oth- ers with 75.37% accuracy. Additionally, the fine-tuned BERT model demonstrates a vast improvement when focused on two 2-class sentiment analyses positive and negative with an accuracy of 86.31%. The study makes a comprehensive compari- son, highlighting BERT’s superiority over the traditional ones based on accuracy and semantic understanding. It is motivated because several results are obtained that the proposed BERT-based models outperform Word2Vec models conventionally used here by a remarkable accuracy gain in most sentiment analysis tasks
      URI: http://192.64.112.23/xmlui/handle/311/91
      Subject: Sentiment analysis , Deep learning , BERT , BiLSTM , Central Kurdish language
      Collections :
      • Article
      • Download: (86.37Kb)
      • Show Full MetaData

      Show full item record

      contributor authorMuhealddin Awlla, Kozhin
      contributor authorVeisi, Hadi
      contributor authorAbas Abdullah, Abdulhady
      date accessioned2025-02-21T19:12:05Z
      date available2025-02-21T19:12:05Z
      date issued2025
      identifier urihttp://192.64.112.23/xmlui/handle/311/91
      description abstractThis paper enhances the study of sentiment analysis for the Central Kurdish lan- guage by integrating the Bidirectional Encoder Representations from Transformers (BERT) into Natural Language Processing techniques. Kurdish is a low-resourced language, having a high level of linguistic diversity with minimal computational resources, making sentiment analysis somewhat challenging. Earlier, this was done using a traditional word embedding model, such as Word2Vec, but with the emer- gence of new language models, specifically BERT, there is hope for improvements. The better word embedding capabilities of BERT lend to this study, aiding in the capturing of the nuanced semantic pool and the contextual intricacies of the lan- guage under study, the Kurdish language, thus setting a new benchmark for senti- ment analysis in low-resource languages. The steps include collecting and normal- izing a large corpus of Kurdish texts, pretraining BERT with a special tokenizer for Kurdish, and developing different models for sentiment analysis including Bidi- rectional Long Short-Term Memory (BiLSTM), Multi-Layer Perceptron (MLP), and finetuning the BERT classifier. The proposed approach consists of 3 classes: positive, negative, and neutral sentiment analysis using a sentiment embedding of BERT in four different configurations. The accuracy of the best-performing clas- sifier, BiLSTM, is 74.09%. For the BERT with an MLP classifier model, the maxi- mum accuracy achieved is 73.96%, while the fine-tuned BERT model tops the oth- ers with 75.37% accuracy. Additionally, the fine-tuned BERT model demonstrates a vast improvement when focused on two 2-class sentiment analyses positive and negative with an accuracy of 86.31%. The study makes a comprehensive compari- son, highlighting BERT’s superiority over the traditional ones based on accuracy and semantic understanding. It is motivated because several results are obtained that the proposed BERT-based models outperform Word2Vec models conventionally used here by a remarkable accuracy gain in most sentiment analysis tasksen_US
      language isoen_USen_US
      publisherLanguage Resources and Evaluationen_US
      subjectSentiment analysisen_US
      subjectDeep learningen_US
      subjectBERTen_US
      subjectBiLSTMen_US
      subjectCentral Kurdish languageen_US
      titleSentiment analysis in low‑resource contexts: BERT’s impact on Central Kurdishen_US
      typeArticleen_US
      Digital Repository Software, Supported by Negasht Company

      71 Title Indexed

       
      Digital Repository Software, Supported by Negasht Company

      71 Title Indexed