Cross-lingual sentiment analysis of official EU Slavic languages

doktorski rad

doktorski rad

Cross-lingual sentiment analysis of official EU Slavic languages

Vrsta ocjenski radovi
Tip doktorski rad
Godina 2022
Status obranjeno

Sažetak

In this dissertation, we develop automated deep learning models for the task of
sentiment analysis for low-resource languages. These models are built using transformer
neural networks. To accomplish the task of sentiment analysis, formally known as the task of
calculating the polar orientation of the text that is provided, we make use of the resources that
are available in high resource languages.
In this dissertation, we develop and conduct experiments on a set of low-resource and
high-resource South Slavic languages.
The dissertation is divided into three sections.
1) Using a probe mechanism, we conduct experiments in the first section to select a good
pre-trained model from publicly available resources. We develop a simple scoring
technique to correlate the performance of sentiment analysis and probing scores. To
test our hypothesis that a model is appropriate for cross-lingual sentiment transfer, we
compute scores before and after fine-tuning.
2) In the second section, we conduct numerous experiments employing Slavic and nonSlavic language datasets. We also examine the effect of Cyrillic and Roman scripts on
the transfer of sentiment. We combine datasets from multiple languages and determine
the optimal combination technique. We also propose a framework for multi-task
learning for cross-lingual sentiment analysis.
3) In the third section, we examine the effect of augmenting low-resource sentiment
analysis tasks using data augmentation techniques. We conduct an experiment
utilising the existing data enhancement methods and propose two novel methods. Our
proposed procedures do not rely on external oversight or resources. By analysing the
results, we have determined that the transformer-based fine-tuning schemes do not
benefit from augmented data because it is invariant to augmented instances.

Ključne riječi

Sentiment analysis, Low-resource, Multilingual, South-Slavic, Cross-lingual, Sentiment classification, Data augmentation, Multilingual, Multi-task learning, Probing, Negation, Semantic, Cross-family, Multi-source