You are here

Title: Applying Distributional Semantics to Enhance Classifying Emotions in Arabic Tweets

Speaker: Shahad Alharbi, MSc

Abstract: In recent years, there has been an emergence of user-generated content through the online social media, which has affected the way in which people express their emotions. This has led to increase the interest by different researchers to classify and analyse sentiment and the different emotions presented in the social media contents in order to identify people’s opinions and reviews.

Most of these research studies have been carried out to analyse sentiment and emotions found in English texts, where few studies have been conducted on Arabic contents, which have been focused on analysing the sentiment as positive and negative, instead of the different emotions’ classes. Therefore we have focused on analysing different six emotions’ classes in Arabic contents, especially Arabic tweets which have unstructured nature that make it challenging task compared to analysing the formal structured contents found in Arabic journals and books. On the other hand, the recent developments in the distributional sematic models, have encouraged us to test the effect of the distributional measures on the classification process, which was not investigated by any other classification-related studies for analysing Arabic texts. As a result, the model has successfully improved the average accuracy to more than 86%, compared to the different sentiment and emotions studies for classifying Arabic texts through our developed semi-supervised approach which has employed the contextual and the co-occurrence information from a large amount of unlabelled dataset instead of increasing the labelled datasets which is used as a standard way to increase the performance in different supervised methods, that can be impractical in some situations due to the time and cost limitations resulted from these increases of the labelled dataset. In addition to the different remarkable achieved results, the model has recorded a high average accuracy, 85.30%, after removing the labels from the unlabelled contextual information which was used in the labelled dataset during the classification process. Moreover, due to the unstructured nature of Twitter contents, a general set of pre-processing techniques for Arabic texts was found which has resulted in increasing the accuracy of the six emotions’ classes to 85.95% while employing the contextual information from the unlabelled dataset.

Msc Thesis Supervisor: Dr Mathew Purver,

Date/Time: December 2, 2014 at 12-1pm

Location: Auditorium 6F49 in CCIS and broadcast to CCIS Building 31 in 2079