Arabic Spam Detection in Twitter

Al-Humoud, Nora Al-Twairesh, Mawaheb Al-Tuwaijri, Afnan Al-Moammar and Sarah . 2016

Spam in Twitter has emerged due to the proliferation of this social network among users worldwide coupled with the ease of creating content. Having different characteristics than Web or mail spam, Twitter spam detection approaches have become a new research problem. This study aims to analyse the content of Saudi tweets to detect spam by developing both a rule-based approach that exploits a spam lexicon extracted from the tweets and a supervised learning approach that utilizes statistical methods based on the bag of words model and several features. The focus is on spam in trending hashtags in the Saudi Twittersphere since most of the spam in Saudi tweets is found in hashtags. The features used were identified through empirical analysis then applied in the classification approaches developed. Both approaches showed comparable results in terms of performance measures reported reaching an average F-measure of 85% for the rule based approach and 91.6% for the supervised learning approach.

موقع المؤتمر

Portorož (Slovenia)

اسم المؤتمر

The 2nd Workshop on Arabic Corpora and Processing Tools

المنظمة الممولة

European Language Resources Association (ELRA)

مزيد من المنشورات

The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets

The field of natural language processing (NLP) has witnessed a boom in language representation models with the introduction of pretrained language models that are trained on massive textual data…

Building an Arabic Flight Booking Dialogue System Using a Hybrid Rule-Based and Data Driven Approach

Approaches for developing Dialogue Systems (DSs) are typically categorized into rule-based and data-driven. Data-driven DSs require a massive quantity of training data, while rule-based DSs rely…

2021

Part-of-speech tagging for Arabic tweets using CRF and Bi-LSTM

Over the past few years, Twitter has experienced massive growth and the volume of its online content has increased rapidly. This content has been a rich source for several studies that focused on…

2021

Nora S. AlTwairesh

Arabic Spam Detection in Twitter

Al-Humoud, Nora Al-Twairesh, Mawaheb Al-Tuwaijri, Afnan Al-Moammar and Sarah . 2016