A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

Niddal Imam, Biju Issac, Seibu Mary Jacob

Research output: Contribution to journalArticleResearchpeer-review

Abstract

Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called 'Twitter Spam Drift'. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter Spam Drift and outperform the existing techniques.
Original languageEnglish
Number of pages18
JournalInternational Journal of Computational Intelligence and Applications
Volume18
Issue number2
Publication statusPublished - 30 Jun 2019

Fingerprint

Spam
Semi-supervised Learning
Supervised learning
Learning systems
Machine Learning
Experiments
Express
Vary
Evaluate

Cite this

@article{4812a6fb66434a8b96cde5386c529eca,
title = "A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift",
abstract = "Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called 'Twitter Spam Drift'. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter Spam Drift and outperform the existing techniques.",
author = "Niddal Imam and Biju Issac and Jacob, {Seibu Mary}",
year = "2019",
month = "6",
day = "30",
language = "English",
volume = "18",
journal = "International Journal of Computational Intelligence and Applications",
issn = "1757-5885",
publisher = "World Scientific Publishing Co.",
number = "2",

}

A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift. / Imam, Niddal; Issac, Biju; Jacob, Seibu Mary.

In: International Journal of Computational Intelligence and Applications, Vol. 18, No. 2, 30.06.2019.

Research output: Contribution to journalArticleResearchpeer-review

TY - JOUR

T1 - A Semi-Supervised Learning Approach for Tackling Twitter Spam Drift

AU - Imam, Niddal

AU - Issac, Biju

AU - Jacob, Seibu Mary

PY - 2019/6/30

Y1 - 2019/6/30

N2 - Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called 'Twitter Spam Drift'. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter Spam Drift and outperform the existing techniques.

AB - Twitter has changed the way people get information by allowing them to express their opinion and comments on the daily tweets. Unfortunately, due to the high popularity of Twitter, it has become very attractive to spammers. Unlike other types of spam, Twitter spam has become a serious issue in the last few years. The large number of users and the high amount of information being shared on Twitter play an important role in accelerating the spread of spam. In order to protect the users, Twitter and the research community have been developing different spam detection systems by applying different machine-learning techniques. However, a recent study showed that the current machine learning-based detection systems are not able to detect spam accurately because spam tweet characteristics vary over time. This issue is called 'Twitter Spam Drift'. In this paper, a semi-supervised learning approach (SSLA) has been proposed to tackle this. The new approach uses the unlabeled data to learn the structure of the domain. Different experiments were performed on English and Arabic datasets to test and evaluate the proposed approach and the results show that the proposed SSLA can reduce the effect of Twitter Spam Drift and outperform the existing techniques.

M3 - Article

VL - 18

JO - International Journal of Computational Intelligence and Applications

JF - International Journal of Computational Intelligence and Applications

SN - 1757-5885

IS - 2

ER -