TY - JOUR
T1 - Evaluating spam filters and Stylometric Detection of AI-generated phishing emails
AU - Opara, Chidimma
AU - Modesti, Paolo
AU - Golightly, Lewis
PY - 2025/3/15
Y1 - 2025/3/15
N2 - The advanced architecture of Large Language Models (LLMs) has revolutionised natural language processing, enabling the creation of text that convincingly mimics legitimate human communication, including phishing emails. As AI-generated phishing emails become increasingly sophisticated, a critical question arises: How effectively can current email systems and detection mechanisms identify these threats? This study addresses this issue by analysing 63 AI-generated phishing emails created using GPT-4o. It evaluates the effectiveness of major email services, Gmail, Outlook, and Yahoo, in filtering these malicious communications. The findings reveal that Gmail and Outlook allowed more AI-generated phishing emails to bypass their filters compared to Yahoo, highlighting vulnerabilities in existing email filtering systems. To mitigate these challenges, we applied 60 stylometric features across four machine learning models: Logistic Regression, Support Vector Machine, Random Forest, and XGBoost. Among these, XGBoost demonstrated superior performance, achieving 96% accuracy and an AUC score of 99%. Key features such as imperative verb count, clause density, and first-person pronoun usage were instrumental to the model’s success. The dataset of AI-generated phishing emails is publicly available on Kaggle to foster further research.
AB - The advanced architecture of Large Language Models (LLMs) has revolutionised natural language processing, enabling the creation of text that convincingly mimics legitimate human communication, including phishing emails. As AI-generated phishing emails become increasingly sophisticated, a critical question arises: How effectively can current email systems and detection mechanisms identify these threats? This study addresses this issue by analysing 63 AI-generated phishing emails created using GPT-4o. It evaluates the effectiveness of major email services, Gmail, Outlook, and Yahoo, in filtering these malicious communications. The findings reveal that Gmail and Outlook allowed more AI-generated phishing emails to bypass their filters compared to Yahoo, highlighting vulnerabilities in existing email filtering systems. To mitigate these challenges, we applied 60 stylometric features across four machine learning models: Logistic Regression, Support Vector Machine, Random Forest, and XGBoost. Among these, XGBoost demonstrated superior performance, achieving 96% accuracy and an AUC score of 99%. Key features such as imperative verb count, clause density, and first-person pronoun usage were instrumental to the model’s success. The dataset of AI-generated phishing emails is publicly available on Kaggle to foster further research.
UR - https://doi.org/10.1016/j.eswa.2025.127044
U2 - 10.1016/j.eswa.2025.127044
DO - 10.1016/j.eswa.2025.127044
M3 - Article
SN - 0957-4174
VL - 276
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 127044
ER -