TY - JOUR
T1 - CDSMOTE
T2 - class decomposition and synthetic minority class oversampling technique for imbalanced-data classification
AU - Elyan, Eyad
AU - Moreno-Garcia, Carlos Francisco
AU - Jayne, Chrisina
N1 - Publisher Copyright:
© 2020, The Author(s).
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/7/18
Y1 - 2020/7/18
N2 - Class-imbalanced datasets are common across several domains such as health, banking, security, and others. The dominance of majority class instances (negative class) often results in biased learning models, and therefore, classifying such datasets requires employing some methods to compact the problem. In this paper, we propose a new hybrid approach aiming at reducing the dominance of the majority class instances using class decomposition and increasing the minority class instances using an oversampling method. Unlike other undersampling methods, which suffer data loss, our method preserves the majority class instances, yet significantly reduces its dominance, resulting in a more balanced dataset and hence improving the results. A large-scale experiment using 60 public datasets was carried out to validate the proposed methods. The results across three standard evaluation metrics show the comparable and superior results with other common and state-of-the-art techniques.
AB - Class-imbalanced datasets are common across several domains such as health, banking, security, and others. The dominance of majority class instances (negative class) often results in biased learning models, and therefore, classifying such datasets requires employing some methods to compact the problem. In this paper, we propose a new hybrid approach aiming at reducing the dominance of the majority class instances using class decomposition and increasing the minority class instances using an oversampling method. Unlike other undersampling methods, which suffer data loss, our method preserves the majority class instances, yet significantly reduces its dominance, resulting in a more balanced dataset and hence improving the results. A large-scale experiment using 60 public datasets was carried out to validate the proposed methods. The results across three standard evaluation metrics show the comparable and superior results with other common and state-of-the-art techniques.
UR - http://www.scopus.com/inward/record.url?scp=85088127619&partnerID=8YFLogxK
U2 - 10.1007/s00521-020-05130-z
DO - 10.1007/s00521-020-05130-z
M3 - Article
AN - SCOPUS:85088127619
SN - 0941-0643
JO - Neural Computing and Applications
JF - Neural Computing and Applications
ER -