Abstract
Significant progress has been made in the area of
text classification and natural language processing. However, like
many other datasets from across different domains, text-based
datasets may suffer from class-imbalance. This problem leads
to model’s bias toward the majority class instances. In this
paper, we present a new approach to handle class-imbalance
in text data by means of unsupervised learning algorithms.
We present class-decomposition using two different unsupervised
methods, namely k-means and Density-Based Spatial Clustering
of Applications with Noise, applied to two different sentiment
analysis data sets. The experimental results show that utilizing
clustering to find within-class similarities can lead to significant
improvement in learning algorithm’s performances as well as
reducing the dominance of the majority class instances without
causing information loss.
text classification and natural language processing. However, like
many other datasets from across different domains, text-based
datasets may suffer from class-imbalance. This problem leads
to model’s bias toward the majority class instances. In this
paper, we present a new approach to handle class-imbalance
in text data by means of unsupervised learning algorithms.
We present class-decomposition using two different unsupervised
methods, namely k-means and Density-Based Spatial Clustering
of Applications with Noise, applied to two different sentiment
analysis data sets. The experimental results show that utilizing
clustering to find within-class similarities can lead to significant
improvement in learning algorithm’s performances as well as
reducing the dominance of the majority class instances without
causing information loss.
Original language | English |
---|---|
Publication status | Accepted/In press - 10 Apr 2021 |
Event | International Joint Conference on Neural Networks 2021 - Virtual Duration: 18 Jul 2021 → 22 Jul 2021 http://ijcnn.org |
Conference
Conference | International Joint Conference on Neural Networks 2021 |
---|---|
Abbreviated title | IJCNN 2021 |
Period | 18/07/21 → 22/07/21 |
Internet address |