TY - GEN
T1 - Efficient method for feature selection in text classification
AU - Sun, Jian
AU - Zhang, Xiang
AU - Liao, Dan
AU - Chang, Victor
PY - 2018/3/8
Y1 - 2018/3/8
N2 - In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
AB - In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
UR - http://www.scopus.com/inward/record.url?scp=85047877564&partnerID=8YFLogxK
U2 - 10.1109/ICEngTechnol.2017.8308201
DO - 10.1109/ICEngTechnol.2017.8308201
M3 - Conference contribution
AN - SCOPUS:85047877564
T3 - Proceedings of 2017 International Conference on Engineering and Technology, ICET 2017
SP - 1
EP - 6
BT - Proceedings of 2017 International Conference on Engineering and Technology, ICET 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 International Conference on Engineering and Technology
Y2 - 21 August 2017 through 23 August 2017
ER -