Efficient method for feature selection in text classification

Jian Sun, Xiang Zhang, Dan Liao, Victor Chang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.

Original languageEnglish
Title of host publicationProceedings of 2017 International Conference on Engineering and Technology, ICET 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538619490
DOIs
Publication statusPublished - 8 Mar 2018
Event2017 International Conference on Engineering and Technology - UniversityAntalya, Antalya, Turkey
Duration: 21 Aug 201723 Aug 2017

Publication series

NameProceedings of 2017 International Conference on Engineering and Technology, ICET 2017
Volume2018-January

Conference

Conference2017 International Conference on Engineering and Technology
Abbreviated titleICET 2017
CountryTurkey
CityAntalya
Period21/08/1723/08/17

Fingerprint Dive into the research topics of 'Efficient method for feature selection in text classification'. Together they form a unique fingerprint.

Cite this