Comparisons and Selections of Features and Classifiers for Short Text Classification

abstract

Published under licence by IOP Publishing Ltd. Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

authors

Lu, Mi

published proceedings

2017 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE APPLICATIONS AND TECHNOLOGIES (AIAAT 2017)

author list (cited authors)

Wang, Y. e., Zhou, Z., Jin, S., Liu, D., & Lu, M. i.

citation count

35

complete list of authors

Wang, Ye||Zhou, Zhi||Jin, Shan||Liu, Debin||Lu, Mi

publication date

October 2017

publisher

IOP Publishing Publisher

published in

IOP Conference Series: Materials Science and Engineering Journal

keywords

46 Information And Computing Sciences
4602 Artificial Intelligence
4605 Data Management And Data Science
4611 Machine Learning
Networking And Information Technology R&d (nitrd)

Digital Object Identifier (DOI)

10.1088/1757-899X/261/1/012018

start page

012018

end page

012018

volume

261

issue

1

URL

http://dx.doi.org/10.1088/1757-899x/261/1/012018

Comparisons and Selections of Features and Classifiers for Short Text Classification Conference Paper

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

published in

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL