Stock Price Prediction Based on Natural Language Processing

Complexity 2022 · Xiaobin Tang, Nuo Lei, Manru Dong, Dan Ma ·

The keywords used in traditional stock price prediction are mainly based on literature and experience. This paper designs a new text mining method for keywords augmentation based on natural language processing models including Bidirectional Encoder Representation from Transformers (BERT) and Neural Contextualized Representation for Chinese Language Understanding (NEZHA). The BERT vectorization and the NEZHA keyword discrimination model extend the seed keywords from two dimensions of similarity and importance respectively, thus constructing the keyword thesaurus for stock price prediction. Furthermore, the predictive ability of seed words and our generated words are compared by the LSTM model, taking the CSI 300 as an example. The result shows that, compared with seed keywords, the search indexes of extracted words have higher correlations with CSI 300 and can improve its forecasting performance. Therefore, the keywords augmentation model designed in this paper is helpful to provide references for other variable expansion in financial time series forecasting.

PDF Abstract