The StockNet dataset is a comprehensive dataset for stock movement prediction from tweets and historical stock prices. It consists of two-year price movements from 01/01/2014 to 01/01/2016 of 88 stocks, coming from all the 8 stocks in the Conglomerates sector and the top 10 stocks in capital size in each of the other 8 sectors.
25 PAPERS • 1 BENCHMARK
The Earning Calls dataset consists of processed earning conference calls data (text and audio). It can be used to predict financial risk from both textual and vocal features from conference calls.
7 PAPERS • NO BENCHMARKS YET
The EDT dataset is designed for corporate event detection and text-based stock prediction (trading strategy) benchmark.
2 PAPERS • NO BENCHMARKS YET
We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents (the named entity recognition part) and link them to their numerical values (the relation extraction part).
2 PAPERS • 1 BENCHMARK
This research aimed at the case of customers default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel Sorting Smoothing Method to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y = A + BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural networ
1 PAPER • NO BENCHMARKS YET
MAEC is a new, large-scale multi-modal, text-audio paired, earnings-call dataset named MAEC, based on S&P 1500 companies.
0 PAPER • NO BENCHMARKS YET