Research Replication Prediction Using Weakly Supervised Learning

Knowing whether a published research result can be replicated is important. Carrying out direct replication of published research incurs a high cost. There are efforts tried to use machine learning aided methods to predict scientific claims{'} replicability. However, existing machine learning aided approaches use only hand-extracted statistics features such as p-value, sample size, etc. without utilizing research papers{'} text information and train only on a very small size of annotated data without making the most use of a large number of unlabeled articles. Therefore, it is desirable to develop effective machine learning aided automatic methods which can automatically extract text information as features so that we can benefit from Natural Language Processing techniques. Besides, we aim for an approach that benefits from both labeled and the large number of unlabeled data. In this paper, we propose two weakly supervised learning approaches that use automatically extracted text information of research papers to improve the prediction accuracy of research replication using both labeled and unlabeled datasets. Our experiments over real-world datasets show that our approaches obtain much better prediction performance compared to the supervised models utilizing only statistic features and a small size of labeled dataset. Further, we are able to achieve an accuracy of 75.76{\%} for predicting the replicability of research.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here