Automated title and abstract screening for scoping reviews using the GPT-4 Large Language Model

14 Nov 2023  ·  David Wilkins ·

Scoping reviews, a type of literature review, require intensive human effort to screen large numbers of scholarly sources for their relevance to the review objectives. This manuscript introduces GPTscreenR, a package for the R statistical programming language that uses the GPT-4 Large Language Model (LLM) to automatically screen sources. The package makes use of the chain-of-thought technique with the goal of maximising performance on complex screening tasks. In validation against consensus human reviewer decisions, GPTscreenR performed similarly to an alternative zero-shot technique, with a sensitivity of 71%, specificity of 89%, and overall accuracy of 84%. Neither method achieved perfect accuracy nor human levels of intraobserver agreement. GPTscreenR demonstrates the potential for LLMs to support scholarly work and provides a user-friendly software framework that can be integrated into existing review processes.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods