Assessing the Ability of ChatGPT to Screen Articles for Systematic Reviews

Paper · arXiv 2307.06464 · Published July 12, 2023

EUGENE SYRIANI, DIRO, Université de Montréal, Canada

ISTVAN DAVID, DIRO, Université de Montréal, Canada

GAURANSH KUMAR, DIRO, Université de Montréal, Canada

“By organizing knowledge within a research field, Systematic Reviews (SR) provide valuable leads to steer research. Evidence suggests that SRs have become first-class artifacts in software engineering. However, the tedious manual effort associated with the screening phase of SRs renders these studies a costly and error-prone endeavor. While screening has traditionally been considered not amenable to automation, the advent of generative AI-driven chatbots, backed with large language models is set to disrupt the field. In this report, we propose an approach to leverage these novel technological developments for automating the screening of SRs. We assess the consistency, classification performance, and generalizability of ChatGPT in screening articles for SRs and compare these figures with those of traditional classifiers used in SR automation. Our results indicate that ChatGPT is a viable option.”

“This work provides the first look at the opportunities of using ChatGPT and similar LLM for the automation of article screening in SRs. Through detailed and systematic experiments, we show that ChatGPT performs comparably in making decisions about the inclusion of articles into an SR compared to traditional classifiers. Our results indicate that ChatGPT is a viable option to automate screening and its costs are minimal at the time of writing. Due to these beneficial qualities, we foresee a rapid adoption curve of LLMs into survey tools and novel surveying techniques to appear, e.g., solo reviewing aided by ChatGPT.”