Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

Artículo de revista

Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

Pedro Rodrigues Vidor et al · Nature Portfolio · 2026

Acceso abierto disponible

Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Autor / responsable

Pedro Rodrigues Vidor et al

Editorial

Nature Portfolio

Año

2026

ISSN

2045-2322

ISSN

2045-2322

Idioma

eng

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Acceso abierto disponible

Recurso identificado como acceso abierto, sin confirmar automáticamente si es texto completo directo.

Abrir recurso

Resumen

Descripción general del contenido del recurso.

Abstract Risk of bias assessment is a crucial step in evidence synthesis. The traditionally adopted tool, however, is complex, resource-intensive, and unreliable. While prior investigations have focused on whether Large Language Models (LLMs) could perform assessments with RoB 2, this study is the first to evaluate the reliability of ROBUST-RCT, a novel risk-of-bias tool, as applied by humans and LLMs. Reviewers working independently used ROBUST-RCT to assess different aspects of a sample of RCTs and then reached a consensus through discussion. A chain-of-thought prompt instructed four LLMs on how to apply ROBUST-RCT. The primary analysis used Gwet’s AC2 to assess inter-rater reliability based on all the final ratings (i.e., the ratings in the second step of the tool) for all the core items of the ROBUST-RCT. A sample of 56 assessments, derived from 9 studies, was compared for each LLM against human consensus. In the primary analysis, Gwet’s AC2 inter-rater reliability varied across the LLMs. DeepSeek-R1, the lowest performer, yielded an AC2 of 0.46 ( 95% CI: 0.24 to 0.69). On the other side, Gemini 2.5 Pro Preview – the model with higher consistency with human consensus – yielded an AC2 of 0.69 (95% CI: 0.54 to 0.84). With 95% confidence, three of the four tested LLMs achieved ‘moderate’ or higher reliability based on benchmarking. LLMs could be helpful in the risk-of-bias assessment of systematic reviews using the ROBUST-RCT tool.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, P. R. V. E. (2026). Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study. https://doi.org/10.1038/s41598-026-44303-z

MLA

al, Pedro Rodrigues Vidor et. "Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study." 2026. https://doi.org/10.1038/s41598-026-44303-z.

Chicago

al, Pedro Rodrigues Vidor et. 2026. "Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study.". https://doi.org/10.1038/s41598-026-44303-z.

Harvard

al, P. R. V. E. 2026, Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study, Nature Portfolio, available at: https://doi.org/10.1038/s41598-026-44303-z [Accessed 28 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

RIS BibTeX

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título: Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

Autor / colaboradores: Pedro Rodrigues Vidor et al

Editorial: Nature Portfolio

Año de publicación: 2026

ISSN: 2045-2322

ISSN: 2045-2322

Idioma: eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Risk of bias; Inter-rater reliability; Randomized controlled trials; Large language models

Assessing the risk of bias of clinical trials with large language models and ROBUST-RCT: a feasibility study

3D scan-based classification of Chinese young female hand morphology

Acceso al recurso

Resumen

Cómo citar

APA 7

MLA

Chicago

Harvard

Compartir e imprimir

Exportar referencia

Detalles del recurso

Materias