RT Journal Article
T1 Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry
A1 Caponio, Vito Carlo Alberto
A1 Lorenzo-Pouso, Alejandro I
A1 Magalhaes, Marco
A1 Ali, Aiman
A1 Adamo, Daniela
A1 Cirillo, Nicola
A1 López-Pintor Muñoz, Rosa María
A1 Musella, Gennaro
AB Objectives: Evidence-based dentistry relies heavily on systematic reviews and meta-analyses (SRMA), considered the most robust forms of evidence. Still, conducting SRMA is time- and resource-intensive, with high error rates in data extraction. Artificial intelligence (AI) and large language models (LLMs) offer the potential to automate and accelerate SRMA processes such as data extraction. However, assessing the reliability and accuracy of these new AI-based technologies for SRMA is crucial. This study evaluated the accuracy of four LLMs (DeepSeek v3 R1, Claude 3.5 Sonnet, ChatGPT-4o, and Gemini 2.0-flash) in extracting different primary numeric outcomes data in various dental topics.Methods: LLMs were queried via APIs using default settings and a SMART-format prompt. Descriptive analysis was conducted at sub-outcome, outcome, and study levels. Errors were classified as hallucinations, missed, or omitted data.Results: Overall extraction accuracy was exceptionally high at the sub-outcome level, with only 3 hallucinations (from Gemini 2.0-flash). Total errors increased at the outcome level and study level. Gemini 2.0-flash generally performed significantly worse than others (p < 0.01). Claude 3.5 Sonnet and DeepSeek-v3 R1 generally exhibited superior accuracy and lower omission rates in full-text extraction compared to Gemini 2.0-flash and ChatGPT-4o.Conclusions: This first comparative evaluation of multiple LLMs for data extraction in dental research from full-text PDFs highlights their significant potential but also limitations. Performance varied notably between models, with cost not directly correlating with superior performance. While single data point extraction was highly accurate, errors increased at higher aggregation levels. Standardized outcome reporting in studies could benefit future LLM extraction, and we offer a solid benchmark for future performance comparisons.Clinical significance: This study demonstrates that LLMs can achieve high accuracy in extracting single numeric outcomes, but omission errors in full-text analyses limit their independent use in SRMA. Improving outcome reporting standards and leveraging accurate, lower-cost models may enhance evidence synthesis efficiency in dentistry and beyond.
PB Elsevier
SN 0300-5712
YR 2025
FD 2025-11-19
LK https://hdl.handle.net/20.500.14352/129877
UL https://hdl.handle.net/20.500.14352/129877
LA eng
NO Caponio VCA, Lorenzo-Pouso AI, Magalhaes M, Ali A, Adamo D, Cirillo N, López-Pintor RM, Musella G. Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry. J Dent. 2026 Jan;164:106245. doi: 10.1016/j.jdent.2025.106245
DS Docta Complutense
RD 26 feb 2026