RT Journal Article T1 Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry A1 Caponio, Vito Carlo Alberto A1 Lorenzo-Pouso, Alejandro I A1 Magalhaes, Marco A1 Ali, Aiman A1 Adamo, Daniela A1 Cirillo, Nicola A1 López-Pintor Muñoz, Rosa María A1 Musella, Gennaro AB Objectives: Evidence-based dentistry relies heavily on systematic reviews and meta-analyses (SRMA), considered the most robust forms of evidence. Still, conducting SRMA is time- and resource-intensive, with high error rates in data extraction. Artificial intelligence (AI) and large language models (LLMs) offer the potential to automate and accelerate SRMA processes such as data extraction. However, assessing the reliability and accuracy of these new AI-based technologies for SRMA is crucial. This study evaluated the accuracy of four LLMs (DeepSeek v3 R1, Claude 3.5 Sonnet, ChatGPT-4o, and Gemini 2.0-flash) in extracting different primary numeric outcomes data in various dental topics.Methods: LLMs were queried via APIs using default settings and a SMART-format prompt. Descriptive analysis was conducted at sub-outcome, outcome, and study levels. Errors were classified as hallucinations, missed, or omitted data.Results: Overall extraction accuracy was exceptionally high at the sub-outcome level, with only 3 hallucinations (from Gemini 2.0-flash). Total errors increased at the outcome level and study level. Gemini 2.0-flash generally performed significantly worse than others (p < 0.01). Claude 3.5 Sonnet and DeepSeek-v3 R1 generally exhibited superior accuracy and lower omission rates in full-text extraction compared to Gemini 2.0-flash and ChatGPT-4o.Conclusions: This first comparative evaluation of multiple LLMs for data extraction in dental research from full-text PDFs highlights their significant potential but also limitations. Performance varied notably between models, with cost not directly correlating with superior performance. While single data point extraction was highly accurate, errors increased at higher aggregation levels. Standardized outcome reporting in studies could benefit future LLM extraction, and we offer a solid benchmark for future performance comparisons.Clinical significance: This study demonstrates that LLMs can achieve high accuracy in extracting single numeric outcomes, but omission errors in full-text analyses limit their independent use in SRMA. Improving outcome reporting standards and leveraging accurate, lower-cost models may enhance evidence synthesis efficiency in dentistry and beyond. PB Elsevier SN 0300-5712 YR 2025 FD 2025-11-19 LK https://hdl.handle.net/20.500.14352/129877 UL https://hdl.handle.net/20.500.14352/129877 LA eng NO Caponio VCA, Lorenzo-Pouso AI, Magalhaes M, Ali A, Adamo D, Cirillo N, López-Pintor RM, Musella G. Accuracy of LLMs to retrieve numeric data for meta-analysis in dentistry. J Dent. 2026 Jan;164:106245. doi: 10.1016/j.jdent.2025.106245 DS Docta Complutense RD 22 ene 2026