In Vouchers We (Hope to) Trust: Unveiling Hidden Errors in GenBank's Tetrapod Taxonomic Foundations

Loading...
Thumbnail Image

Full text at PDC

Publication date

2025

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Wiley
Citations
Google Scholar

Citation

Carné, A., Vieites, D.R. and van den Burg, M.P. (2025), In Vouchers We (Hope to) Trust: Unveiling Hidden Errors in GenBank's Tetrapod Taxonomic Foundations. Mol Ecol, 34: e17812. https://doi.org/10.1111/mec.17812

Abstract

Genetic repositories are invaluable resources foundational to various biological disciplines. While their data and metadata reliability are essential for robust research outcomes, numerous studies have highlighted data quality and consistency issues. Here, we detect and quantify errors at the most fundamental level by analysing the congruence of sequences derived from the same genetic marker and specimen voucher across tetrapods. Our analysis reveals that 32% of re-sequenced vouchers (with identical field or museum numbers) yield unequal sequences, ranging from a few mutations to significant divergences (0.06%–33.95%). These divergences may result from sample misidentification, labelling errors, fidelity disparities between sequencing methods, or contamination at various stages of the research process. Our findings demonstrate errors within GenBank at its most basal level and suggest that, although undetectable, a similar error rate likely exists in non-re-sequenced data. These previously overlooked errors are concerning because they arise from replicated experiments, which are uncommon, and raise serious questions about the reliability of non-re-sequenced specimens. Such errors can compromise the accuracy of biodiversity assessments (e.g., taxonomic assessment, eDNA and barcoding), phylogenetic analyses and conservation planning by artificially inflating the intraspecific divergence or misidentifying (to-be-described) species. Additionally, the accuracy of large-scale biological studies that rely on such data can be compromised. Our concerning results call for protocols ensuring sample traceability to the specimens or tissues during the whole process of data generation, analysis and deposition in a database. We propose a third-party annotation system for individual GenBank records that would allow flagging common errors and alert both the original submitter and all users to potential problems without modifying the original records.

Research Projects

Organizational Units

Journal Issue

Description

This work was supported by Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación 10.13039/501100011033.

Keywords

Collections