In Vouchers We (Hope to) Trust: Unveiling Hidden Errors in GenBank's Tetrapod Taxonomic Foundations

dc.contributor.authorCarné, Albert
dc.contributor.authorVieites, David R.
dc.contributor.authorvan den Burg, Matthijs P.
dc.date.accessioned2025-08-07T12:24:04Z
dc.date.available2025-08-07T12:24:04Z
dc.date.issued2025-06
dc.descriptionThis work was supported by Ministerio de Ciencia, Innovación y Universidades, Agencia Estatal de Investigación 10.13039/501100011033.
dc.description.abstractGenetic repositories are invaluable resources foundational to various biological disciplines. While their data and metadata reliability are essential for robust research outcomes, numerous studies have highlighted data quality and consistency issues. Here, we detect and quantify errors at the most fundamental level by analysing the congruence of sequences derived from the same genetic marker and specimen voucher across tetrapods. Our analysis reveals that 32% of re-sequenced vouchers (with identical field or museum numbers) yield unequal sequences, ranging from a few mutations to significant divergences (0.06%–33.95%). These divergences may result from sample misidentification, labelling errors, fidelity disparities between sequencing methods, or contamination at various stages of the research process. Our findings demonstrate errors within GenBank at its most basal level and suggest that, although undetectable, a similar error rate likely exists in non-re-sequenced data. These previously overlooked errors are concerning because they arise from replicated experiments, which are uncommon, and raise serious questions about the reliability of non-re-sequenced specimens. Such errors can compromise the accuracy of biodiversity assessments (e.g., taxonomic assessment, eDNA and barcoding), phylogenetic analyses and conservation planning by artificially inflating the intraspecific divergence or misidentifying (to-be-described) species. Additionally, the accuracy of large-scale biological studies that rely on such data can be compromised. Our concerning results call for protocols ensuring sample traceability to the specimens or tissues during the whole process of data generation, analysis and deposition in a database. We propose a third-party annotation system for individual GenBank records that would allow flagging common errors and alert both the original submitter and all users to potential problems without modifying the original records.
dc.description.departmentDepto. de Biodiversidad, Ecología y Evolución
dc.description.facultyFac. de Ciencias Biológicas
dc.description.refereedTRUE
dc.description.sponsorshipMinisterio de Ciencia, Innovación y Universidades (España)
dc.description.statuspub
dc.identifier.citationCarné, A., Vieites, D.R. and van den Burg, M.P. (2025), In Vouchers We (Hope to) Trust: Unveiling Hidden Errors in GenBank's Tetrapod Taxonomic Foundations. Mol Ecol, 34: e17812. https://doi.org/10.1111/mec.17812
dc.identifier.doi10.1111/mec.17812
dc.identifier.essn1365-294X
dc.identifier.issn0962-1083
dc.identifier.officialurlhttps://doi.org/10.1111/mec.17812
dc.identifier.relatedurlhttps://onlinelibrary.wiley.com/doi/10.1111/mec.17812
dc.identifier.urihttps://hdl.handle.net/20.500.14352/123118
dc.journal.titleMolecular Ecology
dc.language.isoeng
dc.publisherWiley
dc.relation.projectIDinfo:eu-repo/grantAgreement/MICINN//DIN2021-011964/ES
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.cdu57.06
dc.subject.cdu59
dc.subject.cdu575
dc.subject.cdu57.08
dc.subject.keywordData quality
dc.subject.keywordDNA sequencing
dc.subject.keywordGenBank
dc.subject.keywordGenetic data
dc.subject.keywordIntraspecific diversity
dc.subject.keywordMuseum specimens
dc.subject.keywordRepositories
dc.subject.keywordTaxonomy
dc.subject.ucmZoología
dc.subject.ucmGenética
dc.subject.ucmBiología molecular (Biología)
dc.subject.ucmInformática (Informática)
dc.subject.unesco2401.14 Taxonomía Animal
dc.subject.unesco2401.08 Genética Animal
dc.subject.unesco1203.12 Bancos de Datos
dc.titleIn Vouchers We (Hope to) Trust: Unveiling Hidden Errors in GenBank's Tetrapod Taxonomic Foundations
dc.typereview article
dc.type.hasVersionVoR
dc.volume.number34
dspace.entity.typePublication

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
In_vouchers _we _hope.pdf
Size:
366.92 KB
Format:
Adobe Portable Document Format

Collections