Phase transition in the computational complexity of the shortest common superstring and genome assembly

dc.contributor.authorFernández Pérez, Luis Antonio
dc.contributor.authorMartín Mayor, Víctor
dc.contributor.authorYllanes, D.
dc.date.accessioned2025-11-05T15:29:44Z
dc.date.available2025-11-05T15:29:44Z
dc.date.issued2024-01-24
dc.description©2024 American Physical Society
dc.description.abstractGenome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the “easy” phase (solvable by polynomialtime algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime.
dc.description.departmentDepto. de Física Teórica
dc.description.facultyFac. de Ciencias Físicas
dc.description.refereedTRUE
dc.description.sponsorshipMinisterio de Ciencia, Innovación y Universidades (España)
dc.description.sponsorshipAgencia Estatal de Investigación (España)
dc.description.sponsorshipEuropean Commission
dc.description.sponsorshipChan Zuckerberg Biohub
dc.description.statuspub
dc.identifier.citationFernandez, L. A., et al. «Phase Transition in the Computational Complexity of the Shortest Common Superstring and Genome Assembly». Physical Review E, vol. 109, n.o 1, enero de 2024, p. 014133. DOI.org (Crossref), https://doi.org/10.1103/PhysRevE.109.014133
dc.identifier.doi10.1103/physreve.109.014133
dc.identifier.essn2470-0053
dc.identifier.issn2470-0045
dc.identifier.officialurlhttps//doi.org/10.1103/physreve.109.014133
dc.identifier.relatedurlhttps://journals.aps.org/pre/abstract/10.1103/PhysRevE.109.014133
dc.identifier.relatedurlhttps://arxiv.org/abs/2210.09986
dc.identifier.urihttps://hdl.handle.net/20.500.14352/125779
dc.issue.number1
dc.journal.titlePhysical Review E
dc.language.isoeng
dc.page.final014133-8
dc.page.initial014133-1
dc.publisherAmerican Physical Society
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136374NB-C21/ES/COMPLEJIDAD EN FISICA Y MAS ALLA: DE LOS VIDRIOS DE ESPIN A LAS INTERACCIONES SOCIALES/
dc.rights.accessRightsopen access
dc.subject.cdu531.1:004
dc.subject.cdu57:004
dc.subject.keywordBioinformatics
dc.subject.keywordComputational complexity
dc.subject.keywordNP-hard problems
dc.subject.keywordPhase transitions
dc.subject.keywordSequencing analysis
dc.subject.keywordMonte Carlo methods
dc.subject.ucmFísica (Física)
dc.subject.ucmAnálisis numérico
dc.subject.ucmInformática (Informática)
dc.subject.ucmBiología
dc.subject.unesco2212 Física Teórica
dc.subject.unesco1203.17 Informática
dc.subject.unesco2205.10 Mecánica Estadística
dc.subject.unesco1206 Análisis Numérico
dc.titlePhase transition in the computational complexity of the shortest common superstring and genome assembly
dc.typejournal article
dc.type.hasVersionAM
dc.volume.number109
dspace.entity.typePublication
relation.isAuthorOfPublication146096b1-5825-4230-8ad9-b2dad468673b
relation.isAuthorOfPublication061118c0-eadf-4ee3-8897-2c9b65a6df66
relation.isAuthorOfPublication.latestForDiscovery146096b1-5825-4230-8ad9-b2dad468673b

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
(2024). Phase transition in the computational complexity ....pdf
Size:
824.54 KB
Format:
Adobe Portable Document Format
Description:
last revised 11 Mar 2024 (this version, v2) Postprint

Collections