Fernández Pérez, Luis AntonioMartín Mayor, VíctorYllanes, D.2025-11-052025-11-052024-01-24Fernandez, L. A., et al. «Phase Transition in the Computational Complexity of the Shortest Common Superstring and Genome Assembly». Physical Review E, vol. 109, n.o 1, enero de 2024, p. 014133. DOI.org (Crossref), https://doi.org/10.1103/PhysRevE.109.0141332470-004510.1103/physreve.109.014133https://hdl.handle.net/20.500.14352/125779©2024 American Physical SocietyGenome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the “easy” phase (solvable by polynomialtime algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime.engPhase transition in the computational complexity of the shortest common superstring and genome assemblyjournal article2470-0053https//doi.org/10.1103/physreve.109.014133https://journals.aps.org/pre/abstract/10.1103/PhysRevE.109.014133https://arxiv.org/abs/2210.09986open access531.1:00457:004BioinformaticsComputational complexityNP-hard problemsPhase transitionsSequencing analysisMonte Carlo methodsFísica (Física)Análisis numéricoInformática (Informática)Biología2212 Física Teórica1203.17 Informática2205.10 Mecánica Estadística1206 Análisis Numérico