RT Journal Article T1 Phase transition in the computational complexity of the shortest common superstring and genome assembly A1 Fernández Pérez, Luis Antonio A1 Martín Mayor, Víctor A1 Yllanes, D. AB Genome assembly, the process of reconstructing a long genetic sequence by aligning and merging short fragments, or reads, is known to be NP-hard, either as a version of the shortest common superstring problem or in a Hamiltonian-cycle formulation. That is, the computing time is believed to grow exponentially with the problem size in the worst case. Despite this fact, high-throughput technologies and modern algorithms currently allow bioinformaticians to handle datasets of billions of reads. Using methods from statistical mechanics, we address this conundrum by demonstrating the existence of a phase transition in the computational complexity of the problem and showing that practical instances always fall in the “easy” phase (solvable by polynomialtime algorithms). In addition, we propose a Markov-chain Monte Carlo method that outperforms common deterministic algorithms in the hard regime. PB American Physical Society SN 2470-0045 YR 2024 FD 2024-01-24 LK https://hdl.handle.net/20.500.14352/125779 UL https://hdl.handle.net/20.500.14352/125779 LA eng NO Fernandez, L. A., et al. «Phase Transition in the Computational Complexity of the Shortest Common Superstring and Genome Assembly». Physical Review E, vol. 109, n.o 1, enero de 2024, p. 014133. DOI.org (Crossref), https://doi.org/10.1103/PhysRevE.109.014133 NO ©2024 American Physical Society NO Ministerio de Ciencia, Innovación y Universidades (España) NO Agencia Estatal de Investigación (España) NO European Commission NO Chan Zuckerberg Biohub DS Docta Complutense RD 18 mar 2026