2026-06-29T15:09:12Zhttps://docta.ucm.es/rest/oai/request

oai:docta.ucm.es:20.500.14352/1336982026-03-03T00:57:58Zcom_20.500.14352_14col_20.500.14352_15

Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis Giordano, Sara Sen, Kornikar Martín-Delgado Alcántara, Miguel Ángel 004.27 004.85 530.145 Circuit depth Circuit optimization Quantum circuits Reinforcement learning Informática (Informática) Teoría de los quanta Inteligencia artificial (Informática) 1203.04 Inteligencia Artificial 2212.12 Teoría Cuántica de Campos 1203.02 Lenguajes Algorítmicos © The Author(s) 2026. Next Generation EU PRTR-C17. W911NF-14-1-0103. A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm’s robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization. Ministerio de Ciencia e Innovación (España) Agencia Estatal de Investigación European Comission Comunidad de Madrid Ministerio de Transformación Digital y de la Función Pública (España) U.S. Army Research Office Depto. de Física Teórica Fac. de Ciencias Físicas TRUE pub 2026-03-02T18:05:38Z 2026-03-02T18:05:38Z 2026-02-03 journal article VoR https://hdl.handle.net/20.500.14352/133698 2524-4906 10.1007/s42484-026-00359-8 2524-4914 eng info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-122547NB-I00/ES/TECNOLOGIAS CLAVE PARA COMPUTACION CUANTICA/ TEC-2024/COM-84 QUITEMAD-CM Giordano, Sara, et al. «Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis». Quantum Machine Intelligence, vol. 8, n.o 1, junio de 2026, p. 9. DOI.org (Crossref), https://doi.org/10.1007/s42484-026-00359-8. Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ open access application/pdf Springer