Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis

Giordano, SaraSen, KornikarMartín-Delgado Alcántara, Miguel Ángel2026-03-022026-03-022026-02-03Giordano, Sara, et al. «Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis». Quantum Machine Intelligence, vol. 8, n.o 1, junio de 2026, p. 9. DOI.org (Crossref), https://doi.org/10.1007/s42484-026-00359-8.2524-490610.1007/s42484-026-00359-8https://hdl.handle.net/20.500.14352/133698© The Author(s) 2026. Next Generation EU PRTR-C17. W911NF-14-1-0103.A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm’s robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.engAttribution-NonCommercial-NoDerivatives 4.0 Internationalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesisjournal article2524-4914https://dx.doi.org/10.1007/s42484-026-00359-8https://link-springer-com.bucm.idm.oclc.org/article/10.1007/s42484-026-00359-8open access004.27004.85530.145Circuit depthCircuit optimizationQuantum circuitsReinforcement learningInformática (Informática)Teoría de los quantaInteligencia artificial (Informática)1203.04 Inteligencia Artificial2212.12 Teoría Cuántica de Campos1203.02 Lenguajes Algorítmicos