Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis

Giordano, Sara; Sen, Kornikar; Martín-Delgado Alcántara, Miguel Ángel

doi:10.1007/s42484-026-00359-8

Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis

Download

Quantum Mach. Intell. 8, 9 (2026).pdf (4.75 MB)

Official URL

https://dx.doi.org/10.1007/s42484-026-00359-8

Publication date

2026

Authors

Giordano, Sara

Sen, Kornikar

Martín-Delgado Alcántara, Miguel Ángel

Publisher

Springer

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/133698

Citation

Giordano, Sara, et al. «Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis». Quantum Machine Intelligence, vol. 8, n.o 1, junio de 2026, p. 9. DOI.org (Crossref), https://doi.org/10.1007/s42484-026-00359-8.

Abstract

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension. The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm’s robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.

Description

UCM subjects

Informática (Informática), Teoría de los quanta, Inteligencia artificial (Informática)

Unesco subjects

1203.04 Inteligencia Artificial, 2212.12 Teoría Cuántica de Campos, 1203.02 Lenguajes Algorítmicos

Collections

Artículos

Full item page

Hybrid reward-driven reinforcement learning for efficient quantum circuit synthesis

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections