Solving the task scheduling and GPU reconfiguration problem on MIG devices via deep reinforcement learning

dc.contributor.authorVillarrubia Elvira, Jorge
dc.contributor.authorCostero Valero, Luis María
dc.contributor.authorIgual Peña, Francisco Daniel
dc.contributor.authorOlcoz Herrero, Katzalin
dc.date.accessioned2025-11-17T18:35:21Z
dc.date.available2025-11-17T18:35:21Z
dc.date.issued2026-03
dc.description© 2025 The Author(s).
dc.description.abstractRecent advances in dynamic GPU partitioning, such as NVIDIA's Multi-Instance GPU (MIG) technology, have enhanced resource utilization by enabling task co-execution without contention. However, existing MIG schedulers remain limited to static or task-agnostic methods that sacrifice optimality for tractability. This paper presents a Deep Reinforcement Learning framework that seeks to minimize the completion time of a task queue by holistically addressing the dimensions of the problem: task molding, GPU reconfiguration and execution order. To manage the vast solution space, we apply optimizations such as discrete and canonical representation of states, unification of equivalent configurations, action masking, or promoting the exploration of reconfigurations; this offers insights for similar resource management scenarios. The proposed models are extensively evaluated with widely used benchmarks of the Rodinia and Altis suites, and synthetic workloads generated to emulate a wide range of plausible real situations. The final model improves to the state-of-the-art, especially in workloads that clearly contradict the assumptions of previous proposals, achieving a difference of less than 20% to the optimum. Additionally, two different approaches to the problem are faced (offline vs. online), discussing their theoretical advantages and disadvantages, and evaluating them experimentally for the final model.
dc.description.departmentDepto. de Arquitectura de Computadores y Automática
dc.description.facultyFac. de Ciencias Físicas
dc.description.facultyFac. de Informática
dc.description.refereedTRUE
dc.description.sponsorshipMinisterio de Ciencia e Innovación (España)
dc.description.sponsorshipAgencia Estatal de Investigación
dc.description.sponsorshipEuropean Commission
dc.description.statuspub
dc.identifier.citationÁlvarez-Domínguez, Álvaro, et al. «No Black Holes from Light». Physical Review Letters, vol. 133, n.º 4, julio de 2024, p. 041401. DOI.org (Crossref), https://doi.org/10.1103/PhysRevLett.133.041401.
dc.identifier.doi10.1016/j.future.2025.108145
dc.identifier.essn1872-7115
dc.identifier.issn0167-739X
dc.identifier.officialurlhttps://dx.doi.org/10.1016/j.future.2025.108145
dc.identifier.relatedurlhttps://www.sciencedirect.com/science/article/pii/S0167739X2500439X?via%3Dihub
dc.identifier.urihttps://hdl.handle.net/20.500.14352/126143
dc.journal.titleFuture Generation Computer Systems
dc.language.isoeng
dc.page.final108145-16
dc.page.initial108145-1
dc.publisherElsevier
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2021-126576NB-I00/ES/SOFTWARE DE SISTEMA PARA ARQUITECTURAS Y APLICACIONES DE NUEVA GENERACION/
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subject.cdu004
dc.subject.keywordMulti-Instance GPU (MIG)
dc.subject.keywordMoldable resource management
dc.subject.keywordDeep reinforcement learning
dc.subject.keywordTask scheduling
dc.subject.ucmInformática (Informática)
dc.subject.unesco1203.17 Informática
dc.titleSolving the task scheduling and GPU reconfiguration problem on MIG devices via deep reinforcement learning
dc.typejournal article
dc.type.hasVersionVoR
dc.volume.number176
dspace.entity.typePublication
relation.isAuthorOfPublication8788ef00-9b4e-469d-8693-d45f3dfa836a
relation.isAuthorOfPublicationb2616c88-d3da-43df-86cb-3ced1084f460
relation.isAuthorOfPublicatione1ed9960-37d5-4817-8e5c-4e0e392b4d66
relation.isAuthorOfPublication8cfc18ec-4816-404d-982d-21dc07318c07
relation.isAuthorOfPublication.latestForDiscovery8788ef00-9b4e-469d-8693-d45f3dfa836a

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Future Generation Computer Systems, vol 176, march 2026, 108145.pdf
Size:
25.34 MB
Format:
Adobe Portable Document Format

Collections