Leveraging Multi-Instance GPUs through moldable task scheduling

Villarrubia Elvira, Jorge; Costero Valero, Luis María; Olcoz Herrero, Katzalin; Igual Peña, Francisco Daniel

doi:10.1016/j.jpdc.2025.105128

Leveraging Multi-Instance GPUs through moldable task scheduling

dc.contributor.author	Villarrubia Elvira, Jorge
dc.contributor.author	Costero Valero, Luis María
dc.contributor.author	Olcoz Herrero, Katzalin
dc.contributor.author	Igual Peña, Francisco Daniel
dc.date.accessioned	2026-02-23T15:30:38Z
dc.date.available	2026-02-23T15:30:38Z
dc.date.issued	2025
dc.description.abstract	NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present FAR, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search via task moves and swaps. FAR schedules tasks in batches offline, concatenating their schedules on the fly in an improved way that favors resource reuse. Excluding reconfiguration costs, the List Scheduling proof shows an approximation factor of 7/4 on the NVIDIA A30 model. We adapt the technique to the particular constraints of an NVIDIA A100/H100 to obtain an approximation factor of 2. Including the reconfiguration cost, our real-world experiments reveal a makespan with respect to the optimum no worse than 1.22× for a well-known suite of benchmarks, and 1.10× for synthetic inputs inspired by real kernels. We obtain good experimental results for each batch of tasks, but also in the concatenation of batches, with large improvements over the state-of-the-art and proposals without GPU reconfiguration. Moreover, we show that the proposed heuristics allow a correct adaptation to tasks of very different characteristics. Beyond the specific algorithm, the paper demonstrates the research potential of the MIG technology and suggests useful metrics, workload characterizations and evaluation techniques for future work in this field.
dc.description.department	Depto. de Arquitectura de Computadores y Automática
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	pub
dc.identifier.doi	10.1016/j.jpdc.2025.105128
dc.identifier.uri	https://hdl.handle.net/20.500.14352/132930
dc.journal.title	Journal of Parallel and Distributed Computing
dc.language.iso	eng
dc.publisher	Elsevier
dc.rights.accessRights	open access
dc.subject.keyword	Multi-Instance GPU (MIG)
dc.subject.keyword	Moldable Resource Management
dc.subject.keyword	Task Scheduling
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	33 Ciencias Tecnológicas
dc.title	Leveraging Multi-Instance GPUs through moldable task scheduling
dc.type	journal article
dc.type.hasVersion	AM
dc.volume.number	24
dspace.entity.type	Publication
relation.isAuthorOfPublication	8788ef00-9b4e-469d-8693-d45f3dfa836a
relation.isAuthorOfPublication	b2616c88-d3da-43df-86cb-3ced1084f460
relation.isAuthorOfPublication	8cfc18ec-4816-404d-982d-21dc07318c07
relation.isAuthorOfPublication	e1ed9960-37d5-4817-8e5c-4e0e392b4d66
relation.isAuthorOfPublication.latestForDiscovery	8788ef00-9b4e-469d-8693-d45f3dfa836a

Download

Original bundle

Now showing 1 - 1 of 1

Name:: Leveraging_multi-instance_GPUs.pdf
Size:: 1.68 MB
Format:: Adobe Portable Document Format

Download

Collections

Artículos