Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

Herrero, José Ramón; Quintana-Ortí, Enrique S.; Catalán Pallarés, Sandra; Igual Peña, Francisco Daniel; Rodríguez Sánchez, Rafael

doi:10.1016/j.jpdc.2023.01.004

Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

dc.contributor.author	Herrero, José Ramón
dc.contributor.author	Quintana-Ortí, Enrique S.
dc.contributor.author	Catalán Pallarés, Sandra
dc.contributor.author	Igual Peña, Francisco Daniel
dc.contributor.author	Rodríguez Sánchez, Rafael
dc.date.accessioned	2025-01-21T12:23:02Z
dc.date.available	2025-01-21T12:23:02Z
dc.date.issued	2023-01-19
dc.description.abstract	We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-to-data binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming.
dc.description.department	Sección Deptal. de Arquitectura de Computadores y Automática (Físicas)
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	pub
dc.identifier.citation	Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí, Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures, Journal of Parallel and Distributed Computing, Volume 175, 2023, Pages 51-65, ISSN 0743-7315, https://doi.org/10.1016/j.jpdc.2023.01.004.
dc.identifier.doi	10.1016/j.jpdc.2023.01.004
dc.identifier.officialurl	https://www.sciencedirect.com/science/article/pii/S0743731523000047?via%3Dihub
dc.identifier.uri	https://hdl.handle.net/20.500.14352/115348
dc.journal.title	Journal of Parallel and Distributed Computing
dc.language.iso	eng
dc.page.final	65
dc.page.initial	51
dc.publisher	Elsevier
dc.rights.accessRights	open access
dc.subject.keyword	NUMA architectures
dc.subject.keyword	Chiplets
dc.subject.keyword	Dense linear algebra
dc.subject.keyword	Shared-memory programming
dc.subject.keyword	Portability
dc.subject.ucm	Software
dc.subject.unesco	33 Ciencias Tecnológicas
dc.title	Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
dc.type	journal article
dc.type.hasVersion	AM
dc.volume.number	175
dspace.entity.type	Publication
relation.isAuthorOfPublication	9c042df5-5a71-4088-a155-194f339a226e
relation.isAuthorOfPublication	e1ed9960-37d5-4817-8e5c-4e0e392b4d66
relation.isAuthorOfPublication	02e9ebb2-af1f-451a-a819-47cb4e4ce515
relation.isAuthorOfPublication.latestForDiscovery	9c042df5-5a71-4088-a155-194f339a226e

Download

Original bundle

Now showing 1 - 1 of 1

Name:: 2022_NUMA_DLA (26).pdf
Size:: 397.08 KB
Format:: Adobe Portable Document Format

Download

Collections

Artículos