Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures
dc.contributor.author | Herrero, José Ramón | |
dc.contributor.author | Quintana-Ortí, Enrique S. | |
dc.contributor.author | Catalán Pallarés, Sandra | |
dc.contributor.author | Igual Peña, Francisco Daniel | |
dc.contributor.author | Rodríguez Sánchez, Rafael | |
dc.date.accessioned | 2025-01-21T12:23:02Z | |
dc.date.available | 2025-01-21T12:23:02Z | |
dc.date.issued | 2023-01-19 | |
dc.description.abstract | We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-to-data binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming. | |
dc.description.department | Sección Deptal. de Arquitectura de Computadores y Automática (Físicas) | |
dc.description.faculty | Fac. de Informática | |
dc.description.refereed | TRUE | |
dc.description.status | pub | |
dc.identifier.citation | Sandra Catalán, Francisco D. Igual, José R. Herrero, Rafael Rodríguez-Sánchez, Enrique S. Quintana-Ortí, Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures, Journal of Parallel and Distributed Computing, Volume 175, 2023, Pages 51-65, ISSN 0743-7315, https://doi.org/10.1016/j.jpdc.2023.01.004. | |
dc.identifier.doi | 10.1016/j.jpdc.2023.01.004 | |
dc.identifier.officialurl | https://www.sciencedirect.com/science/article/pii/S0743731523000047?via%3Dihub | |
dc.identifier.uri | https://hdl.handle.net/20.500.14352/115348 | |
dc.journal.title | Journal of Parallel and Distributed Computing | |
dc.language.iso | eng | |
dc.page.final | 65 | |
dc.page.initial | 51 | |
dc.publisher | Elsevier | |
dc.rights.accessRights | open access | |
dc.subject.keyword | NUMA architectures | |
dc.subject.keyword | Chiplets | |
dc.subject.keyword | Dense linear algebra | |
dc.subject.keyword | Shared-memory programming | |
dc.subject.keyword | Portability | |
dc.subject.ucm | Software | |
dc.subject.unesco | 33 Ciencias Tecnológicas | |
dc.title | Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures | |
dc.type | journal article | |
dc.type.hasVersion | AM | |
dc.volume.number | 175 | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 9c042df5-5a71-4088-a155-194f339a226e | |
relation.isAuthorOfPublication | e1ed9960-37d5-4817-8e5c-4e0e392b4d66 | |
relation.isAuthorOfPublication | 02e9ebb2-af1f-451a-a819-47cb4e4ce515 | |
relation.isAuthorOfPublication.latestForDiscovery | 9c042df5-5a71-4088-a155-194f339a226e |
Download
Original bundle
1 - 1 of 1