Person:
Prieto Matías, Manuel

First Name

Manuel

Last Name

Prieto Matías

URI

https://hdl.handle.net/20.500.14352/78600

Affiliation

Universidad Complutense de Madrid

Faculty / Institute

Informática

Department

Arquitectura de Computadores y Automática

Area

Arquitectura y Tecnología de Computadores

Identifiers

Full item page

Search Results

Now showing 1 - 5 of 5

Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
(IEEE Transactions on Computers, 2024) Mallasén Quintana, David; Del Barrio García, Alberto Antonio; Prieto Matías, Manuel
The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in delivering even higher accuracy in such computations. In this work, we explore the native use of 64-bit posits in a series of numerical benchmarks and compare their timing performance, accuracy and hardware cost to IEEE 754 doubles. In addition, we also study the conjugate gradient method for numerically solving systems of linear equations in real-world applications. For this, we extend the PERCIVAL RISC-V core and the Xposit custom RISC-V extension with posit64 and quire operations. Results show that posit64 can obtain up to 4 orders of magnitude lower mean square error than doubles. This leads to a reduction in the number of iterations required for convergence in some iterative solvers. However, leveraging the quire accumulator register can limit the order of some operations such as matrix multiplications. Furthermore, detailed FPGA and ASIC synthesis results highlight the significant hardware cost of 64-bit posit arithmetic and quire. Despite this, the large accuracy improvements achieved with the same memory bandwidth suggest that posit arithmetic may provide a potential alternative representation for scientific computing.
PERCIVAL: Open-source posit RISC-V core with quire capability
(IEEE transactions on emerging topics in computing, 2022) Mallasén Quintana, David; Murillo Montero, Raúl; Barrio García, Alberto Antonio del; Botella Juan, Guillermo; Prieto Matías, Manuel
The posit representation for real numbers is an alternative to the ubiquitous IEEE 754 floating-point standard. In this work, we present PERCIVAL, an application-level posit RISC-V core based on CVA6 that can execute all posit instructions, including the quire fused operations. This solves the obstacle encountered by previous works, which only included partial posit support or which had to emulate posits in software. In addition, Xposit, a RISC-V extension for posit instructions is incorporated into LLVM. Therefore, PERCIVAL is the first work that integrates the complete posit instruction set in hardware. These elements allow for the native execution of posit instructions as well as the standard floating-point ones, further permitting the comparison of these representations. FPGA and ASIC synthesis show the hardware cost of implementing 32-bit posits and highlight the significant overhead of including a quire accumulator. However, results show that the quire enables a more accurate execution of dot products. In general matrix multiplications, the accuracy error is reduced up to 4 orders of magnitude. Furthermore, performance comparisons show that these accuracy improvements do not hinder their execution, as posits run as fast as single-precision floats and exhibit better timing than double-precision floats, thus potentially providing an alternative representation.
LFOC+: A Fair OS-Level Cache-Clustering Policy for Commodity Multicore Systems
(IEEE Transactions on Computers, 2022) Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Fanizzi, Graziano; Prieto Matías, Manuel
Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data fromperformancemonitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications fromaggressor programs to improve fairness,while providing acceptable system throughput.We implemented LFOC+ in the Linux kernel and evaluated it on a real systemfeaturing an Intel Skylake processor, wherewe compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental análisis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies –Dunn and LFOC–, by up to 22% and up to 20.6%, respectively, and by9% and 4.9%on average.
Evaluation of Intel's DPC++ Compatibility Tool in heterogeneous computing
(Journal of Parallel and Distributed Computing, 2022) Castaño Roldán, Germán; Faqir-Rhazoui, Youssef; García Sánchez, Carlos; Prieto Matías, Manuel
The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI Base Toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++), thus assisting in the migration process. DPC++ is an implementation of the programming standard for heterogeneous computing known as SYCL, which unifies the development of parallel applications on CPUs, GPUs or even FPGAs. This paper analyzes the DPC++ Compatibility Tool by considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. For this suite, this tool achieves an impressive rate of almost 87% for code successfully migrated. Moreover, a comparative study of the performance obtained by the migrated code was carried out, showing a moderate overhead in most of the migrated examples. Finally, a performance comparison on different devices was also performed.
Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors
(2020) Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Prieto Matías, Manuel
Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that asymmetric designs may deliver higher energy efficiency than symmetric multicores for diverse workloads. Despite their benefits, AMPs pose significant challenges to runtime systems of parallel programming models. While previous work has mainly explored how to efficiently execute task-based parallel applications on AMPs, via enhancements in the runtime system, improving the performance of unmodified data-parallel applications on these architectures is still a big challenge. In this work we analyze the particular case of loop-based OpenMP applications, which are widely used today in scientific and engineering domains, and constitute the dominant application type in many parallel benchmark suites used for performance evaluation on multicore systems. We observed that conventional loop-scheduling OpenMP approaches are unable to efficiently cope with the load imbalance that naturally stems from the different performance delivered by big and small cores. To address this shortcoming, we propose Asymmetric Iteration Distribution (AID), a set of novel loop-scheduling methods for AMPs that distribute iterations unevenly across worker threads to efficiently deal with performance asymmetry. We implemented AID in libgomp –the GNU OpenMP runtime system–, and evaluated it on two different asymmetric multicore platforms. Our analysis reveals that the AID methods constitute effective replacements of the static and dynamic methods on AMPs, and are capable of improving performance over these conventional strategies by up to 56% and 16.8%, respectively.

Person:
Prieto Matías, Manuel

First Name

Last Name

URI

Affiliation

Faculty / Institute

Department

Area

Identifiers

Filters

Author

Subject

Date

Has files

Item Type

Settings

Sort By

Results per page

Search Results

Person: Prieto Matías, Manuel

First Name

Last Name

URI

Affiliation

Faculty / Institute

Department

Area

Identifiers

Filters

Author

Subject

Date

Has files

Item Type

Settings

Sort By

Results per page

Search Results

Person:
Prieto Matías, Manuel