Person:
Prieto Matías, Manuel

First Name

Manuel

Last Name

Prieto Matías

URI

https://hdl.handle.net/20.500.14352/78600

Affiliation

Universidad Complutense de Madrid

Faculty / Institute

Informática

Department

Arquitectura de Computadores y Automática

Area

Arquitectura y Tecnología de Computadores

Identifiers

Full item page

Search Results

Now showing 1 - 10 of 20

Customized Nios II multi-cycle instructions to accelerate block-matching techniques
(SPIE Proceedings, 2015) González, Diego; Botella Juan, Guillermo; García Sánchez, Carlos; Meyer Bäse, Anke; Meyer Bäse, Uwe; Prieto Matías, Manuel
This study focuses on accelerating the optimization of motion estimation algorithms, which are widely used in video coding standards, by using both the paradigm based on Altera Custom Instructions as well as the efficient combination of SDRAM and On-Chip memory of Nios II processor. Firstly, a complete code profiling is carried out before the optimization in order to detect time leaking affecting the motion compensation algorithms. Then, a multi-cycle Custom Instruction which will be added to the specific embedded design is implemented. The approach deployed is based on optimizing SOC performance by using an efficient combination of On-Chip memory and SDRAM with regards to the reset vector, exception vector, stack, heap, read/write data (.rwdata), read only data (.rodata), and program text (.text) in the design. Furthermore, this approach aims to enhance the said algorithms by incorporating Custom Instructions in the Nios II ISA. Finally, the efficient combination of both methods is then developed to build the final embedded system. The present contribution thus facilitates motion coding for low-cost Soft-Core microprocessors, particularly the RISC architecture of Nios II implemented in FPGA. It enables us to construct an SOC which processes 50×50 @ 180 fps.
2-D wavelet transform enhancement on general-purpose microprocessors: memory hierarchy and SIMD parallelism exploitation
(High performance computing - HIPC 2002, proceedings, 2002) Chaver Martínez, Daniel Ángel; Tenllado van der Reijden, Christian; Piñuel Moreno, Luis; Prieto Matías, Manuel; Tirado Fernández, Francisco
This paper addresses the implementation of a 2-D Discrete Wavelet Transform on general-purpose microprocessors, focusing on both memory hierarchy and SIMD parallelization issues. Both topics are somewhat related, since SIMD extensions are only useful if the memory hierarchy is efficiently exploited. In this work, locality has been significantly improved by means of a novel approach called pipelined computation, which complements previous techniques based on loop tiling and non-linear layouts. As experimental platforms we have employed a Pentium-III (P-III) and a Pentium-4 (P-4) microprocessor. However, our SIMD-oriented tuning has been exclusively performed at source code level. Basically, we have reordered some loops and introduced some modifications that allow automatic vectorization. Taking into account the abstraction level at which the optimizations are carried out, the speedups obtained on the investigated platforms are quite satisfactory, even though further improvement can be obtained by dropping the level of abstraction (compiler intrinsics or assembly code).
A low cost matching motion estimation sensor based on the NIOS II microprocessor.
(Sensors, 2012) González, Diego; Botella Juan, Guillermo; Meyer Baese, Uwe; García Sánchez, Carlos; Sanz, Concepción; Prieto Matías, Manuel; Tirado Fernández, Francisco
Medical imaging has become an absolutely essential diagnostic tool for clinical practices; at present, pathologies can be detected with an earliness never before known. Its use has not only been relegated to the field of radiology but also, increasingly, to computer-based imaging processes prior to surgery. Motion analysis, in particular, plays an important role in analyzing activities or behaviors of live objects in medicine. This short paper presents several low-cost hardware implementation approaches for the new generation of tablets and/or smartphones for estimating motion compensation and segmentation in medical images. These systems have been optimized for breast cancer diagnosis using magnetic resonance imaging technology with several advantages over traditional X-ray mammography, for example, obtaining patient information during a short period. This paper also addresses the challenge of offering a medical tool that runs on widespread portable devices, both on tablets and/or smartphones to aid in patient diagnostics.
Fast-Coding Robust Motion Estimation Model in a GPU
(2015) García Sánchez, Carlos; Botella Juan, Guillermo; Sande, Francisco de; Prieto Matías, Manuel
Nowadays vision systems are used with countless purposes. Moreover, the motion estimation is a discipline that allow to extract relevant information as pattern segmentation, 3D structure or tracking objects. However, the real-time requirements in most applications has limited its consolidation, considering the adoption of high performance systems to meet response times. With the emergence of so-called highly parallel devices known as accelerators this gap has narrowed. Two extreme endpoints in the spectrum of most common accelerators are Field Programmable Gate Array (FPGA) and Graphics Processing Systems (GPU), which usually offer higher performance rates than general propose processors. Moreover, the use of GPUs as accelerators involves the efficient exploitation of any parallelism in the target application. This task is not easy because performance rates are affected by many aspects that programmers should overcome. In this paper, we evaluate OpenACC standard, a programming model with directives which favors porting any code to a GPU in the context of motion estimation application. The results confirm that this programming paradigm is suitable for this image processing applications achieving a very satisfactory acceleration in convolution based problems as in the well-known Lucas & Kanade method.
Big-PERCIVAL: Exploring the Native Use of 64-Bit Posit Arithmetic in Scientific Computing
(IEEE Transactions on Computers, 2024) Mallasén Quintana, David; Del Barrio García, Alberto Antonio; Prieto Matías, Manuel
The accuracy requirements in many scientific computing workloads result in the use of double-precision floating-point arithmetic in the execution kernels. Nevertheless, emerging real-number representations, such as posit arithmetic, show promise in delivering even higher accuracy in such computations. In this work, we explore the native use of 64-bit posits in a series of numerical benchmarks and compare their timing performance, accuracy and hardware cost to IEEE 754 doubles. In addition, we also study the conjugate gradient method for numerically solving systems of linear equations in real-world applications. For this, we extend the PERCIVAL RISC-V core and the Xposit custom RISC-V extension with posit64 and quire operations. Results show that posit64 can obtain up to 4 orders of magnitude lower mean square error than doubles. This leads to a reduction in the number of iterations required for convergence in some iterative solvers. However, leveraging the quire accumulator register can limit the order of some operations such as matrix multiplications. Furthermore, detailed FPGA and ASIC synthesis results highlight the significant hardware cost of 64-bit posit arithmetic and quire. Despite this, the large accuracy improvements achieved with the same memory bandwidth suggest that posit arithmetic may provide a potential alternative representation for scientific computing.
Offset printing plate quality sensor on a low-cost processor
(Sensors, 2013) Poljak, Jelena; Botella Juan, Guillermo; García Sánchez, Carlos; Poljacek, Sanja Mahovic; Prieto Matías, Manuel; Tirado Fernández, Francisco
The aim of this work is to develop a microprocessor-based sensor that measures the quality of the offset printing plate through the introduction of different image analysis applications. The main features of the presented system are the low cost, the low amount of power consumption, its modularity and easy integration with other industrial modules for printing plates, and its robustness against noise environments. For the sake of clarity, a viability analysis of previous software is presented through different strategies, based on dynamic histogram and Hough transform. This paper provides performance and scalability data compared with existing costly commercial devices. Furthermore, a general overview of quality control possibilities for printing plates is presented and could be useful to a system where such controls are regularly conducted.
Formation of stellar inner discs and rings in spiral galaxies through minor mergers
(Fourth Science Meeting with the GTC, 2013) Eliche Moral, María del Carmen; González García, A. C.; Balcells, M.; Aguerri, J.A.L.; Gallego Maestro, Jesús; Zamorano Calvo, Jaime; Prieto Matías, Manuel
Recent observations show that inner disks and rings (IDs and IRs) are not preferentially found in barred galaxies, pointing to the relevance of formation mechanisms different to the traditional bar-origin scenario. Nevertheless, the role of minor mergers in the formation of these inner components (ICs), while often invoked, is still poorly understood. We have investigated the capability of minor mergers to trigger the formation of IDs and IRs in spiral galaxies through collisionless N-body simulations. Our models prove that minor mergers are an efficient mechanism to form rotationally-supported stellar ICs in spirals, neither requiring strong dissipation nor noticeable bars, and suggest that their role in the formation of ICs must have been much more complex than just bar triggering.
PERCIVAL: Open-source posit RISC-V core with quire capability
(IEEE transactions on emerging topics in computing, 2022) Mallasén Quintana, David; Murillo Montero, Raúl; Barrio García, Alberto Antonio del; Botella Juan, Guillermo; Prieto Matías, Manuel
The posit representation for real numbers is an alternative to the ubiquitous IEEE 754 floating-point standard. In this work, we present PERCIVAL, an application-level posit RISC-V core based on CVA6 that can execute all posit instructions, including the quire fused operations. This solves the obstacle encountered by previous works, which only included partial posit support or which had to emulate posits in software. In addition, Xposit, a RISC-V extension for posit instructions is incorporated into LLVM. Therefore, PERCIVAL is the first work that integrates the complete posit instruction set in hardware. These elements allow for the native execution of posit instructions as well as the standard floating-point ones, further permitting the comparison of these representations. FPGA and ASIC synthesis show the hardware cost of implementing 32-bit posits and highlight the significant overhead of including a quire accumulator. However, results show that the quire enables a more accurate execution of dot products. In general matrix multiplications, the accuracy error is reduced up to 4 orders of magnitude. Furthermore, performance comparisons show that these accuracy improvements do not hinder their execution, as posits run as fast as single-precision floats and exhibit better timing than double-precision floats, thus potentially providing an alternative representation.
LFOC+: A Fair OS-Level Cache-Clustering Policy for Commodity Multicore Systems
(IEEE Transactions on Computers, 2022) Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Fanizzi, Graziano; Prieto Matías, Manuel
Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data fromperformancemonitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications fromaggressor programs to improve fairness,while providing acceptable system throughput.We implemented LFOC+ in the Linux kernel and evaluated it on a real systemfeaturing an Intel Skylake processor, wherewe compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental análisis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies –Dunn and LFOC–, by up to 22% and up to 20.6%, respectively, and by9% and 4.9%on average.
Implementation of a low-cost mobile devices to support medical diagnosis
(Computational and Mathematical Methods in Medicine, 2013) García Sanchez, Carlos; Botella Juan, Guillermo; Ayuso Márquez, Fermín; González Rodríguez, Diego; Prieto Matías, Manuel; Tirado Fernández, Francisco
Medical imaging has become an absolutely essential diagnostic tool for clinical practices; at present, pathologies can be detected with an earliness never before known. Its use has not only been relegated to the field of radiology but also, increasingly, to computer-based imaging processes prior to surgery. Motion analysis, in particular, plays an important role in analyzing activities or behaviors of live objects in medicine. This short paper presents several low-cost hardware implementation approaches for the new generation of tablets and/or smartphones for estimating motion compensation and segmentation in medical images. These systems have been optimized for breast cancer diagnosis using magnetic resonance imaging technology with several advantages over traditional X-ray mammography, for example, obtaining patient information during a short period. This paper also addresses the challenge of offering a medical tool that runs on widespread portable devices, both on tablets and/or smartphones to aid in patient diagnostics.