Person: Olcoz Herrero, Katzalin
Universidad Complutense de Madrid
Faculty / Institute
Arquitectura de Computadores y Automática
Arquitectura y Tecnología de Computadores
Now showing 1 - 10 of 15
PublicationA unified cloud-enabled discrete event parallel and distributed simulation architecture(Elsevier, 2022-07) Risco Martín, José Luis; Henares Vilaboa, Kevin; Mittal, Saurabh; Almendras Aruzamen, Luis Fernando; Olcoz Herrero, KatzalinCloud infrastructure provides rapid resource provision for on-demand computational require-ments. Cloud simulation environments today are largely employed to model and simulate complex systems for remote accessibility and variable capacity requirements. In this regard, scalability issues in Modeling and Simulation (M & S) computational requirements can be tackled through the elasticity of on-demand Cloud deployment. However, implementing a high performance cloud M & S framework following these elastic principles is not a trivial task as parallelizing and distributing existing architectures is challenging. Indeed, both the parallel and distributed M & S developments have evolved following separate ways. Parallel solutions has always been focused on ad-hoc solutions, while distributed approaches, on the other hand, have led to the definition of standard distributed frameworks like the High Level Architecture (HLA) or influenced the use of distributed technologies like the Message Passing Interface (MPI). Only a few developments have been able to evolve with the current resilience of computing hardware resources deployment, largely focused on the implementation of Simulation as a Service (SaaS), albeit independently of the parallel ad-hoc methods branch. In this paper, we present a unified parallel and distributed M & S architecture with enough flexibility to deploy parallel and distributed simulations in the Cloud with a low effort, without modifying the underlying model source code, and reaching important speedups against the sequential simulation, especially in the parallel implementation. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M & S tool, Application Programming Interface (API) and the DEVStone benchmark with up to eight computing nodes, obtaining maximum speedups of 15.95x and 1.84x, respectively. PublicationGenome sequence alignment-design space exploration for optimal performance and energy architectures(Institute of Electrical and Electronics Engineers (IEEE), 2021-12-01) Qureshi, Yasir Mahmood; Herruzo, José M.; Zapater, Marina; Olcoz Herrero, Katzalin; González Navarro, Sonia; Plata, Óscar; Atienza, DavidNext generation workloads, such as genome sequencing, have an astounding impact in the healthcare sector. Sequence alignment, the first step in genome sequencing, has experienced recent breakthroughs, which resulted in next generation sequencing (NGS). As NGS applications are memory bounded with random memory access patterns, we propose the use of high bandwidth memories like 3D stacked HBM2, instead of traditional DRAMs like DDR4, along with energy efficient compute cores to improve both performance and energy efficiency. Three state-of-the-art NGS applications, Bowtie2, BWA-MEM, and HISAT2 are used as case studies to explore and optimize NGS computing architectures. Then, using the gem5-X architectural simulator, we obtain an overall 68 percent performance improvement and 71 percent energy savings using HBM2 instead of DDR4. Furthermore, we propose an architecture based on ARMv8 cores and demonstrate that 16 ARMv8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory. Moreover, we show that by using frequency scaling we can achieve up to 59 percent and 61 percent energy savings for ARM in-order and OoO cores, respectively. Lastly, we show that many ARMv8 in-order cores at 1.5GHz match the performance of fewer OoO cores at 2GHz, while attaining 4.5x energy savings. PublicationGem5-X: a many-core heterogeneous simulation platform for architectural exploration and optimization(Association for Computing Machinery, 2021-12) Qureshi, Yasir Mahmood; Simon, William Andrew; Zapater, Marina; Olcoz Herrero, Katzalin; Atienza, DavidThe increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings. PublicationMetodología de internacionalización de material docente basada en el uso de Markdown y Pandoc(2018-06-30) Sáez Alcaide, Juan Carlos; Sánchez-Elez Martín, Marcos; Risco Martín, José Luis; Castro Rodríguez, Fernando; Prieto Matías, Manuel; Sáez Puche, Regino; Chaver Martínez, Daniel; Olcoz Herrero, Katzalin; Clemente Barreira, Juan Antonio; Igual Peña, Francisco; García García, Adrián; Sánchez Foces, DavidLa internacionalización de la docencia ofrece grandes oportunidades para la Universidad, pero también plantea retos significativos para estudiantes y profesores. En particular, la creación y mantenimiento efectivo del material docente de una asignatura impartida simultáneamente en varios idiomas y con alto grado de coordinación entre los distintos grupos de la misma (p.ej., examen final/prácticas comunes para todos los estudiantes) puede suponer un importante desafío para los profesores. Para hacer frente a este problema, hemos diseñado una estrategia específica para la creación y gestión de material docente en dual (p.ej., inglés-español), y desarrollado un conjunto de herramientas multiplataforma para ponerla en práctica. La idea general es mantener en un mismo fichero de texto el contenido del documento que se desee construir en ambos idiomas, proporcionando justo detrás de cada párrafo y título en uno de los idiomas su traducción al otro idioma, empleando delimitadores especiales. Para crear estos documentos duales se emplea Markdown, un lenguaje de marcado ligero, que dada su sencillez y versatilidad está teniendo una rápida adopción por un amplio espectro de profesionales: desde escritores de novelas o periodistas, hasta administradores de sitios web. A partir de los documentos duales creados con Markdown, es posible generar automáticamente el documento final para cada idioma en el formato deseado que se pondrá a disposición de los estudiantes. Para esta tarea, nos basamos en el uso de la herramienta Pandoc, que permite realizar la conversión de documentos Markdown a una gran cantidad de formatos, como PDF, docx (Microsoft Word), EPUB (libro electrónico) o HTML. Como parte de nuestro proyecto, hemos creado extensiones de Pandoc para permitir la creación de documentos duales en Markdown y para aumentar la expresividad de este lenguaje con construcciones comunmente utilizadas en documentos de carácter docente. PublicationGem5-x: a gem5-based system level simulation framework to optimize many-core platforms(IEEE, 2019) Mahmood Qureshi, Yasir; Simon, William Andrew; Zapater, Marina; Atienza, David; Olcoz Herrero, KatzalinThe rapid expansion of online-based services requires novel energy and performance efficient architectures to meet power and latency constraints. Fast architectural exploration has become a key enabler in the proposal of architectural innovation. In this paper, we present gem5-X, a gem5-based system level simulation framework, and a methodology to optimize many-core systems for performance and power. As real-life case studies of many-core server workloads, we use real-time video transcoding and image classification using convolutional neural networks (CNNs). Gem5-X allows us to identify bottlenecks and evaluate the potential benefits of architectural extensions such as in-cache computing and 3D stacked High Bandwidth Memory. For real-time video transcoding, we achieve 15% speed-up using in-order cores with in-cache computing when compared to a baseline in-order system and 76% energy savings when compared to an Out-of-Order system. When using HBM, we further accelerate real-time transcoding and CNNs by up to 7% and 8% respectively. PublicationOptimization of a Line detection algorithm for autonomous vehicles on a RISC-V with accelerator(Universidad Nacional de La Plata, 2022-10) Belda Beneyto, María José; Olcoz Herrero, Katzalin; Castro Rodríguez, Fernando; Tirado Fernández, FranciscoIn recent years, autonomous vehicles have attracted the attention of many research groups, both in academia and business, including researchers from leading companies such as Google, Uber and Tesla. This type of vehicles are equipped with systems that are subject to very strict requirements, essentially aimed at performing safe operations -both for potential passengers and pedestrians- as well as carrying out the processing needed for decision making in real time. In many instances, general-purpose processors alone cannot ensure that these safety, reliability and real-time requirements are met, so it is common to implement paper explores the acceleration of a line detection aprunning without accelerator. PublicationServer Power Modeling for Run-time Energy Optimization of Cloud Computing Facilities.(Elsevier Science BV, 2014) Arroba, Patricia; Risco Martín, José Luis; Zapater Sancho, Marina; Moya, José Manuel; Ayala Rodrigo, José Luis; Olcoz Herrero, KatzalinAs advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. The average consumption of a single data center is equivalent to the energy consumption of 25.000 households. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. This work proposes an automatic method, based on Multi-Objective Particle Swarm Optimization, for the identification of power models of enterprise servers in Cloud data centers. Our approach, as opposed to previous procedures, does not only consider the workload consolidation for deriving the power model, but also incorporates other non traditional factors like the static power consumption and its dependence with temperature. Our experimental results shows that we reach slightly better models than classical approaches, but simultaneously simplifying the power model structure and thus the numbers of sensors needed, which is very promising for a short-term energy prediction. This work, validated with real Cloud applications, broadens the possibilities to derive efficient energy saving techniques for Cloud facilities. PublicationA machine learning-based framework for throughput estimation of time-varying applications in multi-core servers(IEEE, 2019) Iranfar, Arman; Souza, Wellington Silva de; Zapater, Marina; Olcoz Herrero, Katzalin; Souza, Samuel Xavier de; Atienza, DavidAccurate workload prediction and throughput estimation are keys in efficient proactive power and performance management of multi-core platforms. Although hardware performance counters available on modern platforms contain important information about the application behavior, employing them efficiently is not straightforward when dealing with time-varying applications even if they have iterative structures. In this work, we propose a machine learning-based framework for workload prediction and throughput estimation using hardware events. Our framework enables throughput estimation over various available system configurations, namely, number of parallel threads and operating frequency. In particular, we first employ workload clustering and classification techniques along with Markov chains to predict the next workload for each available system configuration. Then, the predicted workload is used to estimate the next expected throughput through a machine learning-based regression model. The comparison with state of the art demonstrates that our framework is able to improve Quality of Service (QoS) by 3.4x, while consuming 15% less power thanks to the more accurate throughput estimation. PublicationVirtualización de Laboratorios de la Materia Sistemas Operativos y Redes mediante Contenedores(2023-07-18) Sánchez-Élez Martín, Marcos; Pardines Lence, Inmaculada; Gómez Pérez, José Ignacio; Moreno Vozmediano, Rafael Aurelio; Olcoz Herrero, Katzalin; Risco Martín, José Luis; Ruiz Gallego-Largo, Rafael; Soria Jiménez, David; Miñana Ropero, Guadalupe; Molina Prego, Mª Carmen; Sánchez Muñoz, Eduardo PublicationResource management for power-constrained HEVC transcoding using reinforcement learning(IEEE Computer Society, 2020-12-01) Costero Valero, Luis María; Iranfar, Arman; Zapater, Marina; Atienza, David; Olcoz Herrero, KatzalinThe advent of online video streaming applications and services along with the users' demand for high-quality contents require High Efficiency Video Coding (HEVC), which provides higher video quality and more compression at the cost of increased complexity. On one hand, HEVC exposes a set of dynamically tunable parameters to provide trade-offs among Quality-of-Service (QoS), performance, and power consumption of multi-core servers on the video providers' data center. On the other hand, resource management of modern multi-core servers is in charge of adapting system-level parameters, such as operating frequency and multithreading, to deal with concurrent applications and their requirements. Therefore, efficient multi-user HEVC streaming necessitates joint adaptation of application- and system-level parameters. Nonetheless, dealing with such a large and dynamic design space is challenging and difficult to address through conventional resource management strategies. Thus, in this work, we develop a multi-agent Reinforcement Learning framework to jointly adjust application- and system-level parameters at runtime to satisfy the QoS of multi-user HEVC streaming in power-constrained servers. In particular, the design space, composed of all design parameters, is split into smaller independent sub-spaces. Each design sub-space is assigned to a particular agent so that it can explore it faster, yet accurately. The benefits of our approach are revealed in terms of adaptability and quality (with up to to 4x improvements in terms of QoS when compared to a static resource management scheme), and learning time (6 x faster than an equivalent mono-agent implementation). Finally, we show that the power-capping techniques formulated outperform the hardware-based power capping with respect to quality.