Person: Olcoz Herrero, Katzalin
Universidad Complutense de Madrid
Faculty / Institute
Arquitectura de Computadores y Automática
Arquitectura y Tecnología de Computadores
Now showing 1 - 10 of 11
PublicationA unified cloud-enabled discrete event parallel and distributed simulation architecture(Elsevier, 2022-07) Risco Martín, José Luis; Henares Vilaboa, Kevin; Mittal, Saurabh; Almendras Aruzamen, Luis Fernando; Olcoz Herrero, KatzalinCloud infrastructure provides rapid resource provision for on-demand computational require-ments. Cloud simulation environments today are largely employed to model and simulate complex systems for remote accessibility and variable capacity requirements. In this regard, scalability issues in Modeling and Simulation (M & S) computational requirements can be tackled through the elasticity of on-demand Cloud deployment. However, implementing a high performance cloud M & S framework following these elastic principles is not a trivial task as parallelizing and distributing existing architectures is challenging. Indeed, both the parallel and distributed M & S developments have evolved following separate ways. Parallel solutions has always been focused on ad-hoc solutions, while distributed approaches, on the other hand, have led to the definition of standard distributed frameworks like the High Level Architecture (HLA) or influenced the use of distributed technologies like the Message Passing Interface (MPI). Only a few developments have been able to evolve with the current resilience of computing hardware resources deployment, largely focused on the implementation of Simulation as a Service (SaaS), albeit independently of the parallel ad-hoc methods branch. In this paper, we present a unified parallel and distributed M & S architecture with enough flexibility to deploy parallel and distributed simulations in the Cloud with a low effort, without modifying the underlying model source code, and reaching important speedups against the sequential simulation, especially in the parallel implementation. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M & S tool, Application Programming Interface (API) and the DEVStone benchmark with up to eight computing nodes, obtaining maximum speedups of 15.95x and 1.84x, respectively. PublicationGenome sequence alignment-design space exploration for optimal performance and energy architectures(Institute of Electrical and Electronics Engineers (IEEE), 2021-12-01) Qureshi, Yasir Mahmood; Herruzo, José M.; Zapater, Marina; Olcoz Herrero, Katzalin; González Navarro, Sonia; Plata, Óscar; Atienza, DavidNext generation workloads, such as genome sequencing, have an astounding impact in the healthcare sector. Sequence alignment, the first step in genome sequencing, has experienced recent breakthroughs, which resulted in next generation sequencing (NGS). As NGS applications are memory bounded with random memory access patterns, we propose the use of high bandwidth memories like 3D stacked HBM2, instead of traditional DRAMs like DDR4, along with energy efficient compute cores to improve both performance and energy efficiency. Three state-of-the-art NGS applications, Bowtie2, BWA-MEM, and HISAT2 are used as case studies to explore and optimize NGS computing architectures. Then, using the gem5-X architectural simulator, we obtain an overall 68 percent performance improvement and 71 percent energy savings using HBM2 instead of DDR4. Furthermore, we propose an architecture based on ARMv8 cores and demonstrate that 16 ARMv8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory. Moreover, we show that by using frequency scaling we can achieve up to 59 percent and 61 percent energy savings for ARM in-order and OoO cores, respectively. Lastly, we show that many ARMv8 in-order cores at 1.5GHz match the performance of fewer OoO cores at 2GHz, while attaining 4.5x energy savings. PublicationGem5-X: a many-core heterogeneous simulation platform for architectural exploration and optimization(Association for Computing Machinery, 2021-12) Qureshi, Yasir Mahmood; Simon, William Andrew; Zapater, Marina; Olcoz Herrero, Katzalin; Atienza, DavidThe increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings. PublicationOptimization of a Line detection algorithm for autonomous vehicles on a RISC-V with accelerator(Universidad Nacional de La Plata, 2022-10) Belda Beneyto, María José; Olcoz Herrero, Katzalin; Castro Rodríguez, Fernando; Tirado Fernández, FranciscoIn recent years, autonomous vehicles have attracted the attention of many research groups, both in academia and business, including researchers from leading companies such as Google, Uber and Tesla. This type of vehicles are equipped with systems that are subject to very strict requirements, essentially aimed at performing safe operations -both for potential passengers and pedestrians- as well as carrying out the processing needed for decision making in real time. In many instances, general-purpose processors alone cannot ensure that these safety, reliability and real-time requirements are met, so it is common to implement paper explores the acceleration of a line detection aprunning without accelerator. PublicationVirtualización de Laboratorios de la Materia Sistemas Operativos y Redes mediante Contenedores(2023-07-18) Sánchez-Élez Martín, Marcos; Pardines Lence, Inmaculada; Gómez Pérez, José Ignacio; Moreno Vozmediano, Rafael Aurelio; Olcoz Herrero, Katzalin; Risco Martín, José Luis; Ruiz Gallego-Largo, Rafael; Soria Jiménez, David; Miñana Ropero, Guadalupe; Molina Prego, Mª Carmen; Sánchez Muñoz, Eduardo PublicationResource management for power-constrained HEVC transcoding using reinforcement learning(IEEE Computer Society, 2020-12-01) Costero Valero, Luis María; Iranfar, Arman; Zapater, Marina; Atienza, David; Olcoz Herrero, KatzalinThe advent of online video streaming applications and services along with the users' demand for high-quality contents require High Efficiency Video Coding (HEVC), which provides higher video quality and more compression at the cost of increased complexity. On one hand, HEVC exposes a set of dynamically tunable parameters to provide trade-offs among Quality-of-Service (QoS), performance, and power consumption of multi-core servers on the video providers' data center. On the other hand, resource management of modern multi-core servers is in charge of adapting system-level parameters, such as operating frequency and multithreading, to deal with concurrent applications and their requirements. Therefore, efficient multi-user HEVC streaming necessitates joint adaptation of application- and system-level parameters. Nonetheless, dealing with such a large and dynamic design space is challenging and difficult to address through conventional resource management strategies. Thus, in this work, we develop a multi-agent Reinforcement Learning framework to jointly adjust application- and system-level parameters at runtime to satisfy the QoS of multi-user HEVC streaming in power-constrained servers. In particular, the design space, composed of all design parameters, is split into smaller independent sub-spaces. Each design sub-space is assigned to a particular agent so that it can explore it faster, yet accurately. The benefits of our approach are revealed in terms of adaptability and quality (with up to to 4x improvements in terms of QoS when compared to a static resource management scheme), and learning time (6 x faster than an equivalent mono-agent implementation). Finally, we show that the power-capping techniques formulated outperform the hardware-based power capping with respect to quality. PublicationContainergy-a container-based energy and performance profiling tool for next generation workloads(MDPI, 2020-05) Souza, Wellington Silva de; Iranfar, Arman; Braulio, Anderson; Zapater, Marina; Souza, Samuel Xavier de; Olcoz Herrero, Katzalin; Atienza, DavidRun-time profiling of software applications is key to energy efficiency. Even the most optimized hardware combined to an optimally designed software may become inefficient if operated poorly. Moreover, the diversification of modern computing platforms and broadening of their run-time configuration space make the task of optimally operating software ever more complex. With the growing financial and environmental impact of data center operation and cloud-based applications, optimal software operation becomes increasingly more relevant to existing and next-generation workloads. In order to guide software operation towards energy savings, energy and performance data must be gathered to provide a meaningful assessment of the application behavior under different system configurations, which is not appropriately addressed in existing tools. In this work we present Containergy, a new performance evaluation and profiling tool that uses software containers to perform application run-time assessment, providing energy and performance profiling data with negligible overhead (below 2%). It is focused on energy efficiency for next generation workloads. Practical experiments with emerging workloads, such as video transcoding and machine-learning image classification, are presented. The profiling results are analyzed in terms of performance and energy savings under a Quality-of-Service (QoS) perspective. For video transcoding, we verified that wrong choices in the configuration space can lead to an increase above 300% in energy consumption for the same task and operational levels. Considering the image classification case study, the results show that the choice of the machine-learning algorithm and model affect significantly the energy efficiency. Profiling datasets of AlexNet and SqueezeNet, which present similar accuracy, indicate that the latter represents 55.8% in energy saving compared to the former. PublicationApplying game-learning environments to power capping scenarios via reinforcement learning(Springer international Publishing, 2022-08-05) Hernández Aguado, Pablo; Costero Valero, Luis María; Olcoz Herrero, Katzalin; Igual Peña, Francisco DanielResearch in deep learning for video game playing has received much attention and provided very relevant results in the last years. Frameworks and libraries have been developed to ease game playing research leveraging Reinforcement Learning techniques. In this paper, we propose to use two of them (RLLIB and GYM) in a very different scenario, such as learning to apply resource management policies in a multi-core server, specifically, we leverage the facilities of both frameworks coupled to derive policies for power-capping. Using RLlib and Gym enables implementing different resource management policies in a simple and fast way and, as they are based on neural networks, guarantees the efficiency in the solution, and the use of hardware accelerators for both training and inference. The results demonstrate that game-learning environments provide an effective support to cast a completely different scenario, and open new research avenues in the field of resource management using reinforcement learning techniques with minimal development effort. PublicationLeveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding(Springer, 2020-02-25) Costero Valero, Luis María; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin; Tirado Fernández, José FranciscoThe coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion become mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24× compared with alternative approaches considering homogeneous QoS requests. PublicationGem5-X: A Many-core Heterogeneous Simulation Platform for Architectural Exploration and Optimization(Association for Computing Machinery, 2021-07-17) Qureshi, Yasir M.; Simon, William A.; Zapater Sancho, Marina; Olcoz Herrero, Katzalin; Atienza Alonso, DavidThe increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings.