Person:
Olcoz Herrero, Katzalin

First Name

Katzalin

Last Name

Olcoz Herrero

URI

https://hdl.handle.net/20.500.14352/74976

Affiliation

Universidad Complutense de Madrid

Faculty / Institute

Ciencias Físicas

Department

Arquitectura de Computadores y Automática

Area

Arquitectura y Tecnología de Computadores

Identifiers

Full item page

Search Results

Now showing 1 - 10 of 22

Level-Spread: A New Job Allocation Policy for Dragonfly Networks
(2018) Yijia Zhang; Ozan Tuncer; Fulya Kaplan; Vitus J. Leung; Ayse K. Coskun; Olcoz Herrero, Katzalin
The dragonfly network topology has attracted attention in recent years owing to its high radix and constant diameter. However, the influence of job allocation on communication time in dragonfly networks is not fully understood. Recent studies have shown that random allocation is better at balancing the network traffic, while compact allocation is better at harnessing the locality in dragonfly groups. Based on these observations, this paper introduces a novel allocation policy called Level-Spread for dragonfly networks. This policy spreads jobs within the smallest network level that a given job can fit in at the time of its allocation. In this way, it simultaneously harnesses node adjacency and balances link congestion. To evaluate the performance of Level-Spread, we run packet-level network simulations using a diverse set of application communication patterns, job sizes, and communication intensities. We also explore the impact of network properties such as the number of groups, number of routers per group, machine utilization level, and global link bandwidth. Level-Spread reduces the communication overhead by 16% on average (and up to 71%) compared to the state-of-the-art allocation policies.
Leveraging knowledge-as-a-service (KaaS) for QoS-aware resource management in multi-user video transcoding
(The Journal of Supercomputing, 2020) Costero Valero, Luis María; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin; Tirado Fernández, José Francisco
The coexistence of parallel applications in shared computing nodes, each one featuring different Quality of Service (QoS) requirements, carries out new challenges to improve resource occupation while keeping acceptable rates in terms of QoS. As more application-specific and system-wide metrics are included as QoS dimensions, or under situations in which resource-usage limits are strict, building and serving the most appropriate set of actions (application control knobs and system resource assignment) to concurrent applications in an automatic and optimal fashion become mandatory. In this paper, we propose strategies to build and serve this type of knowledge to concurrent applications by leveraging Reinforcement Learning techniques. Taking multi-user video transcoding as a driving example, our experimental results reveal an excellent adaptation of resource and knob management to heterogeneous QoS requests, and increases in the amount of concurrently served users up to 1.24× compared with alternative approaches considering homogeneous QoS requests.
Genome sequence alignment-design space exploration for optimal performance and energy architectures
(IEEE transactions on computers, 2021) Qureshi, Yasir Mahmood; Herruzo, José M.; Zapater, Marina; Olcoz Herrero, Katzalin; González Navarro, Sonia; Plata, Óscar; Atienza, David
Next generation workloads, such as genome sequencing, have an astounding impact in the healthcare sector. Sequence alignment, the first step in genome sequencing, has experienced recent breakthroughs, which resulted in next generation sequencing (NGS). As NGS applications are memory bounded with random memory access patterns, we propose the use of high bandwidth memories like 3D stacked HBM2, instead of traditional DRAMs like DDR4, along with energy efficient compute cores to improve both performance and energy efficiency. Three state-of-the-art NGS applications, Bowtie2, BWA-MEM, and HISAT2 are used as case studies to explore and optimize NGS computing architectures. Then, using the gem5-X architectural simulator, we obtain an overall 68 percent performance improvement and 71 percent energy savings using HBM2 instead of DDR4. Furthermore, we propose an architecture based on ARMv8 cores and demonstrate that 16 ARMv8 64-bit OoO cores with HBM2 outperforms 32-cores of Intel Xeon Phi Knights Landing (KNL) processor with 3D stacked memory. Moreover, we show that by using frequency scaling we can achieve up to 59 percent and 61 percent energy savings for ARM in-order and OoO cores, respectively. Lastly, we show that many ARMv8 in-order cores at 1.5GHz match the performance of fewer OoO cores at 2GHz, while attaining 4.5x energy savings.
Gem5-X: A Many-core Heterogeneous Simulation Platform for Architectural Exploration and Optimization
(ACM Transactions on Architecture and Code Optimization, 2021) Qureshi, Yasir M.; Simon, William A.; Zapater Sancho, Marina; Olcoz Herrero, Katzalin; Atienza Alonso, David
The increasing adoption of smart systems in our daily life has led to the development of new applications with varying performance and energy constraints, and suitable computing architectures need to be developed for these new applications. In this article, we present gem5-X, a system-level simulation framework, based on gem-5, for architectural exploration of heterogeneous many-core systems. To demonstrate the capabilities of gem5-X, real-time video analytics is used as a case-study. It is composed of two kernels, namely, video encoding and image classification using convolutional neural networks (CNNs). First, we explore through gem5-X the benefits of latest 3D high bandwidth memory (HBM2) in different architectural configurations. Then, using a two-step exploration methodology, we develop a new optimized clustered-heterogeneous architecture with HBM2 in gem5-X for video analytics application. In this proposed clustered-heterogeneous architecture, ARMv8 in-order cluster with in-cache computing engine executes the video encoding kernel, giving 20% performance and 54% energy benefits compared to baseline ARM in-order and Out-of-Order systems, respectively. Furthermore, thanks to gem5-X, we conclude that ARM Out-of-Order clusters with HBM2 are the best choice to run visual recognition using CNNs, as they outperform DDR4-based system by up to 30% both in terms of performance and energy savings.
A unified cloud-enabled discrete event parallel and distributed simulation architecture
(Simulation modelling practice and theory, 2022) Risco Martín, José Luis; Henares Vilaboa, Kevin; Mittal, Saurabh; Almendras Aruzamen, Luis Fernando; Olcoz Herrero, Katzalin
Cloud infrastructure provides rapid resource provision for on-demand computational require-ments. Cloud simulation environments today are largely employed to model and simulate complex systems for remote accessibility and variable capacity requirements. In this regard, scalability issues in Modeling and Simulation (M & S) computational requirements can be tackled through the elasticity of on-demand Cloud deployment. However, implementing a high performance cloud M & S framework following these elastic principles is not a trivial task as parallelizing and distributing existing architectures is challenging. Indeed, both the parallel and distributed M & S developments have evolved following separate ways. Parallel solutions has always been focused on ad-hoc solutions, while distributed approaches, on the other hand, have led to the definition of standard distributed frameworks like the High Level Architecture (HLA) or influenced the use of distributed technologies like the Message Passing Interface (MPI). Only a few developments have been able to evolve with the current resilience of computing hardware resources deployment, largely focused on the implementation of Simulation as a Service (SaaS), albeit independently of the parallel ad-hoc methods branch. In this paper, we present a unified parallel and distributed M & S architecture with enough flexibility to deploy parallel and distributed simulations in the Cloud with a low effort, without modifying the underlying model source code, and reaching important speedups against the sequential simulation, especially in the parallel implementation. Our framework is based on the Discrete Event System Specification (DEVS) formalism. The performance of the parallel and distributed framework is tested using the xDEVS M & S tool, Application Programming Interface (API) and the DEVStone benchmark with up to eight computing nodes, obtaining maximum speedups of 15.95x and 1.84x, respectively.
Server Power Modeling for Run-time Energy Optimization of Cloud Computing Facilities.
(Energy Procedia, 6th International conference on sustainability in energy and buildings, 2014) Arroba, Patricia; Risco Martín, José Luis; Zapater Sancho, Marina; Moya, José Manuel; Ayala Rodrigo, José Luis; Olcoz Herrero, Katzalin
As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. The average consumption of a single data center is equivalent to the energy consumption of 25.000 households. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. This work proposes an automatic method, based on Multi-Objective Particle Swarm Optimization, for the identification of power models of enterprise servers in Cloud data centers. Our approach, as opposed to previous procedures, does not only consider the workload consolidation for deriving the power model, but also incorporates other non traditional factors like the static power consumption and its dependence with temperature. Our experimental results shows that we reach slightly better models than classical approaches, but simultaneously simplifying the power model structure and thus the numbers of sensors needed, which is very promising for a short-term energy prediction. This work, validated with real Cloud applications, broadens the possibilities to derive efficient energy saving techniques for Cloud facilities.
Applying game-learning environments to power capping scenarios via reinforcement learning
(Cloud Computing, Big Data and Emerging Topics, 2022) Hernández Aguado, Pablo; Costero Valero, Luis María; Olcoz Herrero, Katzalin; Igual Peña, Francisco Daniel
Research in deep learning for video game playing has received much attention and provided very relevant results in the last years. Frameworks and libraries have been developed to ease game playing research leveraging Reinforcement Learning techniques. In this paper, we propose to use two of them (RLLIB and GYM) in a very different scenario, such as learning to apply resource management policies in a multi-core server, specifically, we leverage the facilities of both frameworks coupled to derive policies for power-capping. Using RLlib and Gym enables implementing different resource management policies in a simple and fast way and, as they are based on neural networks, guarantees the efficiency in the solution, and the use of hardware accelerators for both training and inference. The results demonstrate that game-learning environments provide an effective support to cast a completely different scenario, and open new research avenues in the field of resource management using reinforcement learning techniques with minimal development effort.
Dynamic power budget redistribution under a power cap on multi-application environments
(Sustainable Computing-Informatics & Systems, 2023) Costero Valero, Luis María; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin
We present a two-level implementation of an infrastructure that allows performance maximization under a power-cap on multi-application environments with minimal user intervention. At the application level, we integrate bar (Power Budget-Aware Runtime Scheduler) into existing task-based runtimes, e.g. OpenMP; bar implements combined software/hardware techniques (thread malleability and DVFS) to maximize the application performance without violating a granted power budget. At a higher level, we introduce barman (Power Budget-Aware Resource Manager), a system-wide software able to manage resources globally, gathering power needs of registered applications, and redistributing the available overall power budget across them. The combination and co-operative operation of both pieces of software yields performance and energy efficiency improvements on environments in which power capping is established globally, and also granted asymmetrically to different co-existing applications. This behaviour is demonstrated to be stable under different workloads (a selection of task-based scientific applications and PARSEC benchmarks are tested) and different levels of power capping.
A machine learning-based framework for throughput estimation of time-varying applications in multi-core servers
(2019 IFIP/IEEE 27th International conference on very large scale integration (VLSI-SOC), 2019) Iranfar, Arman; Souza, Wellington Silva de; Zapater, Marina; Olcoz Herrero, Katzalin; Souza, Samuel Xavier de; Atienza, David
Accurate workload prediction and throughput estimation are keys in efficient proactive power and performance management of multi-core platforms. Although hardware performance counters available on modern platforms contain important information about the application behavior, employing them efficiently is not straightforward when dealing with time-varying applications even if they have iterative structures. In this work, we propose a machine learning-based framework for workload prediction and throughput estimation using hardware events. Our framework enables throughput estimation over various available system configurations, namely, number of parallel threads and operating frequency. In particular, we first employ workload clustering and classification techniques along with Markov chains to predict the next workload for each available system configuration. Then, the predicted workload is used to estimate the next expected throughput through a machine learning-based regression model. The comparison with state of the art demonstrates that our framework is able to improve Quality of Service (QoS) by 3.4x, while consuming 15% less power thanks to the more accurate throughput estimation.
Project number: 302
Uso de los servicios para.TI@UCM para mejorar la gestión académica en los departamentos
(2016) Risco Martín, José Luis; Olcoz Herrero, Katzalin; Chaver Martínez, Daniel; Sánchez-Élez Martín, Marcos
En este proyecto pretendemos poner en práctica las distintas posibilidades de gestión académica a nivel departamental que ofrecen las herramientas de Google y que han sido integradas en el conjunto de servicios para.TI@UCM (Gmail, Hangouts, Docs, Drive, Calendar, etc.), con el fin de agilizar y facilitar los distintos niveles de gestión que los departamentos realizan hoy en día.