Person:
Olcoz Herrero, Katzalin

First Name

Katzalin

Last Name

Olcoz Herrero

URI

https://hdl.handle.net/20.500.14352/74976

Affiliation

Universidad Complutense de Madrid

Faculty / Institute

Ciencias Físicas

Department

Arquitectura de Computadores y Automática

Area

Arquitectura y Tecnología de Computadores

Identifiers

Full item page

Search Results

Now showing 1 - 10 of 11

Level-Spread: A New Job Allocation Policy for Dragonfly Networks
(2018) Yijia Zhang; Ozan Tuncer; Fulya Kaplan; Vitus J. Leung; Ayse K. Coskun; Olcoz Herrero, Katzalin
The dragonfly network topology has attracted attention in recent years owing to its high radix and constant diameter. However, the influence of job allocation on communication time in dragonfly networks is not fully understood. Recent studies have shown that random allocation is better at balancing the network traffic, while compact allocation is better at harnessing the locality in dragonfly groups. Based on these observations, this paper introduces a novel allocation policy called Level-Spread for dragonfly networks. This policy spreads jobs within the smallest network level that a given job can fit in at the time of its allocation. In this way, it simultaneously harnesses node adjacency and balances link congestion. To evaluate the performance of Level-Spread, we run packet-level network simulations using a diverse set of application communication patterns, job sizes, and communication intensities. We also explore the impact of network properties such as the number of groups, number of routers per group, machine utilization level, and global link bandwidth. Level-Spread reduces the communication overhead by 16% on average (and up to 71%) compared to the state-of-the-art allocation policies.
Server Power Modeling for Run-time Energy Optimization of Cloud Computing Facilities.
(Energy Procedia, 6th International conference on sustainability in energy and buildings, 2014) Arroba, Patricia; Risco Martín, José Luis; Zapater Sancho, Marina; Moya, José Manuel; Ayala Rodrigo, José Luis; Olcoz Herrero, Katzalin
As advanced Cloud services are becoming mainstream, the contribution of data centers in the overall power consumption of modern cities is growing dramatically. The average consumption of a single data center is equivalent to the energy consumption of 25.000 households. Modeling the power consumption for these infrastructures is crucial to anticipate the effects of aggressive optimization policies, but accurate and fast power modeling is a complex challenge for high-end servers not yet satisfied by analytical approaches. This work proposes an automatic method, based on Multi-Objective Particle Swarm Optimization, for the identification of power models of enterprise servers in Cloud data centers. Our approach, as opposed to previous procedures, does not only consider the workload consolidation for deriving the power model, but also incorporates other non traditional factors like the static power consumption and its dependence with temperature. Our experimental results shows that we reach slightly better models than classical approaches, but simultaneously simplifying the power model structure and thus the numbers of sensors needed, which is very promising for a short-term energy prediction. This work, validated with real Cloud applications, broadens the possibilities to derive efficient energy saving techniques for Cloud facilities.
A machine learning-based framework for throughput estimation of time-varying applications in multi-core servers
(2019 IFIP/IEEE 27th International conference on very large scale integration (VLSI-SOC), 2019) Iranfar, Arman; Souza, Wellington Silva de; Zapater, Marina; Olcoz Herrero, Katzalin; Souza, Samuel Xavier de; Atienza, David
Accurate workload prediction and throughput estimation are keys in efficient proactive power and performance management of multi-core platforms. Although hardware performance counters available on modern platforms contain important information about the application behavior, employing them efficiently is not straightforward when dealing with time-varying applications even if they have iterative structures. In this work, we propose a machine learning-based framework for workload prediction and throughput estimation using hardware events. Our framework enables throughput estimation over various available system configurations, namely, number of parallel threads and operating frequency. In particular, we first employ workload clustering and classification techniques along with Markov chains to predict the next workload for each available system configuration. Then, the predicted workload is used to estimate the next expected throughput through a machine learning-based regression model. The comparison with state of the art demonstrates that our framework is able to improve Quality of Service (QoS) by 3.4x, while consuming 15% less power thanks to the more accurate throughput estimation.
Project number: 302
Uso de los servicios para.TI@UCM para mejorar la gestión académica en los departamentos
(2016) Risco Martín, José Luis; Olcoz Herrero, Katzalin; Chaver Martínez, Daniel; Sánchez-Élez Martín, Marcos
En este proyecto pretendemos poner en práctica las distintas posibilidades de gestión académica a nivel departamental que ofrecen las herramientas de Google y que han sido integradas en el conjunto de servicios para.TI@UCM (Gmail, Hangouts, Docs, Drive, Calendar, etc.), con el fin de agilizar y facilitar los distintos niveles de gestión que los departamentos realizan hoy en día.
A QoS and container-based approach for energy saving and performance profiling in multi-core servers
(2019 IFIP/IEEE 27th International conference on very large scale integration (VLSI-SOC), 2019) Souza, Wellington Silva de; Iranfar, Arman; Silva, Anderson; Zapater, Marina; Souza, Samuel Xavier de; Olcoz Herrero, Katzalin; Atienza, David
In this work we present ContainEnergy, a new performance evaluation and profiling tool that uses software containers to perform application runtime assessment, providing energy and performance profiling data. It is focused on energy efficiency for next generation workloads and IT infrastructure.
Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra
(Parallel Computing, 2017) Costero Valero, Luis María; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin; Catalán Pallarés, Sandra; Rodríguez Sánchez, Rafael; Quintana-Ortí, Enrique S.
Dealing with asymmetry in the architecture opens a plethora of questions related with the performance- and energy-efficient scheduling of task-parallel applications. While there exist early attempts to tackle this problem, for example via ad-hoc strategies embedded in a runtime framework, in this paper we take a different path, which consists in addressing the asymmetry at the library-level by developing a few asymmetry-aware fundamental kernels. The appealing consequence is that the architecture heterogeneity remains then hidden from the task scheduler. In order to illustrate the advantage of our approach, we employ two well-known matrix factorizations, key to the solution of dense linear systems of equations. From the perspective of the architecture, we consider two low-power processors, one of them equipped with ARM big.LITTLE technology; furthermore, we include in the study a different scenario, in which the asymmetry arises when the cores of an Intel Xeon server operate at two distinct frequencies. For the specific domain of dense linear algebra, we show that dealing with asymmetry at the library-level is not only possible but delivers higher performance than a naive approach based on an asymmetry-oblivious scheduler. Furthermore, this solution is also competitive in terms of performance compared with an ad-hoc asymmetry-aware scheduler furnished with sophisticated scheduling techniques.
Gem5-x: a gem5-based system level simulation framework to optimize many-core platforms
(2019 Spring Simulation Conference (SpringSim), 2019) Mahmood Qureshi, Yasir; Simon, William Andrew; Zapater, Marina; Atienza, David; Olcoz Herrero, Katzalin
The rapid expansion of online-based services requires novel energy and performance efficient architectures to meet power and latency constraints. Fast architectural exploration has become a key enabler in the proposal of architectural innovation. In this paper, we present gem5-X, a gem5-based system level simulation framework, and a methodology to optimize many-core systems for performance and power. As real-life case studies of many-core server workloads, we use real-time video transcoding and image classification using convolutional neural networks (CNNs). Gem5-X allows us to identify bottlenecks and evaluate the potential benefits of architectural extensions such as in-cache computing and 3D stacked High Bandwidth Memory. For real-time video transcoding, we achieve 15% speed-up using in-order cores with in-cache computing when compared to a baseline in-order system and 76% energy savings when compared to an Out-of-Order system. When using HBM, we further accelerate real-time transcoding and CNNs by up to 7% and 8% respectively.
MAMUT: Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-User Video Transcoding
(2019) Costero Valero, Luis María; Iranfar, Arman; Zapater Sancho, Marina; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin; Atienza Alonso, David
Real-time video transcoding has recently raised as a valid alternative to address the ever-increasing demand for video contents in servers’ infrastructures in current multi-user environments. High Efficiency Video Coding (HEVC) makes efficient online transcoding feasible as it enhances user experience by providing the adequate video configuration, reduces pressure on the network, and minimizes inefficient and costly video storage. However, the computational complexity of HEVC, together with its myriad of configuration parameters, raises challenges for power management, throughput control, and Quality of Service (QoS) satisfaction. This is particularly challenging in multi-user environments where multiple users with different resolution demands and bandwidth constraints need to be served simultaneously. In this work, we present MAMUT, a multiagent machine learning approach to tackle these challenges. Our proposal breaks the design space composed of run-time adaptation of the transcoder and system parameters into smaller sub-spaces that can be explored in a reasonable time by individual agents. While working cooperatively, each agent is in charge of learning and applying the optimal values for internal HEVC and system-wide parameters. In particular, MAMUT dynamically tunes Quantization Parameter, selects number of threads per video, and sets the operating frequency with throughput and video quality objectives under compression and power consumption constraints. We implement MAMUT on an enterprise multicore server and compare equivalent scenarios to state-ofthe-art alternative approaches. The obtained results reveal that MAMUT consistently attains up to 8x improvement in terms of FPS violations (and thus Quality of Service), 24% power reduction, as well as faster and more accurate adaptation both to the video contents and available resources.
Project number: 38
Metodología de internacionalización de material docente basada en el uso de Markdown y Pandoc
(2018) Sáez Alcaide, Juan Carlos; Sánchez-Elez Martín, Marcos; Risco Martín, José Luis; Castro Rodríguez, Fernando; Prieto Matías, Manuel; Sáez Puche, Regino; Chaver Martínez, Daniel; Olcoz Herrero, Katzalin; Clemente Barreira, Juan Antonio; Igual Peña, Francisco; García García, Adrián; Sánchez Foces, David
La internacionalización de la docencia ofrece grandes oportunidades para la Universidad, pero también plantea retos significativos para estudiantes y profesores. En particular, la creación y mantenimiento efectivo del material docente de una asignatura impartida simultáneamente en varios idiomas y con alto grado de coordinación entre los distintos grupos de la misma (p.ej., examen final/prácticas comunes para todos los estudiantes) puede suponer un importante desafío para los profesores. Para hacer frente a este problema, hemos diseñado una estrategia específica para la creación y gestión de material docente en dual (p.ej., inglés-español), y desarrollado un conjunto de herramientas multiplataforma para ponerla en práctica. La idea general es mantener en un mismo fichero de texto el contenido del documento que se desee construir en ambos idiomas, proporcionando justo detrás de cada párrafo y título en uno de los idiomas su traducción al otro idioma, empleando delimitadores especiales. Para crear estos documentos duales se emplea Markdown, un lenguaje de marcado ligero, que dada su sencillez y versatilidad está teniendo una rápida adopción por un amplio espectro de profesionales: desde escritores de novelas o periodistas, hasta administradores de sitios web. A partir de los documentos duales creados con Markdown, es posible generar automáticamente el documento final para cada idioma en el formato deseado que se pondrá a disposición de los estudiantes. Para esta tarea, nos basamos en el uso de la herramienta Pandoc, que permite realizar la conversión de documentos Markdown a una gran cantidad de formatos, como PDF, docx (Microsoft Word), EPUB (libro electrónico) o HTML. Como parte de nuestro proyecto, hemos creado extensiones de Pandoc para permitir la creación de documentos duales en Markdown y para aumentar la expresividad de este lenguaje con construcciones comunmente utilizadas en documentos de carácter docente.
Energy efficiency optimization of task-parallel codes on asymmetric architectures
(2017) Francisco D. Igual; Costero Valero, Luis María; Igual Peña, Francisco Daniel; Olcoz Herrero, Katzalin; Tirado Fernández, José Francisco
We present a family of policies that, integrated within a runtime task scheduler (Nanox), pursue the goal of improving the energy efficiency of task-parallel executions with no intervention from the programmer. The proposed policies tackle the problem by modifying the core operating frequency via DVFS mechanisms, or by enabling/disabling the mapping of tasks to specific cores at selected execution points, depending on the internal status of the scheduler. Experimental results on an asymmetric SoC (Exynos 5422) and for a specific operation (Cholesky factorization) reveal gains up to 29% in terms of energy efficiency and considerable reductions in average power