Person:
Sáez Alcaide, Juan Carlos

First Name

Juan Carlos

Last Name

Sáez Alcaide

URI

https://hdl.handle.net/20.500.14352/78726

Affiliation

Universidad Complutense de Madrid

Faculty / Institute

Informática

Department

Arquitectura de Computadores y Automática

Area

Arquitectura y Tecnología de Computadores

Identifiers

Full item page

Search Results

Now showing 1 - 10 of 10

LFOC+: A Fair OS-Level Cache-Clustering Policy for Commodity Multicore Systems
(IEEE Transactions on Computers, 2022) Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Fanizzi, Graziano; Prieto Matías, Manuel
Commodity multicore systems are increasingly adopting hardware support that enables the system software to partition the last-level cache (LLC). This support makes it possible for the operating system (OS) or the Virtual Machine Monitor (VMM) to mitigate shared-resource contention effects on multicores by assigning different co-running applications to various cache partitions. Recently cache-clustering (or partition-sharing) strategies have emerged as a way to improve system throughput and fairness on new platforms with cache-partitioning support. As opposed to strict cache-partitioning, which allocates separate cache partitions to each application, cache-clustering allows partitions to be shared by a group of applications. In this article we propose LFOC+, a fairness-aware OS-level cache-clustering policy for commodity multicore systems. LFOC+ tries to mimic the behavior of the optimal cache-clustering solution for fairness, which we could obtain for different workload scenarios by using a simulation tool. Our dynamic cache-clustering strategy continuously gathers data fromperformancemonitoring counters to classify applications at runtime based on the degree of cache sensitivity and contentiousness, and effectively separates cache-sensitive applications fromaggressor programs to improve fairness,while providing acceptable system throughput.We implemented LFOC+ in the Linux kernel and evaluated it on a real systemfeaturing an Intel Skylake processor, wherewe compare its effectiveness to that of four previously proposed cache-clustering policies. Our experimental análisis reveals that LFOC+ constitutes a lightweight OS-level policy and improves fairness relative to two other state-of-the-art fairness-aware strategies –Dunn and LFOC–, by up to 22% and up to 20.6%, respectively, and by9% and 4.9%on average.
Calidad de servicio en procesadores con multithreading simultaneo (SMT)
(2006) Alonso Fernández, Alejandro; Morón Tabernero, Noelia; Sáez Alcaide, Juan Carlos; Prieto Matías, Manuel
SMT: Implementación de un algoritmo para dar Calidad de Servicio Aunque en general, con hyperthreading (mecanismo cuya base es tener varios procesadores lógicos sin tener todo el hardware duplicado) se consigue mayor productividad, dicha mejora puede conseguirse a expensas de disminuir los recursos disponibles para procesos críticos. Experimentos previos indican que para conseguir políticas que maximicen el número de plazos cumplidos por tareas con requisitos de tiempo real suave, no basta con la asignación de prioridades tradicional de Linux, sino que es necesario tener en cuenta SMT. Resulta interesante, por tanto, estudiar el efecto de una planificación basada en calidad de servicio que optimice el rendimiento de una tarea sin degradar la respuesta del resto de procesos ejecutándose en el sistema. Nuestra propuesta de calidad de servicio se ha llevado a cabo para procesadores de la familia Pentium 4 e Intel Xeon y sobre el kernel de Linux para la familia 2.6; cuyos mecanismos para calidad de servicio para este tipo de procesadores consideramos no están suficiéntemente estudiados. [ABSTRACT] SMT: Implementing an algorithm for Quality of Service Although normally, with hyperthreading (mecanism based in having various logical processors without having all the software duplicated), bigger throughput is obtained; this improvement can force a reduction in the available resources for critical processes. Previous experiments has shown that obtaining policies which maximize the number of carried out term’s by task with real time requirements, is not enough the traditional linux priority asignation, it’s necesary taking account of SMT. So it’s interesting estudying the effect of Quality of Service based policy that optimizes one task’s throughput without affecting other processes’ response time in the system. Our Quality of Service proposal has been developed for Pentium 4 e Intel Xeon family processors and over the kernel 2.6 series. To the best of our knowledge, scheduling mechanisms to obtain Quality of Service for this kind of processors hasn’t been yet implemented on real systems.
Planificación de procesos en sistemas multicore asimétricos = Thread Scheduling on Asymmetric Multicore Systems
(2011) Sáez Alcaide, Juan Carlos; Prieto Matías, Manuel
Symmetric-ISA (Instruction Set Architecture) asymmetric-performance multicore processors (AMPs) were shown to deliver higher performance per watt and area than its symmetric counterparts [1, 2], and so it is likely that future multicore processors will combine a few fast cores characterized by complex pipelines, high clock frequency, high area requirements and power consumption, and many slow ones, characterized by simple pipelines, low clock frequency, low area requirements and power consumption. Recent research has highlighted that eficiency of AMP systems could be improved using two kinds of core specializations [3, 4]. The former ensures that fast cores are used for those applications that eficiently utilize these cores' "expensive" features, while slow cores would be used for applications spending a majority of their execution time stalling the processor, thus utilizing complex cores ineficiently. The latter leverages the efectiveness of these systems by using fast cores to accelerate sequential phases of parallel applications, and devoting slow cores to running parallel phases. To fully tap into the potential of specialization, the operating system (OS) must be aware of the hardware asymmetry when making scheduling decisions and map applications to cores in consideration of their performance characteristics. While the design and the theoretical benefits of AMPs have been extensively investigated [1, 5], the study of real-world operating system support for these upcoming architectures has not been addressed comprehensively to date. So the questions as to whether this potential can be delivered eficiently by the operating system to unmodified applications, and what the associated overheads are remain open. In this thesis, we propose a set of OS-level scheduling algorithms aimed to unleash the potential of specialization. These algorithms have been implemented on an actual operating system and extensively evaluated on real multicore hardware made asymmetric via dynamic voltage and frequency scaling (DVFS). Notably, none of these algorithms require changes to applications but only moderate changes to the target operating system, providing proof of concept towards lightweight OS support for asymmetric hardware. Our evaluation also includes an extensive comparison with previously proposed asymmetry-aware schedulers to provide a clearer understanding of the pros and cons behind our proposals. [RESUMEN]Los procesadores multicore asimétricos con repertorio común de instrucciones – AMPs (Asymmetric Multicore Processors)– han sido propuestos recientemente como firme alternativa a los multicores simétricos actuales, prometiendo un mayor rendimiento por vatio. Por ello, es probable que próximas generaciones de procesadores multicore integren, en un mismo chip, unos pocos cores complejos junto con numerosos cores más simples y de bajo consumo. El potencial de los sistemas AMP puede extraerse principalmente mediante dos técnicas especialización de cores. La primera técnica asegura el uso de cores complejos por parte de las aplicaciones que explotan más eficientemente las sofisticadas características microarquitectónicas de éstos, y relega a cores simples el resto de aplicaciones. La segunda técnica explota la capacidad de aceleración monohilo de los cores complejos para la ejecución de fases secuenciales en las aplicaciones, mientras que las fases paralelas se ejecutan en cores simples. Aunque los beneficios de la especialización de cores se han hecho patentes en diversos estudios, no se ha llevado a cabo hasta la fecha un análisis exhaustivo del soporte necesario en un sistema operativo real que permita trasladar estos beneficios de manera transparente a las aplicaciones. En esta tesis hemos mostrado cómo y hasta qué punto, las estrategias de especialización pueden explotarse mediante planificación de procesos en el sistema operativo. Para ello, hemos propuesto diversos algoritmos de planificación para AMPs implementados en un sistema operativo real y evaluados exhaustivamente en plataformas multicore asimétricas emuladas. Las principales contribuciones de esta tesis son las técnicas propuestas para la detección y aceleración de fases secuenciales en software paralelo, así como los modelos de estimación del speedup que experimentan las aplicaciones al ejecutar en cores complejos con respecto a cores simples.
LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
(2019) García García, Adrián; Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Prieto Matías, Manuel
Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects. In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions. We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS.
Project number: 254
Diseño y aplicación al aula de un modelo de asistente semi-automático para procesos de aprendizaje presenciales
(2017) Guijarro Mata-García, María; Santos Peñas, Matilde; Fuentes Fernández, Rubén; Saenz Pérez, Fernando; Navarro Martín, Antonio; Fernández Prados, Juan Sebastián; Vicente Hernanz, María Lina; Guijarro de Mata-García, Marta; Prieto Fernández, Lucía Amparo; Garnica Alcázar, Antonio Óscar; Jiménez Castellanos, Juan Francisco; Fernández, Isabel; Sáez Alcaide, Juan Carlos
Project number: 38
Metodología de internacionalización de material docente basada en el uso de Markdown y Pandoc
(2018) Sáez Alcaide, Juan Carlos; Sánchez-Elez Martín, Marcos; Risco Martín, José Luis; Castro Rodríguez, Fernando; Prieto Matías, Manuel; Sáez Puche, Regino; Chaver Martínez, Daniel; Olcoz Herrero, Katzalin; Clemente Barreira, Juan Antonio; Igual Peña, Francisco; García García, Adrián; Sánchez Foces, David
La internacionalización de la docencia ofrece grandes oportunidades para la Universidad, pero también plantea retos significativos para estudiantes y profesores. En particular, la creación y mantenimiento efectivo del material docente de una asignatura impartida simultáneamente en varios idiomas y con alto grado de coordinación entre los distintos grupos de la misma (p.ej., examen final/prácticas comunes para todos los estudiantes) puede suponer un importante desafío para los profesores. Para hacer frente a este problema, hemos diseñado una estrategia específica para la creación y gestión de material docente en dual (p.ej., inglés-español), y desarrollado un conjunto de herramientas multiplataforma para ponerla en práctica. La idea general es mantener en un mismo fichero de texto el contenido del documento que se desee construir en ambos idiomas, proporcionando justo detrás de cada párrafo y título en uno de los idiomas su traducción al otro idioma, empleando delimitadores especiales. Para crear estos documentos duales se emplea Markdown, un lenguaje de marcado ligero, que dada su sencillez y versatilidad está teniendo una rápida adopción por un amplio espectro de profesionales: desde escritores de novelas o periodistas, hasta administradores de sitios web. A partir de los documentos duales creados con Markdown, es posible generar automáticamente el documento final para cada idioma en el formato deseado que se pondrá a disposición de los estudiantes. Para esta tarea, nos basamos en el uso de la herramienta Pandoc, que permite realizar la conversión de documentos Markdown a una gran cantidad de formatos, como PDF, docx (Microsoft Word), EPUB (libro electrónico) o HTML. Como parte de nuestro proyecto, hemos creado extensiones de Pandoc para permitir la creación de documentos duales en Markdown y para aumentar la expresividad de este lenguaje con construcciones comunmente utilizadas en documentos de carácter docente.
Planificación simbiótica en arquitecturas CMP
(2007) Sáez Alcaide, Juan Carlos; Prieto Matías, Manuel
Este trabajo presenta un planificador simbiótico a nivel de sistema operativo para arquitecturas CMP que desarrolla una política de calidad de servicio basada en la desactivación de cores. El término simbiosis se utiliza actualmente para referirse a la efectividad con la que se obtiene mayor rendimiento al ejecutar múltiples hilos simultáneamente en arquitecturas multithreading (MT) [12]. Sin embargo, este concepto puede extenderse a arquitecturas CMP (y en consecuencia a arquitecturas CMT) ya que sigue existiendo un notable índice de compartición de recursos (L2 cache o Front Side Bus) cuyo impacto sobre el rendimiento de las aplicaciones actuales sigue siendo crítico [13]. El planificador simbiótico ha sido implementado sobre la versión 2.6.21 de Linux ejecutando sobre una arquitectura CMP de dos vías (Intel Core 2 Duo). En este tipo de arquitecturas, el planificador de Linux 2.6.x garantiza la calidad de servicio para procesos que ejecutan en un mismo core. Sin embargo, el sistema permite la ejecución de dos tareas de distinta prioridad en distintos cores ignorando las posibles degradaciones del rendimiento de la tarea más prioritaria por motivos de conflicto por el uso de los recursos compartidos por los cores. Por este motivo, Linux no ofrece calidad de servicio (QoS) para procesos que ejecuten en distintos cores.
Reuse detector: improving the management of STT-RAM SLLCs
(The Computer Journal, 2018) Rodríguez Rodríguez, Roberto Alonso; Díaz, Javier; Castro Rodríguez, Fernando; Ibáñez, Pablo; Chaver Martínez, Daniel Ángel; Viñals, Víctor; Sáez Alcaide, Juan Carlos; Prieto Matías, Manuel; Piñuel Moreno, Luis; Monreal, Teresa; Llabería, José María
Various constraints of Static Random Access Memory (SRAM) are leading to consider new memory technologies as candidates for building on-chip shared last-level caches (SLLCs). Spin-Transfer Torque RAM (STT-RAM) is currently postulated as the prime contender due to its better energy efficiency, smaller die footprint and higher scalability. However, STT-RAM also exhibits some drawbacks, like slow and energy-hungry write operations that need to be mitigated before it can be used in SLLCs for the next generation of computers. In this work, we address these shortcomings by leveraging a new management mechanism for STT-RAM SLLCs. This approach is based on the previous observation that although the stream of references arriving at the SLLC of a Chip MultiProcessor (CMP) exhibits limited temporal locality, it does exhibit reuse locality, i.e. those blocks referenced several times manifest high probability of forthcoming reuse. As such, conventional STT-RAM SLLC management mechanisms, mainly focused on exploiting temporal locality, result in low efficient behavior. In this paper, we employ a cache management mechanism that selects the contents of the SLLC aimed to exploit reuse locality instead of temporal locality. Specifically, our proposal consists in the inclusion of a Reuse Detector (RD) between private cache levels and the STT-RAM SLLC. Its mission is to detect blocks that do not exhibit reuse, in order to avoid their insertion in the SLLC, hence reducing the number of write operations and the energy consumption in the STT-RAM. Our evaluation, using multiprogrammed workloads in quad-core, eight-core and 16-core systems, reveals that our scheme reports on average, energy reductions in the SLLC in the range of 37–30%, additional energy savings in the main memory in the range of 6–8% and performance improvements of 3% (quadcore), 7% (eight-core) and 14% (16-core) compared with an STT-RAM SLLC baseline where no RD is employed. More importantly, our approach outperforms DASCA, the state-of-the-art STT-RAM SLLC management, reporting —depending on the specific scenario and the kind of applications used— SLLC energy savings in the range of 4–11% higher than those of DASCA, delivering higher performance in the range of 1.5–14% and additional improvements in DRAM energy consumption in the range of 2–9% higher than DASCA.
Enabling performance portability of data-parallel OpenMP applications on asymmetric multicore processors
(2020) Sáez Alcaide, Juan Carlos; Castro Rodríguez, Fernando; Prieto Matías, Manuel
Asymmetric multicore processors (AMPs) couple high-performance big cores and low-power small cores with the same instruction-set architecture but different features, such as clock frequency or microarchitecture. Previous work has shown that asymmetric designs may deliver higher energy efficiency than symmetric multicores for diverse workloads. Despite their benefits, AMPs pose significant challenges to runtime systems of parallel programming models. While previous work has mainly explored how to efficiently execute task-based parallel applications on AMPs, via enhancements in the runtime system, improving the performance of unmodified data-parallel applications on these architectures is still a big challenge. In this work we analyze the particular case of loop-based OpenMP applications, which are widely used today in scientific and engineering domains, and constitute the dominant application type in many parallel benchmark suites used for performance evaluation on multicore systems. We observed that conventional loop-scheduling OpenMP approaches are unable to efficiently cope with the load imbalance that naturally stems from the different performance delivered by big and small cores. To address this shortcoming, we propose Asymmetric Iteration Distribution (AID), a set of novel loop-scheduling methods for AMPs that distribute iterations unevenly across worker threads to efficiently deal with performance asymmetry. We implemented AID in libgomp –the GNU OpenMP runtime system–, and evaluated it on two different asymmetric multicore platforms. Our analysis reveals that the AID methods constitute effective replacements of the static and dynamic methods on AMPs, and are capable of improving performance over these conventional strategies by up to 56% and 16.8%, respectively.
Project number: 172
Integración de los servicios para.TI@UCM en una plataforma de e-learning similar al Campus Virtual
(2014) Sánchez-Elez Martín, Marcos; Risco Martín, José Lui; Pardines Lence, Inmaculada; Miñana Ropero, Guadalupe; Garnica Alcázar, Oscar; Gómez Pérez, José Ignacio; Olcoz Herrero, Katzalin; Chaver Martínez, Daniel Ángel; Castro Rodríguez, Fernando; Sáez Alcaide, Juan Carlos; Igual Peña, Francisco
La integración de los servicios para.TI@UCM en nuestra Universidad hace plantearnos nuevas metodologías docentes y de evaluación en el proceso de enseñanza-aprendizaje. Este proyecto surge como continuación del proyecto PIMCD UCM 138 (2013) titulado “Uso de los servicios para.TI@UCM para integrar tareas docentes y fomentar el aprendizaje activo y colaborativo de los alumnos” desarrollado por este mismo grupo de profesores. Como resultado de este proyecto se han elaborado una serie de tutoriales sobre el uso de las aplicaciones de Google en el ámbito de las tareas docentes como herramientas útiles para fomentar el aprendizaje de los alumnos. Partiendo del nuevo marco docente creado en el PIMCD UCM 138 (2013) donde tanto el material docente como las actividades propuestas a los alumnos se desarrollan en la nube, el objetivo de este nuevo proyecto es conseguir integrar todas las aplicaciones necesarias para un desarrollo completo de la actividad docente en la nube (para.TI@UCM), tanto las propietarias de Google como las desarrolladas por terceros. Nuestro objetivo es intentar crear una plataforma de e-learning similar al Campus Virtual. Para realizar esta tarea será necesario realizar un estudio, por un lado, de las funcionalidades que ofrece el Campus Virtual, y por otro, de cuáles de estas funcionalidades están disponibles en los recursos para.TI@UCM. El siguiente paso sería plantear cómo se pueden implementar las funcionalidades buscadas y no encontradas en para.TI@UCM usando como base las aplicaciones de Google.