Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers

Huang, Darong; Pahlevan, Ali; Costero Valero, Luis María; Zapater, Marina; Atienza Alonso, David

Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers

dc.contributor.author	Huang, Darong
dc.contributor.author	Pahlevan, Ali
dc.contributor.author	Costero Valero, Luis María
dc.contributor.author	Zapater, Marina
dc.contributor.author	Atienza Alonso, David
dc.date.accessioned	2024-01-30T12:29:21Z
dc.date.available	2024-01-30T12:29:21Z
dc.date.issued	2022-03-11
dc.description.abstract	Computing servers play a key role in the development and process of emerging compute-intensive applications in recent years. However, they need to operate efficiently from an energy perspective viewpoint, while maximizing the performance and lifetime of the hottest server components (i.e., cores and cache). Previous methods focused on either improving energy efficiency by adopting new hybrid-cache architectures including the resistive random-access memory (RRAM) and static random-access memory (SRAM) at the hardware level, or exploring tradeoffs between lifetime limitation and performance of multicore processors under stable workloads conditions. Therefore, no work has so far proposed a co-optimization method with hybrid-cache-based server architectures for real-life dynamic scenarios taking into account scalability, performance, lifetime reliability, and energy efficiency at the same time. In this article, we first formulate a reliability model for the hybrid-cache architecture to enable precise lifetime reliability management and energy efficiency optimization. We also include the performance and energy overheads of cache switching, and optimize the benefits of hybrid-cache usage for better energy efficiency and performance. Then, we propose a runtime Q-learning-based reliability management and performance optimization approach for multicore microprocessors with the hybrid-cache architecture, jointly incorporated with a dynamic preemptive priority queue management method to improve the overall tasks’ performance by targeting to respect their end time limits. Experimental results show that our proposed method achieves up to 44% average performance (i.e., tasks execution time) improvement, while maintaining the whole system design lifetime longer than five years, when compared to the latest state-of-the-art energy efficiency optimization and reliability management methods for computing servers.
dc.description.department	Depto. de Arquitectura de Computadores y Automática
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	pub
dc.identifier.doi	10.1109/TCAD.2022.3158832
dc.identifier.issn	1937-4151
dc.identifier.uri	https://hdl.handle.net/20.500.14352/96484
dc.issue.number	12
dc.journal.title	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
dc.language.iso	eng
dc.page.final	5609
dc.page.initial	5596
dc.publisher	IEEE
dc.rights	Attribution-ShareAlike 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-sa/4.0/
dc.subject.ucm	Inteligencia artificial (Informática)
dc.subject.ucm	Programación de ordenadores (Informática)
dc.subject.unesco	3304.06 Arquitectura de Ordenadores
dc.title	Reinforcement Learning-Based Joint Reliability and Performance Optimization for Hybrid-Cache Computing Servers
dc.type	journal article
dc.volume.number	41
dspace.entity.type	Publication
relation.isAuthorOfPublication	b2616c88-d3da-43df-86cb-3ced1084f460
relation.isAuthorOfPublication	cbef6c8a-04b5-428f-b092-c8399eb856a4
relation.isAuthorOfPublication.latestForDiscovery	b2616c88-d3da-43df-86cb-3ced1084f460

Download

Original bundle

Now showing 1 - 1 of 1

Name:: Zapater_2022_reinforcement-learning-based.pdf
Size:: 7.69 MB
Format:: Adobe Portable Document Format

Download

Collections

Artículos