Evaluación de rendimiento y eficiencia energética de procesos de inferencia en GPUs
Loading...
Official URL
Full text at PDC
Publication date
2024
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
La compañía NVIDIA se sitúa como un referente de innovación y desarrollo de GPUs, colocándose entre las compañías que más están creciendo en estos últimos meses. Con la nueva gama de GPUs Blackwell publicada este mismo año 2024, vemos la importancia que se le está dando a este sector, como un gran proyecto de futuro. El uso de gran cantidad de datos procedentes del auge de la inteligencia artificial hace que estas máquinas tengan tanta importancia. Parte de las investigaciones que se llevan a cabo son formas de sacar el máximo partido de estas máquinas debido a la cantidad inmensa de procesadores que contienen. Es esta misma premisa lo que se intenta conseguir con este trabajo, haciendo uso del bechmark MLPerf. A través del modo de trabajo Offline que proporciona el banco de trabajo, se estudian una serie de modelos de inferencia que van desde el reconocimiento automático de voz hasta la división de imágenes en 3D para ´ámbito biomédico. Estos modelos ya entrenados se analizan en base a su rendimiento en cuanto a latencia, throughput 1 y rendimiento energético. La tecnología utilizada para contrastar resultados se basa en una NVIDIA A30 y una Jetson Orin AGX, haciendo uso de la herramienta nvpmodel que proporciona a esta última para limitar la potencia máxima consumida. Se ha conseguido ver cómo al aumentar el número de peticiones máximas que se le pasa al modelo, el número de muestras procesadas por segundo aumenta. Esto es debido a que las GPUs son capaces de paralelizar el gran número de peticiones entre sus los distintos cores. Por otro lado, se ha demostrado que la eficiencia energética mejora a medida que disminuye la potencia máxima de la máquina. Esto se debe a que en el proceso de inferencia no se está proporcionando una suficiente carga de trabajo, de forma que no se puede paralelizar en las máquinas de mayor potencia. Por último, cuando se estudia la latencia se ha contrastado que al aumentar el batch que se le pasa al modelo (i.e. el número de peticiones), cuando se fija el número de muestras esperadas al número de muestras resultantes, el tiempo de ejecución no varía.
NVIDIA has become a leading innovator in GPU development, positioning itself among the fastest-growing companies in recent months. With the release of the new Blackwell GPU series in 2024, we see the importance that is being placed on this sector, with a promising future ahead. The use of vast amount of data driven by the rise of artificial intelligence highlights the importance of these machines. Part of the ongoing research focuses on maximizing the performance due to the massive amount of cores they have. This same premise is carried through in this work, using MLPerf. With the Offline mode provided by the benchmark, a series of models are studied, ranging from automatic speech recognition to 3D image segmentation for biomedical applications. These pre-trained models are analyzed for their performance in terms of latency, throughput and energy efficiency. The technology used to compare the results is a NVIDIA A30 and a Jetson Orin AGX, using the nvpmodel tool provided by the latter to cap the maximum power consumption. It has been observed that as the maximum number of requests passed to the model increases, so does the rate of processed samples per second. This is due to the GPUs being able to parallelize the great number of requests among their different cores. On the other hand, it has also been observed that the energetic efficiency rises with decreasing maximum power of the machine. The reason for this is that during the inference process there would not be a sufficiently high workload, so parallelization in the highest power machines is not optimized. Lastly, in studying the latency it was apparent that increasing the batch size for the model (i.e., the number of requests), and fixing the number of expected samples to the number of resulting samples, the execution time does not vary.
NVIDIA has become a leading innovator in GPU development, positioning itself among the fastest-growing companies in recent months. With the release of the new Blackwell GPU series in 2024, we see the importance that is being placed on this sector, with a promising future ahead. The use of vast amount of data driven by the rise of artificial intelligence highlights the importance of these machines. Part of the ongoing research focuses on maximizing the performance due to the massive amount of cores they have. This same premise is carried through in this work, using MLPerf. With the Offline mode provided by the benchmark, a series of models are studied, ranging from automatic speech recognition to 3D image segmentation for biomedical applications. These pre-trained models are analyzed for their performance in terms of latency, throughput and energy efficiency. The technology used to compare the results is a NVIDIA A30 and a Jetson Orin AGX, using the nvpmodel tool provided by the latter to cap the maximum power consumption. It has been observed that as the maximum number of requests passed to the model increases, so does the rate of processed samples per second. This is due to the GPUs being able to parallelize the great number of requests among their different cores. On the other hand, it has also been observed that the energetic efficiency rises with decreasing maximum power of the machine. The reason for this is that during the inference process there would not be a sufficiently high workload, so parallelization in the highest power machines is not optimized. Lastly, in studying the latency it was apparent that increasing the batch size for the model (i.e., the number of requests), and fixing the number of expected samples to the number of resulting samples, the execution time does not vary.
Description
Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2023/2024.
El repositorio con las aportaciones, scripts y resultados obtenidos se encuentra en este enlace:
https://github.com/GonzaloIslaLlave/TFG-Evaluacion-de-rendimiento-y-eficiencia-energetica-de-procesos-de-inferencia-en-GPUs













