Chat Gpt on the edge: evaluación de modelos de lenguaje natural en dispositivos de bajo consumo
Loading...
Official URL
Full text at PDC
Publication date
2024
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
El objetivo de este trabajo es realizar un estudio sobre la viabilidad de la ejecución de modelos de IA basados en transformers en dispositivos de bajo consumo. Este estudio se realizará desde una perspectiva doble: la viabilidad de ejecutar estos modelos en este tipo de dispositivos (en términos de rendimiento y consumo energético), y la usabilidad (a través de la creación de una aplicación web que permita su uso de forma remota). Este tipo de tecnologías están haciéndose cada vez más presentes en nuestras vidas con un crecimiento exponencial en los últimos años. Ejemplo de esto tenemos herramientas como ChatGPT, Copilot o DALL-E… Todas estas herramientas son extremadamente costosas de crear y sobre todo entrenar, y también, aunque en menor medida, el uso de estos modelos para producir resultados también es bastante costoso desde el punto de vista computacional dependiendo de la tarea a realizar. En este sentido, el uso de dispositivos de bajo consumo es crucial en contextos como inferencia en móviles o edge computing debido a que permite ejecutar modelos de inteligencia artificial de manera eficiente, sin agotar rápidamente la batería del dispositivo y sin comprometer su rendimiento. Esto es esencial para garantizar una experiencia de usuario fluida y prolongada, así como para facilitar la adopción masiva de aplicaciones de IA en dispositivos móviles y entornos de IOT, donde los recursos computacionales y energéticos son limitados
Por lo tanto, este estudio se basa en comprobar la viabilidad de este tipo de modelos en dispositivos de bajo consumo, puesto que la accesibilidad en estos tomará mucha importancia a la hora de su implementación en la sociedad. Además, se han realizado mediciones detalladas del tiempo de respuesta y del consumo de recursos y energía de los modelos.
Estos datos son esenciales para entender el rendimiento de los modelos en un entorno de producción real, donde los datos obtenidos serán cruciales para la viabilidad de un posible proyecto, por ello como parte de la investigación, se ha desarrollado una aplicación web para interactuar con los modelos de IA basados en transformers. Esta aplicación permite realizar peticiones a los modelos y obtener respuestas en tiempo real, recalcando la importancia del rendimiento de estos modelos en dispositivos de bajo consumo, que es el tipo de plataforma predominante donde se implementaran este tipo de aplicaciones.
The objective of this work is to conduct a study on the feasibility of running transformer-based AI models on low-power devices. This study will be carried out from a dual perspective: the feasibility of running these models on these types of devices (in terms of performance and energy consumption), and usability (through the creation of a web application that allows their use remotely). These types of technologies are becoming increasingly present in our lives with exponential growth in recent years. Examples of this include tools like ChatGPT, Copilot or DALL-E… All these tools are extremely expensive to create and especially train, and also, although to a lesser extent, the use of these models to produce results is also quite costly from a computational point of view depending on the task to be performed. In this sense, the use of low-power devices is crucial in contexts such as inference on mobiles or edge computing because it allows artificial intelligence models to be run efficiently, without quickly draining the device’s battery and without compromising its performance. This is essential to ensure a smooth and prolonged user experience, as well as to facilitate the massive adoption of AI applications on mobile devices and IOT environments, where computational and energy resources are limited. Therefore, this study is based on checking the viability of these types of models on low-power devices, since accessibility in these will take on a lot of importance when it comes to their implementation in society. In addition, detailed measurements of response time and resource and energy consumption of the models have been made. These data are essential to understand the performance of the models in a real production environment, where the data obtained will be crucial for the viability of a possible project, therefore as part of the research, a web application has been developed to interact with transformer-based AI models. This application allows requests to be made to the models and obtain responses in real time, emphasizing the importance of the performance of these models on low-power devices, which is the predominant type of platform where these types of applications will be implemented.
The objective of this work is to conduct a study on the feasibility of running transformer-based AI models on low-power devices. This study will be carried out from a dual perspective: the feasibility of running these models on these types of devices (in terms of performance and energy consumption), and usability (through the creation of a web application that allows their use remotely). These types of technologies are becoming increasingly present in our lives with exponential growth in recent years. Examples of this include tools like ChatGPT, Copilot or DALL-E… All these tools are extremely expensive to create and especially train, and also, although to a lesser extent, the use of these models to produce results is also quite costly from a computational point of view depending on the task to be performed. In this sense, the use of low-power devices is crucial in contexts such as inference on mobiles or edge computing because it allows artificial intelligence models to be run efficiently, without quickly draining the device’s battery and without compromising its performance. This is essential to ensure a smooth and prolonged user experience, as well as to facilitate the massive adoption of AI applications on mobile devices and IOT environments, where computational and energy resources are limited. Therefore, this study is based on checking the viability of these types of models on low-power devices, since accessibility in these will take on a lot of importance when it comes to their implementation in society. In addition, detailed measurements of response time and resource and energy consumption of the models have been made. These data are essential to understand the performance of the models in a real production environment, where the data obtained will be crucial for the viability of a possible project, therefore as part of the research, a web application has been developed to interact with transformer-based AI models. This application allows requests to be made to the models and obtain responses in real time, emphasizing the importance of the performance of these models on low-power devices, which is the predominant type of platform where these types of applications will be implemented.
Description
Trabajo de Fin de Grado en Ingeniería Informática y en Ingeniería de Computadores, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2023/2024