Aprendizaje por refuerzo de videojuegos RPG basado en exploración: el caso de Pokémon Rojo
Loading...
Official URL
Full text at PDC
Publication date
2025
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
La inteligencia artificial (IA) ha experimentado grandes avances en los últimos años gracias a los modelos de lenguaje (LLM) y, sin embargo, sigue teniendo dificultades en algunos videojuegos donde la exploración es un componente principal. En estos casos, el aprendizaje por refuerzo tradicional sigue ofreciendo mejores resultados. Los videojuegos representan un entorno ideal para probar técnicas de inteligencia artificial, ya que presentan escenarios complejos que exigen habilidades como la planificación, la toma de decisiones y la adaptación constante. En este contexto, el uso de la exploración como forma de recompensa intrínseca dentro del aprendizaje por refuerzo ha sido una estrategia común, aunque con amplio margen de desarrollo. La exploración de un mapa es un problema aplicable a multitud de escenarios: desde robots de limpieza automáticos hasta operaciones de búsqueda y rescate de personas. Este trabajo evalúa el potencial de la exploración como única recompensa para un agente de aprendizaje por refuerzo. Usando diferentes arquitecturas de red neuronal como LSTM y CNN, técnicas como Go-Explore y precalentamiento de estado LSTM, y auxiliado por la herramienta de explicabilidad Captum, se plantea un entrenamiento en el que el agente mapea la pantalla del videojuego Pokémon Rojo al igual que lo haría un humano: de maneara visual sin acceso a la memoria RAM. Se comprueba que esta recompensa es capaz de dotar al agente de comportamientos sorprendentemente humanos, navegación autónoma y la capacidad de avanzar en el videojuego sin necesidad de objetivos explícitos. A su vez, el mapa generado tiene interés en sí mismo, mostrando la capacidad del agente como sistema de mapeo automático.
Artificial Intelligence (AI) has seen major advances in recent years thanks to large language models (LLMs), yet it still struggles in certain video games where exploration is a key component. In these cases, traditional reinforcement learning continues to deliver better results. Video games provide an ideal environment for testing AI techniques, as they present complex scenarios that require skills such as planning, decision-making, and constant adaptation. In this context, exploration as a form of intrinsic reward in reinforcement learning has been a common strategy, although one with considerable room for development. Map exploration is a challenge applicable to many scenarios: from autonomous cleaning robots to search and rescue operations. This work evaluates the potential of exploration as the sole reward for a reinforcement learning agent. Using different neural network architectures such as LSTMs and CNNs, techniques like Go-Explore and LSTM state pretraining, and aided by the explainability tool Captum, a training setup is proposed in which the agent maps the screen of the video game Pokemon Red in a way similar to a human player: visually, with no RAM access. The results show that this reward is capable of producing surprisingly human-like behaviors, autonomous navigation, and the ability to progress through the game without explicit objectives. Additionally, the generated map is valuable in itself, demonstrating the agent’s capability as an automatic mapping system.
Artificial Intelligence (AI) has seen major advances in recent years thanks to large language models (LLMs), yet it still struggles in certain video games where exploration is a key component. In these cases, traditional reinforcement learning continues to deliver better results. Video games provide an ideal environment for testing AI techniques, as they present complex scenarios that require skills such as planning, decision-making, and constant adaptation. In this context, exploration as a form of intrinsic reward in reinforcement learning has been a common strategy, although one with considerable room for development. Map exploration is a challenge applicable to many scenarios: from autonomous cleaning robots to search and rescue operations. This work evaluates the potential of exploration as the sole reward for a reinforcement learning agent. Using different neural network architectures such as LSTMs and CNNs, techniques like Go-Explore and LSTM state pretraining, and aided by the explainability tool Captum, a training setup is proposed in which the agent maps the screen of the video game Pokemon Red in a way similar to a human player: visually, with no RAM access. The results show that this reward is capable of producing surprisingly human-like behaviors, autonomous navigation, and the ability to progress through the game without explicit objectives. Additionally, the generated map is valuable in itself, demonstrating the agent’s capability as an automatic mapping system.
Description
Trabajo de Fin de Máster en Ingeniería Informática, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2024/2025.