Bandidos Contextuales: Fundamentos y Aplicaciones

Hernández Roldán, Iván; Magarzo Gonzalo, Alejandro

Bandidos Contextuales: Fundamentos y Aplicaciones

dc.contributor.advisor	Palomino Tarjuelo, Miguel
dc.contributor.author	Hernández Roldán, Iván
dc.contributor.author	Magarzo Gonzalo, Alejandro
dc.date.accessioned	2023-09-11T15:50:55Z
dc.date.available	2023-09-11T15:50:55Z
dc.date.issued	2023
dc.degree.title	Doble Grado en Ingeniería Informática y Administración y Dirección de Empresas
dc.description	Trabajo de Fin de Grado en Ingeniería Informática y Trabajo de Fin de Doble Grado en Ingeniería Informática y Administración y Dirección de Empresas, Facultad de Informática UCM, Departamento de Sistemas Informáticos y de Computación, Curso 2022/2023.
dc.description.abstract	Como punto de partida, se abordan los fundamentos teóricos subyacentes a los bandidos multi-brazo, preparando así el terreno para la profundización en los bandidos contextuales. Los bandidos, como elemento fundamental en el aprendizaje por refuerzo, ofrecen una respuesta eficiente a los problemas básicos del dilema de la exploración frente a la explotación. Un problema de bandidos implica un juego secuencial entre un agente y un entorno, donde en cada ronda el agente tiene varias acciones a su disposición y debe elegir una para recibir la recompensa correspondiente como resultado. Basado en las recompensas anteriores, el agente deberá mejorar su toma de decisiones para obtener la máxima recompensa acumulada al final del juego, manteniendo un balance entre explorar acciones menos probadas y explotar la mejor acción según la información que posee. Además, se explican los bandidos estocásticos y antagonistas como preludio para presentar varios algoritmos que serán de gran utilidad en una variante particular del modelo de bandidos: los bandidos contextuales. En este tipo de bandido, cada acción disponible está asociada a una distribución de probabilidad de recompensas, desconocida de antemano por el agente, de la cual se obtiene la recompensa correspondiente tras elegir una acción. Por lo tanto, el agente tratará de maximizar sus recompensas eligiendo los brazos que mayor recompensa media tengan en función del contexto. A lo largo de este trabajo se presentan los algoritmos que resuelven los problemas de los bandidos planteados y se comparan sus rendimientos a través de la métrica del remordimiento. También, se tratan las diferencias entre los remordimientos de los algoritmos que se adaptan al contexto y los que no gracias a la exposición de un juego contextual. Tras abordar cada concepto teórico del ´ámbito de los bandidos contextuales, se expone una aplicación práctica en consonancia para estudiar el desempeño de los bandidos contextuales en diversos dominios. Las principales aportaciones prácticas de este trabajo se localizan dentro del sector financiero, concretamente en el departamento de la automatización de la inversión en el mercado de valores a través de los bots de comercio, y en el mundo digital, realizando un sistema recomendador de películas.
dc.description.abstract	As a starting point, the theoretical foundations underlying multi-armed bandits are addressed, thereby laying the groundwork for a deep dive into contextual bandits. Bandits, as a fundamental element in reinforcement learning, provide an efficient response to the basic problems of the exploration-exploitation dilemma. A bandit problem involves a sequential game between an agent and an environment, where in each round the agent has several actions at his disposal and must choose one to receive the corresponding reward as a result. Based on past rewards, the agent should improve his decision-making to obtain the greatest accumulated reward at the end of the game, maintaining a balance between exploring lesser-tested actions and exploiting the best action according to the information he possesses. In addition, stochastic and adversarial bandits are explained as a prelude to presenting various algorithms that will be extremely useful in a particular variant of the bandit model: the contextual bandits. In this type of bandit, each available action is associated with a probability distribution of rewards, unknown to the agent beforehand, from which the corresponding reward is obtained after choosing an action. Therefore, the agent will try to maximize his rewards by choosing the arms with the highest average reward based on the context. Throughout this work, algorithms that solve the problems posed by bandits are presented and their performances are compared through the metric of regret. Also, the differences between the regrets of algorithms that adapt to the context and those that do not are discussed, thanks to the exposition of a contextual game. After addressing each theoretical concept in the field of contextual bandits, a practical application is presented to study the performance of contextual bandits in various domains. The main practical contributions of this work are located within the financial sector, specifically in the department of automating investment in the stock market through trading bots, and in the digital world, by implementing a movie recommendation system.
dc.description.department	Depto. de Sistemas Informáticos y Computación
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	unpub
dc.identifier.uri	https://hdl.handle.net/20.500.14352/87700
dc.language.iso	spa
dc.page.total	105
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.cdu	004(043.3)
dc.subject.keyword	Bandidos multi-brazo
dc.subject.keyword	Exploración-explotación
dc.subject.keyword	Remordimiento
dc.subject.keyword	Bandidos estocásticos
dc.subject.keyword	Bandidos antagonistas
dc.subject.keyword	Bandidos contextuales
dc.subject.keyword	Clase política
dc.subject.keyword	Exp4
dc.subject.keyword	Bots de comercio
dc.subject.keyword	Sistema recomendador
dc.subject.keyword	Multi-armed bandits
dc.subject.keyword	Exploration-exploitation
dc.subject.keyword	Regret
dc.subject.keyword	Stochastic bandits
dc.subject.keyword	Adversarial bandits
dc.subject.keyword	Contextual bandits
dc.subject.keyword	Policy class
dc.subject.keyword	Trading bots
dc.subject.keyword	Recommendation system
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	33 Ciencias Tecnológicas
dc.title	Bandidos Contextuales: Fundamentos y Aplicaciones
dc.title.alternative	Contextual Bandits: Foundations and Applications
dc.type	bachelor thesis
dc.type.hasVersion	AM
dspace.entity.type	Publication
relation.isAdvisorOfPublication	52909b00-b705-4307-84db-d3211eedef69
relation.isAdvisorOfPublication.latestForDiscovery	52909b00-b705-4307-84db-d3211eedef69

Download

Original bundle

Now showing 1 - 1 of 1

Name:: 88302_ALEJANDRO_MAGARZO_GONZALO_Bandidos_Contextuales_67_2404368_2008781148.pdf
Size:: 4.37 MB
Format:: Adobe Portable Document Format

Download

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)