Performance assessment of credit risk models with boosting algorithms and transfer learning from Large Language Models

Sanz Guerrero, Mario

Performance assessment of credit risk models with boosting algorithms and transfer learning from Large Language Models

Download

58091_MARIO_SANZ_GUERRERO_TFG_-_Mario_Sanz_Guerrero_2404368_806265036 (1).pdf (4.98 MB)

Publication date

2023

Authors

Sanz Guerrero, Mario

Advisors (or tutors)

Arroyo Gallardo, Javier

Caparrini López, Antonio

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/101304

Abstract

El objetivo de este Trabajo de Fin de Grado es explorar el potencial de las técnicas de aprendizaje automático y aprendizaje por transferencia en el análisis del riesgo crediticio en préstamos entre particulares (P2P). La ausencia de un intermediario financiero tradicional en los préstamos P2P genera una significativa asimetría de información, lo que aumenta el riesgo. Una forma de mitigar este riesgo es predecir correctamente si el préstamo será o no devuelto. Sin embargo, al momento de otorgar el préstamo, la información disponible es limitada. En este estudio, se propone una aproximación novedosa mediante el uso de la descripción proporcionada por el prestatario, la cual consiste en texto libre sin estructura. Para aprovechar esta información e incorporarla en un algoritmo de clasificación tradicional, utilizaremos aprendizaje por transferencia con redes neuronales profundas, específicamente, el modelo BERT, un modelo grande de lenguaje (Large Language Model) desarrollado por Google en 2018 que se utiliza ampliamente en tareas de clasificación. Este estudio se fundamenta en los resultados de investigaciones recientes que emplean técnicas avanzadas de aprendizaje automático, como algoritmos de gradient boosting, optimización de hiperparámetros mediante algoritmos genéticos y técnicas de IA explicables, para analizar el papel de las variables de entrada. A estos modelos se les añadirá como variable de entrada una puntuación generada por el modelo BERT, la cual indica la probabilidad de impago basándose en la descripción del préstamo. Nuestro trabajo demuestra que las descripciones contienen información útil para predecir el impago, y que su inclusión mejora significativamente el rendimiento de los modelos de otorgamiento de créditos.
The objective of this Bachelor’s Thesis is to explore the potential of machine learning techniques and transfer learning in the analysis of credit risk in peer-topeer (P2P) lending. The absence of a traditional financial intermediary in P2P lending creates a significant information asymmetry, which increases the risk. One way to mitigate this risk is to accurately predict whether the loan will default or not. However, at the time of granting the loan, the available information is limited. In this study, a novel approach is proposed using the borrower’s description, which consists of unstructured free text. To leverage this information and incorporate it into a traditional classification algorithm, we will use transfer learning with deep neural networks, specifically the BERT model, a Large Language Model developed by Google in 2018 that is widely used in classification tasks. This study builds on the findings of recent research that employ advanced machine learning techniques, such as gradient boosting algorithms, hyperparameter optimization using genetic algorithms, and explainable AI techniques, to analyze the role of input variables. These models will be supplemented with an input variable representing a score generated by the BERT model, which indicates the probability of default based on the loan description. Our work demonstrates that descriptions contain useful information for predicting default, and their inclusion significantly improves the performance of creditgranting models.

Description

Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Ingeniería de Software e Inteligencia Artificial (ISIA), Curso 2022/2023.

UCM subjects

Informática (Informática)

Unesco subjects

33 Ciencias Tecnológicas

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)

Full item page

Performance assessment of credit risk models with boosting algorithms and transfer learning from Large Language Models

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections