Performance assessment of credit risk models with boosting algorithms and transfer learning from Large Language Models
Loading...
Official URL
Full text at PDC
Publication date
2023
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
El objetivo de este Trabajo de Fin de Grado es explorar el potencial de las técnicas de aprendizaje automático y aprendizaje por transferencia en el análisis del riesgo crediticio en préstamos entre particulares (P2P). La ausencia de un intermediario financiero tradicional en los préstamos P2P genera una significativa asimetría de información, lo que aumenta el riesgo. Una forma de mitigar este riesgo es predecir correctamente si el préstamo será o no devuelto. Sin embargo, al momento de otorgar el préstamo, la información disponible es limitada. En este estudio, se propone una aproximación novedosa mediante el uso de la descripción proporcionada por el prestatario, la cual consiste en texto libre sin estructura. Para aprovechar esta información e incorporarla en un algoritmo de clasificación tradicional, utilizaremos aprendizaje por transferencia con redes neuronales profundas, específicamente, el
modelo BERT, un modelo grande de lenguaje (Large Language Model) desarrollado por Google en 2018 que se utiliza ampliamente en tareas de clasificación.
Este estudio se fundamenta en los resultados de investigaciones recientes que emplean técnicas avanzadas de aprendizaje automático, como algoritmos de gradient boosting, optimización de hiperparámetros mediante algoritmos genéticos y técnicas de IA explicables, para analizar el papel de las variables de entrada. A estos modelos se les añadirá como variable de entrada una puntuación generada por el modelo BERT, la cual indica la probabilidad de impago basándose en la descripción del préstamo.
Nuestro trabajo demuestra que las descripciones contienen información útil para predecir el impago, y que su inclusión mejora significativamente el rendimiento de los modelos de otorgamiento de créditos.
The objective of this Bachelor’s Thesis is to explore the potential of machine learning techniques and transfer learning in the analysis of credit risk in peer-topeer (P2P) lending. The absence of a traditional financial intermediary in P2P lending creates a significant information asymmetry, which increases the risk. One way to mitigate this risk is to accurately predict whether the loan will default or not. However, at the time of granting the loan, the available information is limited. In this study, a novel approach is proposed using the borrower’s description, which consists of unstructured free text. To leverage this information and incorporate it into a traditional classification algorithm, we will use transfer learning with deep neural networks, specifically the BERT model, a Large Language Model developed by Google in 2018 that is widely used in classification tasks. This study builds on the findings of recent research that employ advanced machine learning techniques, such as gradient boosting algorithms, hyperparameter optimization using genetic algorithms, and explainable AI techniques, to analyze the role of input variables. These models will be supplemented with an input variable representing a score generated by the BERT model, which indicates the probability of default based on the loan description. Our work demonstrates that descriptions contain useful information for predicting default, and their inclusion significantly improves the performance of creditgranting models.
The objective of this Bachelor’s Thesis is to explore the potential of machine learning techniques and transfer learning in the analysis of credit risk in peer-topeer (P2P) lending. The absence of a traditional financial intermediary in P2P lending creates a significant information asymmetry, which increases the risk. One way to mitigate this risk is to accurately predict whether the loan will default or not. However, at the time of granting the loan, the available information is limited. In this study, a novel approach is proposed using the borrower’s description, which consists of unstructured free text. To leverage this information and incorporate it into a traditional classification algorithm, we will use transfer learning with deep neural networks, specifically the BERT model, a Large Language Model developed by Google in 2018 that is widely used in classification tasks. This study builds on the findings of recent research that employ advanced machine learning techniques, such as gradient boosting algorithms, hyperparameter optimization using genetic algorithms, and explainable AI techniques, to analyze the role of input variables. These models will be supplemented with an input variable representing a score generated by the BERT model, which indicates the probability of default based on the loan description. Our work demonstrates that descriptions contain useful information for predicting default, and their inclusion significantly improves the performance of creditgranting models.
Description
Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Ingeniería de Software e Inteligencia Artificial (ISIA), Curso 2022/2023.