Simplificación de textos basada en deep-learning
Loading...
Official URL
Full text at PDC
Publication date
2024
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Los textos cotidianos pueden ser difíciles de entender para algunos grupos sociales debido a diversas razones, como un nivel educativo bajo, el envejecimiento, la discapacidad intelectual o los trastornos de aprendizaje. Para facilitar a estos grupos el acceso a la información surge la simplificación de textos. La simplificación de textos se entiende como el proceso de transformar un texto en uno equivalente pero más sencillo de comprender. Este proceso incluye varias tareas como la división de oraciones complejas en otras más simples y la sustitución del vocabulario complejo por un vocabulario más simple y cotidiano. Tradicionalmente, la simplificación de textos se realizaba de forma manual por editores con conocimiento acerca de las pautas de simplificación, pero en términos de tiempo y esfuerzo, la simplificación manual de textos es una tarea costosa, sobre todo ahora cuando la información que se genera es constante. Con el objetivo de agilizar la tarea de simplificación, surge la idea de automatizar parte del trabajo, dando lugar a la simplificación automática de textos. Los modelos de lenguaje juegan un papel importante en esta tarea pues actualmente son la base de las técnicas del Procesamiento del Lenguaje Natural. En este TFG analizamos los corpora y los Grandes Modelos de Lenguaje que existen actualmente para la simplificación de textos en castellano. Tras este análisis concluimos que la elección de un corpus específico influye en la tarea de simplificación que se esté estudiando en cada caso, pues cada corpus está creado para una tarea de simplificación concreta. Para seleccionar el mejor modelo para cada tarea de simplificación, realizamos un estudio experimental en el que, mediante métricas de evaluación, evaluamos el rendimiento de cada modelo sobre cada corpus. Finalmente, y para poner en práctica los modelos estudiados, creamos una aplicación web que simplifica textos en español teniendo en cuenta los distintos tipos de simplificaciones y las conclusiones obtenidas durante el estudio de los corpora y los modelos.
Everyday texts can be difficult to understand for some social groups due to various reasons, such as low educational level, aging, intellectual disability, or learning disorders. To facilitate access to information for these groups, text simplification arises. Text simplification is understood as the process of transforming a text into an equivalent but simpler one to understand. This process includes various tasks such as dividing complex sentences into simpler ones and replacing complex vocabulary with simpler and more everyday vocabulary. Traditionally, text simplification was carried out manually by editors with knowledge of simplification guidelines, but in terms of time and effort, manual text simplification is an expensive task, especially now when information is constantly being generated. With the aim of streamlining the simplification task, the idea of automating part of the work arises, giving rise to automatic text simplification. Language models play an important role in this task as they are currently the basis of Natural Language Processing techniques. In this work we analyze the corpora and Large Language Models that currently exist for text simplification in Spanish. After this analysis, we conclude that the choice of a specific corpus influences the simplification task that is being studied in each case, since each corpus is created for a specific simplification task. To select the best model for each simplification task, we carry out an experimental study in which, using evaluation metrics, we evaluate the performance of each model on each corpus. Finally, and to put into practice the studied models, we create a web application that simplifies texts in Spanish taking into account the different types of simplifications and the conclusions obtained during the study of the corpora and models.
Everyday texts can be difficult to understand for some social groups due to various reasons, such as low educational level, aging, intellectual disability, or learning disorders. To facilitate access to information for these groups, text simplification arises. Text simplification is understood as the process of transforming a text into an equivalent but simpler one to understand. This process includes various tasks such as dividing complex sentences into simpler ones and replacing complex vocabulary with simpler and more everyday vocabulary. Traditionally, text simplification was carried out manually by editors with knowledge of simplification guidelines, but in terms of time and effort, manual text simplification is an expensive task, especially now when information is constantly being generated. With the aim of streamlining the simplification task, the idea of automating part of the work arises, giving rise to automatic text simplification. Language models play an important role in this task as they are currently the basis of Natural Language Processing techniques. In this work we analyze the corpora and Large Language Models that currently exist for text simplification in Spanish. After this analysis, we conclude that the choice of a specific corpus influences the simplification task that is being studied in each case, since each corpus is created for a specific simplification task. To select the best model for each simplification task, we carry out an experimental study in which, using evaluation metrics, we evaluate the performance of each model on each corpus. Finally, and to put into practice the studied models, we create a web application that simplifies texts in Spanish taking into account the different types of simplifications and the conclusions obtained during the study of the corpora and models.
Description
Trabajo de Fin de Doble Grado en Administración y Dirección de Empresas e Ingeniería Informática, Facultad de Informática UCM, Departamento de Ingeniería del Software e Inteligencia Artificial, Curso 2023/2024.
La aplicación final está disponible en https://simplificacion.pythonanywhere.com/ y el código desarrollado se puede consultar en https://github.com/XinxiangZ/tfg.