exTRAE : clasificación de tuits dirigidos a la RAE mediante las herramientas de modelado de tópicos LSA y LDA
Loading...
Official URL
Full text at PDC
Publication date
2021
Defense date
16/07/2021
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
RESUMEN: Este trabajo pretende ofrecer un método informático que clasifique los tuits recibidos por la Real Academia Española (RAE) en su cuenta de Twitter @RAEinforma. Dada la enorme cantidad de tuits que muestran sus dudas sobre cuestiones lingüísticas, es necesario implementar métodos informáticos que ayuden al manejo de tales datos. A lo largo de este trabajo, trataremos de vislumbrar cuál es la actual situación de las instituciones en el ámbito digital y pondremos especialmente el foco en la RAE y su labor en Twitter. Luego, explicaremos «topic modeling» y dos de sus métodos: Latent Semantic Allocation y Latent Dirichlet Allocation. Ambos serán empleados para la clasificación de un corpus de más de nueve mil tuits. Concluiremos llevando a cabo un «test» con el que comprobar el éxito de los resultados.
ABSTRACT: This dissertation aims to provide a programming method to classify the tweets received by the Spanish Royal Academy (RAE) throughout its Twitter account @RAEinforma. Due to the enormous quantity of tweets wondering about linguistic inquiries, it is necessary to implement computer methods that help to manage these data. Throughout the paper, we will shed some light on institutions’ current digital situation, specifically focusing on RAE and its role on Twitter. Then, we will get immersed into topic modeling and the two of its methods: Latent Semantic Allocation and Latent Dirichlet Allocation. We will apply them to a corpus of more than nine thousand tweets. A final test using added tweets will be performed to check if our results turn out to be successful.
ABSTRACT: This dissertation aims to provide a programming method to classify the tweets received by the Spanish Royal Academy (RAE) throughout its Twitter account @RAEinforma. Due to the enormous quantity of tweets wondering about linguistic inquiries, it is necessary to implement computer methods that help to manage these data. Throughout the paper, we will shed some light on institutions’ current digital situation, specifically focusing on RAE and its role on Twitter. Then, we will get immersed into topic modeling and the two of its methods: Latent Semantic Allocation and Latent Dirichlet Allocation. We will apply them to a corpus of more than nine thousand tweets. A final test using added tweets will be performed to check if our results turn out to be successful.