exTRAE : clasificación de tuits dirigidos a la RAE mediante las herramientas de modelado de tópicos LSA y LDA

Hernández de la Cruz, Jose María; Saiz Escobar, Bárbara

exTRAE : clasificación de tuits dirigidos a la RAE mediante las herramientas de modelado de tópicos LSA y LDA

Download

Docta_TFM_Hernández de la Cruz, Jose María y Saiz Escobar, Bárbara_2021.pdf (1.62 MB)

Publication date

2021

Defense date

16/07/2021

Authors

Hernández de la Cruz, Jose María

Saiz Escobar, Bárbara

Advisors (or tutors)

Caballero Roldán, Rafael

Riesco Rodríguez, Adrián

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/118425

Abstract

RESUMEN: Este trabajo pretende ofrecer un método informático que clasifique los tuits recibidos por la Real Academia Española (RAE) en su cuenta de Twitter @RAEinforma. Dada la enorme cantidad de tuits que muestran sus dudas sobre cuestiones lingüísticas, es necesario implementar métodos informáticos que ayuden al manejo de tales datos. A lo largo de este trabajo, trataremos de vislumbrar cuál es la actual situación de las instituciones en el ámbito digital y pondremos especialmente el foco en la RAE y su labor en Twitter. Luego, explicaremos «topic modeling» y dos de sus métodos: Latent Semantic Allocation y Latent Dirichlet Allocation. Ambos serán empleados para la clasificación de un corpus de más de nueve mil tuits. Concluiremos llevando a cabo un «test» con el que comprobar el éxito de los resultados.
ABSTRACT: This dissertation aims to provide a programming method to classify the tweets received by the Spanish Royal Academy (RAE) throughout its Twitter account @RAEinforma. Due to the enormous quantity of tweets wondering about linguistic inquiries, it is necessary to implement computer methods that help to manage these data. Throughout the paper, we will shed some light on institutions’ current digital situation, specifically focusing on RAE and its role on Twitter. Then, we will get immersed into topic modeling and the two of its methods: Latent Semantic Allocation and Latent Dirichlet Allocation. We will apply them to a corpus of more than nine thousand tweets. A final test using added tweets will be performed to check if our results turn out to be successful.

UCM subjects

Lingüística, Informática (Filología), Inteligencia artificial (Informática)

Unesco subjects

57 Lingüística, 5701.04 Lingüística Informatizada, 1203.04 Inteligencia Artificial

Collections

Trabajos Fin de Master (TFM)

Full item page

exTRAE : clasificación de tuits dirigidos a la RAE mediante las herramientas de modelado de tópicos LSA y LDA

Download

Official URL

Full text at PDC

Publication date

Defense date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections