Identificación de idiomas mediante técnicas procesamiento de lenguaje natural
Loading...
Official URL
Full text at PDC
Publication date
2023
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
El presente proyecto, consiste en el análisis, diseño e implementación de un sistema que identifique el lenguaje en el que se ha escrito un texto. El objetivo es el de comparar diferentes implementaciones del algoritmo para evaluar la rapidez y eficiencia sobre cada idioma en el alcance.
Para ello se dispone de múltiples textos escritos en diferentes idiomas europeos (obtenidos de la base de datos del Parlamento Europeo) con los que trabajar a lo largo del proceso.
Así pues, el proyecto consta de dos partes. Por un lado, se elaborarán programas que adaptarán los textos originales a un formato entendible por los algoritmos de detección de idiomas escogidos, y por otro se realizarán pruebas de tiempo y eficiencia sobre los algoritmos de detección de idiomas para evaluar su potencia a la hora de detectar los diferentes idiomas del alcance.
Tanto los algoritmos de detección como el programa de adaptación de textos estarán escritos en el lenguaje de programación Python.
The present project consists of the analysis, design and implementation of a system that identifies the language in which a text has been written. The objective is to compare different implementations of the algorithm to evaluate the speed and efficiency over each language in scope. For this purpose, multiple texts written in different European languages (obtained from the database of the European Parliament) are available to work with throughout the process. Thus, the project consists of two parts. On one hand, programs will be developed to adapt the original texts to a format understandable by the chosen language detection algorithms, and on the other hand, speed and efficiency tests will be performed on the language detection algorithms to evaluate their power in detecting the different languages within the scope. Both the detection algorithms and the text adaptation program will be written in the Python programming language.
The present project consists of the analysis, design and implementation of a system that identifies the language in which a text has been written. The objective is to compare different implementations of the algorithm to evaluate the speed and efficiency over each language in scope. For this purpose, multiple texts written in different European languages (obtained from the database of the European Parliament) are available to work with throughout the process. Thus, the project consists of two parts. On one hand, programs will be developed to adapt the original texts to a format understandable by the chosen language detection algorithms, and on the other hand, speed and efficiency tests will be performed on the language detection algorithms to evaluate their power in detecting the different languages within the scope. Both the detection algorithms and the text adaptation program will be written in the Python programming language.
Description
Trabajo de Fin de Grado en Ingeniería del Software, Facultad de Informática UCM, Departamento de Sistemas Informáticos y Computación, Curso 2022/2023.