Herramienta de anonimización y conversión de voz con Inteligencia Artificial
Loading...
Official URL
Full text at PDC
Publication date
2025
Authors
Advisors (or tutors)
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Citation
Abstract
Este Trabajo contiene el proceso de creación de una herramienta con la función de generar audios con la información privada protegida usado varias técnicas de Inteligencia Artificial. Para el objetivo descrito anteriormente, se llevó a cabo una extensa investigación documentada en este documento que ayudó a profundizar sobre las técnicas usadas como Transcripción de Audio, Reconocimiento de Entidades en Texto y Generación de Audio y los mejores modelos producidos por otros investigadores. Con estos conocimientos se concluyó que una buena estrategia para solucionar el problema sería crear un sistema conformado por varios modelos con las funciones de transcribir el audio a texto, encontrar información personal del texto y censurarla y generar el audio de nuevo con el texto anonimizado y con la posibilidad de cambiar la voz para más privacidad. Para conseguir esto, se probó diversos modelos en cada ámbito comparando su rendimiento para elegir los mejores en cada tarea y se entrenaron otros modelos que dieron mejor eficiencia en cuanto al tiempo de procesado. Por ´ultimo, se unific´o la herramienta final y se cre´o con diversos audios y transcripciones un conjunto de datos nuevo para medir el rendimiento y la eficiencia en anonimizar la información privada de los audios. Los resultados de las pruebas demuestran la generación de audio exitosa con el 86 % de los datos privados censurados, un EER variable según la elección de voz (4 % con la misma voz, 90 % con voz diferente) y valores de MOS próximos a 4 sobre 5 en la calidad del audio.
This document presents the process of creating a tool designed to generate audio outputs with protected private information, employing various techniques from Artificial Intelligence. To achieve this goal, an extensive investigation was conducted and documented in this report, which deepened the understanding of techniques such as Audio Transcription, Text Entity Recognition, and Audio Generation, as well as the leading models developed by other researchers. From this knowledge, it was concluded that an effective strategy to address the problem would be to build a system composed of multiple models with the three different functions. Transcribe the audio into text, identify personal information within the text and censor it and synthesize the audio from the anonymized text, with the option to change the voice for enhanced privacy. To accomplish this, various models with those qualities were tested, comparing their performance to select the best for each task, and additional models were trained to achieve greater processing-time efficiency. Lastly, the complete tool was integrated, and using a variety of audio samples and transcripts, new datasets were created to evaluate the system’s performance and efficiency in anonymizing private information in audio. The results demonstrate successful audio generation with 86 % of private data censored, an optional EER depending on the voice choice (4 % for the same voice, 90 % for a different voice), and MOS values near 4 out of 5 in audio quality.
This document presents the process of creating a tool designed to generate audio outputs with protected private information, employing various techniques from Artificial Intelligence. To achieve this goal, an extensive investigation was conducted and documented in this report, which deepened the understanding of techniques such as Audio Transcription, Text Entity Recognition, and Audio Generation, as well as the leading models developed by other researchers. From this knowledge, it was concluded that an effective strategy to address the problem would be to build a system composed of multiple models with the three different functions. Transcribe the audio into text, identify personal information within the text and censor it and synthesize the audio from the anonymized text, with the option to change the voice for enhanced privacy. To accomplish this, various models with those qualities were tested, comparing their performance to select the best for each task, and additional models were trained to achieve greater processing-time efficiency. Lastly, the complete tool was integrated, and using a variety of audio samples and transcripts, new datasets were created to evaluate the system’s performance and efficiency in anonymizing private information in audio. The results demonstrate successful audio generation with 86 % of private data censored, an optional EER depending on the voice choice (4 % for the same voice, 90 % for a different voice), and MOS values near 4 out of 5 in audio quality.
Description
Trabajo de Fin de Grado en Ingeniería de Computadores, Facultad de Informática UCM, Departamento de Ingeniería des Software e Inteligencia Artificial, Curso 2024/2025












