Applied speech emotion recognition on a serverless Cloud architecture

Farzan Rodríguez, Robert

Applied speech emotion recognition on a serverless Cloud architecture

Download

FARZAN RODRÍGUEZ 54079_ROBERT_FARZAN_RODRIGUEZ_Reconocimiento_de_emociones_de_la_voz_aplicado_sobre_una_arquitectura_Cloud_serverless_1398832_1444014109.pdf (2.97 MB)

Publication date

2022

Authors

Farzan Rodríguez, Robert

Advisors (or tutors)

Vázquez Poletti, José Luis

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/3256

Abstract

The purpose of this final degree thesis Applied speech emotion recognition on a serverless Cloud architecture is to do research into emotion recognition on human voice through several techniques including audio signal processing and deep learning technologies to classify a certain emotion detected on a piece of audio, as well as finding ways to deploy this functionality on Cloud (serverless). From there we can get a brief implementation of a streaming nearly real-time system in which an end user could record audio and retrieve responses of the emotions continuously. The idea intends to be a "emotion tracking system" that couples the technologies mentioned above along with a simple end-user GUI app that anyone could use purposefully to track their own voices in different situations - during a call, a meeting etc. - and get a brief summary visualization of their emotions across time with just a quick glance. This prototype seems to be one of the first software products of its kind, as there is a lot of literature on the Internet on Speech Emotion Recognition and tools for software engineers to facilitate this task but an easy final user product or solution for real-time SER appears to be non-existent. As a short summary of the project road map and the technologies involved, the process is as follows: development of a CNN model on Tensorflow 2.0 (with Python) to get emotion labels as output from a short chunk of audio as input; deployment of a Python script that uses this previously mentioned CNN model to return the emotion predictions in AWS Lambda (the Amazon service for serverless Cloud); and finally the design of a Python app with GUI integrated to send requests to the Lambda service and retrieve the responses with emotion predictions to present them with beautiful visualizations.
El propósito de este TFG Reconocimiento de emociones de la voz aplicado sobre una arquitectura Clous serverless es investigar el reconocimiento de emociones en la voz humana usando diversas técnicas, entre las que se incluye el procesamiento de señal y deep learning para clasificar una cierta emoción en una pieza de audio, así como encontrar maneras de desplegar esta funcionalidad en el Cloud (serverless). A partir de estos pasos se podrá obtener una implementación de un sistema en streaming en tiempo cuasi real, en el que un usuario pueda grabarse a sí mismo y recibir respuestas cronológicas sobre su estado de ánimo continuamente. Esta idea trata de ser un "sistema monitor de emociones", que envuelva las tecnologías mencionadas arriba junto con una simple interfaz gráfica de usuario que cualquiera pueda usar para monitorizar intencionadamente su voz en diferentes situaciones - durante una llamada, una reunión etc. - y obtener una breve visualización de sus emociones a lo largo del tiempo en un simple vistazo. Este prototipo apunta a ser una de las primeras soluciones software de este tipo, ya que a pesar de haber mucha literatura en Internet acerca de Speech Emotion Recognition y herramientas para desarrolladores en esta tarea, parece no haber productos o soluciones de SER en tiempo real para usuarios. Como breve resumen de la hoja de ruta del proyecto y las tecnologías involucradas, el proceso es el siguiente: desarrollo de una red neuronal convolucional en TensorFlow 2.0 (con Python) para predecir emociones a partir de una pieza de audio como input; despliegue de un script de Python que use la red neuronal para devolver predicciones en AWS Lambda (el servicio de Amazon para serverless); y finalmente el diseño de una aplicación final para usuario en Python que incluya una interfaz gráfica que se conecte con los servicios de Lambda y devuelva respuestas con las predicciones y haga visualizaciones a partir de ellas.

Description

Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2021-2022. The source code of this project can be found both in GitHub and Google Drive: https://github.com/RobertFarzan/Speech-Emotion-Recognition-system https://drive.google.com/file/d/1XobYLxcARE73EFwZ3VUr6Po7vum42ajh/view?usp=sharing

UCM subjects

Informática (Informática)

Unesco subjects

1203.17 Informática

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)

Full item page

Applied speech emotion recognition on a serverless Cloud architecture

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections