Applied speech emotion recognition on a serverless Cloud architecture

Farzan Rodríguez, Robert

Applied speech emotion recognition on a serverless Cloud architecture

dc.contributor.advisor	Vázquez Poletti, José Luis
dc.contributor.author	Farzan Rodríguez, Robert
dc.date.accessioned	2023-06-16T13:24:06Z
dc.date.available	2023-06-16T13:24:06Z
dc.date.issued	2022
dc.degree.title	Grado en Ingeniería Informática
dc.description	Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2021-2022. The source code of this project can be found both in GitHub and Google Drive: https://github.com/RobertFarzan/Speech-Emotion-Recognition-system https://drive.google.com/file/d/1XobYLxcARE73EFwZ3VUr6Po7vum42ajh/view?usp=sharing
dc.description.abstract	The purpose of this final degree thesis Applied speech emotion recognition on a serverless Cloud architecture is to do research into emotion recognition on human voice through several techniques including audio signal processing and deep learning technologies to classify a certain emotion detected on a piece of audio, as well as finding ways to deploy this functionality on Cloud (serverless). From there we can get a brief implementation of a streaming nearly real-time system in which an end user could record audio and retrieve responses of the emotions continuously. The idea intends to be a "emotion tracking system" that couples the technologies mentioned above along with a simple end-user GUI app that anyone could use purposefully to track their own voices in different situations - during a call, a meeting etc. - and get a brief summary visualization of their emotions across time with just a quick glance. This prototype seems to be one of the first software products of its kind, as there is a lot of literature on the Internet on Speech Emotion Recognition and tools for software engineers to facilitate this task but an easy final user product or solution for real-time SER appears to be non-existent. As a short summary of the project road map and the technologies involved, the process is as follows: development of a CNN model on Tensorflow 2.0 (with Python) to get emotion labels as output from a short chunk of audio as input; deployment of a Python script that uses this previously mentioned CNN model to return the emotion predictions in AWS Lambda (the Amazon service for serverless Cloud); and finally the design of a Python app with GUI integrated to send requests to the Lambda service and retrieve the responses with emotion predictions to present them with beautiful visualizations.
dc.description.abstract	El propósito de este TFG Reconocimiento de emociones de la voz aplicado sobre una arquitectura Clous serverless es investigar el reconocimiento de emociones en la voz humana usando diversas técnicas, entre las que se incluye el procesamiento de señal y deep learning para clasificar una cierta emoción en una pieza de audio, así como encontrar maneras de desplegar esta funcionalidad en el Cloud (serverless). A partir de estos pasos se podrá obtener una implementación de un sistema en streaming en tiempo cuasi real, en el que un usuario pueda grabarse a sí mismo y recibir respuestas cronológicas sobre su estado de ánimo continuamente. Esta idea trata de ser un "sistema monitor de emociones", que envuelva las tecnologías mencionadas arriba junto con una simple interfaz gráfica de usuario que cualquiera pueda usar para monitorizar intencionadamente su voz en diferentes situaciones - durante una llamada, una reunión etc. - y obtener una breve visualización de sus emociones a lo largo del tiempo en un simple vistazo. Este prototipo apunta a ser una de las primeras soluciones software de este tipo, ya que a pesar de haber mucha literatura en Internet acerca de Speech Emotion Recognition y herramientas para desarrolladores en esta tarea, parece no haber productos o soluciones de SER en tiempo real para usuarios. Como breve resumen de la hoja de ruta del proyecto y las tecnologías involucradas, el proceso es el siguiente: desarrollo de una red neuronal convolucional en TensorFlow 2.0 (con Python) para predecir emociones a partir de una pieza de audio como input; despliegue de un script de Python que use la red neuronal para devolver predicciones en AWS Lambda (el servicio de Amazon para serverless); y finalmente el diseño de una aplicación final para usuario en Python que incluya una interfaz gráfica que se conecte con los servicios de Lambda y devuelva respuestas con las predicciones y haga visualizaciones a partir de ellas.
dc.description.department	Depto. de Arquitectura de Computadores y Automática
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	unpub
dc.eprint.id	https://eprints.ucm.es/id/eprint/74519
dc.identifier.relatedurl	https://github.com/RobertFarzan/Speech-Emotion-Recognition-system
dc.identifier.relatedurl	https://drive.google.com/file/d/1XobYLxcARE73EFwZ3VUr6Po7vum42ajh/view?usp=sharing
dc.identifier.uri	https://hdl.handle.net/20.500.14352/3256
dc.language.iso	eng
dc.page.total	73
dc.rights	Atribución-NoComercial 3.0 España
dc.rights.accessRights	open access
dc.rights.uri	https://creativecommons.org/licenses/by-nc/3.0/es/
dc.subject.cdu	004(043.3)
dc.subject.keyword	Serverless
dc.subject.keyword	CNN
dc.subject.keyword	Artificial Intelligence
dc.subject.keyword	Cloud
dc.subject.keyword	Python
dc.subject.keyword	Tensorflow
dc.subject.keyword	AWS
dc.subject.keyword	Speech Emotion Recognition
dc.subject.keyword	GUI
dc.subject.keyword	Inteligencia Artificial
dc.subject.keyword	Interfaz Gráfica de Usuario (GUI)
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	1203.17 Informática
dc.title	Applied speech emotion recognition on a serverless Cloud architecture
dc.title.alternative	Reconocimiento de emociones de la voz aplicado sobre una arquitectura Cloud serverless
dc.type	bachelor thesis
dspace.entity.type	Publication
relation.isAdvisorOfPublication	d3c2b5a8-3672-4a45-b84e-cbd3ba076155
relation.isAdvisorOfPublication.latestForDiscovery	d3c2b5a8-3672-4a45-b84e-cbd3ba076155

Download

Original bundle

Now showing 1 - 1 of 1

Name:: FARZAN RODRÍGUEZ 54079_ROBERT_FARZAN_RODRIGUEZ_Reconocimiento_de_emociones_de_la_voz_aplicado_sobre_una_arquitectura_Cloud_serverless_1398832_1444014109.pdf
Size:: 2.97 MB
Format:: Adobe Portable Document Format

Download

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)