A deep learning approach for automatically generating descriptions of images containing people

Aracil Muñoz, Marta

A deep learning approach for automatically generating descriptions of images containing people

dc.contributor.advisor	Méndez Pozo, Gonzalo
dc.contributor.advisor	Hervás Ballesteros, Raquel
dc.contributor.author	Aracil Muñoz, Marta
dc.date.accessioned	2023-06-17T14:59:32Z
dc.date.available	2023-06-17T14:59:32Z
dc.date.issued	2018-09
dc.degree.title	Grado en Ingeniería Informática
dc.description	Universidad Complutense, Facultad de Informática. Departamento de Ingeniería del Software e Inteligencia Artificial, curso 2017/2018
dc.description.abstract	Generating image descriptions is a challenging Artificial Intelligence problem with many interesting applications such as robots’ communication or helping visually impaired people. However, it is a complex task for computers: it requires Computer Vision algorithms, to understand what the image depicts, and Natural Language Processing algorithms, to generate a well-formed sentence. Nowadays, deep neural networks are the state-of-the-art in these two Artificial Intelligence fields. Furthermore, we believe that images that contain people are described in a slightly different manner and that restricting an image description generator model to these images may produce better descriptions. Therefore, the main objective of this project is to develop a Deep Learning model that automatically produces descriptions of images containing people and to conclude if it is a good practice the restriction to this kind of images. For this purpose, we have reviewed and studied the literature in the field and we have built, trained and compared four different models using Deep Learning techniques and a GPU to speed-up the computation, as well as a big and complete dataset.
dc.description.abstract	Generar descripciones de imágenes es un problema de Inteligencia Artificial con muchas aplicaciones interesantes como la comunicación de robots o ayudar a personas con discapacidad visual. Sin embargo, es una tarea compleja para un ordenador: requiere algoritmos de visión por computador para entender lo que la imagen representa y algoritmos de procesamiento de lenguaje natural para generar una frase bien formada. Hoy en día, las redes neuronales profundas son el estado del arte en estos dos campos de la Inteligencia Artificial. Por otra parte, creemos que las imágenes que contienen personas se describen de manera ligeramente diferente y que restringir un modelo de generación de descripciones de imágenes a imágenes de este tipo puede producir mejores descripciones. Por lo tanto, el principal objetivo de este proyecto es desarrollar un modelo de aprendizaje profundo que produce automáticamente descripciones de imágenes que contienen personas y concluir si es una buena práctica la restricción a esta clase de imágenes. Para ello, hemos revisado y estudiado la literatura y hemos construido, entrenado y comparado cuatro modelos diferentes usando técnicas de aprendizaje profundo y una GPU para acelerar los cálculos, así como un dataset grande y completo.
dc.description.department	Depto. de Ingeniería de Software e Inteligencia Artificial (ISIA)
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	unpub
dc.eprint.id	https://eprints.ucm.es/id/eprint/50248
dc.identifier.uri	https://hdl.handle.net/20.500.14352/15119
dc.language.iso	eng
dc.page.total	76
dc.rights	Atribución-NoComercial 3.0 España
dc.rights.accessRights	open access
dc.rights.uri	https://creativecommons.org/licenses/by-nc/3.0/es/
dc.subject.cdu	004(043.3)
dc.subject.keyword	Deep Learning
dc.subject.keyword	Computer Vision
dc.subject.keyword	Natural Language Processing
dc.subject.keyword	Iimage description generation
dc.subject.keyword	Keras
dc.subject.keyword	GPU
dc.subject.keyword	Dataset
dc.subject.keyword	Aprendizaje profundo
dc.subject.keyword	Visión por computador
dc.subject.keyword	Procesamiento de lenguaje natural
dc.subject.keyword	Generación de descripciones de imágenes
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	1203.17 Informática
dc.title	A deep learning approach for automatically generating descriptions of images containing people
dc.type	bachelor thesis
dspace.entity.type	Publication
relation.isAdvisorOfPublication	bdd570a9-0372-451a-9992-e7f9cfb22e71
relation.isAdvisorOfPublication.latestForDiscovery	bdd570a9-0372-451a-9992-e7f9cfb22e71

Download

Original bundle

Now showing 1 - 1 of 1

Name:: 128.pdf
Size:: 3.51 MB
Format:: Adobe Portable Document Format

Download

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)