Automatic analysis of high dimensional categorical variables in medical databases for the prediction of hospital bacteremia

Rey García, Jaime del

Automatic analysis of high dimensional categorical variables in medical databases for the prediction of hospital bacteremia

dc.contributor.advisor	Garnica Alcázar, Óscar
dc.contributor.advisor	Ruiz Giardín, José Manuel
dc.contributor.author	Rey García, Jaime del
dc.date.accessioned	2023-06-17T10:57:12Z
dc.date.available	2023-06-17T10:57:12Z
dc.date.issued	2021
dc.degree.title	Grado en Ingeniería Informática
dc.description	Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Arquitectura de Computadores y Automática, Curso 2020/2021.
dc.description.abstract	This project aims to continue and consolidate the study for the bacteriemia detection process and its diagnosis carried out by some faculty companions last year. A first glance through the analysis of numerical variables allowed a deeper understanding and the trace of an approach for a quick detection model. Now, categorical variables take relevance too in order to successfully achieve higher results in the classifier models. The addition of categorical variables in classifier models has been around for at least five years due to the increase in computational capacity, and the benefits in the classifiers as direct consequence is clear. Yet, it is proven that, as complex and abstract as language is, classifiers do struggle when data with slang or abbreviations comes up for prediction, even if its linguistic register is heavily bounded, i.e. when strictly related to medical issues data is treated. Throughout the study we will apply text cleaning and text processing methods to prepare the variables for use, since their format is heterogeneous and unsuitable to be processed by Machine Learning tools. We will also apply the string similarity method to identify all those classes that can help in the algorithm classification process and we will assess the most suitable types of encoding for working with these variables. Finally, we will apply the Random Forest Machine Learning algorithm on the set with techniques that allow us to avoid data learning bias and we will assess the results in terms of the success rates and the relevance of the variables in the decision-making process of the algorithm.
dc.description.department	Depto. de Arquitectura de Computadores y Automática
dc.description.faculty	Fac. de Informática
dc.description.refereed	TRUE
dc.description.status	unpub
dc.eprint.id	https://eprints.ucm.es/id/eprint/74572
dc.identifier.uri	https://hdl.handle.net/20.500.14352/10603
dc.language.iso	eng
dc.page.total	65
dc.rights	Atribución-NoComercial 3.0 España
dc.rights.accessRights	open access
dc.rights.uri	https://creativecommons.org/licenses/by-nc/3.0/es/
dc.subject.cdu	004(043.3)
dc.subject.keyword	Bacteremia
dc.subject.keyword	Comorbidity
dc.subject.keyword	Predictive medicine
dc.subject.keyword	Pathogenesis
dc.subject.keyword	Dataframe
dc.subject.keyword	Dirty category
dc.subject.keyword	String similarity
dc.subject.keyword	One hot encoding
dc.subject.keyword	Adjacency matrix
dc.subject.keyword	Adjacency list
dc.subject.keyword	Binary encoding
dc.subject.keyword	K-Nearest Neighbors (KNN)
dc.subject.keyword	Bias and Variance
dc.subject.keyword	K-Fold Cross Validation
dc.subject.keyword	Random forest
dc.subject.keyword	ROC
dc.subject.keyword	SHAP
dc.subject.ucm	Informática (Informática)
dc.subject.unesco	1203.17 Informática
dc.title	Automatic analysis of high dimensional categorical variables in medical databases for the prediction of hospital bacteremia
dc.title.alternative	Análisis automático de variables categóricas de alta dimensionalidad en bases de datos médicas para la predicción de bacteriemias hospitalarias
dc.type	bachelor thesis
dspace.entity.type	Publication

Download

Original bundle

Now showing 1 - 1 of 1

Name:: REY GARCÍA 82332_JAIME_DEL_REY_GARCIA_Analisis_automatico_de_variables_categoricas_de_alta_dimensionalidad_en_bases_de_datos_medicas_para_la_prediccion_de_1000412445.pdf
Size:: 1.25 MB
Format:: Adobe Portable Document Format

Download

Collections

Trabajos Fin de Grado (TFG) y Diplomas de Estudios Avanzados (DEA)