%0 Thesis
%A Barreñada&#x20;Taleb,&#x20;Lasai&#x20;Alai
%T Imputación&#x20;de&#x20;datos&#x20;mediante&#x20;Random&#x20;Forest
%D 2021
%U https:&#x2F;&#x2F;hdl.handle.net&#x2F;20.500.14352&#x2F;5138
%X La&#x20;información&#x20;disponible&#x20;es&#x20;cada&#x20;vez&#x20;mayor&#x20;y&#x20;los&#x20;institutos&#x20;de&#x20;estadística&#x20;oficiales&#x20;deben&#x20;hacer&#x20;uso&#x20;de&#x20;esta&#x20;información&#x20;para&#x20;crear&#x20;procesos&#x20;innovadores&#x20;y&#x20;eficaces.&#x20;El&#x20;statistical&#x20;lear­ning&#x20;es&#x20;el&#x20;conjunto&#x20;de&#x20;técnicas&#x20;usadas&#x20;para&#x20;la&#x20;mejor&#x20;comprensión&#x20;de&#x20;los&#x20;datos.&#x20;Los&#x20;random&#x20;forests,&#x20;basados&#x20;en&#x20;un&#x20;ensemble&#x20;de&#x20;árboles&#x20;de&#x20;decisión,&#x20;son&#x20;una&#x20;de&#x20;las&#x20;técnicas&#x20;mas&#x20;utilizadas&#x20;de&#x20;aprendizaje&#x20;supervisado.&#x20;En&#x20;este&#x20;trabajo&#x20;se&#x20;han&#x20;usado&#x20;random&#x20;forests&#x20;para&#x20;la&#x20;imputación&#x20;de&#x20;datos&#x20;en&#x20;encuestas&#x20;económicas&#x20;coyunturales&#x20;y&#x20;mas&#x20;concretamente&#x20;en&#x20;los&#x20;Índices&#x20;de&#x20;Cifras&#x20;de&#x20;Negocios&#x20;de&#x20;la&#x20;Industria.&#x20;La&#x20;imputación&#x20;se&#x20;trata&#x20;del&#x20;proceso&#x20;mediante&#x20;el&#x20;cual&#x20;se&#x20;asigna&#x20;un&#x20;valor&#x20;a&#x20;un&#x20;ítem&#x20;para&#x20;el&#x20;que&#x20;previamente&#x20;no&#x20;se&#x20;tenia&#x20;información.&#x20;En&#x20;este&#x20;estudio&#x20;se&#x20;elabora&#x20;la&#x20;metodología&#x20;para&#x20;la&#x20;imputación&#x20;después&#x20;de&#x20;analizar&#x20;los&#x20;criterios&#x20;de&#x20;calidad&#x20;necesarios&#x20;para&#x20;la&#x20;producción&#x20;de&#x20;una&#x20;estadística&#x20;oficial.&#x20;En&#x20;primer&#x20;lugar&#x20;se&#x20;realiza&#x20;la&#x20;selección&#x20;de&#x20;variables&#x20;o&#x20;feature&#x20;selection&#x20;más&#x20;interesante&#x20;para&#x20;el&#x20;cálculo&#x20;de&#x20;las&#x20;cifras&#x20;de&#x20;negocios.&#x20;Posteriormente,&#x20;se&#x20;aborda&#x20;el&#x20;proceso&#x20;de&#x20;selección&#x20;de&#x20;parámetros&#x20;para&#x20;la&#x20;obtención&#x20;del&#x20;modelo&#x20;óptimo&#x20;de&#x20;bos­ques&#x20;aleatorios&#x20;para&#x20;el&#x20;conjunto&#x20;de&#x20;datos&#x20;seleccionado.&#x20;Finalmente&#x20;se&#x20;realiza&#x20;una&#x20;aplicación&#x20;práctica&#x20;del&#x20;bosque&#x20;aleatorio&#x20;para&#x20;las&#x20;imputaciones&#x20;y&#x20;se&#x20;evalúan&#x20;obteniendo&#x20;un&#x20;resultado&#x20;satisfactorio.
%X The&#x20;amount&#x20;of&#x20;available&#x20;information&#x20;in&#x20;National&#x20;Statistical&#x20;lnstitutes&#x20;is&#x20;increasing&#x20;ra­pidly&#x20;and&#x20;they&#x20;shall&#x20;make&#x20;use&#x20;of&#x20;it&#x20;to&#x20;develop&#x20;innovative&#x20;and&#x20;effective&#x20;processes.&#x20;Statistical&#x20;learning&#x20;is&#x20;the&#x20;set&#x20;of&#x20;techniques&#x20;used&#x20;for&#x20;better&#x20;understanding&#x20;of&#x20;data.&#x20;Random&#x20;Forests,&#x20;ba­sed&#x20;on&#x20;decision&#x20;tree&#x20;ensembles,&#x20;are&#x20;one&#x20;of&#x20;the&#x20;most&#x20;used&#x20;techniques&#x20;of&#x20;supervised&#x20;learning.&#x20;In&#x20;this&#x20;thesis&#x20;Random&#x20;Forest&#x20;have&#x20;been&#x20;used&#x20;to&#x20;impute&#x20;data&#x20;in&#x20;short&#x20;term&#x20;business&#x20;statistics.&#x20;Imputation&#x20;is&#x20;defined&#x20;as&#x20;the&#x20;method&#x20;to&#x20;give&#x20;value&#x20;to&#x20;an&#x20;item&#x20;that&#x20;previously&#x20;was&#x20;missing.&#x20;In&#x20;this&#x20;study&#x20;a&#x20;new&#x20;methodology&#x20;is&#x20;developed&#x20;after&#x20;analysing&#x20;the&#x20;quality&#x20;requirements&#x20;for&#x20;of­ficial&#x20;statistics.&#x20;Firstly,&#x20;the&#x20;feature&#x20;selection&#x20;is&#x20;carried&#x20;out&#x20;in&#x20;order&#x20;to&#x20;get&#x20;the&#x20;set&#x20;of&#x20;variables&#x20;that&#x20;will&#x20;be&#x20;included&#x20;in&#x20;the&#x20;model.&#x20;After&#x20;this,&#x20;the&#x20;tuning&#x20;of&#x20;the&#x20;forests&#x20;is&#x20;carried&#x20;out&#x20;to&#x20;get&#x20;the&#x20;optimum&#x20;forest.&#x20;Finally,&#x20;this&#x20;model&#x20;is&#x20;used&#x20;to&#x20;impute&#x20;the&#x20;missing&#x20;values&#x20;and&#x20;the&#x20;assessment&#x20;of&#x20;the&#x20;accuracy&#x20;of&#x20;the&#x20;estimation&#x20;is&#x20;carried&#x20;out&#x20;having&#x20;satisfactory&#x20;results.
%~