%0 Thesis
%A Bohnensteffen,&#x20;Sarah
%T Selective&#x20;Data&#x20;Editing&#x20;of&#x20;Continuous&#x20;Variables&#x20;with&#x20;Random&#x20;Forests&#x20;in&#x20;Official&#x20;Statistics
%D 2020
%U https:&#x2F;&#x2F;hdl.handle.net&#x2F;20.500.14352&#x2F;9156
%X Technological&#x20;advances&#x20;and&#x20;new&#x20;demands&#x20;due&#x20;to&#x20;economic&#x20;and&#x20;socio-cultural&#x20;changes&#x20;regularly&#x20;challenge&#x20;the&#x20;National&#x20;Statistical&#x20;Institutes&#x20;to&#x20;adapt&#x20;to&#x20;their&#x20;evolving&#x20;environment.&#x20;The&#x20;application&#x20;of&#x20;machine&#x20;learning&#x20;methods&#x20;as&#x20;important&#x20;and&#x20;promising&#x20;tools&#x20;for&#x20;official&#x20;statistics&#x20;are&#x20;discussed&#x20;in&#x20;the&#x20;context&#x20;of&#x20;these&#x20;changes,&#x20;in&#x20;the&#x20;context&#x20;of&#x20;opportunities&#x20;arising&#x20;from&#x20;new&#x20;digital&#x20;data&#x20;sources,&#x20;and&#x20;considering&#x20;the&#x20;difficult&#x20;task&#x20;of&#x20;having&#x20;to&#x20;balance&#x20;a&#x20;variety&#x20;of&#x20;quality&#x20;requirements&#x20;at&#x20;national&#x20;and&#x20;international&#x20;level.&#x20;Selective&#x20;statistical&#x20;data&#x20;editing&#x20;is&#x20;an&#x20;approach&#x20;to&#x20;detect&#x20;influential&#x20;units&#x20;and&#x20;select&#x20;them&#x20;for&#x20;manual&#x20;follow&#x20;up&#x20;in&#x20;order&#x20;to&#x20;make&#x20;the&#x20;process&#x20;more&#x20;efficient.&#x20;In&#x20;this&#x20;thesis,&#x20;a&#x20;simple&#x20;and&#x20;a&#x20;two-step&#x20;approach&#x20;are&#x20;developed&#x20;to&#x20;apply&#x20;random&#x20;forests&#x20;to&#x20;selective&#x20;editing&#x20;of&#x20;continuous&#x20;variables&#x20;in&#x20;the&#x20;context&#x20;of&#x20;short-term&#x20;business&#x20;survey&#x20;data.&#x20;We&#x20;present&#x20;a&#x20;score&#x20;function&#x20;based&#x20;on&#x20;decision&#x20;forest&#x20;models&#x20;which&#x20;allows&#x20;for&#x20;an&#x20;efficient&#x20;selection&#x20;of&#x20;units&#x20;relevant&#x20;for&#x20;the&#x20;estimation&#x20;of&#x20;the&#x20;final&#x20;estimates.&#x20;The&#x20;approach&#x20;is&#x20;found&#x20;to&#x20;be&#x20;applicable&#x20;also&#x20;at&#x20;the&#x20;disaggregated&#x20;levels&#x20;of&#x20;the&#x20;autonomous&#x20;communities&#x20;and&#x20;economic&#x20;branches.
%X El&#x20;avance&#x20;tecnológico&#x20;y&#x20;nuevas&#x20;demandas&#x20;debidas&#x20;a&#x20;cambios&#x20;económicos&#x20;y&#x20;socioculturales&#x20;desafían&#x20;regularmente&#x20;a&#x20;los&#x20;Institutos&#x20;Nacionales&#x20;de&#x20;Estadística&#x20;a&#x20;adaptarse&#x20;a&#x20;su&#x20;entorno&#x20;en&#x20;constante&#x20;evolución.&#x20;La&#x20;aplicación&#x20;de&#x20;métodos&#x20;de&#x20;aprendizaje&#x20;automático&#x20;como&#x20;instrumentos&#x20;importantes&#x20;y&#x20;prometedores&#x20;para&#x20;las&#x20;estadísticas&#x20;oficiales&#x20;se&#x20;analizan&#x20;en&#x20;el&#x20;contexto&#x20;de&#x20;esos&#x20;cambios,&#x20;en&#x20;el&#x20;contexto&#x20;de&#x20;las&#x20;oportunidades&#x20;que&#x20;surgen&#x20;de&#x20;nuevas&#x20;fuentes&#x20;de&#x20;datos&#x20;digitales,&#x20;y&#x20;teniendo&#x20;en&#x20;cuenta&#x20;la&#x20;difícil&#x20;tarea&#x20;de&#x20;tener&#x20;que&#x20;equilibrar&#x20;una&#x20;variedad&#x20;de&#x20;requisitos&#x20;de&#x20;calidad&#x20;a&#x20;nivel&#x20;nacional&#x20;e&#x20;internacional.&#x20;La&#x20;depuración&#x20;selectiva&#x20;es&#x20;un&#x20;conjunto&#x20;de&#x20;técnicas&#x20;para&#x20;detectar&#x20;unidades&#x20;influyentes&#x20;y&#x20;seleccionarlas&#x20;para&#x20;el&#x20;seguimiento&#x20;manual&#x20;a&#x20;fin&#x20;de&#x20;hacer&#x20;el&#x20;proceso&#x20;más&#x20;eficiente.&#x20;En&#x20;este&#x20;trabajo&#x20;se&#x20;desarrolla&#x20;un&#x20;enfoque&#x20;simple&#x20;y&#x20;uno&#x20;en&#x20;dos&#x20;etapas&#x20;para&#x20;aplicar&#x20;los&#x20;bosques&#x20;aleatorios&#x20;a&#x20;la&#x20;depuración&#x20;selectiva&#x20;de&#x20;variables&#x20;continuas&#x20;en&#x20;el&#x20;contexto&#x20;de&#x20;datos&#x20;de&#x20;encuestas&#x20;económicas&#x20;coyunturales.&#x20;Se&#x20;presenta&#x20;una&#x20;función&#x20;de&#x20;puntuación&#x20;basada&#x20;en&#x20;modelos&#x20;de&#x20;bosques&#x20;aleatorios&#x20;que&#x20;permite&#x20;una&#x20;selección&#x20;eficiente&#x20;de&#x20;unidades&#x20;relevantes&#x20;para&#x20;la&#x20;estimación&#x20;de&#x20;los&#x20;agregados&#x20;finales.&#x20;El&#x20;enfoque&#x20;desarrollado&#x20;también&#x20;es&#x20;aplicable&#x20;a&#x20;los&#x20;niveles&#x20;desagregados&#x20;de&#x20;las&#x20;comunidades&#x20;autónomas&#x20;y&#x20;ramas&#x20;de&#x20;negocio&#x20;para&#x20;los&#x20;datos&#x20;usados.
%~