Complexity analysis and practical resolution of the data classification problem with private characteristics
| dc.contributor.author | Pantoja, David | |
| dc.contributor.author | Rodríguez Laguna, Ismael | |
| dc.contributor.author | Rubio Díez, Fernando | |
| dc.contributor.author | Segura Díaz, Clara María | |
| dc.date.accessioned | 2025-05-19T14:38:02Z | |
| dc.date.available | 2025-05-19T14:38:02Z | |
| dc.date.issued | 2025-05-08 | |
| dc.description.abstract | In this work we analyze the problem of, given the probability distribution of a population, questioning an unknown individual that is representative of the distribution so that our uncertainty about certain characteristics is significantly reduced—but the uncertainty about others, deemed private or sensitive, is not. Thus, the goal of the problem is extracting information being relevant to a legitimate purpose while preserving the privacy of individuals, which is crucial to enable non-intrusive selection processes in several areas. For instance, it is essential in the design of non-discriminatory personnel selection, promotion, and layoff processes in companies and institutions; in the retrieval of customer information being relevant to the service provided by a company (and no more); in certifications not revealing sensitive industrial information being irrelevant for the certification itself; etc. Interactive questioning processes are constructed for this purpose, which requires generalizing the notion of decision trees to account the amount of desired and undesired information retrieved for each branch of the plan. Our findings about this problem are both theoretical and practical: on the one hand, we prove its NP-completeness by a reduction from the Set Cover problem; and on the other hand, given this intractability, we provide heuristic solutions to find reasonable solutions in affordable time. In particular, a greedy algorithm and two genetic algorithms are presented. Our experiments indicate that the best results are obtained using a genetic algorithm reinforced with a greedy strategy. | |
| dc.description.department | Depto. de Sistemas Informáticos y Computación | |
| dc.description.faculty | Fac. de Informática | |
| dc.description.faculty | Instituto de Tecnología del Conocimiento (ITC) | |
| dc.description.refereed | TRUE | |
| dc.description.sponsorship | Plan Estatal de Investigación Científica y Técnica y de Innovación | |
| dc.description.status | pub | |
| dc.identifier.citation | Pantoja, D., Rodríguez, I., Rubio, F., & Segura, C. (2025). Complexity analysis and practical resolution of the data classification problem with private characteristics. Complex & Intelligent Systems, 11(6), 1-23. | |
| dc.identifier.doi | 10.1007/s40747-025-01911-y | |
| dc.identifier.officialurl | https://doi.org/10.1007/s40747-025-01911-y | |
| dc.identifier.relatedurl | https://link.springer.com/article/10.1007/s40747-025-01911-y | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14352/120216 | |
| dc.journal.title | Complex & Intelligent Systems | |
| dc.language.iso | eng | |
| dc.publisher | Springer | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-108528RB-C22/ES/METODOS RIGUROSOS PARA EL DESARROLLO DE SISTEMAS SOFTWARE DE CALIDAD Y FIABILIDAD CERTIFICADAS/ | |
| dc.relation.projectID | PID2023-149943OB-I00 | |
| dc.rights | Attribution-NonCommercial-ShareAlike 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | |
| dc.subject.keyword | Computational complexity | |
| dc.subject.keyword | NP-completeness | |
| dc.subject.keyword | Data classification | |
| dc.subject.keyword | Decision trees | |
| dc.subject.keyword | Selection processes | |
| dc.subject.keyword | Privacy preservation | |
| dc.subject.ucm | Informática (Informática) | |
| dc.subject.unesco | 1203 Ciencia de Los Ordenadores | |
| dc.title | Complexity analysis and practical resolution of the data classification problem with private characteristics | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dc.volume.number | 11 | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 28429d40-53cb-4bb3-a3f6-82ec557a34ed | |
| relation.isAuthorOfPublication | 24d04c3b-f9e3-4ad0-95cb-c28e064f7a03 | |
| relation.isAuthorOfPublication | b7547876-744e-4e9b-b551-c0dfab2a2d83 | |
| relation.isAuthorOfPublication.latestForDiscovery | 28429d40-53cb-4bb3-a3f6-82ec557a34ed |
Download
Original bundle
1 - 1 of 1


