Complexity analysis and practical resolution of the data classification problem with private characteristics

dc.contributor.authorPantoja, David
dc.contributor.authorRodríguez Laguna, Ismael
dc.contributor.authorRubio Díez, Fernando
dc.contributor.authorSegura Díaz, Clara María
dc.date.accessioned2025-05-19T14:38:02Z
dc.date.available2025-05-19T14:38:02Z
dc.date.issued2025-05-08
dc.description.abstractIn this work we analyze the problem of, given the probability distribution of a population, questioning an unknown individual that is representative of the distribution so that our uncertainty about certain characteristics is significantly reduced—but the uncertainty about others, deemed private or sensitive, is not. Thus, the goal of the problem is extracting information being relevant to a legitimate purpose while preserving the privacy of individuals, which is crucial to enable non-intrusive selection processes in several areas. For instance, it is essential in the design of non-discriminatory personnel selection, promotion, and layoff processes in companies and institutions; in the retrieval of customer information being relevant to the service provided by a company (and no more); in certifications not revealing sensitive industrial information being irrelevant for the certification itself; etc. Interactive questioning processes are constructed for this purpose, which requires generalizing the notion of decision trees to account the amount of desired and undesired information retrieved for each branch of the plan. Our findings about this problem are both theoretical and practical: on the one hand, we prove its NP-completeness by a reduction from the Set Cover problem; and on the other hand, given this intractability, we provide heuristic solutions to find reasonable solutions in affordable time. In particular, a greedy algorithm and two genetic algorithms are presented. Our experiments indicate that the best results are obtained using a genetic algorithm reinforced with a greedy strategy.
dc.description.departmentDepto. de Sistemas Informáticos y Computación
dc.description.facultyFac. de Informática
dc.description.facultyInstituto de Tecnología del Conocimiento (ITC)
dc.description.refereedTRUE
dc.description.sponsorshipPlan Estatal de Investigación Científica y Técnica y de Innovación
dc.description.statuspub
dc.identifier.citationPantoja, D., Rodríguez, I., Rubio, F., & Segura, C. (2025). Complexity analysis and practical resolution of the data classification problem with private characteristics. Complex & Intelligent Systems, 11(6), 1-23.
dc.identifier.doi10.1007/s40747-025-01911-y
dc.identifier.officialurlhttps://doi.org/10.1007/s40747-025-01911-y
dc.identifier.relatedurlhttps://link.springer.com/article/10.1007/s40747-025-01911-y
dc.identifier.urihttps://hdl.handle.net/20.500.14352/120216
dc.journal.titleComplex & Intelligent Systems
dc.language.isoeng
dc.publisherSpringer
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-108528RB-C22/ES/METODOS RIGUROSOS PARA EL DESARROLLO DE SISTEMAS SOFTWARE DE CALIDAD Y FIABILIDAD CERTIFICADAS/
dc.relation.projectIDPID2023-149943OB-I00
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject.keywordComputational complexity
dc.subject.keywordNP-completeness
dc.subject.keywordData classification
dc.subject.keywordDecision trees
dc.subject.keywordSelection processes
dc.subject.keywordPrivacy preservation
dc.subject.ucmInformática (Informática)
dc.subject.unesco1203 Ciencia de Los Ordenadores
dc.titleComplexity analysis and practical resolution of the data classification problem with private characteristics
dc.typejournal article
dc.type.hasVersionVoR
dc.volume.number11
dspace.entity.typePublication
relation.isAuthorOfPublication28429d40-53cb-4bb3-a3f6-82ec557a34ed
relation.isAuthorOfPublication24d04c3b-f9e3-4ad0-95cb-c28e064f7a03
relation.isAuthorOfPublicationb7547876-744e-4e9b-b551-c0dfab2a2d83
relation.isAuthorOfPublication.latestForDiscovery28429d40-53cb-4bb3-a3f6-82ec557a34ed

Download

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Complexity_analysis.pdf
Size:
548.05 KB
Format:
Adobe Portable Document Format

Collections