A Method to Generate Soft Reference Data for Topic Identification
dc.conference.date | June 15–19, 2020 | |
dc.conference.place | Lisboa, Portugal | |
dc.conference.title | International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2020) | |
dc.contributor.author | Vélez Serrano, Daniel | |
dc.contributor.author | Villarino, Guillermo | |
dc.contributor.author | Rodríguez González, Juan Tinguaro | |
dc.contributor.author | Gómez González, Daniel | |
dc.date.accessioned | 2025-01-20T16:03:10Z | |
dc.date.available | 2025-01-20T16:03:10Z | |
dc.date.issued | 2020 | |
dc.description.abstract | Text mining and topic identification models are becoming increasingly relevant to extract value from the huge amount of unstructured textual information that companies obtain from their users and clients nowadays. Soft approaches to these problems are also gaining relevance, as in some contexts it may be unrealistic to assume that any document has to be associated to a single topic without any further consideration of the involved uncertainties. However, there is an almost total lack of reference documents allowing a proper assessment of the performance of soft classifiers in such soft topic identification tasks. To address this lack, in this paper a method is proposed that generates topic identification reference documents with a soft but objective nature, and which proceeds by combining, in random but known proportions, phrases of existing documents dealing with different topics. We also provide a computational study illustrating the application of the proposed method on a well-known benchmark for topic identification, as well as showing the possibility of carrying out an informative evaluation of soft classifiers in the context of soft topic identification. | |
dc.description.department | Depto. de Estadística e Investigación Operativa | |
dc.description.department | Depto. de Estadística y Ciencia de los Datos | |
dc.description.faculty | Fac. de Ciencias Matemáticas | |
dc.description.faculty | Fac. de Estudios Estadísticos | |
dc.description.faculty | Instituto de Matemática Interdisciplinar (IMI) | |
dc.description.refereed | TRUE | |
dc.description.sponsorship | Ministerio de Ciencia, Innovación y Universidades | |
dc.description.sponsorship | Universidad Complutense de Madrid | |
dc.description.status | pub | |
dc.identifier.citation | Vélez, D., Villarino, G., Rodríguez, J.T., Gómez, D. (2020). A Method to Generate Soft Reference Data for Topic Identification. In: Lesot, MJ., et al. Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2020. Communications in Computer and Information Science, vol 1239. Springer, Cham. https://doi.org/10.1007/978-3-030-50153-2_5 | |
dc.identifier.doi | 10.1007/978-3-030-50153-2_5 | |
dc.identifier.isbn | 9783030501525 | |
dc.identifier.isbn | 9783030501532 | |
dc.identifier.issn | 1865-0929 | |
dc.identifier.issn | 1865-0937 | |
dc.identifier.officialurl | https://doi.org/ 10.1007/978-3-030-50153-2_5 | |
dc.identifier.uri | https://hdl.handle.net/20.500.14352/115196 | |
dc.language.iso | eng | |
dc.page.final | 67 | |
dc.page.initial | 54 | |
dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-096509-B-I00/ES/GESTION INTELIGENTE DE INFORMACION BORROSA/ | |
dc.relation.projectID | UCM Research Group 910149 | |
dc.rights.accessRights | restricted access | |
dc.subject.keyword | Soft classification | |
dc.subject.keyword | Text mining | |
dc.subject.keyword | Topic identification | |
dc.subject.ucm | Estadística matemática (Matemáticas) | |
dc.subject.ucm | Investigación operativa (Matemáticas) | |
dc.subject.unesco | 1207 Investigación Operativa | |
dc.subject.unesco | 1209 Estadística | |
dc.title | A Method to Generate Soft Reference Data for Topic Identification | |
dc.type | conference paper | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | 1375c631-ecbd-4b51-b213-c7d4148c3eba | |
relation.isAuthorOfPublication | ddad170a-793c-4bdc-b983-98d313c81b03 | |
relation.isAuthorOfPublication | 4dcf8c54-8545-4232-8acf-c163330fd0fe | |
relation.isAuthorOfPublication.latestForDiscovery | 1375c631-ecbd-4b51-b213-c7d4148c3eba |
Download
Original bundle
1 - 1 of 1