A method for K-Means seeds generation applied to text mining

dc.contributor.authorVélez Serrano, Daniel
dc.contributor.authorSueiras, Jorge
dc.contributor.authorOrtega, Alejandro
dc.contributor.authorVelez, Jose F.
dc.date.accessioned2024-01-31T08:43:35Z
dc.date.available2024-01-31T08:43:35Z
dc.date.issued2015-11-11
dc.description.abstractIn this paper, a methodology is proposed in order to produce a set of seeds later used as a starting point to K-Means-type unsupervised classification algorithms for text mining. Our proposal involves using the eigenvectors obtained from principal component analysis to extract initial seeds, upon appropriate treatment for search of lightly overlapping clusters which are also clearly identified by keywords. This work is motivated by the interest of the authors in the problem of identification of topics and themes previously unknown in short texts. Therefore, in order to validate the goodness of this method, it was applied on a sample of labeled e-mails (NG20) representing a gold standard within the field of text mining. Specifically, some corpora referenced in the literature have been used, configured in accordance to a mix of topics contained in the sample. The proposed method improves on the results of other state-of-the-art methods to which it is compared.
dc.description.departmentDepto. de Estadística e Investigación Operativa
dc.description.facultyFac. de Ciencias Matemáticas
dc.description.refereedTRUE
dc.description.sponsorshipMinisterio de Economía y Competitividad
dc.description.statuspub
dc.identifier.doi10.1007/s10260-015-0345-4
dc.identifier.issn1618-2510
dc.identifier.issn1613-981X
dc.identifier.officialurlhttps://link.springer.com/article/10.1007/s10260-015-0345-4
dc.identifier.urihttps://hdl.handle.net/20.500.14352/96877
dc.journal.titleStatistical Methods & Applications
dc.language.isoeng
dc.page.final499
dc.page.initial477
dc.publisherSpringer
dc.relation.projectIDinfo:eu-repo/grantAgreement/MINECO//TIN2014-57458-R/ES/ALGORITMOS Y TECNICAS PARA LOS RETOS DE LA EXTRACCION DE CONTENIDO SEMANTICO DESDE IMAGENES DE DOCUMENTOS ESCANEADOS/
dc.rights.accessRightsrestricted access
dc.subject.keywordText mining
dc.subject.keywordK-Means
dc.subject.keywordPCA
dc.subject.keywordClassification
dc.subject.keywordSeeds
dc.subject.keywordEigenvectors
dc.subject.ucmEstadística matemática (Matemáticas)
dc.subject.ucmAnálisis numérico
dc.subject.unesco12 Matemáticas
dc.titleA method for K-Means seeds generation applied to text mining
dc.typejournal article
dc.volume.number25
dspace.entity.typePublication
relation.isAuthorOfPublication1375c631-ecbd-4b51-b213-c7d4148c3eba
relation.isAuthorOfPublication.latestForDiscovery1375c631-ecbd-4b51-b213-c7d4148c3eba
Download
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
s10260-015-0345-4.pdf
Size:
766.18 KB
Format:
Adobe Portable Document Format
Collections