A method for K-Means seeds generation applied to text mining

Vélez Serrano, Daniel; Sueiras, Jorge; Ortega, Alejandro; Velez, Jose F.

doi:10.1007/s10260-015-0345-4

A method for K-Means seeds generation applied to text mining

dc.contributor.author	Vélez Serrano, Daniel
dc.contributor.author	Sueiras, Jorge
dc.contributor.author	Ortega, Alejandro
dc.contributor.author	Velez, Jose F.
dc.date.accessioned	2024-01-31T08:43:35Z
dc.date.available	2024-01-31T08:43:35Z
dc.date.issued	2015
dc.description.abstract	In this paper, a methodology is proposed in order to produce a set of seeds later used as a starting point to K-Means-type unsupervised classification algorithms for text mining. Our proposal involves using the eigenvectors obtained from principal component analysis to extract initial seeds, upon appropriate treatment for search of lightly overlapping clusters which are also clearly identified by keywords. This work is motivated by the interest of the authors in the problem of identification of topics and themes previously unknown in short texts. Therefore, in order to validate the goodness of this method, it was applied on a sample of labeled e-mails (NG20) representing a gold standard within the field of text mining. Specifically, some corpora referenced in the literature have been used, configured in accordance to a mix of topics contained in the sample. The proposed method improves on the results of other state-of-the-art methods to which it is compared.	en
dc.description.department	Depto. de Estadística e Investigación Operativa
dc.description.faculty	Fac. de Ciencias Matemáticas
dc.description.refereed	TRUE
dc.description.sponsorship	Ministerio de Economía, Comercio y Empresa (España)
dc.description.status	pub
dc.identifier.citation	Velez D, Sueiras J, Ortega A, Velez JF (2016) A method for K-Means seeds generation applied to text mining. Stat Methods Appl 25:477–499. https://doi.org/10.1007/s10260-015-0345-4
dc.identifier.doi	10.1007/s10260-015-0345-4
dc.identifier.essn	1613-981X
dc.identifier.issn	1618-2510
dc.identifier.officialurl	https://doi.org/10.1007/s10260-015-0345-4
dc.identifier.relatedurl	https://link.springer.com/article/10.1007/s10260-015-0345-4
dc.identifier.uri	https://hdl.handle.net/20.500.14352/96877
dc.journal.title	Statistical Methods & Applications
dc.language.iso	eng
dc.page.final	499
dc.page.initial	477
dc.publisher	Springer
dc.relation.projectID	info:eu-repo/grantAgreement/MINECO//TIN2014-57458-R/ES/ALGORITMOS Y TECNICAS PARA LOS RETOS DE LA EXTRACCION DE CONTENIDO SEMANTICO DESDE IMAGENES DE DOCUMENTOS ESCANEADOS/
dc.rights.accessRights	restricted access
dc.subject.keyword	Text mining
dc.subject.keyword	K-Means
dc.subject.keyword	PCA
dc.subject.keyword	Classification
dc.subject.keyword	Seeds
dc.subject.keyword	Eigenvectors
dc.subject.ucm	Estadística matemática (Matemáticas)
dc.subject.ucm	Análisis numérico
dc.subject.unesco	12 Matemáticas
dc.title	A method for K-Means seeds generation applied to text mining	en
dc.type	journal article
dc.type.hasVersion	VoR
dc.volume.number	25
dspace.entity.type	Publication
relation.isAuthorOfPublication	1375c631-ecbd-4b51-b213-c7d4148c3eba
relation.isAuthorOfPublication.latestForDiscovery	1375c631-ecbd-4b51-b213-c7d4148c3eba

Download

Original bundle

Now showing 1 - 1 of 1

Name:: K-Means_seeds_generation.pdf
Size:: 766.18 KB
Format:: Adobe Portable Document Format

Download

Collections

Artículos