RT Journal Article
T1 A method for K-Means seeds generation applied to text mining
A1 Vélez Serrano, Daniel
A1 Sueiras, Jorge
A1 Ortega, Alejandro
A1 Velez, Jose F.
AB In this paper, a methodology is proposed in order to produce a set of seeds later used as a starting point to K-Means-type unsupervised classification algorithms for text mining. Our proposal involves using the eigenvectors obtained from principal component analysis to extract initial seeds, upon appropriate treatment for search of lightly overlapping clusters which are also clearly identified by keywords. This work is motivated by the interest of the authors in the problem of identification of topics and themes previously unknown in short texts. Therefore, in order to validate the goodness of this method, it was applied on a sample of labeled e-mails (NG20) representing a gold standard within the field of text mining. Specifically, some corpora referenced in the literature have been used, configured in accordance to a mix of topics contained in the sample. The proposed method improves on the results of other state-of-the-art methods to which it is compared.
PB Springer
SN 1618-2510
YR 2015
FD 2015
LK https://hdl.handle.net/20.500.14352/96877
UL https://hdl.handle.net/20.500.14352/96877
LA eng
NO Velez D, Sueiras J, Ortega A, Velez JF (2016) A method for K-Means seeds generation applied to text mining. Stat Methods Appl 25:477–499. https://doi.org/10.1007/s10260-015-0345-4
NO Ministerio de Economía, Comercio y Empresa (España)
DS Docta Complutense
RD 8 jun 2026