A method for K-Means seeds generation applied to text mining

Vélez Serrano, Daniel; Sueiras, Jorge; Ortega, Alejandro; Velez, Jose F.

doi:10.1007/s10260-015-0345-4

A method for K-Means seeds generation applied to text mining

Download

K-Means_seeds_generation.pdf (766.18 KB)

Official URL

https://doi.org/10.1007/s10260-015-0345-4

Publication date

2015

Authors

Vélez Serrano, Daniel

Sueiras, Jorge

Ortega, Alejandro

Velez, Jose F.

Publisher

Springer

Citations

Exportar

URI

https://hdl.handle.net/20.500.14352/96877

Citation

Velez D, Sueiras J, Ortega A, Velez JF (2016) A method for K-Means seeds generation applied to text mining. Stat Methods Appl 25:477–499. https://doi.org/10.1007/s10260-015-0345-4

Abstract

In this paper, a methodology is proposed in order to produce a set of seeds later used as a starting point to K-Means-type unsupervised classification algorithms for text mining. Our proposal involves using the eigenvectors obtained from principal component analysis to extract initial seeds, upon appropriate treatment for search of lightly overlapping clusters which are also clearly identified by keywords. This work is motivated by the interest of the authors in the problem of identification of topics and themes previously unknown in short texts. Therefore, in order to validate the goodness of this method, it was applied on a sample of labeled e-mails (NG20) representing a gold standard within the field of text mining. Specifically, some corpora referenced in the literature have been used, configured in accordance to a mix of topics contained in the sample. The proposed method improves on the results of other state-of-the-art methods to which it is compared.

UCM subjects

Estadística matemática (Matemáticas), Análisis numérico

Unesco subjects

12 Matemáticas

Collections

Artículos

Full item page

A method for K-Means seeds generation applied to text mining

Download

Official URL

Full text at PDC

Publication date

Authors

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Citations

Exportar

URI

Citation

Abstract

Research Projects

Organizational Units

Journal Issue

Description

UCM subjects

Unesco subjects

Keywords

Collections