CRB-NCE: An adaptable cohesion rule-based approach to number of clusters estimation

Loading...
Thumbnail Image

Full text at PDC

Publication date

2026

Advisors (or tutors)

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Elsevier
Citations
Google Scholar

Citation

Abstract

Accurate number-of-clusters estimation (NCE) is a central task in many clustering applications, particularly for prototype-based 𝑘-centers methods like 𝑘-Means, which require the number of clusters 𝑘 to be specified in advance. This paper presents CRB-NCE, a general cluster cohesion rule-based framework for NCE integrating three main innovations: (i) the introduction of tail ratios to reliably identify decelerations in sequences of cohesion measures, (ii) a threshold-based rule system supporting accurate NCE, and (iii) an optimization-driven approach to learn these thresholds from synthetic datasets with controlled clustering complexity. Two cohesion measures are considered: inertia (SSE) and a new, scale-invariant metric called the mean coverage index. CRB-NCE is mainly applied to derive general-purpose NCE methods, but, most importantly, it also provides an adaptable framework that enables producing specialized procedures with enhanced performance under specific conditions, such as particular clustering algorithms or overlapping cluster structures. Extensive evaluations on synthetic Gaussian datasets (both standard and high-dimensional), clustering benchmarks, and real-world datasets show that CRB-NCE methods consistently achieve robust and competitive NCE performance with efficient runtimes compared to a broad baseline of internal clustering validity indices and other NCE methods.

Research Projects

Organizational Units

Journal Issue

Description

2025 Acuerdos transformativos CRUE

Keywords

Collections