What Data-Centric AI Can Do For kmeans–a Faster, Robust kmeans-d

Parichit Sharma, Hasan Kurban, Mehmet M. Dalkilic – Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, Data Centric Machine Learning Research Workshop (DMLR), 2024 (Accepted)


In this work, we investigate the role of data for enhancing the venerable kmeans. Instead of doing the traditional data agnostic iteration, data is treated as a first class citizen to proactively channelize significant computations towards high expressive (HE) data (as opposed to low expressive (LE)). We show that LE does not affect the convergence or quality of results. Our experiments revealed that, real world data contains substantial amount of HE points, resulting in significant saving of compute, training resources (memory) and time. The concept is illustrated by the figure in cover.


Main contributions are

TBA
Parichit Avatar

Posted by