Shanghai Jiao Tong University wireless communication network laboratory

Research

Location：Home >> Research >> Smart Data Analytics >> Distributed Machine Learning

Federated kernel k-means algorithm

Conventionally, kernel k-means is conducted in a centralized manner where data are stored in one place. There exist two drawbacks in such a manner: high cost of communication bandwidth and lack of data privacy. A promising approach to resolving these issues is to design a federated kernel k-means algorithm such that users upload their local computational results, rather than their local data, to a central place to construct the kernel k-means model. However, two challenges lie ahead: 1) how to solve the problem of kernel $k$-means in a distributed manner; 2) how to maintain communication efficiency in this algorithm. To tackle the first challenge, an approximated solution to the problem of kernel k-means is employed, which requires to determine the eigenpairs of the kernel matrix in kernel k-means problem. The problem of determining the eigenpairs can be reformulated into a convex stochastic composite optimization (SCO) problem, and a distributed stochastic proximal gradient descent (DSPGD) algorithm is then developed to solve the SCO problem. To tackle the second challenge, a communication efficient mechanism is designed to reinforce DSPGD. Theoretical analysis shows that DSPGD converges to the optimal solution to the SCO problem with an O(1/T) rate, where T is the number of iterations. The communication cost and the clustering loss are also analyzed for the developed algorithm. In comparison to the state-of-the-art schemes, the developed algorithm achieves the highest clustering quality with the communication cost reduced by more than 60%.

People:

Xiaochen Zhou