Fast and Communication-Efficient Algorithm for Distributed Support Vector Machine Training Academic Article uri icon

abstract

  • 1990-2012 IEEE. Support Vector Machines (SVM) are widely used as supervised learning models to solve the classification problem in machine learning. Training SVMs for large datasets is an extremely challenging task due to excessive storage and computational requirements. To tackle so-called big data problems, one needs to design scalable distributed algorithms to parallelize the model training and to develop efficient implementations of these algorithms. In this paper, we propose a distributed algorithm for SVM training that is scalable and communication-efficient. The algorithm uses a compact representation of the kernel matrix, which is based on the QR decomposition of low-rank approximations, to reduce both computation and storage requirements for the training stage. This is accompanied by considerable reduction in communication required for a distributed implementation of the algorithm. Experiments on benchmark data sets with up to five million samples demonstrate negligible communication overhead and scalability on up to 64 cores. Execution times are vast improvements over other widely used packages. Furthermore, the proposed algorithm has linear time complexity with respect to the number of samples making it ideal for SVM training on decentralized environments such as smart embedded systems and edge-based internet of things, IoT.

published proceedings

  • IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

author list (cited authors)

  • Dass, J., Sarin, V., & Mahapatra, R. N.

citation count

  • 16

complete list of authors

  • Dass, Jyotikrishna||Sarin, Vivek||Mahapatra, Rabi N

publication date

  • May 2019