Fast and Communication-Efficient Algorithm for Distributed Support Vector Machine Training

abstract

1990-2012 IEEE. Support Vector Machines (SVM) are widely used as supervised learning models to solve the classification problem in machine learning. Training SVMs for large datasets is an extremely challenging task due to excessive storage and computational requirements. To tackle so-called big data problems, one needs to design scalable distributed algorithms to parallelize the model training and to develop efficient implementations of these algorithms. In this paper, we propose a distributed algorithm for SVM training that is scalable and communication-efficient. The algorithm uses a compact representation of the kernel matrix, which is based on the QR decomposition of low-rank approximations, to reduce both computation and storage requirements for the training stage. This is accompanied by considerable reduction in communication required for a distributed implementation of the algorithm. Experiments on benchmark data sets with up to five million samples demonstrate negligible communication overhead and scalability on up to 64 cores. Execution times are vast improvements over other widely used packages. Furthermore, the proposed algorithm has linear time complexity with respect to the number of samples making it ideal for SVM training on decentralized environments such as smart embedded systems and edge-based internet of things, IoT.

authors

published proceedings

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

author list (cited authors)

Dass, J., Sarin, V., & Mahapatra, R. N.

citation count

16

complete list of authors

Dass, Jyotikrishna||Sarin, Vivek||Mahapatra, Rabi N

publication date

May 2019

publisher

Institute of Electrical and Electronics Engineers (IEEE) Publisher

keywords

Classification Algorithms
Distributed Computing
Iterative Algorithms
Machine Learning
Message Passing
Multicore Processing
Optimization
Parallel Programming
Quadratic Programming
Support Vector Machines

Digital Object Identifier (DOI)

10.1109/TPDS.2018.2879950

start page

1065

end page

1076

volume

30

issue

5

URL

http://dx.doi.org/10.1109/tpds.2018.2879950

user-defined tag

7 Affordable and Clean Energy

Fast and Communication-Efficient Algorithm for Distributed Support Vector Machine Training Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL

user-defined tag