Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis

abstract

A remarkable recent discovery in machine learning has been that deep neural networks can achieve impressive performance (in terms of both lower training error and higher generalization capacity) in the regime where they are massively over-parameterized. Consequently, over the past year, the community has devoted growing interest in analyzing optimization and generalization properties of over-parameterized networks, and several breakthrough works have led to important theoretical progress. However, the majority of existing work only applies to supervised learning scenarios and hence are limited to settings such as classification and regression. In contrast, the role of over-parameterization in the unsupervised setting has gained far less attention. In this paper, we study the gradient dynamics of two-layer over-parameterized autoencoders with ReLU activation. We make very few assumptions about the given training dataset (other than mild non-degeneracy conditions). Starting from a randomly initialized autoencoder network, we rigorously prove the linear convergence of gradient descent in two learning regimes, namely: (i) the weakly-trained regime where only the encoder is trained, and (ii) the jointly-trained regime where both the encoder and the decoder are trained. Our results indicate the considerable benefits of joint training over weak training for finding global optima, achieving a dramatic decrease in the required level of over-parameterization. We also analyze the case of weight-tied autoencoders (which is a commonly used architectural choice in practical settings) and prove that in the over-parameterized setting, training such networks from randomly initialized points leads to certain unexpected degeneracies.

authors

Wong, Raymond Ka Wai

published proceedings

IEEE TRANSACTIONS ON INFORMATION THEORY

author list (cited authors)

Nguyen, T. V., Wong, R., & Hegde, C.

citation count

5

complete list of authors

Nguyen, Thanh V||Wong, Raymond KW||Hegde, Chinmay

publication date

July 2021

publisher

Institute of Electrical and Electronics Engineers (IEEE) Publisher

keywords

Autoencoders
Convergence
Data Models
Decoding
Gradient Dynamics
Heuristic Algorithms
Kernel
Neural Tangent Kernel
Task Analysis
Training

Digital Object Identifier (DOI)

10.1109/TIT.2021.3065212

start page

4669

end page

4692

volume

67

issue

7

URL

http://dx.doi.org/10.1109/tit.2021.3065212

Benefits of Jointly Training Autoencoders: An Improved Neural Tangent Kernel Analysis Academic Article

Overview

abstract

authors

published proceedings

author list (cited authors)

citation count

complete list of authors

publication date

publisher

Research

keywords

Identity

Digital Object Identifier (DOI)

Additional Document Info

start page

end page

volume

issue

Other

URL