Pixel Transposed Convolutional Networks
- Additional Document Info
- View All
Transposed convolutional layers have been widely used in a variety of deep models for up-sampling, including encoder-decoder networks for semantic segmentation and deep generative models for unsupervised learning. One of the key limitations of transposed convolutional operations is that they result in the so-called checkerboard problem. This is caused by the fact that no direct relationship exists among adjacent pixels on the output feature map. To address this problem, we propose the pixel transposed convolutional layer (PixelTCL) to establish direct relationships among adjacent pixels on the up-sampled feature map. Our method is based on a fresh interpretation of the regular transposed convolutional operation. The resulting PixelTCL can be used to replace any transposed convolutional layer in a plug-and-play manner without compromising the fully trainable capabilities of original models. The proposed PixelTCL may result in slight decrease in efficiency, but this can be overcome by an implementation trick. Experimental results on semantic segmentation demonstrate that PixelTCL can consider spatial features such as edges and shapes and yields more accurate segmentation outputs than transposed convolutional layers. When used in image generation tasks, our PixelTCL can largely overcome the checkerboard problem suffered by regular transposed convolutional operations.
author list (cited authors)
Gao, H., Yuan, H., Wang, Z., & Ji, S.