Yuan, Hao (2021-06). Towards Explainable Deep Models for Images, Texts, and Graphs. Doctoral Dissertation. Thesis uri icon


  • Deep neural networks have been widely studied and applied to different applications in recent years due to their great performance. Even though deep models are shown to be powerful and promising, most of them are developed as black boxes. However, without meaningful explanations of how and why predictions are made, we do not fully understand their inner working mechanisms. Hence, such models cannot be fully trusted, which prevents their use in critical applications pertaining to fairness, privacy, and safety. This raises the need of explaining deep learning models and investigating several questions; some of those are, what input factors are important to the predictions? how the decisions are made through deep networks? and what is the meaning of hidden neurons? In this dissertation, we investigate different explanation techniques for different types of deep models. In particular, we explore both instance-level and model-level explanations for image models, text models, and graph models. Understanding deep image models is the most straightforward choice for explaining deep models since images are naturally well presented and can be easily visualized. Hence, we start by proposing a novel discrete masking method for explaining deep image classifiers. Our method follows the generative adversarial network formalism that the deep model to be explained is regarded as the discriminator while we train a generator to explain it. The generator is trained to capture discriminative image regions that should convey the same or similar semantic meaning as the original image from the model's perspective. It produces a probability map from which a discrete mask can be sampled. Then the discriminator is used to measure the quality of the sampled mask and provide feedback for updating the generator. Due to the sampling operations, the generator cannot be trained directly by back-propagation. We propose to update it using the policy gradient. Furthermore, we propose to incorporate gradients as auxiliary information to reduce the search space and facilitate training. We conduct both quantitative and qualitative experiments on the ILSVRC dataset to demonstrate the effectiveness of our proposed method. Experimental results indicate that our method can provide reasonable explanations for both correct and incorrect predictions and outperform existing approaches. In addition, our method can pass the model randomization test, indicating that it is reasoning the attribution of network predictions. Unlike image models, text models are more difficult to explain since texts are represented as discrete variables and cannot be directly visualized. In addition, most explanation methods only focus on the input space of the models and ignore the hidden space. Hence, we propose to explain deep models for text analysis by exploring the meaning of hidden space. Specifically, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network models for sentence classification tasks. We first employ the saliency map technique to identify important spatial locations in the hidden layers. Then we use optimization techniques to approximate the detected information of these hidden locations from input sentences. Furthermore, we develop regularization terms and explore words in vocabulary to explain such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable explanations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep text models. These facts further motivate us to study the explanation techniques for graph neural networks (GNNs). Unlike images and texts, graph data are usually represented as continuous feature matrices and discrete adjacency matrices. The structural information in the adjacency matrices is important, which should be considered when providing explanations. Thus, methods for images and texts ca

publication date

  • August 2021