Sui, Xiaopeng (2018-12). Feature Selection for Unsupervised and Supervised Learning. Doctoral Dissertation. Thesis uri icon

abstract

  • Unsupervised and semi-supervised learning are explored in convex clustering with metric learning while supervised learning is explored in a novel feature selection method. First, we evaluate the performance of convex clustering against previous clustering formulations. Moreover, we implement two metric learning schemes in convex clustering to replace the Euclidean distance used in the original convex clustering formulation. The first metric learning scheme involves using a full-rank positive definite matrix to characterize a Mahalanobis metric and the second metric learning scheme involves using a sparse compositional metric. This sparse compositional metric is a weighted sum of a set of orthonormal rank-1 basis vectors. In experimentation on both simulated data and real life data, convex clustering with metric learning, especially a sparse compositional metric, can outperform convex clustering, other methods based on convex clustering and previous popular clustering algorithms. Second, a novel feature selection method is proposed using Chow-Liu tree approximations to estimate Shannon's mutual information. In experimental analysis, this Chow-Liu tree feature selection method out performs previous feature selection method when classification accuracy is used as a performance measure.
  • Unsupervised and semi-supervised learning are explored in convex clustering with metric
    learning while supervised learning is explored in a novel feature selection method. First, we evaluate
    the performance of convex clustering against previous clustering formulations. Moreover,
    we implement two metric learning schemes in convex clustering to replace the Euclidean distance
    used in the original convex clustering formulation. The first metric learning scheme involves using
    a full-rank positive definite matrix to characterize a Mahalanobis metric and the second metric
    learning scheme involves using a sparse compositional metric. This sparse compositional metric is
    a weighted sum of a set of orthonormal rank-1 basis vectors. In experimentation on both simulated
    data and real life data, convex clustering with metric learning, especially a sparse compositional
    metric, can outperform convex clustering, other methods based on convex clustering and previous
    popular clustering algorithms. Second, a novel feature selection method is proposed using
    Chow-Liu tree approximations to estimate Shannon's mutual information. In experimental analysis,
    this Chow-Liu tree feature selection method out performs previous feature selection method
    when classification accuracy is used as a performance measure.

publication date

  • December 2018