Hoang, Duc (2021-04). 3M-POSE: MULTI-RESOLUTION, MULTI-PATH AND MULTI-OUTPUT NEURAL ARCHITECTURE SEARCH FOR BOTTOM-UP POSE PREDICTION. Master's Thesis. Thesis uri icon

abstract

  • Human pose estimation is a challenging computer vision task and often hinges on carefully handcrafted architectures. This paper aims to be the first to apply Neural Architectural Search (NAS) to automatically design a bottom-up, one-stage human pose estimation model with significantly lower computational costs and smaller model size than existing bottom-up approaches. Our framework dubbed 3M-Pose co-searches and co-trains with the novel building block of Early Escape Layers (EELs), producing native modular architectures that are optimized to support dynamic inference for even lower average computational cost. To flexibly explore the fine-grained spectrum between the performance and computational budget, we propose Dynamic Ensemble Gumbel Softmax (Dyn-EGS), a novel approach to sample micro and macro search spaces by allowing varying numbers of operators and inputs to be individually selected for each cell. We additionally enforce a computational constraint with a student-teacher guidance to avoid the trivial search collapse caused by the pursuit of lightweight models. Experiments demonstrate 3M-Pose to find models of drastically superior speed and efficiency compared to existing works, reducing computational costs by up to 93% and parameter size by up to 75% at the cost of minor loss in performance.

publication date

  • April 2021
  • April 2021