Applicability Study of Deep Learning-Based Human Action Recognition Models

  • Chuan Dai

Student thesis: Doctoral Thesis

Abstract

Deep learning-based methods for human action recognition (HAR) have been applied in various domains such as intelligent surveillance, autonomous driving, and anomaly detection. While research achievements are being transferred into applications, there exist a series of challenges that urgently need to be addressed, such as computational efficiency, scalability, and robustness. This research will propose improved methods and optimised models, with the ultimate goal of facilitating the transition of such research outcomes into applications. The key ideas developed in this thesis are: 1) Avoiding the using of large scale pre-trained models. 2) Reducing neural network scale and exploring modelling methods adaptable to tasks. 3) Enabling self-supervised contrastive representation learning. The first line of investigation argues that the pattern of using large-scale pre-trained models for feature extraction followed by fine-tuning has inherent flaws. These weaknesses include enormous model size, high demands on target applications or devices, and high model employ costs. Instead, this thesis proposes methods to improve model performance by optimising model design and data processing without using pre-trained models for HAR applications. The second viewpoint suggests that effectively managing the scale of neural networks is a fundamental prerequisite for controlling the size of generated models. This is a critical condition for applying research achievements. Therefore, this thesis proposes the use of sparsification techniques to reduce the size of deep learning models. Two types of enhanced algorithms for network sparsification are proposed. Furthermore, this research proposes modelling methods relevant to training tasks under the premise of network sparsification. The third spectrum of endeavours highlights that in the modelling process, utilising massive amounts of non-labelled data from the real world is essential for continuously improving model robustness. Therefore, this research proposes a framework based on the self-supervised contrastive representation learning, while employing a simple neural network architecture to effectively control network performance.
Date of Award5 Aug 2024
Original languageEnglish
SupervisorZhijie Xu (Main Supervisor) & Minsi Chen (Co-Supervisor)

Cite this

'