An Investigation of Machine Learning-Augmented Vision Systems for Human Action Understanding

  • Chaolong Zhang

Student thesis: Doctoral Thesis


Recognising and understanding the complex visual world is the ultimate goal of intelligent vision systems. Computer vision and artificial intelligence have been a long lasting research hotspot with increasing major discoveries and breakthroughs. Human action understanding is one of the crucial topics due to its potential value in both academia and industry. Various steep challenges remain due to semantically implicit and ambiguous definitions of video events and their inherent signal complexities from streamed videos ill-affected by target occlusion and variation of illumination conditions.
Classic strategies and techniques for addressing these critical challenges of human action understanding have been investigated in this research. An innovative machine learning-augmented analytical framework for visual behaviour understanding has been proposed. The corresponding operational pipeline first integrates the discrete wavelet transform technique into the dense trajectory model to gain more defining human action features. Then the end-to-end multimodality neural networks are deployed for automatic feature learning and action classification. Performance enhancement has been achieved through the innovation of an efficient two-stream aggregation network by adopting optical flow-guided features and spatial-temporal fusion blocks in a cascaded spatial and temporal space.
This research has also addressed the context-biased problem causing long aggravation to the deep-learning community when dealing with generalisation issues. A long-short-term motion encoding scheme is presented to interpret human actions based on their semantic meanings embedded in pose skeletons, which has greatly alleviated the open-set action recognition problem by introducing the Euclidean and Additive Angular Margin Loss.
To facilitate the real-world implementation of the devised human action understanding models and techniques, the state-of-the-art and future trends of edge computing have also been explored. Corresponding experiments have demonstrated the viability and effectiveness of open format-based transferrable model generation for rapid and mass deployment in live.
Date of Award11 Aug 2022
Original languageEnglish
SupervisorLee McCluskey (Main Supervisor), Zhijie Xu (Co-Supervisor) & David Peebles (Co-Supervisor)

Cite this