Abstract
Video understanding represents a crucial research direction in smart surveillance. With the advancement of artificial intelligence technology, especially machine (deep)learning, these autonomous and efficient computer models and frameworks arecapable of detecting individuals and comprehending their behaviours in the videos, which have become a research hot spot.
This thesis explores four primary areas, structured around the workflowof videocomprehension: starting with low-level object recognition, advancing to mid-level
object tracking, and culminating in high-level behaviour detection and estimation. Through extensive literature review and experimentation, major progress has beenachieved in these aspects:
A bi-directional feature pyramid model and an extra output head for small sizes have been integrated into regression CNNs (convolutional neural networks), significantly improving the detection of tiny objects, such as in face detection tasks. Furthermore, a novel integration of BoTNet with DeepSort-FastReid technique has fostered advancements in pedestrian tracking. Additionally, the incorporation of attention mechanisms has markedly enhanced the detection of distracted drivingbehaviour. Lastly, the deployment of an enhanced self-attention framework that combines spatial and frequency domain features has proven to be effective in the 3Dskeleton-based behaviour estimation task, demonstrating the potential for further research and application in this domain.
Experimental results show that this theoretical framework for videounderstanding, along with its corresponding models and techniques, outperforms benchmarks and approaches with robust and superior performance.
Date of Award | 19 Jul 2024 |
---|---|
Original language | English |
Supervisor | Zhijie Xu (Main Supervisor) & Minsi Chen (Co-Supervisor) |