TY - JOUR
T1 - From “Mirror Flower, Water Moon” to Multi-Task Visual Prospective Representation Learning for Unmanned Aerial Vehicles Indoor Mapless Navigation
AU - Chang, Yingxiu
AU - Cheng, Yongqiang
AU - Murray, John
AU - Khalid, Muhammad
AU - Manzoor, Umar
N1 - Funding Information:
The authors acknowledge the assistance from the team member, Kang Xiang and Chi Jen Li, for data collection and video recording. The work has been supported in part by China Scholarship Council (202008010003).
Publisher Copyright:
© 2025 The Author(s). Journal of Field Robotics published by Wiley Periodicals LLC.
PY - 2025/9/1
Y1 - 2025/9/1
N2 - Vision-based deep learning models have been widely adopted in autonomous agents, such as unmanned aerial vehicles (UAVs), particularly in reactive control policies that serve as a key component of navigation systems. These policies enable agents to respond instantaneously to dynamic environments without relying on pre-existing maps. However, there remain open challenges to improve the agent's reactive control performance: (1) Is it possible and how to anticipate future states at the current moment to benefit control precision? (2) Is it possible and how can we anticipate future states for different sub-tasks when the agent's control consists of both discrete classification and continuous regression commands? Inspired by the Chinese idiom “Mirror Flower, Water Moon,” this paper hypothesizes that future states in the latent space can be learnt from sequential images using contrastive learning, and consequently proposes a light-weight Multi-task Visual Prospective Representation Learning (MulVPRL) framework for benefiting reactive control. Specifically, (1) This paper leverages the advantage of contrastive learning to correlate the representations obtained from the latest sequential images and one image in the future. (2) This paper constructs an integrated loss function of contrastive learning for classification and regression sub-tasks. The MulVPRL framework outperforms the benchmark models on the public HDIN and DroNet datasets, and obtained the best performance in real-world experiments ((Formula presented.) SOTA (Formula presented.)). Therefore, the multi-task contrastive learning of the light-weight MulVPRL framework enhances reactive control performance on a 2D plane, and demonstrates the potential to be integrated with various intelligent strategies, and implemented on ground vehicles.
AB - Vision-based deep learning models have been widely adopted in autonomous agents, such as unmanned aerial vehicles (UAVs), particularly in reactive control policies that serve as a key component of navigation systems. These policies enable agents to respond instantaneously to dynamic environments without relying on pre-existing maps. However, there remain open challenges to improve the agent's reactive control performance: (1) Is it possible and how to anticipate future states at the current moment to benefit control precision? (2) Is it possible and how can we anticipate future states for different sub-tasks when the agent's control consists of both discrete classification and continuous regression commands? Inspired by the Chinese idiom “Mirror Flower, Water Moon,” this paper hypothesizes that future states in the latent space can be learnt from sequential images using contrastive learning, and consequently proposes a light-weight Multi-task Visual Prospective Representation Learning (MulVPRL) framework for benefiting reactive control. Specifically, (1) This paper leverages the advantage of contrastive learning to correlate the representations obtained from the latest sequential images and one image in the future. (2) This paper constructs an integrated loss function of contrastive learning for classification and regression sub-tasks. The MulVPRL framework outperforms the benchmark models on the public HDIN and DroNet datasets, and obtained the best performance in real-world experiments ((Formula presented.) SOTA (Formula presented.)). Therefore, the multi-task contrastive learning of the light-weight MulVPRL framework enhances reactive control performance on a 2D plane, and demonstrates the potential to be integrated with various intelligent strategies, and implemented on ground vehicles.
KW - contrastive learning
KW - indoor unknown environment
KW - mapless navigation
KW - prospective classification-aware representation
KW - prospective regression-aware representation
KW - UAV
KW - visual prospective representation learning (VPRL)
UR - https://www.scopus.com/pages/publications/105014757186
U2 - 10.1002/rob.70057
DO - 10.1002/rob.70057
M3 - Article
AN - SCOPUS:105014757186
SN - 1556-4959
JO - Journal of Field Robotics
JF - Journal of Field Robotics
ER -