From “Mirror Flower, Water Moon” to Multi-Task Visual Prospective Representation Learning for Unmanned Aerial Vehicles Indoor Mapless Navigation

Yingxiu Chang, Yongqiang Cheng, John Murray, Muhammad Khalid, Umar Manzoor

Research output: Contribution to journalArticlepeer-review

Abstract

Vision-based deep learning models have been widely adopted in autonomous agents, such as unmanned aerial vehicles (UAVs), particularly in reactive control policies that serve as a key component of navigation systems. These policies enable agents to respond instantaneously to dynamic environments without relying on pre-existing maps. However, there remain open challenges to improve the agent's reactive control performance: (1) Is it possible and how to anticipate future states at the current moment to benefit control precision? (2) Is it possible and how can we anticipate future states for different sub-tasks when the agent's control consists of both discrete classification and continuous regression commands? Inspired by the Chinese idiom “Mirror Flower, Water Moon,” this paper hypothesizes that future states in the latent space can be learnt from sequential images using contrastive learning, and consequently proposes a light-weight Multi-task Visual Prospective Representation Learning (MulVPRL) framework for benefiting reactive control. Specifically, (1) This paper leverages the advantage of contrastive learning to correlate the representations obtained from the latest sequential images and one image in the future. (2) This paper constructs an integrated loss function of contrastive learning for classification and regression sub-tasks. The MulVPRL framework outperforms the benchmark models on the public HDIN and DroNet datasets, and obtained the best performance in real-world experiments ((Formula presented.) SOTA (Formula presented.)). Therefore, the multi-task contrastive learning of the light-weight MulVPRL framework enhances reactive control performance on a 2D plane, and demonstrates the potential to be integrated with various intelligent strategies, and implemented on ground vehicles.

Original languageEnglish
Number of pages26
JournalJournal of Field Robotics
Early online date1 Sept 2025
DOIs
Publication statusE-pub ahead of print - 1 Sept 2025
Externally publishedYes

Fingerprint

Dive into the research topics of 'From “Mirror Flower, Water Moon” to Multi-Task Visual Prospective Representation Learning for Unmanned Aerial Vehicles Indoor Mapless Navigation'. Together they form a unique fingerprint.

Cite this