Localization is one of the crucial issues in assistive technology for visually impaired people. In this paper, we propose a novel hierarchical visual localization pipeline based on the wearable assistive navigation device for visually impaired people. The proposed pipeline involves the deep descriptor network, 2D-3D geometric verification and online sequence matching. Images in different modalities (RGB, Infrared and Depth) are fed into Dual Desc network to generate robust attentive global descriptors and local features. The global descriptors are leveraged to retrieve the coarse candidates of query images. The 2D local features, as well as 3D sparse point cloud, are used in geometric verification to select the optimal results from the retrieved candidates. Finally, sequence matching robustifies the localization results by synthesizing the verified results of successive frames. The proposed unified descriptor network Dual Desc surpasses the state-of-the-art NetVLAD and its variant on the task of image description. Validated on the real-world dataset captured by the wearable assistive device, the proposed visual localization utilizes multimodal images to overcome the disadvantages of RGB images and robustifies the localization performance by deep descriptor network and hierarchical pipeline. In the challenging scenarios of the Yuquan dataset, the proposed method achieves the F1 score of 0.77 and the mean localization error of 2.75, which is satisfactory in practical use.