1) Firstly, Semantic-aware Iterative Segmentation (SIS) is proposed to extract unsupervised representations of multi-view images, which are converted into 2D pedestrian masks as pseudo labels, via our proposed iterative PCA and zero-shot semantic classes from vision-language models.
Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models.
Ranked #5 on Pedestrian Detection on Caltech
We develop two simple yet effective plug and play methods to improve the performance of existing local and non-local propagation-based VSR algorithms on widely-used public videos.
2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model.
no code implementations • 13 Apr 2022 • Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, Jinhai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, Jingsong Zhu, Dong Hu, Zijie Fang, Yang Chen, Yongbing Zhang, Yi Li, Yiwen Zou, Yiduo Yu, Xiaomeng Li, Haiming Li, Yanfen Cui, Guoqiang Han, Yan Xu, Jun Xu, Huihua Yang, Chunming Li, Zhenbing Liu, Cheng Lu, Xin Chen, Changhong Liang, Qingling Zhang, Zaiyi Liu
According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS.
Instead of estimating alignment information, we propose a simple and effective deep Recurrent Neural Network with Multi-scale Bi-directional Propagation (RNN-MBP) to effectively propagate and gather the information from unaligned neighboring frames for better video deblurring.
Ranked #11 on Deblurring on GoPro
The repeated cross-correlation and semi-FPN are designed based on this idea.
On the other hand, the video dehazing algorithms, which can acquire more satisfying dehazing results by exploiting the temporal redundancy from neighborhood hazy frames, receive less attention due to the absence of the video dehazing datasets.
These two branches are trained in a mutual-supervised way with full body annotations and visible body annotations, respectively.
However, automatic stem-leaf segmentation of maize shoots from three-dimensional (3D) point clouds remains challenging, especially for new emerging leaves that are very close and wrapped together during the seedling stage.
no code implementations • 10 Oct 2019 • Manzhang Xu, Bijun Tang, Yuhao Lu, Chao Zhu, Lu Zheng, Jingyu Zhang, Nannan Han, Yuxi Guo, Jun Di, Pin Song, Yongmin He, Lixing Kang, Zhiyong Zhang, Wu Zhao, Cuntai Guan, Xuewen Wang, Zheng Liu
Reducing the lateral scale of two-dimensional (2D) materials to one-dimensional (1D) has attracted substantial research interest not only to achieve competitive electronic device applications but also for the exploration of fundamental physical properties.
Specifically, we design a generalized cross-entropy loss for the training of the proposed framework to fully exploit the semantic priors via considering the relevance between adjacent levels and enlarge the distance between samples of different coarse classes.
To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes.