no code implementations • 30 May 2022 • Xiaosong Zhang, Yunjie Tian, Wei Huang, Qixiang Ye, Qi Dai, Lingxi Xie, Qi Tian
A key idea of efficient implementation is to discard the masked image patches (or tokens) throughout the target network (encoder), which requires the encoder to be a plain vision transformer (e. g., ViT), albeit hierarchical vision transformers (e. g., Swin Transformer) have potentially better properties in formulating vision inputs.
1 code implementation • 27 Mar 2022 • Yunjie Tian, Lingxi Xie, Jiemin Fang, Mengnan Shi, Junran Peng, Xiaopeng Zhang, Jianbin Jiao, Qi Tian, Qixiang Ye
The past year has witnessed a rapid development of masked image modeling (MIM).
no code implementations • CVPR 2022 • Weixi Zhao, Weiqiang Wang, Yunjie Tian
In 2D-to-3D pose estimation, it is important to exploit the spatial constraints of 2D joints, but it is not yet well modeled.
no code implementations • 5 Dec 2021 • Yunjie Tian, Lingxi Xie, Jiemin Fang, Jianbin Jiao, Qixiang Ye, Qi Tian
In this paper, we build the search algorithm upon a complicated search space with long-distance connections, and show that existing weight-sharing search algorithms mostly fail due to the existence of \textbf{interleaved connections}.
1 code implementation • 25 Nov 2021 • Yunjie Tian, Lingxi Xie, Xiaopeng Zhang, Jiemin Fang, Haohang Xu, Wei Huang, Jianbin Jiao, Qi Tian, Qixiang Ye
In this paper, we propose a self-supervised visual representation learning approach which involves both generative and discriminative proxies, where we focus on the former part by requiring the target network to recover the original image based on the mid-level features.
Ranked #56 on
Semantic Segmentation
on Cityscapes test
(using extra training data)
2 code implementations • 17 Sep 2021 • Weixi Zhao, Yunjie Tian, Qixiang Ye, Jianbin Jiao, Weiqiang Wang
Exploiting relations among 2D joints plays a crucial role yet remains semi-developed in 2D-to-3D pose estimation.
1 code implementation • 8 Nov 2020 • Chang Liu, Yunjie Tian, Jianbin Jiao, Qixiang Ye
Conventional networks for object skeleton detection are usually hand-crafted.
1 code implementation • 7 Jul 2020 • Yunjie Tian, Chang Liu, Lingxi Xie, Jianbin Jiao, Qixiang Ye
The search cost of neural architecture search (NAS) has been largely reduced by weight-sharing methods.