no code implementations • 27 Nov 2023 • Zhiyang Chen, Yousong Zhu, Yufei Zhan, Zhaowen Li, Chaoyang Zhao, Jinqiao Wang, Ming Tang
Large vision-language models (LVLMs) suffer from hallucination a lot, generating responses that apparently contradict to the image content occasionally.
no code implementations • 10 Apr 2023 • Zhaowen Li, Xu Zhao, Peigeng Ding, Zongxin Gao, Yuting Yang, Ming Tang, Jinqiao Wang
In the high-frequency branch, a derivative-filter-like architecture is designed to extract the high-frequency information while a light extractor is employed in the low-frequency branch because the low-frequency information is usually redundant.
no code implementations • 28 Feb 2023 • Zhaowen Li, Yousong Zhu, Zhiyang Chen, Wei Li, Chaoyang Zhao, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
However, its high random mask ratio would result in two serious problems: 1) the data are not efficiently exploited, which brings inefficient pre-training (\eg, 1600 epochs for MAE $vs.$ 300 epochs for the supervised), and 2) the high uncertainty and inconsistency of the pre-trained model, \ie, the prediction of the same patch may be inconsistent under different mask rounds.
2 code implementations • 28 Sep 2022 • Zhiyang Chen, Yousong Zhu, Zhaowen Li, Fan Yang, Wei Li, Haixin Wang, Chaoyang Zhao, Liwei Wu, Rui Zhao, Jinqiao Wang, Ming Tang
Obj2Seq is able to flexibly determine input categories to satisfy customized requirements, and be easily extended to different visual tasks.
no code implementations • 31 Aug 2022 • Zhaowen Li, Xu Zhao, Chaoyang Zhao, Ming Tang, Jinqiao Wang
Previous unsupervised domain adaptation methods did not handle the cross-domain problem from the perspective of frequency for computer vision.
no code implementations • CVPR 2022 • Zhaowen Li, Yousong Zhu, Fan Yang, Wei Li, Chaoyang Zhao, Yingying Chen, Zhiyang Chen, Jiahao Xie, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2. 5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.
no code implementations • NeurIPS 2021 • Zhaowen Li, Zhiyang Chen, Fan Yang, Wei Li, Yousong Zhu, Chaoyang Zhao, Rui Deng, Liwei Wu, Rui Zhao, Ming Tang, Jinqiao Wang
More importantly, the masked tokens together with the remaining tokens are further recovered by a global image decoder, which preserves the spatial information of the image and is more friendly to the downstream dense prediction tasks.