1 code implementation • 23 Aug 2024 • Qi Qian, Yuanhong Xu, Juhua Hu
To unleash the potential of fixed deep features, we propose a novel semantic adversarial augmentation (SeA) in the feature space for optimization.
1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.
Ranked #1 on Visual Question Answering (VQA) on HallusionBench (Question Pair Acc metric)
Visual Question Answering (VQA) Zero-Shot Video Question Answer
1 code implementation • ICCV 2023 • Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples.
4 code implementations • 1 Feb 2023 • Haiyang Xu, Qinghao Ye, Ming Yan, Yaya Shi, Jiabo Ye, Yuanhong Xu, Chenliang Li, Bin Bi, Qi Qian, Wei Wang, Guohai Xu, Ji Zhang, Songfang Huang, Fei Huang, Jingren Zhou
In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.
Ranked #1 on Video Captioning on MSR-VTT
no code implementations • 25 May 2022 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang Ji, Antoni B. Chan
With our empirical result obtained from 1, 330 models, we provide the following main observations: 1) ERM combined with data augmentation can achieve state-of-the-art performance if we choose a proper pre-trained model respecting the data property; 2) specialized algorithms further improve the robustness on top of ERM when handling a specific type of distribution shift, e. g., GroupDRO for spurious correlation and CORAL for large-scale out-of-distribution data; 3) Comparing different pre-training modes, architectures and data sizes, we provide novel observations about pre-training on distribution shift, which sheds light on designing or selecting pre-training strategy for different kinds of distribution shifts.
no code implementations • 24 Nov 2021 • Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Xiangyang Ji, Antoni Chan, Rong Jin
The generalization result of using pre-training data shows that the excess risk bound on a target task can be improved when the appropriate pre-training data is included in fine-tuning.
1 code implementation • CVPR 2022 • Qi Qian, Yuanhong Xu, Juhua Hu, Hao Li, Rong Jin
Clustering is to assign each instance a pseudo label that will be used to learn representations in discrimination.
Ranked #6 on Unsupervised Image Classification on CIFAR-10
no code implementations • 20 Jun 2020 • Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin
Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants.
1 code implementation • ICCV 2021 • Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Juhua Hu
To mitigate this challenge, we propose an algorithm to learn the fine-grained patterns for the target task, when only its coarse-class labels are available.
no code implementations • 6 Jul 2018 • Yuanhong Xu, Pei Dong, Junyu Dong, Lin Qi
Obtaining dense 3D reconstrution with low computational cost is one of the important goals in the field of SLAM.