2 code implementations • 15 Mar 2022 • Guanyu Cai, Yixiao Ge, Binjie Zhang, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, XiaoHu Qie, Jianping Wu, Mike Zheng Shou
Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval.
1 code implementation • CVPR 2023 • Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, XiaoHu Qie, Mike Zheng Shou
In this work, we for the first time introduce an end-to-end video-language model, namely \textit{all-in-one Transformer}, that embeds raw video and textual signals into joint representations using a unified backbone architecture.
Ranked #6 on TGIF-Transition on TGIF-QA (using extra training data)
1 code implementation • 2 Dec 2021 • Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs via aligning the semantics between visual and textual information.
1 code implementation • CVPR 2022 • Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, XiaoHu Qie, Mike Zheng Shou
In this work, we present Object-aware Transformers, an object-centric approach that extends video-language transformer to incorporate object representations.
Ranked #20 on Zero-Shot Video Retrieval on DiDeMo
no code implementations • 27 May 2021 • Guanyu Cai, Lianghua He
In the first stage, we propose the local Lipschitzness regularization as the objective function to align different domains by exploiting intra-domain knowledge, which explores a promising direction for non-adversarial adaptive semantic segmentation.
1 code implementation • ICCV 2021 • Guanyu Cai, Jun Zhang, Xinyang Jiang, Yifei Gong, Lianghua He, Fufu Yu, Pai Peng, Xiaowei Guo, Feiyue Huang, Xing Sun
However, the performance of existing methods suffers in real life since the user is likely to provide an incomplete description of an image, which often leads to results filled with false positives that fit the incomplete description.
2 code implementations • 8 Jan 2021 • Chenyang Gao, Guanyu Cai, Xinyang Jiang, Feng Zheng, Jun Zhang, Yifei Gong, Pai Peng, Xiaowei Guo, Xing Sun
Secondly, a BERT with locality-constrained attention is proposed to obtain representations of descriptions at different scales.
Ranked #15 on Text based Person Retrieval on CUHK-PEDES
1 code implementation • 26 May 2019 • Guanyu Cai, Lianghua He, Mengchu Zhou, Hesham Alhumade, Die Hu
When constructing a deep end-to-end model, to ensure the effectiveness and stability of unsupervised domain adaptation, three critical factors are considered in our proposed optimization strategy, i. e., the sample amount of a target domain, dimension and batchsize of samples.
Ranked #1 on Domain Adaptation on SVNH-to-MNIST
1 code implementation • 25 Jan 2019 • Haifeng Shi, Guanyu Cai, Yuqin Wang, Shaohua Shang, Lianghua He
All the generative paths share the same decoder network while in each path the decoder network is fed with a concatenation of a different pre-computed amplified one-hot vector and the inputted Gaussian noise.
no code implementations • 25 Apr 2018 • Guanyu Cai, Yuqin Wang, Mengchu Zhou, Lianghua He
Domain adaptation is widely used in learning problems lacking labels.