no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
Temporal modeling is crucial for various video learning tasks.
no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu
One-shot object detection aims at detecting novel objects according to merely one given instance.
1 code implementation • CVPR 2022 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
It significantly improves the performance of several classic contrastive learning models in downstream tasks.
no code implementations • 29 Sep 2021 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.
no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu
By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.
no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu
Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.
Ranked #23 on
Self-Supervised Action Recognition
on UCF101
no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu
However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.
1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li
The recent research for person re-identification has been focused on two trends.
5 code implementations • 2 Aug 2017 • Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, Debin Zhao
The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end.