1 code implementation • CVPR 2023 • Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Wenqiang Zhang
Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e. g., +22. 4 T2V and +28. 0 V2T R@1 improvements on How2Sign dataset, and +13. 7 T2V and +17. 1 V2T R@1 improvements on PHOENIX-2014T dataset.
Ranked #1 on
Sign Language Retrieval
on CSL-Daily
1 code implementation • CVPR 2023 • Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu
The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.
1 code implementation • CVPR 2023 • Ronglai Zuo, Fangyun Wei, Brian Mak
Sign languages are visual languages which convey information by signers' handshape, facial expression, body movement, and so forth.
Ranked #1 on
Sign Language Recognition
on WLASL-2000
1 code implementation • 15 Mar 2023 • Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu
Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.
2 code implementations • CVPR 2023 • Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai
A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.
1 code implementation • CVPR 2023 • Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu
Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.
no code implementations • CVPR 2023 • Meng Cao, Fangyun Wei, Can Xu, Xiubo Geng, Long Chen, Can Zhang, Yuexian Zou, Tao Shen, Daxin Jiang
Weakly-Supervised Video Grounding (WSVG) aims to localize events of interest in untrimmed videos with only video-level annotations.
no code implementations • 16 Dec 2022 • Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang
To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.
1 code implementation • 2 Nov 2022 • Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak
RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.
no code implementations • 12 Oct 2022 • Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong
To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN.
no code implementations • 18 Jul 2022 • Xiaokang Chen, Fangyun Wei, Gang Zeng, Jingdong Wang
Inspired by Conditional DETR, an improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, we reformulate the object query into the format of the box query that is a composition of the embeddings of the reference point and the transformation of the box with respect to the reference point.
1 code implementation • 8 Jul 2022 • Yu Du, Miaojing Shi, Fangyun Wei, Guoqi Li
In this paper, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images.
1 code implementation • 7 Apr 2022 • Tony Huang, Jack Chu, Fangyun Wei
In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models.
1 code implementation • CVPR 2022 • Minghao Chen, Fangyun Wei, Chong Li, Deng Cai
In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner.
1 code implementation • CVPR 2022 • Yu Du, Fangyun Wei, Zihe Zhang, Miaojing Shi, Yue Gao, Guoqi Li
In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model.
3 code implementations • CVPR 2022 • Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, Stephen Lin
Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.
Ranked #2 on
Sign Language Translation
on CSL-Daily
1 code implementation • 29 Dec 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai
However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.
Ranked #1 on
Open Vocabulary Semantic Segmentation
on Cityscapes
no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin
Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.
1 code implementation • 22 Nov 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.
1 code implementation • NeurIPS 2021 • Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai
We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.
1 code implementation • NeurIPS 2021 • Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, LiWei Wang
The confidence bank is leveraged as an indicator to tilt training towards under-performing categories, instantiated in three strategies: 1) adaptive Copy-Paste and CutMix data augmentation approaches which give more chance for under-performing categories to be copied or cut; 2) an adaptive data sampling approach to encourage pixels from under-performing category to be sampled; 3) a simple yet effective re-weighting method to alleviate the training noise raised by pseudo-labeling.
no code implementations • 29 Sep 2021 • Qiwei Ye, Yuxuan Song, Chang Liu, Fangyun Wei, Tao Qin, Tie-Yan Liu
Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification.
Ranked #1 on
MuJoCo Games
on Ant-v3
no code implementations • 29 Sep 2021 • Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.
no code implementations • ICCV 2021 • Yangyu Huang, Hao Yang, Chong Li, Jongyoo Kim, Fangyun Wei
On the other hand, AAM is an attention module which can get anisotropic attention mask focusing on the region of point and its local edge connected by adjacent points, it has a stronger response in tangent than in normal, which means relaxed constraints in the tangent.
Ranked #3 on
Face Alignment
on 300W
1 code implementation • ICCV 2021 • Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang
In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary in terms of image translation and SSL, we propose a novel dual path learning (DPL) framework to alleviate visual inconsistency.
6 code implementations • ICCV 2021 • Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
Ranked #4 on
Semi-Supervised Object Detection
on COCO 100% labeled data
(using extra training data)
1 code implementation • NeurIPS 2021 • Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
no code implementations • CVPR 2021 • Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, Zhouhui Lian
However, we observe that the generator tends to find a tricky way to hide information from the original image to satisfy the constraint of cycle consistency, making it impossible to maintain the rich details (e. g., wrinkles and moles) of non-editing areas.
3 code implementations • 24 Dec 2020 • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.
Ranked #36 on
Instance Segmentation
on COCO test-dev
2 code implementations • NeurIPS 2020 • Cheng Chi, Fangyun Wei, Han Hu
The proposed module is named \emph{bridging visual representations} (BVR).
Ranked #63 on
Object Detection
on COCO test-dev
1 code implementation • NeurIPS 2020 • Yukuan Yang, Fangyun Wei, Miaojing Shi, Guoqi Li
In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives.
1 code implementation • ECCV 2020 • Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin
A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.
no code implementations • ECCV 2020 • Xiao Yang, Fangyun Wei, Hongyang Zhang, Jun Zhu
We consider universal adversarial patches for faces -- small visual elements whose addition to a face image reliably destroys the performance of face detectors.
9 code implementations • 25 Apr 2019 • Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu
In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation.
Ranked #54 on
Instance Segmentation
on COCO test-dev
no code implementations • 19 Feb 2019 • Chen Change Loy, Dahua Lin, Wanli Ouyang, Yuanjun Xiong, Shuo Yang, Qingqiu Huang, Dongzhan Zhou, Wei Xia, Quanquan Li, Ping Luo, Junjie Yan, Jian-Feng Wang, Zuoxin Li, Ye Yuan, Boxun Li, Shuai Shao, Gang Yu, Fangyun Wei, Xiang Ming, Dong Chen, Shifeng Zhang, Cheng Chi, Zhen Lei, Stan Z. Li, Hongkai Zhang, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen, Wu Liu, Boyan Zhou, Huaxiong Li, Peng Cheng, Tao Mei, Artem Kukharenko, Artem Vasenin, Nikolay Sergievskiy, Hua Yang, Liangqi Li, Qiling Xu, Yuan Hong, Lin Chen, Mingjun Sun, Yirong Mao, Shiying Luo, Yongjun Li, Ruiping Wang, Qiaokang Xie, Ziyang Wu, Lei Lu, Yiheng Liu, Wengang Zhou
This paper presents a review of the 2018 WIDER Challenge on Face and Pedestrian.