Search Results for author: Fangyun Wei

Found 45 papers, 33 papers with code

Rethinking Generative Large Language Model Evaluation for Semantic Comprehension

no code implementations12 Mar 2024 Fangyun Wei, Xi Chen, Lin Luo

Through a comprehensive evaluation of 24 models across 11 benchmarks, we highlight several potential drawbacks of MCQA, for instance, the inconsistency between the MCQA evaluation and the generation of open-ended responses in practical scenarios.

Language Modelling Large Language Model +2

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

1 code implementation12 Mar 2024 Lei Zhu, Fangyun Wei, Yanye Lu

To achieve this, we present the Vision-to-Language Tokenizer, abbreviated as V2T Tokenizer, which transforms an image into a ``foreign language'' with the combined aid of an encoder-decoder, the LLM vocabulary, and a CLIP model.

Deblurring Image Captioning +5

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

1 code implementation6 Feb 2024 Yu Du, Fangyun Wei, Hongyang Zhang

We also revisit the evaluation protocol introduced by previous works and identify a limitation in this protocol that leads to an artificially high pass rate.

Language Modelling Large Language Model

Towards Online Sign Language Recognition and Translation

1 code implementation10 Jan 2024 Ronglai Zuo, Fangyun Wei, Brian Mak

Our approach comprises three phases: 1) developing a sign language dictionary encompassing all glosses present in a target sign language dataset; 2) training an isolated sign language recognition model on augmented signs using both conventional classification loss and our novel saliency loss; 3) employing a sliding window approach on the input sign sequence and feeding each sign clip to the well-optimized model for online recognition.

Sign Language Recognition speech-recognition +2

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

1 code implementation9 Jan 2024 Ronglai Zuo, Fangyun Wei, Zenggui Chen, Brian Mak, Jiaolong Yang, Xin Tong

The objective of this paper is to develop a functional system for translating spoken languages into sign languages, referred to as Spoken2Sign translation.

Sign Language Translation Translation

RAIN: Your Language Models Can Align Themselves without Finetuning

1 code implementation13 Sep 2023 Yuhui Li, Fangyun Wei, Jinjing Zhao, Chao Zhang, Hongyang Zhang

We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting.

Adversarial Attack

Exploring Non-additive Randomness on ViT against Query-Based Black-Box Attacks

no code implementations12 Sep 2023 Jindong Gu, Fangyun Wei, Philip Torr, Han Hu

In this work, we first taxonomize the stochastic defense strategies against QBBA.

AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections

no code implementations5 Sep 2023 Yue Wu, Sicheng Xu, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong

For the new task, we base our method on the generative radiance manifold representation and equip it with learnable facial and head-shoulder deformations.

Improving Continuous Sign Language Recognition with Cross-Lingual Signs

no code implementations ICCV 2023 Fangyun Wei, Yutong Chen

Experimentally, our approach achieves state-of-the-art performance on two widely-used CSLR datasets: Phoenix-2014 and Phoenix-2014T.

Sign Language Recognition speech-recognition +1

CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

1 code implementation CVPR 2023 Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Wenqiang Zhang

Our framework, termed as domain-aware sign language retrieval via Cross-lingual Contrastive learning or CiCo for short, outperforms the pioneering method by large margins on various datasets, e. g., +22. 4 T2V and +28. 0 V2T R@1 improvements on How2Sign dataset, and +13. 7 T2V and +17. 1 V2T R@1 improvements on PHOENIX-2014T dataset.

Contrastive Learning Retrieval +5

Natural Language-Assisted Sign Language Recognition

1 code implementation CVPR 2023 Ronglai Zuo, Fangyun Wei, Brian Mak

Sign languages are visual languages which convey information by signers' handshape, facial expression, body movement, and so forth.

Sign Language Recognition

Two-shot Video Object Segmentation

1 code implementation CVPR 2023 Kun Yan, Xiao Li, Fangyun Wei, Jinglu Wang, Chenbin Zhang, Ping Wang, Yan Lu

The underlying idea is to generate pseudo labels for unlabeled frames during training and to optimize the model on the combination of labeled and pseudo-labeled data.

Object Pseudo Label +5

DeepMIM: Deep Supervision for Masked Image Modeling

1 code implementation15 Mar 2023 Sucheng Ren, Fangyun Wei, Samuel Albanie, Zheng Zhang, Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification in the early deep learning era since it significantly reduces the training difficulty and eases the optimization like avoiding gradient vanish over the vanilla training.

Image Classification object-detection +2

Side Adapter Network for Open-Vocabulary Semantic Segmentation

3 code implementations CVPR 2023 Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai

A side network is attached to a frozen CLIP model with two branches: one for predicting mask proposals, and the other for predicting attention bias which is applied in the CLIP model to recognize the class of masks.

Language Modelling Open Vocabulary Semantic Segmentation +3

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

2 code implementations CVPR 2023 Sucheng Ren, Fangyun Wei, Zheng Zhang, Han Hu

Our TinyMIM model of tiny size achieves 79. 6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget.

Image Classification Semantic Segmentation

Iterative Proposal Refinement for Weakly-Supervised Video Grounding

no code implementations CVPR 2023 Meng Cao, Fangyun Wei, Can Xu, Xiubo Geng, Long Chen, Can Zhang, Yuexian Zou, Tao Shen, Daxin Jiang

Weakly-Supervised Video Grounding (WSVG) aims to localize events of interest in untrimmed videos with only video-level annotations.

Sentence Video Grounding

Attentive Mask CLIP

1 code implementation ICCV 2023 Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.

Contrastive Learning Retrieval +1

Two-Stream Network for Sign Language Recognition and Translation

1 code implementation2 Nov 2022 Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.

Sign Language Recognition Sign Language Translation +2

AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars

1 code implementation12 Oct 2022 Yue Wu, Yu Deng, Jiaolong Yang, Fangyun Wei, Qifeng Chen, Xin Tong

To achieve meaningful control over facial expressions via deformation, we propose a 3D-level imitative learning scheme between the generator and a parametric 3D face model during adversarial training of the 3D-aware GAN.

Disentanglement Face Model +1

Conditional DETR V2: Efficient Detection Transformer with Box Queries

no code implementations18 Jul 2022 Xiaokang Chen, Fangyun Wei, Gang Zeng, Jingdong Wang

Inspired by Conditional DETR, an improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, we reformulate the object query into the format of the box query that is a composition of the embeddings of the reference point and the transformation of the box with respect to the reference point.

Object object-detection +1

Boosting Zero-shot Learning via Contrastive Optimization of Attribute Representations

1 code implementation8 Jul 2022 Yu Du, Miaojing Shi, Fangyun Wei, Guoqi Li

In this paper, we propose a new framework to boost ZSL by explicitly learning attribute prototypes beyond images and contrastively optimizing them with attribute-level features within images.

Attribute Zero-Shot Learning

Unsupervised Prompt Learning for Vision-Language Models

1 code implementation7 Apr 2022 Tony Huang, Jack Chu, Fangyun Wei

In this paper, we explore a different scenario, in which the labels of the target datasets are unprovided, and we present an unsupervised prompt learning (UPL) approach to avoid prompt engineering while simultaneously improving transfer performance of CLIP-like vision-language models.

Prompt Engineering Transfer Learning

Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning

1 code implementation CVPR 2022 Minghao Chen, Fangyun Wei, Chong Li, Deng Cai

In this paper, we introduce a novel contrastive action representation learning (CARL) framework to learn frame-wise action representations, especially for long videos, in a self-supervised manner.

Action Classification Contrastive Learning +4

Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model

1 code implementation CVPR 2022 Yu Du, Fangyun Wei, Zihe Zhang, Miaojing Shi, Yue Gao, Guoqi Li

In this paper, we introduce a novel method, detection prompt (DetPro), to learn continuous prompt representations for open-vocabulary object detection based on the pre-trained vision-language model.

Image Classification Language Modelling +5

A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation

4 code implementations CVPR 2022 Yutong Chen, Fangyun Wei, Xiao Sun, Zhirong Wu, Stephen Lin

Concretely, we pretrain the sign-to-gloss visual network on the general domain of human actions and the within-domain of a sign-to-gloss dataset, and pretrain the gloss-to-text translation network on the general domain of a multilingual corpus and the within-domain of a gloss-to-text corpus.

Sign Language Recognition Sign Language Translation +2

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

2 code implementations29 Dec 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Han Hu, Xiang Bai

However, semantic segmentation and the CLIP model perform on different visual granularity, that semantic segmentation processes on pixels while CLIP performs on images.

Image Classification Language Modelling +8

Cross-Model Pseudo-Labeling for Semi-Supervised Action Recognition

no code implementations CVPR 2022 Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin

Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.

Action Recognition

Towards Tokenized Human Dynamics Representation

1 code implementation22 Nov 2021 Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

For human action understanding, a popular research direction is to analyze short video clips with unambiguous semantic content, such as jumping and drinking.

Action Segmentation Action Understanding +3

Bootstrap Your Object Detector via Mixed Training

1 code implementation NeurIPS 2021 Mengde Xu, Zheng Zhang, Fangyun Wei, Yutong Lin, Yue Cao, Stephen Lin, Han Hu, Xiang Bai

We introduce MixTraining, a new training paradigm for object detection that can improve the performance of existing detectors for free.

Data Augmentation Missing Labels +3

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning

1 code implementation NeurIPS 2021 Hanzhe Hu, Fangyun Wei, Han Hu, Qiwei Ye, Jinshi Cui, LiWei Wang

The confidence bank is leveraged as an indicator to tilt training towards under-performing categories, instantiated in three strategies: 1) adaptive Copy-Paste and CutMix data augmentation approaches which give more chance for under-performing categories to be copied or cut; 2) an adaptive data sampling approach to encourage pixels from under-performing category to be sampled; 3) a simple yet effective re-weighting method to alleviate the training noise raised by pseudo-labeling.

Data Augmentation Semi-Supervised Semantic Segmentation

Particle Based Stochastic Policy Optimization

no code implementations29 Sep 2021 Qiwei Ye, Yuxuan Song, Chang Liu, Fangyun Wei, Tao Qin, Tie-Yan Liu

Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification.

MuJoCo Games Offline RL +2

Self-supervised Discovery of Human Actons from Long Kinematic Videos

no code implementations29 Sep 2021 Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin

However, methods for understanding short semantic actions cannot be directly translated to long kinematic sequences such as dancing, where it becomes challenging even to semantically label the human movements.

Action Understanding Sentence

ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment

1 code implementation ICCV 2021 Yangyu Huang, Hao Yang, Chong Li, Jongyoo Kim, Fangyun Wei

On the other hand, AAM is an attention module which can get anisotropic attention mask focusing on the region of point and its local edge connected by adjacent points, it has a stronger response in tangent than in normal, which means relaxed constraints in the tangent.

Face Alignment

Dual Path Learning for Domain Adaptation of Semantic Segmentation

1 code implementation ICCV 2021 Yiting Cheng, Fangyun Wei, Jianmin Bao, Dong Chen, Fang Wen, Wenqiang Zhang

In this paper, based on the observation that domain adaptation frameworks performed in the source and target domain are almost complementary in terms of image translation and SSL, we propose a novel dual path learning (DPL) framework to alleviate visual inconsistency.

Domain Adaptation Segmentation +4

End-to-End Semi-Supervised Object Detection with Soft Teacher

8 code implementations ICCV 2021 Mengde Xu, Zheng Zhang, Han Hu, JianFeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai, Zicheng Liu

This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.

Instance Segmentation object-detection +4

Aligning Pretraining for Detection via Object-Level Contrastive Learning

1 code implementation NeurIPS 2021 Fangyun Wei, Yue Gao, Zhirong Wu, Han Hu, Stephen Lin

Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.

Contrastive Learning Object +6

High-Fidelity and Arbitrary Face Editing

no code implementations CVPR 2021 Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, Zhouhui Lian

However, we observe that the generator tends to find a tricky way to hide information from the original image to satisfy the constraint of cycle consistency, making it impossible to maintain the rich details (e. g., wrinkles and moles) of non-editing areas.

Attribute Vocal Bursts Intensity Prediction

Global Context Networks

3 code implementations24 Dec 2020 Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

The Non-Local Network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position.

Instance Segmentation Object Detection

Restoring Negative Information in Few-Shot Object Detection

1 code implementation NeurIPS 2020 Yukuan Yang, Fangyun Wei, Miaojing Shi, Guoqi Li

In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives.

Few-Shot Learning Few-Shot Object Detection +4

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

1 code implementation ECCV 2020 Fangyun Wei, Xiao Sun, Hongyang Li, Jingdong Wang, Stephen Lin

A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person.

Instance Segmentation Object +5

Design and Interpretation of Universal Adversarial Patches in Face Detection

no code implementations ECCV 2020 Xiao Yang, Fangyun Wei, Hongyang Zhang, Jun Zhu

We consider universal adversarial patches for faces -- small visual elements whose addition to a face image reliably destroys the performance of face detectors.

Face Detection

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond

9 code implementations25 Apr 2019 Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, Han Hu

In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation.

Instance Segmentation Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.