Search Results for author: Nanyi Fei

Found 16 papers, 5 papers with code

Visual Prompt Tuning for Few-Shot Text Classification

no code implementations COLING 2022 Jingyuan Wen, Yutian Luo, Nanyi Fei, Guoxing Yang, Zhiwu Lu, Hao Jiang, Jie Jiang, Zhao Cao

In few-shot text classification, a feasible paradigm for deploying VL-PTMs is to align the input samples and their category names via the text encoders.

Few-Shot Learning Few-Shot Text Classification +3

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

no code implementations7 Mar 2024 Yanqi Dai, Dong Jing, Nanyi Fei, Zhiwu Lu

To mitigate this issue, we propose a novel Comprehensive Task Balancing (CoTBal) algorithm for multi-task visual instruction tuning of LMMs.

Instruction Following

Improvable Gap Balancing for Multi-Task Learning

1 code implementation28 Jul 2023 Yanqi Dai, Nanyi Fei, Zhiwu Lu

In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL.

Multi-Task Learning

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

1 code implementation22 May 2023 Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding

We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.

Autonomous Driving Video Generation +1

LGDN: Language-Guided Denoising Network for Video-Language Modeling

no code implementations23 Sep 2022 Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e. g., scenery shot, transition or teaser).

Denoising Language Modelling

Multimodal foundation models are better simulators of the human brain

1 code implementation17 Aug 2022 Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen

Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

no code implementations CVPR 2022 Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu, Ji-Rong Wen

Under a fair comparison setting, our COTS achieves the highest performance among all two-stream methods and comparable performance (but with 10, 800X faster in inference) w. r. t.

Contrastive Learning Cross-Modal Retrieval +5

Compressed Video Contrastive Learning

no code implementations NeurIPS 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.

Contrastive Learning Representation Learning

Towards artificial general intelligence via a multimodal foundation model

1 code implementation27 Oct 2021 Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.

Image Classification Reading Comprehension +2

L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing

2 code implementations CVPR 2021 Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang

To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes.

Attribute Disentanglement

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

no code implementations23 Jan 2021 Yizhao Gao, Nanyi Fei, Guangzhen Liu, Zhiwu Lu, Tao Xiang, Songfang Huang

First, data augmentations are introduced to both the support and query sets with each sample now being represented as an augmented embedding (AE) composed of concatenated embeddings of both the original and augmented versions.

Few-Shot Learning

Z-Score Normalization, Hubness, and Few-Shot Learning

no code implementations ICCV 2021 Nanyi Fei, Yizhao Gao, Zhiwu Lu, Tao Xiang

This means that these methods are prone to the hubness problem, that is, a certain class prototype becomes the nearest neighbor of many test instances regardless which classes they belong to.

Few-Shot Learning

MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

no code implementations ICLR 2021 Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang

Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition.

Few-Shot Learning

Meta-Learning across Meta-Tasks for Few-Shot Learning

no code implementations11 Feb 2020 Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen

In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning.

Domain Adaptation Few-Shot Learning +1

Zero-Shot Learning with Sparse Attribute Propagation

no code implementations11 Dec 2018 Nanyi Fei, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

The standard approach to ZSL requires a set of training images annotated with seen class labels and a semantic descriptor for seen/unseen classes (attribute vector is the most widely used).

Attribute Image Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.