Search Results for author: Nanyi Fei

Found 16 papers, 5 papers with code

Visual Prompt Tuning for Few-Shot Text Classification

no code implementations • COLING 2022 • Jingyuan Wen, Yutian Luo, Nanyi Fei, Guoxing Yang, Zhiwu Lu, Hao Jiang, Jie Jiang, Zhao Cao

In few-shot text classification, a feasible paradigm for deploying VL-PTMs is to align the input samples and their category names via the text encoders.

Few-Shot Learning Few-Shot Text Classification +3

Paper
Add Code

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

no code implementations • 7 Mar 2024 • Yanqi Dai, Dong Jing, Nanyi Fei, Zhiwu Lu

To mitigate this issue, we propose a novel Comprehensive Task Balancing (CoTBal) algorithm for multi-task visual instruction tuning of LMMs.

Instruction Following

Paper
Add Code

Improvable Gap Balancing for Multi-Task Learning

1 code implementation • 28 Jul 2023 • Yanqi Dai, Nanyi Fei, Zhiwu Lu

In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL.

Multi-Task Learning

Paper
Code

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

1 code implementation • 22 May 2023 • Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding

We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.

Autonomous Driving Video Generation +1

149

Paper
Code

LGDN: Language-Guided Denoising Network for Video-Language Modeling

no code implementations • 23 Sep 2022 • Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e. g., scenery shot, transition or teaser).

Denoising Language Modelling

Paper
Add Code

Multimodal foundation models are better simulators of the human brain

1 code implementation • 17 Aug 2022 • Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen

Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.

Paper
Code

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

no code implementations • CVPR 2022 • Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu, Ji-Rong Wen

Under a fair comparison setting, our COTS achieves the highest performance among all two-stream methods and comparable performance (but with 10, 800X faster in inference) w. r. t.

Ranked #23 on Video Retrieval on MSR-VTT

Contrastive Learning Cross-Modal Retrieval +5

Paper
Add Code

A Roadmap for Big Model

no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.

Language Modelling Machine Translation +1

Paper
Add Code

Compressed Video Contrastive Learning

no code implementations • NeurIPS 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.

Contrastive Learning Representation Learning

Paper
Add Code

Towards artificial general intelligence via a multimodal foundation model

1 code implementation • 27 Oct 2021 • Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.

Image Classification Reading Comprehension +2

Paper
Code

L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing

2 code implementations • CVPR 2021 • Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang

To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes.

Attribute Disentanglement

Paper
Code

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

no code implementations • 23 Jan 2021 • Yizhao Gao, Nanyi Fei, Guangzhen Liu, Zhiwu Lu, Tao Xiang, Songfang Huang

First, data augmentations are introduced to both the support and query sets with each sample now being represented as an augmented embedding (AE) composed of concatenated embeddings of both the original and augmented versions.

Few-Shot Learning

Paper
Add Code

MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

no code implementations • ICLR 2021 • Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang

Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition.

Few-Shot Learning

Paper
Add Code

Z-Score Normalization, Hubness, and Few-Shot Learning

no code implementations • ICCV 2021 • Nanyi Fei, Yizhao Gao, Zhiwu Lu, Tao Xiang

This means that these methods are prone to the hubness problem, that is, a certain class prototype becomes the nearest neighbor of many test instances regardless which classes they belong to.

Few-Shot Learning

Paper
Add Code

Meta-Learning across Meta-Tasks for Few-Shot Learning

no code implementations • 11 Feb 2020 • Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen

In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning.

Domain Adaptation Few-Shot Learning +1

Paper
Add Code

Zero-Shot Learning with Sparse Attribute Propagation

no code implementations • 11 Dec 2018 • Nanyi Fei, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

The standard approach to ZSL requires a set of training images annotated with seen class labels and a semantic descriptor for seen/unseen classes (attribute vector is the most widely used).

Attribute Image Retrieval +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.