Search Results for author: Zhiwu Lu

Found 49 papers, 17 papers with code

Visual Prompt Tuning for Few-Shot Text Classification

no code implementations COLING 2022 Jingyuan Wen, Yutian Luo, Nanyi Fei, Guoxing Yang, Zhiwu Lu, Hao Jiang, Jie Jiang, Zhao Cao

In few-shot text classification, a feasible paradigm for deploying VL-PTMs is to align the input samples and their category names via the text encoders.

Few-Shot Learning Few-Shot Text Classification +3

VDT: An Empirical Study on Video Diffusion with Transformers

1 code implementation22 May 2023 Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding

This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers in diffusion-based video generation.

Autonomous Driving Video Generation

UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling

2 code implementations13 Feb 2023 Haoyu Lu, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Wei Zhan, Masayoshi Tomizuka, Mingyu Ding

Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49. 7% recall@1 with 2. 2% model parameters, outperforming the latest competitors by 2. 0%.

Retrieval Text Retrieval +3

TikTalk: A Multi-Modal Dialogue Dataset for Real-World Chitchat

1 code implementation14 Jan 2023 Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu

Compared with previous image-based dialogue datasets, the richer sources of context in TikTalk lead to a greater diversity of conversations.

Text2Poster: Laying out Stylized Texts on Retrieved Images

1 code implementation6 Jan 2023 Chuhao Jin, Hongteng Xu, Ruihua Song, Zhiwu Lu

Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.

Image Retrieval Layout Design +1

LGDN: Language-Guided Denoising Network for Video-Language Modeling

no code implementations23 Sep 2022 Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu

However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e. g., scenery shot, transition or teaser).

Denoising Language Modelling

A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language

1 code implementation12 Sep 2022 Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen

Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.

Contrastive Learning Cross-Modal Retrieval +3

Multimodal foundation models are better simulators of the human brain

1 code implementation17 Aug 2022 Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen

Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

no code implementations CVPR 2022 Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu, Ji-Rong Wen

Under a fair comparison setting, our COTS achieves the highest performance among all two-stream methods and comparable performance (but with 10, 800X faster in inference) w. r. t.

Contrastive Learning Cross-Modal Retrieval +5

Compressed Video Contrastive Learning

no code implementations NeurIPS 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo

To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.

Contrastive Learning Representation Learning

Towards artificial general intelligence via a multimodal foundation model

1 code implementation27 Oct 2021 Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen

To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.

Image Classification Reading Comprehension +2

Improved Generalization Risk Bounds for Meta-Learning with PAC-Bayes-kl Analysis

no code implementations29 Sep 2021 Jiechao Guan, Zhiwu Lu, Yong liu

In particular, we identify that when the number of training task is large, utilizing a prior generated from an informative hyperposterior can achieve the same order of PAC-Bayes-kl bound as that obtained through setting a localized distribution-dependent prior for a novel task.

Generalization Bounds Learning Theory +1

Task Relatedness-Based Generalization Bounds for Meta Learning

no code implementations ICLR 2022 Jiechao Guan, Zhiwu Lu

Supposing the $n$ training tasks and the new task are sampled from the same environment, traditional meta learning theory derives an error bound on the expected loss over the new task in terms of the empirical training loss, uniformly over the set of all hypothesis spaces.

Generalization Bounds Learning Theory +2

L2M-GAN: Learning To Manipulate Latent Space Semantics for Facial Attribute Editing

2 code implementations CVPR 2021 Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang

To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes.


HR-NAS: Searching Efficient High-Resolution Neural Architectures with Lightweight Transformers

1 code implementation CVPR 2021 Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo

Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.

Image Classification Neural Architecture Search +3

Learning Versatile Neural Architectures by Propagating Network Codes

1 code implementation ICLR 2022 Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo

(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.

Image Segmentation Neural Architecture Search +2

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

no code implementations23 Jan 2021 Yizhao Gao, Nanyi Fei, Guangzhen Liu, Zhiwu Lu, Tao Xiang, Songfang Huang

First, data augmentations are introduced to both the support and query sets with each sample now being represented as an augmented embedding (AE) composed of concatenated embeddings of both the original and augmented versions.

Few-Shot Learning

IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning

1 code implementation ICLR 2021 Manli Zhang, Jianhong Zhang, Zhiwu Lu, Tao Xiang, Mingyu Ding, Songfang Huang

Importantly, at the episode-level, two SSL-FSL hybrid learning objectives are devised: (1) The consistency across the predictions of an FSL classifier from different extended episodes is maximized as an episode-level pretext task.

Few-Shot Learning Self-Supervised Learning +1

MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

no code implementations ICLR 2021 Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang

Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition.

Few-Shot Learning

Self-Supervised Video Representation Learning with Constrained Spatiotemporal Jigsaw

no code implementations1 Jan 2021 Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo

With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.

Representation Learning

Z-Score Normalization, Hubness, and Few-Shot Learning

no code implementations ICCV 2021 Nanyi Fei, Yizhao Gao, Zhiwu Lu, Tao Xiang

This means that these methods are prone to the hubness problem, that is, a certain class prototype becomes the nearest neighbor of many test instances regardless which classes they belong to.

Few-Shot Learning

Margin-Based Transfer Bounds for Meta Learning with Deep Feature Embedding

no code implementations2 Dec 2020 Jiechao Guan, Zhiwu Lu, Tao Xiang, Timothy Hospedales

By transferring knowledge learned from seen/previous tasks, meta learning aims to generalize well to unseen/future tasks.

Classification General Classification +2

Counterfactual VQA: A Cause-Effect Look at Language Bias

1 code implementation CVPR 2021 Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen

VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.

Counterfactual Inference Question Answering +1

Domain-Adaptive Few-Shot Learning

1 code implementation19 Mar 2020 An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.

Domain Adaptation Few-Shot Learning

AdarGCN: Adaptive Aggregation GCN for Few-Shot Learning

no code implementations28 Feb 2020 Jianhong Zhang, Manli Zhang, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images.

Denoising Few-Shot Learning +1

Meta-Learning across Meta-Tasks for Few-Shot Learning

no code implementations11 Feb 2020 Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen

In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning.

Domain Adaptation Few-Shot Learning +1

Few-Shot Learning as Domain Adaptation: Algorithm and Analysis

no code implementations6 Feb 2020 Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

Specifically, armed with a set transformer based attention module, we construct each episode with two sub-episodes without class overlap on the seen classes to simulate the domain shift between the seen and unseen classes.

Domain Adaptation Few-Shot Image Classification +1

Every Frame Counts: Joint Learning of Video Segmentation and Optical Flow

no code implementations28 Nov 2019 Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo

Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.

Optical Flow Estimation Semantic Segmentation +2

Mobile Video Action Recognition

no code implementations27 Aug 2019 Yuqi Huo, Xiaoli Xu, Yao Lu, Yulei Niu, Zhiwu Lu, Ji-Rong Wen

In addition to motion vectors, we also provide a temporal fusion method to explicitly induce the temporal context.

Action Recognition Temporal Action Localization

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions

no code implementations8 Jul 2019 Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang

Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.

Multiple Instance Learning Referring Expression

Zero-Shot Learning with Sparse Attribute Propagation

no code implementations11 Dec 2018 Nanyi Fei, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen

The standard approach to ZSL requires a set of training images annotated with seen class labels and a semantic descriptor for seen/unseen classes (attribute vector is the most widely used).

Image Retrieval Zero-Shot Learning

Recursive Visual Attention in Visual Dialog

1 code implementation CVPR 2019 Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Question Answering Visual Dialog +1

Transferrable Feature and Projection Learning with Class Hierarchy for Zero-Shot Learning

no code implementations19 Oct 2018 Aoxue Li, Zhiwu Lu, Jiechao Guan, Tao Xiang, Li-Wei Wang, Ji-Rong Wen

Inspired by the fact that an unseen class is not exactly `unseen' if it belongs to the same superclass as a seen class, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap.

Few-Shot Learning Zero-Shot Learning

Zero and Few Shot Learning with Semantic Feature Synthesis and Competitive Learning

no code implementations19 Oct 2018 Zhiwu Lu, Jiechao Guan, Aoxue Li, Tao Xiang, An Zhao, Ji-Rong Wen

Specifically, we assume that each synthesised data point can belong to any unseen class; and the most likely two class candidates are exploited to learn a robust projection function in a competitive fashion.

Few-Shot Learning Zero-Shot Learning

RUM: network Representation learning throUgh Multi-level structural information preservation

no code implementations8 Oct 2017 Yanlei Yu, Zhiwu Lu, Jiajun Liu, Guoping Zhao, Ji-Rong Wen, Kai Zheng

We propose a novel network representations learning model framework called RUM (network Representation learning throUgh Multi-level structural information preservation).

Representation Learning

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

no code implementations5 Sep 2017 Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang

In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels.

Zero-Shot Fine-Grained Classification by Deep Feature Learning with Semantics

no code implementations4 Jul 2017 Aoxue Li, Zhiwu Lu, Li-Wei Wang, Tao Xiang, Xinqi Li, Ji-Rong Wen

In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i. e. zero-shot fine-grained classification.

Classification Domain Adaptation +3

Pairwise Constraint Propagation: A Survey

no code implementations19 Feb 2015 Zhenyong Fu, Zhiwu Lu

As one of the most important types of (weaker) supervised information in machine learning and pattern recognition, pairwise constraint, which specifies whether a pair of data points occur together, has recently received significant attention, especially the problem of pairwise constraint propagation.

Image classification by visual bag-of-words refinement and reduction

no code implementations18 Jan 2015 Zhiwu Lu, Li-Wei Wang, Ji-Rong Wen

This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification.

Classification General Classification +1

Pairwise Constraint Propagation on Multi-View Data

no code implementations18 Jan 2015 Zhiwu Lu, Li-Wei Wang

This paper presents a graph-based learning approach to pairwise constraint propagation on multi-view data.

graph construction Retrieval

Can Image-Level Labels Replace Pixel-Level Labels for Image Parsing

no code implementations7 Mar 2014 Zhiwu Lu, Zhen-Yong Fu, Tao Xiang, Li-Wei Wang, Ji-Rong Wen

By oversegmenting all the images into regions, we formulate noisily tagged image parsing as a weakly supervised sparse learning problem over all the regions, where the initial labels of each region are inferred from image-level labels.

Sparse Learning

Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications

1 code implementation22 Sep 2011 Zhiwu Lu, Horace H. S. Ip, Yuxin Peng

This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs.


Cannot find the paper you are looking for? You can Submit a new open access paper.