no code implementations • COLING 2022 • Jingyuan Wen, Yutian Luo, Nanyi Fei, Guoxing Yang, Zhiwu Lu, Hao Jiang, Jie Jiang, Zhao Cao
In few-shot text classification, a feasible paradigm for deploying VL-PTMs is to align the input samples and their category names via the text encoders.
no code implementations • 15 Dec 2024 • Zelong Sun, Dong Jing, Guoxing Yang, Nanyi Fei, Zhiwu Lu
Composed Image Retrieval (CIR) aims to retrieve target images from candidate set using a hybrid-modality query consisting of a reference image and a relative caption that describes the user intent.
1 code implementation • 16 Nov 2024 • Jinqiang Long, Yanqi Dai, Guoxing Yang, Hongpeng Lin, Nanyi Fei, Yizhao Gao, Zhiwu Lu
As the research of Multimodal Large Language Models (MLLMs) becomes popular, an advancing MLLM model is typically required to handle various textual and visual tasks (e. g., VQA, Detection, OCR, and ChartQA) simultaneously for real-world applications.
Optical Character Recognition (OCR)
Visual Question Answering (VQA)
1 code implementation • 8 Aug 2024 • Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu
To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comprehensive framework, MMRole, for their development and evaluation, which comprises a personalized multimodal dataset and a robust evaluation method.
no code implementations • 7 Mar 2024 • Yanqi Dai, Dong Jing, Nanyi Fei, Zhiwu Lu
To mitigate this issue, we propose a novel Comprehensive Task Balancing (CoTBal) algorithm for multi-task visual instruction tuning of LMMs.
no code implementations • CVPR 2024 • Yutian Luo, Shiqi Zhao, Haoran Wu, Zhiwu Lu
To tackle these challenges we introduce DECO (Dual-Enhanced Coreset Selection with Class-wise Collaboration) an approach that starts by establishing a class-wise balanced memory to address data imbalances followed by a tailored class-wise gradient-based similarity scoring system for refined coreset selection strategies with reasonable score guidance to all classes.
1 code implementation • 28 Jul 2023 • Yanqi Dai, Nanyi Fei, Zhiwu Lu
In this paper, following the loss balancing framework, we propose two novel improvable gap balancing (IGB) algorithms for MTL: one takes a simple heuristic, and the other (for the first time) deploys deep reinforcement learning for MTL.
1 code implementation • 22 May 2023 • Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding
We also propose a unified spatial-temporal mask modeling mechanism, seamlessly integrated with the model, to cater to diverse video generation scenarios.
2 code implementations • 13 Feb 2023 • Haoyu Lu, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Wei Zhan, Masayoshi Tomizuka, Mingyu Ding
Particularly, on the MSRVTT retrieval task, UniAdapter achieves 49. 7% recall@1 with 2. 2% model parameters, outperforming the latest competitors by 2. 0%.
1 code implementation • 14 Jan 2023 • Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu
Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.
1 code implementation • 6 Jan 2023 • Chuhao Jin, Hongteng Xu, Ruihua Song, Zhiwu Lu
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience.
no code implementations • 23 Sep 2022 • Haoyu Lu, Mingyu Ding, Nanyi Fei, Yuqi Huo, Zhiwu Lu
However, this hypothesis often fails for two reasons: (1) With the rich semantics of video contents, it is difficult to cover all frames with a single video-level description; (2) A raw video typically has noisy/meaningless information (e. g., scenery shot, transition or teaser).
4 code implementations • 12 Sep 2022 • Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen
Although artificial intelligence (AI) has made significant progress in understanding molecules in a wide range of fields, existing models generally acquire the single cognitive ability from the single molecular modality.
Ranked #12 on
Molecule Captioning
on ChEBI-20
1 code implementation • 17 Aug 2022 • Haoyu Lu, Qiongyi Zhou, Nanyi Fei, Zhiwu Lu, Mingyu Ding, Jingyuan Wen, Changde Du, Xin Zhao, Hao Sun, Huiguang He, Ji-Rong Wen
Further, from the perspective of neural encoding (based on our foundation model), we find that both visual and lingual encoders trained multimodally are more brain-like compared with unimodal ones.
no code implementations • CVPR 2022 • Haoyu Lu, Nanyi Fei, Yuqi Huo, Yizhao Gao, Zhiwu Lu, Ji-Rong Wen
Under a fair comparison setting, our COTS achieves the highest performance among all two-stream methods and comparable performance (but with 10, 800X faster in inference) w. r. t.
Ranked #24 on
Video Retrieval
on MSR-VTT
no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.
no code implementations • NeurIPS 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Nanyi Fei, Zhiwu Lu, Ji-Rong Wen, Ping Luo
To enhance the representation ability of the motion vectors, hence the effectiveness of our method, we design a cross guidance contrastive learning algorithm based on multi-instance InfoNCE loss, where motion vectors can take supervision signals from RGB frames and vice versa.
1 code implementation • 27 Oct 2021 • Nanyi Fei, Zhiwu Lu, Yizhao Gao, Guoxing Yang, Yuqi Huo, Jingyuan Wen, Haoyu Lu, Ruihua Song, Xin Gao, Tao Xiang, Hao Sun, Ji-Rong Wen
To overcome this limitation and take a solid step towards artificial general intelligence (AGI), we develop a foundation model pre-trained with huge multimodal data, which can be quickly adapted for various downstream cognitive tasks.
no code implementations • ICLR 2022 • Jiechao Guan, Zhiwu Lu
Supposing the $n$ training tasks and the new task are sampled from the same environment, traditional meta learning theory derives an error bound on the expected loss over the new task in terms of the empirical training loss, uniformly over the set of all hypothesis spaces.
no code implementations • 29 Sep 2021 • Jiechao Guan, Zhiwu Lu, Yong liu
In particular, we identify that when the number of training task is large, utilizing a prior generated from an informative hyperposterior can achieve the same order of PAC-Bayes-kl bound as that obtained through setting a localized distribution-dependent prior for a novel task.
2 code implementations • CVPR 2021 • Guoxing Yang, Nanyi Fei, Mingyu Ding, Guangzhen Liu, Zhiwu Lu, Tao Xiang
To overcome these limitations, we propose a novel latent space factorization model, called L2M-GAN, which is learned end-to-end and effective for editing both local and global attributes.
no code implementations • 14 Jun 2021 • Xu Han, Zhengyan Zhang, Ning Ding, Yuxian Gu, Xiao Liu, Yuqi Huo, Jiezhong Qiu, Yuan YAO, Ao Zhang, Liang Zhang, Wentao Han, Minlie Huang, Qin Jin, Yanyan Lan, Yang Liu, Zhiyuan Liu, Zhiwu Lu, Xipeng Qiu, Ruihua Song, Jie Tang, Ji-Rong Wen, Jinhui Yuan, Wayne Xin Zhao, Jun Zhu
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success and become a milestone in the field of artificial intelligence (AI).
1 code implementation • CVPR 2021 • Mingyu Ding, Xiaochen Lian, Linjie Yang, Peng Wang, Xiaojie Jin, Zhiwu Lu, Ping Luo
Last, we proposed an efficient fine-grained search strategy to train HR-NAS, which effectively explores the search space, and finds optimal architectures given various tasks and computation resources.
no code implementations • 1 May 2021 • Zhaoxin Fan, Zhenbo Song, Hongyan Liu, Zhiwu Lu, Jun He, Xiaoyong Du
Point cloud-based large scale place recognition is fundamental for many applications like Simultaneous Localization and Mapping (SLAM).
Ranked #2 on
3D Place Recognition
on Oxford RobotCar Dataset
1 code implementation • ICLR 2022 • Mingyu Ding, Yuqi Huo, Haoyu Lu, Linjie Yang, Zhe Wang, Zhiwu Lu, Jingdong Wang, Ping Luo
(4) Thorough studies of NCP on inter-, cross-, and intra-tasks highlight the importance of cross-task neural architecture design, i. e., multitask neural architectures and architecture transferring between different tasks.
2 code implementations • 11 Mar 2021 • Yuqi Huo, Manli Zhang, Guangzhen Liu, Haoyu Lu, Yizhao Gao, Guoxing Yang, Jingyuan Wen, Heng Zhang, Baogui Xu, Weihao Zheng, Zongzheng Xi, Yueqian Yang, Anwen Hu, Jinming Zhao, Ruichen Li, Yida Zhao, Liang Zhang, Yuqing Song, Xin Hong, Wanqing Cui, Danyang Hou, Yingyan Li, Junyi Li, Peiyu Liu, Zheng Gong, Chuhao Jin, Yuchong Sun, ShiZhe Chen, Zhiwu Lu, Zhicheng Dou, Qin Jin, Yanyan Lan, Wayne Xin Zhao, Ruihua Song, Ji-Rong Wen
We further construct a large Chinese multi-source image-text dataset called RUC-CAS-WenLan for pre-training our BriVL model.
Ranked #1 on
Image Retrieval
on RUC-CAS-WenLan
no code implementations • 23 Jan 2021 • Yizhao Gao, Nanyi Fei, Guangzhen Liu, Zhiwu Lu, Tao Xiang, Songfang Huang
First, data augmentations are introduced to both the support and query sets with each sample now being represented as an augmented embedding (AE) composed of concatenated embeddings of both the original and augmented versions.
no code implementations • ICCV 2021 • Nanyi Fei, Yizhao Gao, Zhiwu Lu, Tao Xiang
This means that these methods are prone to the hubness problem, that is, a certain class prototype becomes the nearest neighbor of many test instances regardless which classes they belong to.
no code implementations • 1 Jan 2021 • Yuqi Huo, Mingyu Ding, Haoyu Lu, Zhiwu Lu, Tao Xiang, Ji-Rong Wen, Ziyuan Huang, Jianwen Jiang, Shiwei Zhang, Mingqian Tang, Songfang Huang, Ping Luo
With the constrained jigsaw puzzles, instead of solving them directly, which could still be extremely hard, we carefully design four surrogate tasks that are more solvable but meanwhile still ensure that the learned representation is sensitive to spatiotemporal continuity at both the local and global levels.
1 code implementation • ICLR 2021 • Manli Zhang, Jianhong Zhang, Zhiwu Lu, Tao Xiang, Mingyu Ding, Songfang Huang
Importantly, at the episode-level, two SSL-FSL hybrid learning objectives are devised: (1) The consistency across the predictions of an FSL classifier from different extended episodes is maximized as an episode-level pretext task.
no code implementations • ICLR 2021 • Nanyi Fei, Zhiwu Lu, Tao Xiang, Songfang Huang
Most recent few-shot learning (FSL) approaches are based on episodic training whereby each episode samples few training instances (shots) per class to imitate the test condition.
no code implementations • 2 Dec 2020 • Jiechao Guan, Zhiwu Lu, Tao Xiang, Timothy Hospedales
By transferring knowledge learned from seen/previous tasks, meta learning aims to generalize well to unseen/future tasks.
1 code implementation • CVPR 2021 • Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua, Ji-Rong Wen
VQA models may tend to rely on language bias as a shortcut and thus fail to sufficiently learn the multi-modal knowledge from both vision and language.
1 code implementation • 19 Mar 2020 • An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo
Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples.
no code implementations • 28 Feb 2020 • Jianhong Zhang, Manli Zhang, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
To address this problem, we propose a graph convolutional network (GCN)-based label denoising (LDN) method to remove the irrelevant images.
no code implementations • 11 Feb 2020 • Nanyi Fei, Zhiwu Lu, Yizhao Gao, Jia Tian, Tao Xiang, Ji-Rong Wen
In this paper, we argue that the inter-meta-task relationships should be exploited and those tasks are sampled strategically to assist in meta-learning.
no code implementations • 6 Feb 2020 • Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
Specifically, armed with a set transformer based attention module, we construct each episode with two sub-episodes without class overlap on the seen classes to simulate the domain shift between the seen and unseen classes.
2 code implementations • CVPR 2020 • Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, Ping Luo
3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information.
Ranked #17 on
Vehicle Pose Estimation
on KITTI Cars Hard
no code implementations • 28 Nov 2019 • Mingyu Ding, Zhe Wang, Bolei Zhou, Jianping Shi, Zhiwu Lu, Ping Luo
Moreover, our framework is able to utilize both labeled and unlabeled frames in the video through joint training, while no additional calculation is required in inference.
no code implementations • 27 Aug 2019 • Yuqi Huo, Xiaoli Xu, Yao Lu, Yulei Niu, Zhiwu Lu, Ji-Rong Wen
In addition to motion vectors, we also provide a temporal fusion method to explicitly induce the temporal context.
no code implementations • 8 Jul 2019 • Yulei Niu, Hanwang Zhang, Zhiwu Lu, Shih-Fu Chang
Specifically, our framework exploits the reciprocal relation between the referent and context, i. e., either of them influences estimation of the posterior distribution of the other, and thereby the search space of context can be greatly reduced.
no code implementations • CVPR 2019 • Mingyu Ding, An Zhao, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
To address the training data scarcity problem, our FFCSN model is trained with both meta learning and adversarial learning.
no code implementations • 11 Dec 2018 • Nanyi Fei, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
The standard approach to ZSL requires a set of training images annotated with seen class labels and a semantic descriptor for seen/unseen classes (attribute vector is the most widely used).
1 code implementation • CVPR 2019 • Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen
Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.
Ranked #13 on
Visual Dialog
on VisDial v0.9 val
no code implementations • 19 Oct 2018 • Zhiwu Lu, Jiechao Guan, Aoxue Li, Tao Xiang, An Zhao, Ji-Rong Wen
Specifically, we assume that each synthesised data point can belong to any unseen class; and the most likely two class candidates are exploited to learn a robust projection function in a competitive fashion.
no code implementations • NeurIPS 2018 • An Zhao, Mingyu Ding, Jiechao Guan, Zhiwu Lu, Tao Xiang, Ji-Rong Wen
This is made possible by learning a projection between a feature space and a semantic space (e. g. attribute space).
no code implementations • 19 Oct 2018 • Aoxue Li, Zhiwu Lu, Jiechao Guan, Tao Xiang, Li-Wei Wang, Ji-Rong Wen
Inspired by the fact that an unseen class is not exactly `unseen' if it belongs to the same superclass as a seen class, we propose a novel inductive ZSL model that leverages superclasses as the bridge between seen and unseen classes to narrow the domain gap.
no code implementations • 8 Oct 2017 • Yanlei Yu, Zhiwu Lu, Jiajun Liu, Guoping Zhao, Ji-Rong Wen, Kai Zheng
We propose a novel network representations learning model framework called RUM (network Representation learning throUgh Multi-level structural information preservation).
no code implementations • 5 Sep 2017 • Yulei Niu, Zhiwu Lu, Ji-Rong Wen, Tao Xiang, Shih-Fu Chang
In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept; 2) how to annotate an image with the optimal number of class labels.
no code implementations • 4 Jul 2017 • Aoxue Li, Zhiwu Lu, Li-Wei Wang, Tao Xiang, Xinqi Li, Ji-Rong Wen
In this paper, to address the two issues, we propose a two-phase framework for recognizing images from unseen fine-grained classes, i. e. zero-shot fine-grained classification.
no code implementations • 19 Feb 2015 • Zhenyong Fu, Zhiwu Lu
As one of the most important types of (weaker) supervised information in machine learning and pattern recognition, pairwise constraint, which specifies whether a pair of data points occur together, has recently received significant attention, especially the problem of pairwise constraint propagation.
no code implementations • 18 Jan 2015 • Zhiwu Lu, Li-Wei Wang, Ji-Rong Wen
This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification.
no code implementations • 18 Jan 2015 • Zhiwu Lu, Li-Wei Wang
This paper presents a graph-based learning approach to pairwise constraint propagation on multi-view data.
no code implementations • 7 Mar 2014 • Zhiwu Lu, Zhen-Yong Fu, Tao Xiang, Li-Wei Wang, Ji-Rong Wen
By oversegmenting all the images into regions, we formulate noisily tagged image parsing as a weakly supervised sparse learning problem over all the regions, where the initial labels of each region are inferred from image-level labels.
no code implementations • 22 Sep 2011 • Zhiwu Lu, Horace H. S. Ip, Yuxin Peng
This paper presents a novel pairwise constraint propagation approach by decomposing the challenging constraint propagation problem into a set of independent semi-supervised learning subproblems which can be solved in quadratic time using label propagation based on k-nearest neighbor graphs.