1 code implementation • 8 Dec 2022 • Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou
As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data.
1 code implementation • 2 Nov 2022 • An Yang, Junshu Pan, Junyang Lin, Rui Men, Yichang Zhang, Jingren Zhou, Chang Zhou
The tremendous success of CLIP (Radford et al., 2021) has promoted the research and application of contrastive learning for vision-language pretraining.
Ranked #1 on
Zero-shot Image Retrieval
on MUGE Retrieval
2 code implementations • 7 Feb 2022 • Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization.
Ranked #1 on
Referring Expression Comprehension
on RefCoco+
(using extra training data)
no code implementations • 31 May 2021 • An Yang, Junyang Lin, Rui Men, Chang Zhou, Le Jiang, Xianyan Jia, Ang Wang, Jie Zhang, Jiamang Wang, Yong Li, Di Zhang, Wei Lin, Lin Qu, Jingren Zhou, Hongxia Yang
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost, and thus it has become a trend in model scaling.
no code implementations • NeurIPS 2021 • Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations.
1 code implementation • ACL 2021 • Shuhuai Ren, Junyang Lin, Guangxiang Zhao, Rui Men, An Yang, Jingren Zhou, Xu sun, Hongxia Yang
To bridge the semantic gap between the two modalities, previous studies mainly focus on word-region alignment at the object level, lacking the matching between the linguistic relation among the words and the visual relation among the regions.
Ranked #3 on
Image-to-Text Retrieval
on COCO
no code implementations • NeurIPS 2021 • Zhu Zhang, Jianxin Ma, Chang Zhou, Rui Men, Zhikang Li, Ming Ding, Jie Tang, Jingren Zhou, Hongxia Yang
Conditional image synthesis aims to create an image according to some multi-modal guidance in the forms of textual descriptions, reference images, and image blocks to preserve, as well as their combinations.
no code implementations • 1 Mar 2021 • Junyang Lin, Rui Men, An Yang, Chang Zhou, Ming Ding, Yichang Zhang, Peng Wang, Ang Wang, Le Jiang, Xianyan Jia, Jie Zhang, Jianwei Zhang, Xu Zou, Zhikang Li, Xiaodong Deng, Jie Liu, Jinbao Xue, Huiling Zhou, Jianxin Ma, Jin Yu, Yong Li, Wei Lin, Jingren Zhou, Jie Tang, Hongxia Yang
In this work, we construct the largest dataset for multimodal pretraining in Chinese, which consists of over 1. 9TB images and 292GB texts that cover a wide range of domains.
no code implementations • ACL 2016 • Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, Zhi Jin
In this paper, we propose the TBCNN-pair model to recognize entailment and contradiction between two sentences.
Ranked #88 on
Natural Language Inference
on SNLI
no code implementations • 25 Oct 2015 • Lili Mou, Rui Men, Ge Li, Lu Zhang, Zhi Jin
This paper envisions an end-to-end program generation scenario using recurrent neural networks (RNNs): Users can express their intention in natural language; an RNN then automatically generates corresponding code in a characterby-by-character fashion.