1 code implementation • 7 Sep 2024 • Sai Yang, Bin Hu, Bojun Zhou, Fan Liu, Xiaoxin Wu, Xinsong Zhang, Juping Gu, Jun Zhou
To circumvent this problem, we propose a new task of Power Line Aerial Image Restoration under Adverse Weather (PLAIR-AW), which aims to recover clean and high-quality images from degraded images with bad weather thus improving detection performance for PLAI.
1 code implementation • 12 Jan 2023 • Xinsong Zhang, Yan Zeng, Jipeng Zhang, Hang Li
X-FM has one language encoder, one vision encoder, and one fusion encoder, as well as a new training method.
Ranked #3 on
Visual Reasoning
on NLVR2 Test
2 code implementations • 22 Nov 2022 • Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou
Vision language pre-training aims to learn alignments between vision and language from a large amount of data.
Ranked #1 on
Cross-Modal Retrieval
on Flickr30k
(using extra training data)
1 code implementation • 14 Oct 2022 • Tiannan Wang, Wangchunshu Zhou, Yan Zeng, Xinsong Zhang
Pre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks.
1 code implementation • 15 Jun 2022 • Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang
In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.
1 code implementation • 1 Jun 2022 • Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang
To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.
1 code implementation • 30 May 2022 • Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang
We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.
1 code implementation • 16 Nov 2021 • Yan Zeng, Xinsong Zhang, Hang Li
Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts.
Ranked #1 on
Image Retrieval
on Flickr30K 1K test
(using extra training data)
no code implementations • Findings of the Association for Computational Linguistics 2020 • Pengshuai Li, Xinsong Zhang, Weijia Jia, Wei Zhao
Distant supervision has been a widely used method for neural relation extraction for its convenience of automatically labeling datasets.
no code implementations • Findings (ACL) 2021 • Xinsong Zhang, Pengshuai Li, Hang Li
In fact, both fine-grained and coarse-grained tokenizations have advantages and disadvantages for learning of pre-trained language models.
no code implementations • NAACL 2019 • Pengshuai Li, Xinsong Zhang, Weijia Jia, Hai Zhao
Distant supervision has been widely used in relation extraction tasks without hand-labeled datasets recently.
no code implementations • 11 Nov 2018 • Xinsong Zhang, Pengshuai Li, Weijia Jia, Hai Zhao
To disclose overlapped multiple relations from a sentence still keeps challenging.
no code implementations • EMNLP 2018 • Tianyi Liu, Xinsong Zhang, Wanhao Zhou, Weijia Jia
Extracting relations is critical for knowledge base completion and construction in which distant supervised methods are widely used to extract relational facts automatically with the existing knowledge bases.
Ranked #1 on
Relationship Extraction (Distant Supervised)
on New York Times Corpus
(Average Precision metric)