Human evaluation also proves that our model is able to generate relevant and informative questions.
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs).
The results demonstrate the effectiveness of our method on logical reasoning over KGs in both inductive and transductive settings.
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance.
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills have shown remarkable performance on downstream complex tasks.
In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE.
1 code implementation • 7 Nov 2022 • Andrey Ignatov, Radu Timofte, Shuai Liu, Chaoyu Feng, Furui Bai, Xiaotao Wang, Lei Lei, Ziyao Yi, Yan Xiang, Zibin Liu, Shaoqing Li, Keming Shi, Dehui Kong, Ke Xu, Minsu Kwon, Yaqi Wu, Jiesi Zheng, Zhihao Fan, Xun Wu, Feng Zhang, Albert No, Minhyeok Cho, Zewen Chen, Xiaze Zhang, Ran Li, Juan Wang, Zhiming Wang, Marcos V. Conde, Ui-Jin Choi, Georgy Perevozchikov, Egor Ershov, Zheng Hui, Mengchuan Dong, Xin Lou, Wei Zhou, Cong Pang, Haina Qin, Mingxuan Cai
The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing.
In this paper, we propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation at each intermediate step, and utilize the inference of the current hop for the next until reasoning out the final result.
These two steps are iteratively performed in our framework for continuous learning.
In MVPTR, we follow the nested structure of both modalities to introduce concepts as high-level semantics.
We propose our TAiloring neGative Sentences with Discrimination and Correction (TAGS-DC) to generate synthetic sentences automatically as negative samples.
We study the problem of coarse-grained response selection in retrieval-based dialogue systems.
Existing research for image text retrieval mainly relies on sentence-level supervision to distinguish matched and mismatched sentences for a query image.
Considering that theme concepts can be learned from both images and captions, we propose two settings for their representations learning based on TTN.
Logical reasoning of text requires understanding critical logical information in the text and performing inference over them.
Ranked #7 on Reading Comprehension on ReClor
We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively.
Ranked #11 on Machine Translation on WMT2014 English-German
In this paper, we focus on the problem of unsupervised image-sentence matching.
Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts.
Existing research usually employs the architecture of CNN-RNN that views the generation as a sequential decision-making process and the entire dataset vocabulary is used as decoding space.
Visual Question Generation (VQG) aims to ask natural questions about an image automatically.
This paper presents a UIR-Miner system for emotion and sentiment analysis evaluation in Twitter in SemEval 2018.