Specifically, we first define ten types of relations for ASTE task, and then adopt a biaffine attention module to embed these relations as an adjacent tensor between words in a sentence.
Generalized zero-shot text classification aims to classify textual instances from both previously seen classes and incrementally emerging unseen classes.
A key component of the GSC-attention is grouped-attention, which is token-level attention constrained within each input attribute that enables our proposed model captures both local and global context.
A reliable clustering algorithm for task-oriented dialogues can help developer analysis and define dialogue tasks efficiently.
We also introduce a new metric Cross-Model Distance (CMD) for simultaneously evaluating image quality and image-text consistency.
It first models semantic, spatial, and implicit visual relations in images by three graph attention networks, then question information is utilized to guide the aggregation process of the three graphs, further, our QD-GFN adopts an object filtering mechanism to remove question-irrelevant objects contained in the image.
Most existing approaches to Visual Question Answering (VQA) answer questions directly, however, people usually decompose a complex question into a sequence of simple sub questions and finally obtain the answer to the original question after answering the sub question sequence(SQS).
We find some new linguistic phenomena and interactive manners in SSTOD which raise critical challenges of building dialog agents for the task.
Ranked #1 on SSTOD on SSD_NAME
Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively.
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
To capture the various topic information of a conversation and outline salient facts for the captured topics, this work proposes two topic-aware contrastive learning objectives, namely coherence detection and sub-summary generation objectives, which are expected to implicitly model the topic change and handle information scattering challenges for the dialogue summarization task.
Ranked #1 on Text Summarization on SAMSum Corpus
To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially.
To overcome these challenges, in this paper, we propose a dual graph convolutional networks (DualGCN) model that considers the complementarity of syntax structures and semantic correlations simultaneously.
Experimental results show that our method achieves comparable performance to the original LXMERT model in all downstream tasks, and even outperforms the original model in Image-Text Retrieval task.
In Reinforcement Learning, it is crucial to represent states and assign rewards based on the action-caused transitions of states.
We propose a novel task, Multi-Document Driven Dialogue (MD3), in which an agent can guess the target document that the user is interested in by leading a dialogue.
In order to tackle this problem, a weakly supervised cropping frame- work is proposed, where the distribution dissimilarity between high quality images and cropped images is used to guide the coordinate predictor’s training and the ground truths of cropping windows are not required by the proposed method.
Emotion Recognition in Conversations (ERC) is essential for building empathetic human-machine systems.
Ranked #6 on Emotion Recognition in Conversation on IEMOCAP
In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states.
A major challenge of multi-label text classification (MLTC) is to stimulatingly exploit possible label differences and label correlations.
Ranked #1 on Multi-Label Text Classification on AAPD (Micro F1 metric)
In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge from KGs.
We present a new CNN model, named cycle CNN, which can directly use the real data from monochrome-color camera systems for training.
Abstract The task of paragraph image captioning aims to generate a coherent paragraph describing a given image.
Dialogue embeddings are learned by a LSTM at the middle of the network, and updated by the feeding of all turn embeddings.
In recommender systems, usually the ratings of a user to most items are missing and a critical problem is that the missing ratings are often missing not at random (MNAR) in reality.
This paper presents a strong baseline for real-world visual reasoning (GQA), which achieves 60. 93% in GQA 2019 challenge and won the sixth place.
An alternative method is to adversarially train the classifier against a discriminator in a two-player game akin to generative adversarial networks (GAN), which can ensure the classifier to learn the true data distribution at the equilibrium of this game.
A chain of reasoning (CoR) is constructed for supporting multi-step and dynamic reasoning on changed relations and objects.
10 code implementations • 1 Jul 2013 • Ian J. Goodfellow, Dumitru Erhan, Pierre Luc Carrier, Aaron Courville, Mehdi Mirza, Ben Hamner, Will Cukierski, Yichuan Tang, David Thaler, Dong-Hyun Lee, Yingbo Zhou, Chetan Ramaiah, Fangxiang Feng, Ruifan Li, Xiaojie Wang, Dimitris Athanasakis, John Shawe-Taylor, Maxim Milakov, John Park, Radu Ionescu, Marius Popescu, Cristian Grozea, James Bergstra, Jingjing Xie, Lukasz Romaszko, Bing Xu, Zhang Chuang, Yoshua Bengio
The ICML 2013 Workshop on Challenges in Representation Learning focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge.
Ranked #1 on Facial Expression Recognition on FER2013 (using extra training data)