Inspired by the findings of (CITATION) that entities are most informative in the image, we propose an explicit entity-level cross-modal learning approach that aims to augment the entity representation.
More importantly, it demonstrates that it is feasible to decode a certain word within a large vocabulary from its neural brain activity.
Medical entity normalization, which links medical mentions in the text to entities in knowledge bases, is an important research topic in medical natural language processing.
Based on these results, we suggest a block-wise cross-validation training method and an adequate data size for increasing the performance of linear encoding models.
Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods.
Therefore, we propose a novel role interaction enhanced method for role-oriented dialogue summarization.
Specifically, we suppose that each learnable prompt token has a different contribution to different instances, and we learn the contribution by calculating the relevance score between an instance and each prompt token.
To address the problem, we propose a novel model, called DyMen, to dynamically adjust the subsequent linking target based on the previously linked entities via reinforcement learning, enabling the model to select a link target that can fully use previously linked information.
However, most of the existing studies have focused on discriminating which one in two stimuli corresponds to the given brain image, which is far from directly generating text from neural activities.
Therefore, in this paper, we introduce a novel Chinese dataset for Customer Service Dialogue Summarization (CSDS).
We propose two strategies for finetuning process: value-based and context-based augmentation.
Emotion category is usually divided into different ones by human beings, but it is indeed difficult to clearly distinguish and define the boundaries between different emotion categories.
Previous studies combining knowledge graph (KG) with neural machine translation (NMT) have two problems: i) Knowledge under-utilization: they only focus on the entities that appear in both KG and training sentence pairs, making much knowledge in KG unable to be fully utilized.
Thus, we propose a multimodal selective gate network that considers reciprocal relationships between textual and multi-level visual features, including global image descriptor, activation grids, and object proposals, to select highlights of the event when encoding the source sentence.
The framework is based on language models and can be smoothly built with different language model architectures.
The hierarchical attention adaptively aggregates the low-hierarchy and the high-hierarchy information, which is beneficial to balance the neighborhood information of counterpart entities and distinguish non-counterpart entities with similar structures.
We propose a touch-based editing method for translation, which is more flexible than traditional keyboard-mouse-based translation postediting.
End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way.
Specifically, we introduce a selection module that is independent of the translation module to score each candidate context sentence.