To address this issue, we propose a novel flow-based generative framework that consists of multiple conditional affine coupling layers for learning unseen data generation.
The laborious and time-consuming manual annotation has become a real bottleneck in various practical scenarios.
In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification.
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.
Inspired by recent success in unsupervised contrastive representation learning, we propose a novel denoised cross-video contrastive algorithm, aiming to enhance the feature discrimination ability of video snippets for accurate temporal action localization in the weakly-supervised setting.
In this paper, we present a novel method which leverages both visual and semantic modalities to distinguish seen and unseen categories.
As a by-product, a CapS dataset is constructed by augmenting existing benchmark training set with additional image tags and captions.
Zero-shot learning (ZSL) aims to recognize unseen classes based on the knowledge of seen classes.
To address these problems, we investigate domain adaptive semantic segmentation without source data, which assumes that the model is pre-trained on the source domain, and then adapting to the target domain without accessing source data anymore.
An previous solution is test time normalization, which substitutes the source statistics in BN layers with the target batch statistics.
Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis.
Spectral approximation and variational inducing learning for the Gaussian process are two popular methods to reduce computational complexity.
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e. g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
From prestack seismic gathers, anisotropic analysis and inversion were commonly applied to characterize the dominant orientations and relative intensities of fractures.
Different from the traditional recommender system, the session-based recommender system introduces the concept of the session, i. e., a sequence of interactions between a user and multiple items within a period, to preserve the user's recent interest.
Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.
To our knowledge, our work is the first in producing calibrated predictions under different expertise levels for medical image segmentation.
Complex backgrounds and similar appearances between objects and their surroundings are generally recognized as challenging scenarios in Salient Object Detection (SOD).
Previous bi-classifier adversarial learning methods only focus on the similarity between the outputs of two distinct classifiers.
The security constraints of this method is constructed only with the input and output signal samples of the legal and eavesdropper channels and benefit that training the encoder is completely independent of the decoder.
On our created OR-ShARC dataset, MUDERN achieves the state-of-the-art performance, outperforming existing single-passage conversational machine reading models as well as a new multi-passage conversational machine reading baseline by a large margin.
Generalized zero-shot learning (GZSL) aims to classify samples under the assumption that some classes are not observable during training.
To address these issues, in this paper, we propose a novel framework that leverages dual variational autoencoders with a triplet loss to learn discriminative latent features and applies the entropy-based calibration to minimize the uncertainty in the overlapped area between the seen and unseen classes.
In this work we introduce a new ReID task, bird-view person ReID, which aims at searching for a person in a gallery of horizontal-view images with the query images taken from a bird's-eye view, i. e., an elevated view of an object from above.
Our bidirectional dynamic fusion strategy encourages the interaction of spatial and temporal information in a dynamic manner.
Ranked #12 on Video Polyp Segmentation on SUN-SEG-Easy (Unseen)
Specifically, we first pre-train robust item representation from item content data by a Denoising Auto-encoder instead of other deterministic deep learning frameworks; then we finetune the entire framework by adding a pairwise loss objective with discrete constraints; moreover, DPH aims to minimize a pairwise ranking loss that is consistent with the ultimate goal of recommendation.
Such three strategies are formulated into a unified framework to address the fairness issue and domain shift challenge.
Based on the learned EDU and entailment representations, we either reply to the user our final decision "yes/no/irrelevant" of the initial question, or generate a follow-up question to inquiry more information.
A voting strategy averages the probability distributions output from the classifiers and, given that some patches are more discriminative than others, a discrimination-based attention mechanism helps to weight each patch accordingly.
The explicitly extracted edge information goes together with saliency to give more emphasis to the salient regions and object boundaries.
Ranked #19 on RGB-D Salient Object Detection on NJU2K
Specifically, we design a complementary dual-level semantic transfer mechanism to efficiently discover the potential semantics of tags and seamlessly transfer them into binary hash codes.
Regarding as a combination of feature learning and target learning, the new proposed networks provide great capacity in high-hierarchy feature extraction and in-depth data mining.
Additionally, we develop a fast discrete optimization algorithm to directly compute the binary hash codes with simple operations.
In this paper, we present a deep-learning-based method where a novel memory-oriented decoder is tailored for light field saliency detection.
In this paper, therefore, we study the item transition pattern by constructing a session graph and propose a novel model which collaboratively considers the sequence order and the latent order in the session graph for a session-based recommender system.
Question generation (QG) is the task of generating a question from a reference sentence and a specified answer within the sentence.
Thus, a multi-modal cycle-consistency loss between the synthesized semantic representations and the ground truth can be learned and leveraged to enforce the generated semantic features to approximate to the real distribution in semantic space.
Domain adaptation investigates the problem of cross-domain knowledge transfer where the labeled source domain and unlabeled target domain have distinctive data distributions.
Ranked #2 on Domain Adaptation on USPS-to-MNIST
An inevitable issue of such a paradigm is that the synthesized unseen features are prone to seen references and incapable to reflect the novelty and diversity of real unseen instances.
A natural language interface (NLI) to databases is an interface that translates a natural language question to a structured query that is executable by database management systems (DBMS).
Visual paragraph generation aims to automatically describe a given image from different perspectives and organize sentences in a coherent way.
This work, for the first time, formulates CSR as a ZSL problem, and a tailor-made ZSL method is proposed to handle CSR.
We analyze the histogram of the likelihoods of the input images using the generalized mean, which measures the model's accuracy as a function of the relative risk.
We propose to dynamically learn the collaborative similarity structure, and further integrate it with the ultimate feature selection into a unified framework.
In this paper, we take the advantage of generative adversarial networks (GANs) and propose a novel method, named leveraging invariant side GAN (LisGAN), which can directly generate the unseen features from random noises which are conditioned by the semantic descriptions.
Ranked #3 on Generalized Zero-Shot Learning on SUN Attribute