To simultaneously achieve a higher compression rate and better enhancement performance for low-light images, we propose a novel image compression framework with joint optimization of low-light image enhancement.
Hence we advocate that the key of better performance lies in meaningful latent modality structures instead of perfect modality alignment.
We implement the IAT in a mathematical invertible manner on a single rate Invertible Neural Network (INN) based model and the quality level (QLevel) would be fed into the IAT to generate scaling and bias tensors.
Aligning signals from different modalities is an important step in vision-language representation learning as it affects the performance of later stages such as cross-modality fusion.
Besides CMA, TCL introduces an intra-modal contrastive objective to provide complementary benefits in representation learning.
Ranked #1 on Zero-Shot Cross-Modal Retrieval on COCO 2014
Detecting oriented objects along with estimating their rotation information is one crucial step for analyzing remote sensing images.
InfoNCE-based contrastive representation learners, such as SimCLR, have been tremendously successful in recent years.
In many natural language processing applications, identifying predictive text can be as important as the predictions themselves.
There has been growing interest in representation learning for text data, based on theoretical arguments and empirical evidence.
The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former.
Ranked #7 on Knowledge Distillation on CIFAR-100
Deep neural networks excel at comprehending complex visual signals, delivering on par or even superior performance to that of human experts.
An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models, and can be used to evaluate and improve their robustness.
Cross-domain alignment between image objects and text sequences is key to many visual-language tasks, and it poses a fundamental challenge to both computer vision and natural language processing.
In GOT, cross-domain alignment is formulated as a graph matching problem, by representing entities into a dynamically-constructed graph.
Inference, estimation, sampling and likelihood evaluation are four primary goals of probabilistic modeling.
We propose a novel graph-driven generative model, that unifies multiple heterogeneous learning tasks into the same framework.
This paper considers a novel variational formulation of network embeddings, with special focus on textual networks.
We propose a Leaked Motion Video Predictor (LMVP) to predict future frames by capturing the spatial and temporal dependencies from given inputs.
Constituting highly informative network embeddings is an important tool for network analysis.
Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation with latent variables.
Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE).
Sequence generation with reinforcement learning (RL) has received significant attention recently.
Variational autoencoders (VAEs) have received much attention recently as an end-to-end architecture for text generation.
However, the discrete nature of text hinders the application of GAN to text-generation tasks.
To assess the difference between real and synthetic data, Generative Adversarial Networks (GANs) are trained using a distribution discrepancy measure.
Recent advances on the scalability and flexibility of variational inference have made it successful at unravelling hidden patterns in complex data.
There has been recent interest in developing scalable Bayesian sampling methods such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD) for big-data analysis.
A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from a simple prior and propagated through the decoder to manifest data.
The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs.
A new form of the variational autoencoder (VAE) is proposed, based on the symmetric Kullback-Leibler divergence.
We investigate the non-identifiability issues associated with bidirectional adversarial training for joint distribution matching.
Distinct from normalizing flows and GANs, CTFs can be adopted to achieve the above two goals in one framework, with theoretical guarantees.
This paper focusses on the formal analysis of a particular element of security mechanisms for V2X found in many proposals: the revocation of malicious or misbehaving vehicles from the V2X system by invalidating their credentials.
Cryptography and Security D.2.4; D.4.6