We propose a novel paradigm of training with a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.
Ranked #1 on Video Captioning on MSVD
In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits.
Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image.
The employment of convolutional neural networks has achieved unprecedented performance in the task of image restoration for a variety of degradation factors.
We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN).
Ranking is a central task in machine learning and information retrieval.
We present a generic framework for spatio-temporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time.
The maximization of many of these metrics can be expressed as a constrained optimization problem, where the constraint is a function of the classifier's predictions.
In this paper we introduce, develop algorithms for, and investigate the properties of, a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty.