Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses.
We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by integrating cutting-edge diffusion models, which have revolutionized diverse fields, but are relatively unexplored in 3D-HPE.
In this complex system, advances in conventional forecasting methods have been made using curated data, i. e., with the assumption of perfect maps, detection, and tracking.
Particle-based deep generative models, such as gradient flows and score-based diffusion models, have recently gained traction thanks to their striking performance.
We conduct a set of experiments on counterfactual explanation benchmarks for driving scenes, and we show that our method can be adapted beyond classification, e. g., to explain semantic segmentation models.
In this work, we address the problem of producing counterfactual explanations for high-quality images and complex scenes.
Learning-based trajectory prediction models have encountered great success, with the promise of leveraging contextual information in addition to motion history.
Designing video prediction models that account for the inherent uncertainty of the future is challenging.
Ranked #1 on Video Prediction on KTH 64x64 cond10 pred30
Object segmentation is a crucial problem that is usually solved by using supervised learning approaches over very large datasets composed of both images and corresponding object masks.
We assume that the distribution of the data is driven by two independent latent factors: the content, which represents the intrinsic features of an object, and the view, which stands for the settings of a particular observation of that object.