Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Ranked #2 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)
In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.
Our goal is to develop a fast sampling method for DMs with a much less number of steps while retaining high sample quality.
The validation and deployment of novel research ideas in the field of Deep Learning is often limited by the availability of efficient compute kernels for certain basic primitives.
RM-DS integrates Residual U-blocks into a deep supervision network to generate deep multi-scale resolution-maintenance features while learning global context information.
Semantic segmentation models classify pixels into a set of known (``in-distribution'') visual classes.
Ranked #1 on Anomaly Detection on Road Anomaly (using extra training data)