Scene Classification (unified classes)
2 papers with code • 2 benchmarks • 2 datasets
This task has no description! Would you like to contribute one?
Most implemented papers
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
We thus propose VIDAL-10M with Video, Infrared, Depth, Audio and their corresponding Language, naming as VIDAL-10M.
Efficient Multi-Task RGB-D Scene Analysis for Indoor Environments
In order to evaluate our multi-task approach, we extend the annotations of the common RGB-D indoor datasets NYUv2 and SUNRGB-D for instance segmentation and orientation estimation.