Scene Understanding
514 papers with code • 3 benchmarks • 43 datasets
Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.
Benchmarks
These leaderboards are used to track progress in Scene Understanding
Libraries
Use these libraries to find Scene Understanding models and implementationsDatasets
Subtasks
Latest papers
BACS: Background Aware Continual Semantic Segmentation
Besides the common problem of classical catastrophic forgetting in the continual learning setting, CSS suffers from the inherent ambiguity of the background, a phenomenon we refer to as the "background shift'', since pixels labeled as background could correspond to future classes (forward background shift) or previous classes (backward background shift).
AccidentBlip2: Accident Detection With Multi-View MotionBlip2
We also extend our approach to a multi-vehicle cooperative system by deploying Motion Qformer on each vehicle and simultaneously inputting the inference-generated query into the MLP for autoregressive inference.
ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation.
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
In the auto-labeling stage, we represent the surface of each instance as a signed distance field (SDF) and render its silhouette as an instance mask through our proposed instance-aware volumetric silhouette rendering.
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding.
Object Pose Estimation via the Aggregation of Diffusion Features
To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation.
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
In this work, we propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation to facilitate learning with multiple objects for the multiple sub-tasks in a unified manner.