Scene Segmentation
122 papers with code • 5 benchmarks • 7 datasets
Scene segmentation is the task of splitting a scene into its various object components.
Image adapted from Temporally coherent 4D reconstruction of complex dynamic scenes.
Libraries
Use these libraries to find Scene Segmentation models and implementationsLatest papers
Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation
Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors.
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Video-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues.
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Given the limitations of unidirectional attention in GPT models and their ability to generate coherent long paragraphs, we carefully sequence the word tokens before vision tokens, mimicking the human thought process of understanding the question to infer an answer from an image.
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
Although Domain Adaptation in Semantic Scene Segmentation has shown impressive improvement in recent years, the fairness concerns in the domain adaptation have yet to be well defined and addressed.
Self-positioning Point-based Transformer for Point Cloud Understanding
In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
Based on existing efforts, this work has two observations: (1) For different annotators, labeling highlight has uncertainty, which leads to inaccurate and time-consuming annotations.
Neural Implicit Vision-Language Feature Fields
In this work, we present a zero-shot volumetric open-vocabulary semantic scene segmentation method.
Semantic segmentation of surgical hyperspectral images under geometric domain shifts
According to a comprehensive validation on six different OOD data sets comprising 600 RGB and hyperspectral imaging (HSI) cubes from 33 pigs semantically annotated with 19 classes, we demonstrate a large performance drop of SOA organ segmentation networks applied to geometric OOD data.
Towards Surgical Context Inference and Translation to Gestures
We evaluate the performance of each stage of our method by comparing the results with the ground truth segmentation masks, the consensus context labels, and the gesture labels in the JIGSAWS dataset.
Paced-Curriculum Distillation with Prediction and Label Uncertainty for Image Segmentation
Purpose: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress.