To address the lack of continual learning methodologies in SGG, we introduce the comprehensive Continual ScenE Graph Generation (CSEGG) dataset along with 3 learning scenarios and 8 evaluation metrics.
Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer.
However, the vanilla mask generation method of SAM lacks class-specific information in the masks, resulting in inferior counting accuracy.
To capture compositional entities of the scene, we proposed cyclic walks between perceptual features extracted from CNN or transformers and object entities.
For example, when we learn mathematics at school, we build upon our knowledge of addition to learn multiplication.
Target modulation is computed as patch-wise local relevance between the target and search images, whereas contextual modulation is applied in a global fashion.
To better accommodate the object-centric nature of current downstream tasks such as object recognition and detection, various methods have been proposed to suppress contextual biases or disentangle objects from contexts.
no code implementations • 23 Nov 2022 • Mengmi Zhang, Giorgia Dellaferrera, Ankur Sikarwar, Marcelo Armendariz, Noga Mudrik, Prachi Agrawal, Spandan Madan, Andrei Barbu, Haochen Yang, Tanishq Kumar, Meghna Sadwani, Stella Dellaferrera, Michele Pizzochero, Hanspeter Pfister, Gabriel Kreiman
To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans.
Tremendous progress has been made in continual learning to maintain good performance on old tasks when learning new tasks by tackling the catastrophic forgetting problem of neural networks.
However, CL on VQA involves not only the expansion of label sets (new Answer sets).
Here we present SemanticDG (Semantic Domain Generalization): a benchmark with 15 photo-realistic domains with the same geometry, scene layout and camera parameters as the popular 3D ScanNet dataset, but with controlled domain shifts in lighting, materials, and viewpoints.
Remarkably, with only 25% annotated video frames, our method still outperforms the base CL learners, which are trained with 100% annotations on all video frames.
To elucidate the mechanisms responsible for asymmetry in visual search, we propose a computational model that takes a target and a search image as inputs and produces a sequence of eye movements until the target is found.
CRUMB's memory blocks are tuned to enhance replay: a single feature map stored, reconstructed, and replayed by CRUMB mitigates forgetting during video stream learning more effectively than an entire image, even though it occupies only 3. 6% as much memory.
Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets.
Primates constantly explore their surroundings via saccadic eye movements that bring different parts of an image into high resolution.
To model the role of contextual information in visual recognition, we systematically investigated ten critical properties of where, when, and how context modulates recognition, including the amount of context, context and object resolution, geometrical structure of context, context congruence, and temporal dynamics of contextual modulation.
Without tampering with the performance on initial tasks, our method learns novel concepts given a few training examples of each class in new tasks.
In each classification task, our method learns a set of variational prototypes with their means and variances, where embedding of the samples from the same class can be represented in a prototypical distribution and class-representative prototypes are separated apart.
Context reasoning is critical in a wide variety of applications where current inputs need to be interpreted in the light of previous experience and knowledge.
Egocentric spatial memory (ESM) defines a memory system with encoding, storing, recognizing and recalling the spatial information about the environment from an egocentric perspective.
During the exploration, our proposed ESM network model updates belief of the global map based on local observations using a recurrent neural network.
Through competition with discriminator, the generator progressively improves quality of the future frames and thus anticipates future gaze better.