However, crop detection, e. g., apple detection in orchard environments remains challenging due to a lack of large-scale datasets and the small relative size of the crops in the image.
In this work, we parallelize high-level features in deep networks to selectively skip or select class-specific features to reduce inference costs.
Traffic light detection is a challenging problem in the context of self-driving cars and driver assistance systems.
This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information.
We investigate cross-quality knowledge distillation (CQKD), a knowledge distillation method where knowledge from a teacher network trained with full-resolution images is transferred to a student network that takes as input low-resolution images.
In this paper, we present and evaluate a NeRF-based framework that is capable of rendering scenes in immersive VR allowing users to freely move their heads to explore complex real-world scenes.
Our evaluation in an object proposal generation framework shows that our adapted AttentionMask system is robust to image degradations, generalizes well to unseen types of surgeries, and copes well with small instruments.
Since the apples are very small objects in such scenarios, we tackle this problem by adapting the object proposal generation system AttentionMask that focuses on small objects.
In contrast, this domain gap is considerably smaller and easier to fill for depth information.
We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset.
Precise segmentation of objects is an important problem in tasks like class-agnostic object proposal generation or instance segmentation.
We use depth information represented by point clouds as the input to both deep networks and geometry-based pose refinement and use separate networks for rotation and translation regression.
We propose a novel approach for class-agnostic object proposal generation, which is efficient and especially well-suited to detect small objects.
With this inspiration, a deep convolutional neural network for low-level object attribute classification, called the Deep Attribute Network (DAN), is proposed.
Rotation estimation of known rigid objects is important for robotic applications such as dexterous manipulation.
In application domains such as robotics, it is useful to represent the uncertainty related to the robot's belief about the state of its environment.
We address the problem of coordinating the actions of a team of robots with periodic communication capability executing an information gathering task.
In this paper, we show that the seminal, biologically-inspired saliency model by Itti et al. is still competitive with current state-of-the-art methods for salient object segmentation if some important adaptions are made.