|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
We propose a novel video inpainting algorithm that simultaneously hallucinates missing appearance and motion (optical flow) information, building upon the recent 'Deep Image Prior' (DIP) that exploits convolutional network architectures to enforce plausible texture in static images.
In this paper, we develop a multi-task motion guided video salient object detection network, which learns to accomplish two sub-tasks using two sub-networks, one sub-network for salient object detection in still images and the other for motion saliency detection in optical flow images.
Based on rigid projective geometry, the estimated stereo depth is used to guide the camera motion estimation, and the depth and camera motion are used to guide the residual flow estimation.
We postulate that success on this task requires the network to learn semantic and geometric knowledge in the ego-centric view.
Interestingly, we also observe that the optical flow is more informative than the RGB in videos, and overall, models using audio features are more accurate than those based on video features when making the final prediction of evoked emotions.
Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.
Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting.
Due to better video quality and higher frame rate, the performance of multiple object tracking issues has been greatly improved in recent years.
This paper presents a novel obstacle avoidance system for road robots equipped with RGB-D sensor that captures scenes of its way forward.