Experimental result demonstrates that modern models including BERT contextual embedding, movie tag prediction systems, and relational networks, perform at most 37% of human performance (23. 97/64. 87) in terms of F1 score.
We proposed an end-to-end grasp detection network, Grasp Detection Network (GDN), cooperated with a novel coarse-to-fine (C2F) grasp representation design to detect diverse and accurate 6-DoF grasps based on point clouds.
How to efficiently utilize temporal information to recover videos in a consistent way is the main issue for video inpainting problems.
Ranked #5 on Video Inpainting on YouTube-VOS 2018 val
Free-form video inpainting is a very challenging task that could be widely used for video editing such as text removal.
Video object removal is a challenging task in video processing that often requires massive human efforts.