We present GateHUB, Gated History Unit with Background Suppression, that comprises a novel position-guided gated cross-attention mechanism to enhance or suppress parts of the history as per how informative they are for current frame prediction.
Ranked #1 on Online Action Detection on TVSeries
We present a novel dynamic recommendation model that focuses on users who have interactions in the past but turn relatively inactive recently.
This paper concerns the problem of multi-object tracking based on the min-cost flow (MCF) formulation, which is conventionally studied as an instance of linear program.
The OpenTAL is general to enable existing TAL models for open set scenarios, and experimental results on THUMOS14 and ActivityNet1. 3 benchmarks show the effectiveness of our method.
In many applications, it is essential to understand why a machine learning model makes the decisions it does, but this is inhibited by the black-box nature of state-of-the-art neural networks.
Different from image data, video actions are more challenging to be recognized in an open-set setting due to the uncertain temporal dynamics and static bias of human actions.
Traffic accident anticipation aims to accurately and promptly predict the occurrence of a future accident from dashcam videos, which is vital for a safety-guaranteed self-driving system.
Our model explicitly anticipates both activity features and positions by two graph auto-encoders, aiming to learn a discriminative group representation for group activity prediction.
The derived uncertainty-based ranking loss is found to significantly boost model performance by improving the quality of relational features.
Ranked #2 on Accident Anticipation on CCD
The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results.
We fully exploit the hierarchical features from all the convolutional layers.
Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state.
In this paper, we propose a novel residual dense network (RDN) to address this problem in image SR. We fully exploit the hierarchical features from all the convolutional layers.
Ranked #3 on Color Image Denoising on CBSD68 sigma50
Rich heterogeneous RGB and depth data are effectively compressed and projected to a learned shared space, in order to reduce noise and capture useful information for recognition.