It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information.
Ranked #2 on
Video Saliency Detection
on DHF1K
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.
SALIENT OBJECT DETECTION VIDEO SALIENCY DETECTION VIDEO SALIENT OBJECT DETECTION
Network failures continue to plague datacenter operators as their symptoms may not have direct correlation with where or why they occur.
3D FACE RECONSTRUCTION ASPECT-BASED SENTIMENT ANALYSIS CROSS-MODAL RETRIEVAL DOCUMENT IMAGE CLASSIFICATION DRUG DISCOVERY ENTITY LINKING FEDERATED LEARNING FEW-SHOT IMAGE CLASSIFICATION HAND POSE ESTIMATION IMAGE DENOISING MEDICAL IMAGE SEGMENTATION MULTI-LABEL CLASSIFICATION NAMED ENTITY RECOGNITION PAIN INTENSITY REGRESSION PEDESTRIAN ATTRIBUTE RECOGNITION SELF-SUPERVISED ACTION RECOGNITION SELF-SUPERVISED VIDEO RETRIEVAL SEMANTIC SEGMENTATION SEMI-SUPERVISED VIDEO OBJECT SEGMENTATION SINGLE IMAGE DERAINING SPEECH RECOGNITION SPOKEN LANGUAGE IDENTIFICATION TRAJECTORY PREDICTION UNSUPERVISED IMAGE CLASSIFICATION VIDEO SALIENCY DETECTION VIDEO SEMANTIC SEGMENTATION VIDEO SUMMARIZATION WEAKLY SUPERVISED OBJECT DETECTION NETWORKING AND INTERNET ARCHITECTURE
We propose the \textbf{AViNet} architecture for audiovisual saliency prediction.
Ranked #1 on
Video Saliency Detection
on DHF1K
ACTION RECOGNITION SALIENCY PREDICTION VIDEO SALIENCY DETECTION VIDEO SALIENCY PREDICTION