We first establish a causal graph to represent the relations among uploader, UGC, and tag, where the uploaders are identified as confounders that spuriously correlate UGC and tag selections.
Recommending appropriate tags to items can facilitate content organization, retrieval, consumption and other applications, where hybrid tag recommender systems have been utilized to integrate collaborative information and content information for better recommendations.
To better handle the challenges of complex and large motions, instead of aligning features at each scale separately, lower-scale motion information is used to guide the higher-scale motion estimation.
As with all observational studies, hidden confounders, which are factors that affect both item exposures and user ratings, lead to a systematic bias in the estimation.
By combining an object-level graph (OG) and a relation-level graph (RG), the proposed OR2G catches the attribute transitions of objects and reasons about the relationship transitions between objects simultaneously.
To solve the above problem of the existing methods, we propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
Moreover, the multimodal information is fused by the product-of-experts (PoE) principle, where the semantic information in visual and textual modalities of the micro-video are weighted according to their variance estimations such that the modality with a lower noise level is given more weights.
For a typical Scene Graph Generation (SGG) method, there is often a large gap in the performance of the predicates' head classes and tail classes.
To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video.
For the vehicle detection and tracking module, we adopted YOLOv5 and multi-scale tracking to localize the anomalies.
Moreover, by considering the fusion of collaborative and feature variables as a virtual communication channel from an information-theoretic perspective, we introduce a user-dependent channel to dynamically control the information allowed to be accessed from the feature embeddings.
During the learning process, different intermediate time step can be involved as a control variable by means of an extension of coord-conv trick, allowing the estimated components to vary with different input temporal information.
Traditional 3D mesh saliency detection algorithms and corresponding databases were proposed under several constraints such as providing limited viewing directions and not taking the subject's movement into consideration.
As an emerging type of user-generated content, micro-video drastically enriches people's entertainment experiences and social interactions.
The proposed resampler network generates content adaptive image resampling kernels that are applied to the original HR input to generate pixels on the downscaled image.
Ranked #1 on Image Super-Resolution on Set14 - 2x upscaling
Recently, with the prevalence of large-scale image dataset, the co-occurrence information among classes becomes rich, calling for a new way to exploit it to facilitate inference.
Given a sequence of historical purchased items for a user, we devise a novel hierarchical attention over attention mechanism to capture sequential patterns at both union-level and individual-level.
In this paper, we present an end-to-end deep learning based framework for 3D object detection from a single monocular image.
Ranked #12 on Vehicle Pose Estimation on KITTI Cars Hard
We demonstrate that this low-computation-complexity method can efficiently catch the characteristics of the frame.
In this paper, a new heat-map-based (HMB) algorithm is proposed for group activity recognition.
In this paper, we propose a new intra-and-inter-constraint-based video enhancement approach aiming to 1) achieve high intra-frame quality of the entire picture where multiple region-of-interests (ROIs) can be adaptively and simultaneously enhanced, and 2) guarantee the inter-frame quality consistencies among video frames.