The overlook of multiplex behavior relations can hardly recognize the multi-modal contextual signals across different types of interactions, which limit the feasibility of current recommendation methods.
Accurate forecasting of citywide traffic flow has been playing critical role in a variety of spatial-temporal mining applications, such as intelligent traffic control and public risk assessment.
To model relation heterogeneity, we design a metapath-guided heterogeneous graph neural network to aggregate feature embeddings from different types of meta-relations across users and items, em-powering SMIN to maintain dedicated representations for multi-faceted user- and item-wise dependencies.
Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions.
While many recent efforts show the effectiveness of neural network-based social recommender systems, several important challenges have not been well addressed yet: (i) The majority of models only consider users' social connections, while ignoring the inter-dependent knowledge across items; (ii) Most of existing solutions are designed for singular type of user-item interactions, making them infeasible to capture the interaction heterogeneity; (iii) The dynamic nature of user-item interactions has been less explored in many social-aware recommendation techniques.
The learning process of intra- and inter-session transition dynamics are integrated, to preserve the underlying low- and high-level item relationships in a common latent space.
In particular: i) complex inter-dependencies across different types of user behaviors; ii) the incorporation of knowledge-aware item relations into the multi-behavior recommendation framework; iii) dynamic characteristics of multi-typed user-item interactions.
To address the challenges of communication and computation resource utilization, we propose an asynchronous stochastic quasi-Newton (AsySQN) framework for VFL, under which three algorithms, i. e. AsySQN-SGD, -SVRG and -SAGA, are proposed.
Those methods achieve limited performance as they suffer from the challenge in spatial frame alignment and the lack of useful information from similar LR neighbor frames.
Crime prediction is crucial for public safety and resource optimization, yet is very challenging due to two aspects: i) the dynamics of criminal patterns across time and space, crime events are distributed unevenly on both spatial and temporal domains; ii) time-evolving dependencies between different types of crimes (e. g., Theft, Robbery, Assault, Damage) which reveal fine-grained semantics of crimes.
We use this platform to demonstrate our research and development results on privacy preserving machine learning algorithms.
To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured largescale dataset, named as DroneCrowd, formed by 112 video clips with 33, 600 HD frames in various scenarios.
We propose a new offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
In this paper, we advocate to revive latent variable modeling, essentially the power of representation learning, in the era of Transformers to enhance controllability without hurting state-of-the-art generation effectiveness.
Our paper is among the first ones by our knowledge to propose a model and to create datasets for the task of "outline to story".
Existing methods only focused on pig counting using single image, and its accuracy is challenged by several factors, including pig movements, occlusion and overlapping.
Existing fraud detection methods try to identify unexpected dense subgraphs and treat related nodes as suspicious.
This paper proposes a space-time multi-scale attention network (STANet) to solve density map estimation, localization and tracking in dense crowds of video clips captured by drones with arbitrary crowd density, perspective, and flight altitude.
The derived node representations can be used to serve various downstream tasks, such as node classification and node clustering.
Ranked #7 on Heterogeneous Node Classification on DBLP (PACT) 14k
no code implementations • • Dawei Du, Pengfei Zhu, Longyin Wen, Xiao Bian, Haibin Lin, QinGhua Hu, Tao Peng, Jiayu Zheng, Xinyao Wang, Yue Zhang, Liefeng Bo, Hailin Shi, Rui Zhu, Aashish Kumar, Aijin Li, Almaz Zinollayev, Anuar Askergaliyev, Arne Schumann, Binjie Mao, Byeongwon Lee, Chang Liu, Changrui Chen, Chunhong Pan, Chunlei Huo, Da Yu, Dechun Cong, Dening Zeng, Dheeraj Reddy Pailla, Di Li, Dong Wang, Donghyeon Cho, Dongyu Zhang, Furui Bai, George Jose, Guangyu Gao, Guizhong Liu, Haitao Xiong, Hao Qi, Haoran Wang, Heqian Qiu, Hongliang Li, Huchuan Lu, Ildoo Kim, Jaekyum Kim, Jane Shen, Jihoon Lee, Jing Ge, Jingjing Xu, Jingkai Zhou, Jonas Meier, Jun Won Choi, Junhao Hu, Junyi Zhang, Junying Huang, Kaiqi Huang, Keyang Wang, Lars Sommer, Lei Jin, Lei Zhang
Results of 33 object detection algorithms are presented.
Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels.
Ranked #12 on Face Alignment on 300W
Specifically, the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data, is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation.
Ranked #2 on Semi-Supervised Video Object Segmentation on YouTube
Taking this advantage, we are able to explore various types of networks for object detection, without suffering from the poor convergence.
Complex real-world signals, such as images, contain discriminative structures that differ in many aspects including scale, invariance, and data channel.
It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape or structure shared within a category, and the differences are in the details of the object parts.
Finding contours in natural images is a fundamental problem that serves as the basis of many tasks such as image segmentation and object recognition.
We highlight the kernel view of orientation histograms, and show that they are equivalent to a certain type of match kernels over image patches.
We describe twin Gaussian processes (TGP), a generic structured prediction method that uses Gaussian process (GP) priors on both covariates and responses, both multivariate, and estimates outputs by minimizing the Kullback-Leibler divergence between two GP modeled as normal distributions over finite index sets of training and testing examples, emphasizing the goal that similar inputs should produce similar percepts and this should hold, on average, between their marginal distributions.
Ranked #11 on 3D Human Pose Estimation on HumanEva-I
To model the nonlinear relationship between input features and outputs we propose Conditional Neural Fields (CNF), a new conditional probabilistic graphical model for sequence labeling.
To address this problem, we propose an efficient match kernel (EMK), which maps local features to a low dimensional feature space, average the resulting feature vectors to form a set-level feature, then apply a linear classifier.