The sparse transformer can reduce the computational complexity of the self-attention layers to $O(n)$, whilst still being a universal approximator of continuous sequence-to-sequence functions.
Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions.
We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias.
We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes.
We introduce Radflow, a novel model that embodies three key ideas: a recurrent neural network to obtain node embeddings that depend on time, the aggregation of the flow of influence from neighboring nodes with multi-head attention, and the multi-layer decomposition of time series.
The collective attention on online items such as web pages, search terms, and videos reflects trends that are of social, cultural, and economic interest.
Most work on multi-document summarization has focused on generic summarization of information present in each individual document set.
The proof connects the well known Stone-Weierstrass Theorem for function approximation, the uniform density of non-negative continuous functions using a transfer functions, the formulation of the parameters of a piece-wise continuous functions as a dynamic system, and a recurrent neural network implementation for capturing the dynamics.
We address the first challenge by associating words in the caption with faces and objects in the image, via a multi-modal, multi-head attention mechanism.
This paper presents in-depth measurements on the effects of Twitter data sampling across different timescales and different subjects (entities, networks, and cascades).
We show that QP matches quantile functions rather than moments as in EP and has the same mean update but a smaller variance update than EP, thereby alleviating EP's tendency to over-estimate posterior variances.
In this paper, we first construct the Vevo network -- a YouTube video network with 60, 740 music videos interconnected by the recommendation links, and we collect their associated viewing dynamics.
In this paper, we discuss the learning of generalised policies for probabilistic and classical planning problems using Action Schema Networks (ASNets).
This paper considers extractive summarisation in a comparative setting: given two or more document groups (e. g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups.
On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset.
We develop a model that learns to generate visually relevant styled captions from a large corpus of styled text without aligned images.
We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos.
We collect a large dataset of tweets during the 1st U. S. Presidential Debate in 2016 (#DebateNight) and we analyze its 1. 5 million users from three perspectives: user influence, political behavior (partisanship and engagement) and botness.
Social and Information Networks
Images in the wild encapsulate rich knowledge about varied abstract concepts and cannot be sufficiently described with models built only using image-caption pairs containing selected objects.
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems.
The share of videos in the internet traffic has been growing, therefore understanding how videos capture attention on a global scale is also of growing importance.
Social and Information Networks Human-Computer Interaction
This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time.
Knowledge graph construction consists of two tasks: extracting information from external resources (knowledge population) and inferring missing information through a statistical analysis on the extracted information (knowledge completion).
Modeling and predicting the popularity of online content is a significant problem for the practice of information dissemination, advertising, and consumption.
Social and Information Networks
We design a system to describe an image with emotions, and present a model that automatically generates captions with positive or negative sentiments.
This paper proposes AutoRec, a novel autoencoder framework for collaborative filtering (CF).
Ranked #5 on Recommendation Systems on MovieLens 1M