Abstractive document summarization is a comprehensive task including document understanding and summary generation, in which area Transformer-based models have achieved the state-of-the-art performance.
Since our plug-and-play framework can be applied to many meta-learning problems, we further instantiate it to the cases of few-shot classification and implicit meta generative modeling.
As a result, the backbone learns the shared knowledge among all clusters while modulated weights extract the cluster-specific features.
However, they often assume in the prior that the topics at each layer are independently drawn from the Dirichlet distribution, ignoring the dependencies between the topics both at the same layer and across different layers.
Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights.
As a generic tool, the improvement introduced by ASR-Norm is agnostic to the choice of ADA methods.
Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image.
Extensive large-scale experiments on standard vision tasks show that CACR not only consistently outperforms existing CL methods on benchmark datasets in representation learning, but also provides interpretable contrastive weights, demonstrating the efficacy of the proposed doubly contrastive strategy.
We formulate two desired criteria for the space partitioner that aid the training of our mixture of generators: 1) to produce connected partitions and 2) provide a proxy of distance between partitions and data samples, along with a direction for reducing that distance.
However, the quality of uncertainty estimation is highly dependent on the dropout probabilities.
In learning to discover novel classes (L2DNC), we are given labeled data from seen classes and unlabeled data from unseen classes, and we train clustering models for the unseen classes.
Moving beyond conventional Transformers that ignore longer-range word dependencies and contextualize their word representations at the segment level, the proposed method not only captures global semantic coherence of all segments and global word concurrence patterns, but also enriches the representation of each token by adapting it to its local context, which is not limited to the segment it resides in and can be flexibly defined according to the task.
Helmholtz stereopsis (HS) exploits the reciprocity principle of light propagation (i. e., the Helmholtz reciprocity) for 3D reconstruction of surfaces with arbitrary reflectance.
The forward CT is the expected cost of moving a source data point to a target one, with their joint distribution defined by the product of the source probability density function (PDF) and a source-dependent conditional distribution, which is related to the target PDF via Bayes' theorem.
Self-supervised pre-training (SSP) employs random image transformations to generate training data for visual representation learning.
Ranked #34 on Self-Supervised Image Classification on ImageNet
Incorporating the natural document-sentence-word structure into hierarchical Bayesian modeling, we propose convolutional Poisson gamma dynamical systems (PGDS) that introduce not only word-level probabilistic convolutions, but also sentence-level stochastic temporal transitions.
To analyze a collection of interconnected documents, relational topic models (RTMs) have been developed to describe both the link structure and document content, exploring their underlying relationships via a single-layer latent representation with limited expressive capability.
We show that the existing semi-implicit variational inference objective provably reduces information in the observed graph.
The splitting function at each node of CPT is based on the logical disjunction of a community of differently weighted probabilistic linear decision-makers, which also geometrically corresponds to a convex polytope in the covariate space.
Attention modules, as simple and effective tools, have not only enabled deep neural networks to achieve state-of-the-art results in many domains, but also enhanced their interpretability.
Leveraging well-established MCMC strategies, we propose MCMC-interactive variational inference (MIVI) to not only estimate the posterior in a time constrained manner, but also facilitate the design of MCMC transitions.
We develop a recurrent gamma belief network (rGBN) for radar automatic target recognition (RATR) based on high-resolution range profile (HRRP), which characterizes the temporal dependence across the range cells of HRRP.
We use the generated random graph, whose number of nonzero-degree nodes is finite, to define both the sparsity pattern and dimension of the latent state transition matrix of a (generalized) linear dynamical system.
To improve the sample efficiency of policy-gradient based reinforcement learning algorithms, we propose implicit distributional actor-critic (IDAC) that consists of a distributional critic, built on two deep generator networks (DGNs), and a semi-implicit actor (SIA), powered by a flexible policy distribution.
Given a posterior sample of the global parameters, in order to efficiently infer the local latent representations of a document under DATM across all stochastic layers, we propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a Weibull distribution based stochastic downward generative model.
In high-dimensional statistics, variable selection is an optimization problem aiming to recover the latent sparse pattern from all possible covariate combinations.
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs) that generalizes existing stochastic regularization methods for training GNNs.
Semantic hashing has become a crucial component of fast similarity search in many large-scale information retrieval systems, in particular, for text data.
To this end, we propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
In this work, we propose learnable Bernoulli dropout (LBD), a new model-agnostic dropout scheme that considers the dropout rates as parameters jointly optimized with other model parameters.
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension, making it challenging to apply existing on-policy gradient based deep RL algorithms efficiently.
To stabilize this method, we adapt to contextual generation of categorical sequences a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
To simultaneously capture syntax and global semantics from a text corpus, we propose a new larger-context recurrent neural network (RNN) based language model, which extracts recurrent hierarchical semantic structure via a dynamic deep topic model to guide natural language generation.
If this is not done, the meta-learner can ignore the task training data and learn a single model that performs all of the meta-training tasks zero-shot, but does not adapt effectively to new image classes.
We propose Bayesian nonparametric Weibull delegate racing (WDR) to explicitly model surviving under competing events and to interpret how the covariates accelerate or decelerate the event times.
Survival Analysis Methodology
Variational inference is used to approximate the posterior of the local variable, and semi-implicit structure is further introduced to enhance its expressiveness.
This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness.
Stochastic recurrent neural networks with latent random variables of complex dependency structures have shown to be more successful in modeling sequential data than deterministic deep models.
In this work, we investigate semi-supervised learning (SSL) for image classification using adversarial training.
Representation learning over graph structured data has been mostly studied in static graph settings while efforts for modeling dynamic graphs are still scant.
Compared to VGAE, the derived graph latent representations by SIG-VAE are more interpretable, due to more expressive generative model and more faithful inference enabled by the flexible semi-implicit construction.
For bidirectional joint image-text modeling, we develop variational hetero-encoder (VHE) randomized generative adversarial network (GAN), a versatile deep generative model that integrates a probabilistic text decoder, probabilistic image encoder, and GAN into a coherent end-to-end multi-modality learning framework.
For text analysis, one often resorts to a lossy representation that either completely ignores word order or embeds each word as a low-dimensional dense feature vector.
To address the challenge of backpropagating the gradient through categorical variables, we propose the augment-REINFORCE-swap-merge (ARSM) gradient estimator that is unbiased and has low variance.
Many applications, such as text modelling, high-throughput sequencing, and recommender systems, require analysing sparse, high-dimensional, and overdispersed discrete (count-valued or binary) data.
To extract and relate visual and linguistic concepts from images and textual descriptions for text-based zero-shot learning (ZSL), we develop variational hetero-encoder (VHE) that decodes text via a deep probabilisitic topic model, the variational posterior of whose local latent variables is encoded from an image via a Weibull distribution based inference network.
We present a probabilistic framework for multi-label learning based on a deep generative model for the binary label vector associated with each observation.
In this paper, we introduce a novel concentric multi-spectral light field (CMSLF) design that is able to recover the shape and reflectance of surfaces with arbitrary material in one shot.
We avoid the need for spatial constancy of albedo; instead, we use a new measure for albedo similarity that is based on the albedo norm profile.
We develop deep Poisson-gamma dynamical systems (DPGDS) to model sequentially observed multivariate count data, improving previously proposed models by not only mining deep hierarchical latent structure from the data, but also capturing both first-order and long-range temporal dependencies.
Second, compared to the number of involved molecules and system complexity, the number of available samples for studying complex disease, such as cancer, is often limited, especially considering disease heterogeneity.
We propose Lomax delegate racing (LDR) to explicitly model the mechanism of survival under competing risks and to interpret how the covariates accelerate or decelerate the time to event.
To backpropagate the gradients through stochastic binary layers, we propose the augment-REINFORCE-merge (ARM) estimator that is unbiased, exhibits low variance, and has low computational complexity.
Semi-implicit variational inference (SIVI) is introduced to expand the commonly used analytic variational distribution family, by mixing the variational parameter with a flexible distribution.
Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacity-regularized network architectures from the data and require neither cross-validation nor fine-tuning when training the model.
It is important to learn various types of classifiers given training data with noisy labels.
Ranked #27 on Image Classification on Clothing1M (using extra training data)
We present a general method for privacy-preserving Bayesian inference in Poisson factorization, a broad class of models that includes some of the most widely used models in the social sciences.
Next-generation sequencing (NGS) to profile temporal changes in living systems is gaining more attention for deriving better insights into the underlying biological mechanisms compared to traditional static sequencing experiments.
To train an inference network jointly with a deep generative topic model, making it both scalable to big corpora and fast in out-of-sample prediction, we develop Weibull hybrid autoencoding inference (WHAI) for deep latent Dirichlet allocation, which infers posterior samples via a hybrid of stochastic-gradient MCMC and autoencoding variational Bayes.
A nonparametric Bayesian sparse graph linear dynamical system (SGLDS) is proposed to model sequentially observed multivariate data.
It is challenging to develop stochastic gradient based scalable inference for deep discrete latent variable models (LVMs), due to the difficulties in not only computing the gradients, but also adapting the step sizes to different latent factors and hidden layers.
To model categorical response variables given their covariates, we propose a permuted and augmented stick-breaking (paSB) construction that one-to-one maps the observed categories to randomly permuted latent sticks.
To construct flexible nonlinear predictive distributions, the paper introduces a family of softplus function based regression models that convolve, stack, or combine both operations by convolving countably infinite stacked gamma distributions, whose scales depend on the covariates.
We introduce Bayesian Poisson Tucker decomposition (BPTD) for modeling country--country interaction event data.
In this paper, we present a novel LF sampling scheme by exploiting a special non-centric camera called the crossed-slit or XSlit camera.
A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood.
A gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors.
To infer multilayer deep representations of high-dimensional discrete and nonnegative real vectors, we propose an augmentable gamma belief network (GBN) that factorizes each of its hidden layers into the product of a sparse connection weight matrix and the nonnegative real hidden units of the next layer.
Example results on text analysis illustrate interesting relationships between the width of the first layer and the inferred network structure, and demonstrate that the PGBN, whose hidden units are imposed with correlated gamma priors, can add more layers to increase its performance gains over Poisson factor analysis, given the same limit on the width of the first layer.
A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link.
The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix.
The beta-negative binomial process (BNBP), an integer-valued stochastic process, is employed to partition a count vector into a latent random count matrix.
We define a family of probability distributions for random count matrices with a potentially unbounded number of rows and columns.
The paper introduces the concept of a cluster structure to define a joint distribution of the sample size and its exchangeable random partitions.
By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework.
A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling.
The beta process is employed as a prior for learning the dictionary, and this non-parametric method naturally infers an appropriate dictionary size.