We present a comprehensive study of sparse attention patterns in Transformer models.
In this work, we study a more challenging but practical problem, i. e., few-shot class-incremental learning for NER, where an NER model is trained with only few labeled samples of the new classes, without forgetting knowledge of the old ones.
Rapid advancements of large language models (LLMs) have enabled the processing, understanding, and generation of human-like text, with increasing integration into systems that touch our social sphere.
Concurrently, the LM-guided traverser acts as a local navigator that gathers pertinent context to progressively approach the question and guarantee retrieval quality.
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot.
Instruction tuning unlocks the superior capability of Large Language Models (LLM) to interact with humans.
With this framework, we develop two novel mutual information based loss functions, to (i) discover proper prompt initialization for the downstream tasks and learn sufficient task-relevant information from prompt tokens and (ii) encourage the output representation from the pretrained language model to be more aware of the task-relevant information captured in the learnt prompt.
Remarkably, by incorporating conditional information from the powerful CLIP model, our method can boost the current SOTA accuracy by 10-20 absolute points in many cases.
Ranked #1 on Image Classification on Food-101N
However, generating images of novel concept provided by the user input image is still a challenging task.
In this paper, we focus on improving the prompt transfer from dialogue state tracking to dialogue summarization and propose Skeleton-Assisted Prompt Transfer (SAPT), which leverages skeleton generation as extra supervision that functions as a medium connecting the distinct source and target task and resulting in the model's better consumption of dialogue state information.
The DrugChat system consists of a graph neural network (GNN), a large language model (LLM), and an adaptor.
This repository offers a foundational framework for exploring federated fine-tuning of LLMs using heterogeneous instructions across diverse categories.
Demonstration-based learning has shown great potential in stimulating pretrained language models' ability under limited data scenario.
Significantly, it can even outperform the time- and resource-consuming fine-tuning method on sentiment classification tasks.
Non-negative matrix factorization (NMF) based topic modeling is widely used in natural language processing (NLP) to uncover hidden topics of short text documents.
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality text-image pairs.
One of the major challenges in training text-to-image generation models is the need of a large number of high-quality image-text pairs.
Ranked #2 on Text-to-Image Generation on Multi-Modal-CelebA-HQ
Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation.
Voice style transfer, also called voice conversion, seeks to modify one speaker's voice to generate speech as if it came from another (target) speaker.
Data augmentation has been widely used to improve deep neural networks in many research fields, such as computer vision.
Flexibility design problems are a class of problems that appear in strategic decision-making across industries, where the objective is to design a ($e. g.$, manufacturing) network that affords flexibility and adaptivity.
In sequence-to-sequence models, classical optimal transport (OT) can be applied to semantically match generated sentences with target sentences.
An extension is further proposed to improve the OT learning, based on the structural and contextual information of the text sequences.
The neural attention mechanism plays an important role in many natural language processing applications.
High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task.
We propose a novel framework for structured bandits, which we call an influence diagram bandit.
Model-Agnostic Meta-Learning (MAML), a model-agnostic meta-learning method, is successfully employed in NLP applications including few-shot text classification and multi-domain low-resource language generation.
Text-based interactive recommendation provides richer user feedback and has demonstrated advantages over traditional interactive recommender systems.
Auto-regressive text generation models usually focus on local fluency, and may cause inconsistent semantic meaning in long text generation.
An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.
Reinforcement learning (RL) has been widely studied for improving sequence-generation models.
In this paper, we focus on skeleton-based action generation and propose to model smooth and diverse transitions on a latent space of action sequences with much lower dimensionality.
Ranked #4 on Human action generation on NTU RGB+D 2D
Text-based interactive recommendation provides richer user preferences and has demonstrated advantages over traditional interactive recommender systems.
To address this, we propose a learning framework that improves collaborative filtering with a synthetic feedback loop (CF-SFL) to simulate the user feedback.
This paper considers a novel variational formulation of network embeddings, with special focus on textual networks.
In this work, we investigate the problem of figure captioning where the goal is to automatically generate a natural language description of the figure.
We propose a topic-guided variational auto-encoder (TGVAE) model for text generation.
We propose a topic-guided variational autoencoder (TGVAE) model for text generation.
Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model.
Sequence-to-sequence models are commonly trained via maximum likelihood estimation (MLE).
Sequence generation with reinforcement learning (RL) has received significant attention recently.
With such theoretical guarantees, SPOS can be safely and effectively applied on both Bayesian DL and deep RL tasks.
Particle-optimization-based sampling (POS) is a recently developed effective sampling technique that interactively updates a set of particles.
Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate.
Particle-based variational inference methods (ParVIs) have gained attention in the Bayesian inference literature, for their capacity to yield flexible and accurate approximations.
Recent advances on the scalability and flexibility of variational inference have made it successful at unravelling hidden patterns in complex data.
There has been recent interest in developing scalable Bayesian sampling methods such as stochastic gradient MCMC (SG-MCMC) and Stein variational gradient descent (SVGD) for big-data analysis.
Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications.