Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.
Large language models (LLMs) have shown remarkable capacity for in-context learning (ICL), where learning a new task from just a few training examples is done without being explicitly pre-trained.
Moreover, our model even outperforms the largest closed LLMs, Anthropic's Claude and Google's Bard, on HumanEval and HumanEval+.
Ranked #10 on Code Generation on HumanEval
Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern search engines (SEs).
We demonstrate that our PKG framework can enhance the performance of "black-box" LLMs on a range of domain knowledge-intensive tasks that require factual (+7. 9%), tabular (+11. 9%), medical (+3. 0%), and multimodal (+8. 1%) knowledge.
In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans.
The conventional dense retrieval paradigm relies on encoding images and texts into dense representations using dual-stream encoders, however, it faces challenges with low retrieval speed in large-scale retrieval scenarios.
Weakly-Supervised Video Grounding (WSVG) aims to localize events of interest in untrimmed videos with only video-level annotations.
To address this issue, we propose a novel sparse retrieval paradigm for ITR that exploits sparse representations in the vocabulary space for images and texts.
Different from previous works that only rely on one positive and hard negatives as candidate passages, we create dark examples that all have moderate relevance to the query through mixing-up and masking in discrete space.
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder.
We demonstrate the effectiveness of the latent user intent modeling via offline analyses as well as live experiments on a large-scale industrial recommendation platform.
First, it is the largest multi-modal conversation dataset by the number of dialogues by 88x.
Ranked #2 on Multimodal Intent Recognition on MMDialog
no code implementations • 30 Sep 2022 • Konstantina Christakopoulou, Can Xu, Sai Zhang, Sriraj Badam, Trevor Potter, Daniel Li, Hao Wan, Xinyang Yi, Ya Le, Chris Berg, Eric Bencomo Dixon, Ed H. Chi, Minmin Chen
How might we design Reinforcement Learning (RL)-based recommenders that encourage aligning user trajectories with the underlying user satisfaction?
In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency.
The alignment is achieved by weakened knowledge distillations to enlighten the retriever via two aspects -- 1) a lexicon-augmented contrastive objective to challenge the dense encoder and 2) a pair-wise rank-consistent regularization to make dense model's behavior incline to the other.
Recently, many efforts have been devoted to improving Tag-aware recommendation systems (TRS) with Graph Convolutional Networks (GCN), which has become new state-of-the-art for the general recommendation.
This paper focuses on the data augmentation for low-resource NLP tasks where the training set is limited.
A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever.
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
(2) How to cohere with context and preserve the knowledge when generating a stylized response.
To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
This paper focuses on the Data Augmentation for low-resource Natural Language Understanding (NLU) tasks.
A straightforward solution is resorting to more diverse positives from a multi-augmenting strategy, while an open question remains about how to unsupervisedly learn from the diverse positives but with uneven augmenting qualities in the text field.
Recurrent recommender systems have been successful in capturing the temporal dynamics in users' activity trajectories.
The sequence representation plays a key role in the learning of matching degree between the dialogue context and the response.
In such a low-resource setting, we devise a novel conversational agent, Divter, in order to isolate parameters that depend on multimodal dialogues from the entire generation model.
Furthermore, we propose to evaluate the CRS models in an end-to-end manner, which can reflect the overall performance of the entire system rather than the performance of individual modules, compared to the separate evaluations of the two modules used in previous work.
Second, only the items mentioned in the training corpus have a chance to be recommended in the conversation.
We study the problem of coarse-grained response selection in retrieval-based dialogue systems.
Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process.
Sequence-to-Sequence (S2S) neural text generation models, especially the pre-trained ones (e. g., BART and T5), have exhibited compelling performance on various natural language generation tasks.
Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction.
The retriever aims to retrieve a correlated image to the dialog from an image index, while the visual concept detector extracts rich visual knowledge from the image.
ProphetNet is a pre-training based natural language generation method which shows powerful performance on English text summarization and question generation tasks.
Organ transplantation is often the last resort for treating end-stage illness, but the probability of a successful transplantation depends greatly on compatibility between donors and recipients.
We study knowledge-grounded dialogue generation with pre-trained language models.
Generating responses following a desired style has great potentials to extend applications of open-domain dialogue systems, yet is refrained by lacking of parallel data for training.
While neural conversation models have shown great potentials towards generating informative and engaging responses via introducing external knowledge, learning such a model often requires knowledge-grounded dialogues that are difficult to obtain.
Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques.
In such a low-resource setting, we devise a disentangled response decoder in order to isolate parameters that depend on knowledge-grounded dialogues from the entire generation model.
This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge.
Since the paired data now is no longer enough to train a neural generation model, we consider leveraging the large scale of unpaired data that are much easier to obtain, and propose response generation with both paired and unpaired data.
Currently, researchers have paid great attention to retrieval-based dialogues in open-domain.
Ranked #10 on Conversational Response Selection on E-commerce
We present a document-grounded matching network (DGMN) for response selection that can power a knowledge-aware retrieval-based chatbot system.
Our approach employs a mixture of models, each with a different temporal range.
Most current state-of-the-art text-independent speaker verification systems take probabilistic linear discriminant analysis (PLDA) as their backend classifiers.
The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity.
In this paper, we study context-response matching with pre-trained contextualized representations for multi-turn response selection in retrieval-based chatbots.
We study open domain dialogue generation with dialogue acts designed to explain how people engage in social chat.
Conventional methods model open domain dialogue generation as a black box through end-to-end learning from large scale conversation data.
The task requires matching a response candidate with a conversation context, whose challenges include how to recognize important parts of the context, and how to model the relationships among utterances in the context.
We argue that the intermediate mapping, e. g. boosting predictor, is preserving the discriminant aspects of the data and by controlling the dimension of this mapping it is possible to achieve discriminant low dimensional representations for the data.
Images have become one of the most popular types of media through which users convey their emotions within online social networks.