Search Results for author: Nan Duan

Found 180 papers, 95 papers with code

Analytical Reasoning of Text

1 code implementation Findings (NAACL) 2022 Wanjun Zhong, Siyuan Wang, Duyu Tang, Zenan Xu, Daya Guo, Yining Chen, Jiahai Wang, Jian Yin, Ming Zhou, Nan Duan

In this paper, we study the challenge of analytical reasoning of text and collect a new dataset consisting of questions from the Law School Admission Test from 1991 to 2016.

KFCNet: Knowledge Filtering and Contrastive Learning for Generative Commonsense Reasoning

no code implementations Findings (EMNLP) 2021 Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.

Contrastive Learning Text Generation

CULG: Commercial Universal Language Generation

no code implementations NAACL (ACL) 2022 Haonan Li, Yameng Huang, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models (PLMs) have dramatically improved performance for many natural language processing (NLP) tasks in domains such as finance and healthcare.

Marketing Text Generation

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations16 Feb 2024 Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

no code implementations30 Jan 2024 Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.

Vector Graphics

Voila-A: Aligning Vision-Language Models with User's Gaze Attention

no code implementations22 Dec 2023 Kun Yan, Lei Ji, Zeyu Wang, Yuntao Wang, Nan Duan, Shuai Ma

In this paper, we introduce gaze information, feasibly collected by AR or VR devices, as a proxy for human attention to guide VLMs and propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.

Competition-Level Problems are Effective LLM Evaluators

no code implementations4 Dec 2023 Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen

Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently.

PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion

1 code implementation3 Nov 2023 Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, Nan Duan

Recent evaluations of Large Language Models (LLMs) have centered around testing their zero-shot/few-shot capabilities for basic natural language tasks and their ability to translate instructions into tool APIs.

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

no code implementations12 Oct 2023 Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.

In-Context Learning Text Generation

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

1 code implementation29 Sep 2023 Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.

Ranked #8 on Math Word Problem Solving on MATH (using extra training data)

Arithmetic Reasoning Computational Efficiency +3

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

no code implementations29 Sep 2023 Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, Nan Duan

We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency across outputs from multiple perspectives.

Code Generation

LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models

1 code implementation18 Sep 2023 Zecheng Tang, Chenfei Wu, Juntao Li, Nan Duan

Graphic layout generation, a growing research field, plays a significant role in user engagement and information perception.

Code Completion Code Generation

ORES: Open-vocabulary Responsible Visual Synthesis

1 code implementation26 Aug 2023 Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content.

Image Generation Language Modelling

GameEval: Evaluating LLMs on Conversational Games

1 code implementation19 Aug 2023 Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.

Question Answering

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

no code implementations16 Aug 2023 Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.

Trajectory Modeling Video Generation

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

1 code implementation27 Jun 2023 Ryo Sekizawa, Nan Duan, Shuai Lu, Hitomi Yanaka

Code search is a task to find programming codes that semantically match the given natural language queries.

Code Search Machine Translation +2

GroundNLQ @ Ego4D Natural Language Queries Challenge 2023

1 code implementation27 Jun 2023 Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou

Motivated by this, we leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations, and further fine-tune the model on annotated data.

Natural Language Queries

LongCoder: A Long-Range Pre-trained Language Model for Code Completion

1 code implementation26 Jun 2023 Daya Guo, Canwen Xu, Nan Duan, Jian Yin, Julian McAuley

In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task.

Code Completion Language Modelling

CMMLU: Measuring massive multitask language understanding in Chinese

1 code implementation15 Jun 2023 Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, Timothy Baldwin

As the capabilities of large language models (LLMs) continue to advance, evaluating their performance becomes increasingly crucial and challenging.

Large Language Model

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

1 code implementation31 May 2023 Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.

Representation Learning

Allies: Prompting Large Language Model with Beam Search

1 code implementation24 May 2023 Hao Sun, Xiao Liu, Yeyun Gong, Yan Zhang, Daxin Jiang, Linjun Yang, Nan Duan

With the advance of large language models (LLMs), the research field of LLM applications becomes more and more popular and the idea of constructing pipelines to accomplish complex tasks by stacking LLM API calls come true.

Language Modelling Large Language Model +3

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

no code implementations24 May 2023 Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.

Fact Verification Multi-hop Question Answering +2

Query Rewriting for Retrieval-Augmented Large Language Models

no code implementations23 May 2023 Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Furthermore, to better align the query to the frozen modules, we propose a trainable scheme for our pipeline.

Language Modelling Multiple-choice +1

Machine-Created Universal Language for Cross-lingual Transfer

1 code implementation22 May 2023 Yaobo Liang, Quanzhi Zhu, Junhe Zhao, Nan Duan

There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English.

Cross-Lingual Transfer Test

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

2 code implementations19 May 2023 Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging.

Fact Checking Natural Questions +4

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

no code implementations11 May 2023 Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance.

Abstractive Text Summarization

Code Execution with Pre-trained Language Models

1 code implementation8 May 2023 Chenxiao Liu, Shuai Lu, Weizhu Chen, Daxin Jiang, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan, Nan Duan

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code.

Code Generation Code Search +2

Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models

1 code implementation23 Apr 2023 Jiashuo Sun, Yi Luo, Yeyun Gong, Chen Lin, Yelong Shen, Jian Guo, Nan Duan

By utilizing iterative bootstrapping, our approach enables LLMs to autonomously rectify errors, resulting in more precise and comprehensive reasoning chains.

Learning to Plan with Natural Language

1 code implementation20 Apr 2023 Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan

To obtain it, we propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.

Transfer Learning

Low-code LLM: Visual Programming over LLMs

1 code implementation17 Apr 2023 Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan

The proposed Low-code LLM framework consists of a Planning LLM that designs a structured planning workflow for complex tasks, which can be correspondingly edited and confirmed by users through low-code visual programming operations, and an Executing LLM that generates responses following the user-confirmed workflow.

Prompt Engineering

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

2 code implementations13 Apr 2023 Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92. 5% accuracy on the English test of the Chinese national college entrance exam.

Decision Making Math

Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data

4 code implementations3 Apr 2023 Canwen Xu, Daya Guo, Nan Duan, Julian McAuley

Furthermore, we propose a new technique called Self-Distill with Feedback, to further improve the performance of the Baize models with feedback from ChatGPT.

Chatbot Language Modelling +1

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

no code implementations29 Mar 2023 Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long Jin, Hang Zhang, Chen Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, Weizhu Chen

To be more precise, we begin by creating prompts for every demonstrated example, which we subsequently utilize to prompt a LLM to provide an explanation for why the specific ground truth answer/label was chosen for that particular example.

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

no code implementations29 Mar 2023 Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.

Code Generation Common Sense Reasoning +1

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

2 code implementations8 Mar 2023 Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

1 code implementation3 Feb 2023 Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan

Specifically, we propose a multilingual PLM called masked sentence model (MSM), which consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.

Relation Representation Learning +3

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

no code implementations1 Feb 2023 Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly.

MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

1 code implementation15 Dec 2022 Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, Ji-Rong Wen

Pre-trained Transformers (\eg BERT) have been commonly used in existing dense retrieval methods for parameter initialization, and recent studies are exploring more effective pre-training tasks for further improving the quality of dense vectors.

Passage Retrieval Retrieval

APOLLO: An Optimized Training Approach for Long-form Numerical Reasoning

1 code implementation14 Dec 2022 Jiashuo Sun, Hang Zhang, Chen Lin, Yeyun Gong, Jian Guo, Nan Duan

For the retriever, we adopt a number-aware negative sampling strategy to enable the retriever to be more discriminative on key numerical facts.

Conversational Question Answering

LEAD: Liberal Feature-based Distillation for Dense Retrieval

1 code implementation10 Dec 2022 Hao Sun, Xiao Liu, Yeyun Gong, Anlei Dong, Jingwen Lu, Yan Zhang, Linjun Yang, Rangan Majumder, Nan Duan

Knowledge distillation is often used to transfer knowledge from a strong teacher model to a relatively weak student model.

Document Ranking Knowledge Distillation +2

CodeExp: Explanatory Code Document Generation

1 code implementation25 Nov 2022 Haotian Cui, Chenglong Wang, JunJie Huang, Jeevana Priya Inala, Todd Mytkowicz, Bo wang, Jianfeng Gao, Nan Duan

Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones.

Explanation Generation Text Generation

ReCo: Region-Controlled Text-to-Image Generation

no code implementations CVPR 2023 Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.


GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

2 code implementations18 Nov 2022 Biyang Guo, Yeyun Gong, Yelong Shen, Songqiao Han, Hailiang Huang, Nan Duan, Weizhu Chen

We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens).

Conditional Text Generation Data Augmentation +8

Execution-based Evaluation for Data Science Code Generation Models

1 code implementation17 Nov 2022 JunJie Huang, Chenglong Wang, Jipeng Zhang, Cong Yan, Haotian Cui, Jeevana Priya Inala, Colin Clement, Nan Duan, Jianfeng Gao

Code generation models can benefit data scientists' productivity by automatically generating code from context and text descriptions.

Code Generation Model Selection

Metric-guided Distillation: Distilling Knowledge from the Metric to Ranker and Retriever for Generative Commonsense Reasoning

no code implementations21 Oct 2022 Xingwei He, Yeyun Gong, A-Long Jin, Weizhen Qi, Hang Zhang, Jian Jiao, Bartuer Zhou, Biao Cheng, SM Yiu, Nan Duan

Commonsense generation aims to generate a realistic sentence describing a daily scene under the given concepts, which is very challenging, since it requires models to have relational reasoning and compositional generalization capabilities.

Relational Reasoning Re-Ranking +1

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

1 code implementation21 Oct 2022 Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen

Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.

Retrieval Text Retrieval

Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers

no code implementations20 Oct 2022 Wanjun Zhong, Tingting Ma, Jiahai Wang, Jian Yin, Tiejun Zhao, Chin-Yew Lin, Nan Duan

This paper presents ReasonFormer, a unified reasoning framework for mirroring the modular and compositional reasoning process of humans in complex decision-making.

Decision Making

Soft-Labeled Contrastive Pre-training for Function-level Code Representation

1 code implementation18 Oct 2022 Xiaonan Li, Daya Guo, Yeyun Gong, Yun Lin, Yelong Shen, Xipeng Qiu, Daxin Jiang, Weizhu Chen, Nan Duan

In this paper, we present \textbf{SCodeR}, a \textbf{S}oft-labeled contrastive pre-training framework with two positive sample construction methods to learn functional-level \textbf{Code} \textbf{R}epresentation.

Sentiment-Aware Word and Sentence Level Pre-training for Sentiment Analysis

1 code implementation18 Oct 2022 Shuai Fan, Chen Lin, Haonan Li, Zhenghao Lin, Jinsong Su, Hang Zhang, Yeyun Gong, Jian Guo, Nan Duan

Most existing pre-trained language representation models (PLMs) are sub-optimal in sentiment analysis tasks, as they capture the sentiment information from word-level while under-considering sentence-level information.

Contrastive Learning Language Modelling +3

Mixed-modality Representation Learning and Pre-training for Joint Table-and-Text Retrieval in OpenQA

1 code implementation11 Oct 2022 JunJie Huang, Wanjun Zhong, Qian Liu, Ming Gong, Daxin Jiang, Nan Duan

However, training an effective dense table-text retriever is difficult due to the challenges of table-text discrepancy and data sparsity problem.

Open-Domain Question Answering Representation Learning +2

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

no code implementations10 Oct 2022 Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds.

Vocal Bursts Intensity Prediction

PROD: Progressive Distillation for Dense Retrieval

1 code implementation27 Sep 2022 Zhenghao Lin, Yeyun Gong, Xiao Liu, Hang Zhang, Chen Lin, Anlei Dong, Jian Jiao, Jingwen Lu, Daxin Jiang, Rangan Majumder, Nan Duan

It is common that a better teacher model results in a bad student via distillation due to the nonnegligible gap between teacher and student.

Knowledge Distillation Natural Questions +1

CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding

1 code implementation22 Sep 2022 Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing-Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan

This paper tackles an emerging and challenging problem of long video temporal grounding~(VTG) that localizes video moments related to a natural language (NL) query.

Contrastive Learning Video Grounding

Improving Task Generalization via Unified Schema Prompt

no code implementations5 Aug 2022 Wanjun Zhong, Yifan Gao, Ning Ding, Zhiyuan Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

Task generalization has been a long standing challenge in Natural Language Processing (NLP).

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

1 code implementation20 Jul 2022 Chenfei Wu, Jian Liang, Xiaowei Hu, Zhe Gan, JianFeng Wang, Lijuan Wang, Zicheng Liu, Yuejian Fang, Nan Duan

In this paper, we present NUWA-Infinity, a generative model for infinite visual synthesis, which is defined as the task of generating arbitrarily-sized high-resolution images or long-duration videos.

Image Outpainting Text-to-Image Generation +1

Joint Generator-Ranker Learning for Natural Language Generation

2 code implementations28 Jun 2022 Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen

Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates.

Question Generation Question-Generation +2

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

1 code implementation17 Jun 2022 Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years.

Representation Learning

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

1 code implementation7 Jun 2022 Ning Wu, Yaobo Liang, Houxing Ren, Linjun Shou, Nan Duan, Ming Gong, Daxin Jiang

On the multilingual sentence retrieval task Tatoeba, our model achieves new SOTA results among methods without using bilingual data.

Language Modelling Passage Retrieval +4

DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

no code implementations1 Jun 2022 Jie Shi, Chenfei Wu, Jian Liang, Xiang Liu, Nan Duan

Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.

Denoising Image Generation

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

no code implementations23 May 2022 Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan

To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.

Question Generation Question-Generation +1

LogiGAN: Learning Logical Reasoning via Adversarial Pre-training

1 code implementation18 May 2022 Xinyu Pi, Wanjun Zhong, Yan Gao, Nan Duan, Jian-Guang Lou

We present LogiGAN, an unsupervised adversarial pre-training framework for improving logical reasoning abilities of language models.

Logical Reasoning Sentence

ProQA: Structural Prompt-based Pre-training for Unified Question Answering

1 code implementation NAACL 2022 Wanjun Zhong, Yifan Gao, Ning Ding, Yujia Qin, Zhiyuan Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

Furthermore, ProQA exhibits strong ability in both continual learning and transfer learning by taking the advantages of the structural prompt.

Continual Learning Few-Shot Learning +2

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

no code implementations ACL 2022 Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan

Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries.

Representation Learning Retrieval

Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure

no code implementations ACL 2022 Yuan Chai, Yaobo Liang, Nan Duan

Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.

Natural Language Inference Retrieval +2

ReACC: A Retrieval-Augmented Code Completion Framework

1 code implementation ACL 2022 Shuai Lu, Nan Duan, Hojae Han, Daya Guo, Seung-won Hwang, Alexey Svyatkovskiy

Code completion, which aims to predict the following code token(s) according to the code context, can improve the productivity of software development.

Code Completion Language Modelling +1

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

1 code implementation Findings (ACL) 2022 Canwen Xu, Daya Guo, Nan Duan, Julian McAuley

Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives.

Contrastive Learning Re-Ranking +3

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

2 code implementations ACL 2022 Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, Jian Yin

Furthermore, we propose to utilize multi-modal contents to learn representation of code fragment with contrastive learning, and then align representations among programming languages using a cross-modal generation task.

Code Completion Code Search +1

NÜWA-LIP: Language Guided Image Inpainting with Defect-free VQGAN

no code implementations10 Feb 2022 Minheng Ni, Chenfei Wu, Haoyang Huang, Daxin Jiang, WangMeng Zuo, Nan Duan

Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged.

Image Inpainting

Reasoning over Hybrid Chain for Table-and-Text Open Domain QA

no code implementations15 Jan 2022 Wanjun Zhong, JunJie Huang, Qian Liu, Ming Zhou, Jiahai Wang, Jian Yin, Nan Duan

CARP utilizes hybrid chain to model the explicit intermediate reasoning process across table and text for question answering.

Open-Domain Question Answering

Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering

no code implementations NeurIPS 2021 Weijiang Yu, Haoteng Zheng, Mengfei Li, Lei Ji, Lijun Wu, Nong Xiao, Nan Duan

To consider the interdependent knowledge between contextual clips into the network inference, we propose a Siamese Sampling and Reasoning (SiaSamRea) approach, which consists of a siamese sampling mechanism to generate sparse and similar clips (i. e., siamese clips) from the same video, and a novel reasoning strategy for integrating the interdependent knowledge between contextual clips into the network.

Multimodal Reasoning Question Answering +1

NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion

1 code implementation24 Nov 2021 Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan

To cover language, image, and video at the same time for different scenarios, a 3D transformer encoder-decoder framework is designed, which can not only deal with videos as 3D data but also adapt to texts and images as 1D and 2D data, respectively.

Text-to-Video Generation Video Generation +1

Adversarial Retriever-Ranker for dense text retrieval

1 code implementation ICLR 2022 Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen

To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker.

Natural Questions Retrieval +2

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation Findings (NAACL) 2022 Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy

no code implementations EMNLP 2021 Colin B. Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, Alexey Svyatkovskiy

While there are many efforts to extend the context window, we introduce an architecture-independent approach for leveraging the syntactic hierarchies of source code for incorporating entire file-level context into a fixed-length window.

Code Completion Code Generation +3

KFCNet: Knowledge Filtering and Contrastive Learning Network for Generative Commonsense Reasoning

no code implementations14 Sep 2021 Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan

Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.

Contrastive Learning Text Generation

Hybrid Reasoning Network for Video-based Commonsense Captioning

1 code implementation5 Aug 2021 Weijiang Yu, Jian Liang, Lei Ji, Lu Li, Yuejian Fang, Nong Xiao, Nan Duan

Firstly, we develop multi-commonsense learning for semantic-level reasoning by jointly training different commonsense types in a unified network, which encourages the interaction between the clues of multiple commonsense descriptions, event-wise captions and videos.


Control Image Captioning Spatially and Temporally

no code implementations ACL 2021 Kun Yan, Lei Ji, Huaishao Luo, Ming Zhou, Nan Duan, Shuai Ma

Moreover, the controllability and explainability of LoopCAG are validated by analyzing spatial and temporal sensitivity during the generation process.

Contrastive Learning Image Captioning +1

GEM: A General Evaluation Benchmark for Multimodal Tasks

1 code implementation Findings (ACL) 2021 Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages.

Learning to Complete Code with Sketches

no code implementations ICLR 2022 Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, Miltiadis Allamanis

To evaluate models, we consider both ROUGE as well as a new metric RegexAcc that measures success of generating completions matching long outputs with as few holes as possible.

Code Completion Code Generation +1

EL-Attention: Memory Efficient Lossless Attention for Generation

1 code implementation11 May 2021 Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks.

Question Generation Question-Generation

Poolingformer: Long Document Modeling with Pooling Attention

no code implementations10 May 2021 Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen

We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA.

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

1 code implementation30 Apr 2021 Chenfei Wu, Lun Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, Nan Duan

Generating videos from text is a challenging task due to its high computational requirements for training and infinite possible answers for evaluation.

Ranked #16 on Text-to-Video Generation on MSR-VTT (CLIPSIM metric)

Text-to-Video Generation Video Generation

CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval

5 code implementations18 Apr 2021 Huaishao Luo, Lei Ji, Ming Zhong, Yang Chen, Wen Lei, Nan Duan, Tianrui Li

In this paper, we propose a CLIP4Clip model to transfer the knowledge of the CLIP model to video-language retrieval in an end-to-end manner.

Retrieval Text Retrieval +4

AR-LSAT: Investigating Analytical Reasoning of Text

1 code implementation14 Apr 2021 Wanjun Zhong, Siyuan Wang, Duyu Tang, Zenan Xu, Daya Guo, Jiahai Wang, Jian Yin, Ming Zhou, Nan Duan

Analytical reasoning is an essential and challenging task that requires a system to analyze a scenario involving a set of particular circumstances and perform reasoning over it to make conclusions.

Syntax-Enhanced Pre-trained Model

1 code implementation ACL 2021 Zenan Xu, Daya Guo, Duyu Tang, Qinliang Su, Linjun Shou, Ming Gong, Wanjun Zhong, Xiaojun Quan, Nan Duan, Daxin Jiang

We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa.

Entity Typing Question Answering +1

Multi-level Alignment Pretraining for Multi-lingual Semantic Parsing

no code implementations COLING 2020 Bo Shao, Yeyun Gong, Weizhen Qi, Nan Duan, Xiaola Lin

In this paper, we present a multi-level alignment pretraining method in a unified architecture formulti-lingual semantic parsing.

Semantic Parsing Sentence

ProphetNet: Predicting Future N-gram for Sequence-to-SequencePre-training

no code implementations Findings of the Association for Computational Linguistics 2020 Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.

Abstractive Text Summarization Question Generation +1

Machine Reasoning: Technology, Dilemma and Future

no code implementations EMNLP 2020 Nan Duan, Duyu Tang, Ming Zhou

Machine reasoning research aims to build interpretable AI systems that can solve problems or draw conclusions from what they are told (i. e. facts and observations) and already know (i. e. models, common sense and knowledge) under certain constraints.

Common Sense Reasoning

ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine

no code implementations21 Oct 2020 Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou

We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models.


Neural Deepfake Detection with Factual Structure of Text

1 code implementation EMNLP 2020 Wanjun Zhong, Duyu Tang, Zenan Xu, Ruize Wang, Nan Duan, Ming Zhou, Jiahai Wang, Jian Yin

To address this, we propose a graph-based model that utilizes the factual structure of a document for deepfake detection of text.

DeepFake Detection Face Swapping +1

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

1 code implementation EMNLP 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv, Nan Duan, Ming Zhou

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.

Data Augmentation Machine Reading Comprehension +6

No Answer is Better Than Wrong Answer: A Reflection Model for Document Level Machine Reading Comprehension

no code implementations Findings of the Association for Computational Linguistics 2020 Xuguang Wang, Linjun Shou, Ming Gong, Nan Duan, Daxin Jiang

The Natural Questions (NQ) benchmark set brings new challenges to Machine Reading Comprehension: the answers are not only at different levels of granularity (long and short), but also of richer types (including no-answer, yes/no, single-span and multi-span).

Machine Reading Comprehension Natural Questions

GraphCodeBERT: Pre-training Code Representations with Data Flow

1 code implementation ICLR 2021 Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, Shengyu Fu, Michele Tufano, Shao Kun Deng, Colin Clement, Dawn Drain, Neel Sundaresan, Jian Yin, Daxin Jiang, Ming Zhou

Instead of taking syntactic-level structure of code like abstract syntax tree (AST), we use data flow in the pre-training stage, which is a semantic-level structure of code that encodes the relation of "where-the-value-comes-from" between variables.

Clone Detection Code Completion +7

Graph Neural News Recommendation with Unsupervised Preference Disentanglement

1 code implementation ACL 2020 Linmei Hu, Siyong Xu, Chen Li, Cheng Yang, Chuan Shi, Nan Duan, Xing Xie, Ming Zhou

Furthermore, the learned representations are disentangled with latent preference factors by a neighborhood routing algorithm, which can enhance expressiveness and interpretability.

Disentanglement News Recommendation

Evidence-Aware Inferential Text Generation with Vector Quantised Variational AutoEncoder

1 code implementation ACL 2020 Daya Guo, Duyu Tang, Nan Duan, Jian Yin, Daxin Jiang, Ming Zhou

Generating inferential texts about an event in different perspectives requires reasoning over different contexts that the event occurs.

Common Sense Reasoning Text Generation

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

1 code implementation CVPR 2021 Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Jianfeng Gao, Dongdong Zhang, Nan Duan

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.

Image Captioning Image Retrieval +4

Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension

1 code implementation ACL 2020 Bo Zheng, Haoyang Wen, Yaobo Liang, Nan Duan, Wanxiang Che, Daxin Jiang, Ming Zhou, Ting Liu

Natural Questions is a new challenging machine reading comprehension benchmark with two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer).

Graph Attention Machine Reading Comprehension +1

RikiNet: Reading Wikipedia Pages for Natural Question Answering

no code implementations ACL 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Nan Duan

The representations are then fed into the predictor to obtain the span of the short answer, the paragraph of the long answer, and the answer type in a cascaded manner.

Natural Language Understanding Natural Questions +1

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

no code implementations ACL 2020 Fei Yuan, Linjun Shou, Xuanyu Bai, Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages.

Boundary Detection Machine Reading Comprehension +2

Pre-training Text Representations as Meta Learning

no code implementations12 Apr 2020 Shangwen Lv, Yuechen Wang, Daya Guo, Duyu Tang, Nan Duan, Fuqing Zhu, Ming Gong, Linjun Shou, Ryan Ma, Daxin Jiang, Guihong Cao, Ming Zhou, Songlin Hu

In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks.

Language Modelling Meta-Learning +2

Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation

1 code implementation EMNLP 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Wei Liu, Yu Yan, Bo Shao, Daxin Jiang, Jiancheng Lv, Nan Duan

Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of $<$news article, headline, keyphrase$>$.

Headline Generation Sentence

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

2 code implementations3 Apr 2020 Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.

Natural Language Understanding XLM-R

XGPT: Cross-modal Generative Pre-Training for Image Captioning

no code implementations3 Mar 2020 Qiaolin Xia, Haoyang Huang, Nan Duan, Dong-dong Zhang, Lei Ji, Zhifang Sui, Edward Cui, Taroon Bharti, Xin Liu, Ming Zhou

While many BERT-based cross-modal pre-trained models produce excellent results on downstream understanding tasks like image-text retrieval and VQA, they cannot be applied to generation tasks directly.

Data Augmentation Denoising +7

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

2 code implementations15 Feb 2020 Huaishao Luo, Lei Ji, Botian Shi, Haoyang Huang, Nan Duan, Tianrui Li, Jason Li, Taroon Bharti, Ming Zhou

However, most of the existing multimodal models are pre-trained for understanding tasks, leading to a pretrain-finetune discrepancy for generation tasks.

Ranked #2 on Action Segmentation on COIN (using extra training data)

Action Segmentation Language Modelling +2