Search Results for author: Yaobo Liang

Found 26 papers, 13 papers with code

PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion

1 code implementation6 Mar 2024 Zekai Zhang, Yiduo Guo, Yaobo Liang, Dongyan Zhao, Nan Duan

The growing dependence on Large Language Models (LLMs) for finishing user instructions necessitates a comprehensive understanding of their robustness to complex task completion in real-world situations.


Competition-Level Problems are Effective LLM Evaluators

no code implementations4 Dec 2023 Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen

Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently.

PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion

1 code implementation3 Nov 2023 Yiduo Guo, Zekai Zhang, Yaobo Liang, Dongyan Zhao, Nan Duan

Recent evaluations of Large Language Models (LLMs) have centered around testing their zero-shot/few-shot capabilities for basic natural language tasks and their ability to translate instructions into tool APIs.

EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation

no code implementations12 Oct 2023 Wang You, Wenshan Wu, Yaobo Liang, Shaoguang Mao, Chenfei Wu, Maosong Cao, Yuzhe Cai, Yiduo Guo, Yan Xia, Furu Wei, Nan Duan

In this paper, we propose a new framework called Evaluation-guided Iterative Plan Extraction for long-form narrative text generation (EIPE-text), which extracts plans from the corpus of narratives and utilizes the extracted plans to construct a better planner.

In-Context Learning Text Generation

GameEval: Evaluating LLMs on Conversational Games

1 code implementation19 Aug 2023 Dan Qiao, Chenfei Wu, Yaobo Liang, Juntao Li, Nan Duan

In this paper, we propose GameEval, a novel approach to evaluating LLMs through goal-driven conversational games, overcoming the limitations of previous methods.

Question Answering

Machine-Created Universal Language for Cross-lingual Transfer

1 code implementation22 May 2023 Yaobo Liang, Quanzhi Zhu, Junhe Zhao, Nan Duan

There are two primary approaches to addressing cross-lingual transfer: multilingual pre-training, which implicitly aligns the hidden representations of various languages, and translate-test, which explicitly translates different languages into an intermediate language, such as English.

Cross-Lingual Transfer

Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast

no code implementations19 May 2023 Yiduo Guo, Yaobo Liang, Dongyan Zhao, Bing Liu, Duan Nan

Existing research has shown that a multilingual pre-trained language model fine-tuned with one (source) language also performs well on downstream tasks for non-source languages, even though no fine-tuning is done on these languages.

Cross-Lingual Transfer Language Modelling

Learning to Plan with Natural Language

1 code implementation20 Apr 2023 Yiduo Guo, Yaobo Liang, Chenfei Wu, Wenshan Wu, Dongyan Zhao, Nan Duan

To obtain it, we propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.

Transfer Learning

Low-code LLM: Graphical User Interface over Large Language Models

2 code implementations17 Apr 2023 Yuzhe Cai, Shaoguang Mao, Wenshan Wu, Zehua Wang, Yaobo Liang, Tao Ge, Chenfei Wu, Wang You, Ting Song, Yan Xia, Jonathan Tien, Nan Duan, Furu Wei

By introducing this framework, we aim to bridge the gap between humans and LLMs, enabling more effective and efficient utilization of LLMs for complex tasks.

Prompt Engineering

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

2 code implementations13 Apr 2023 Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92. 5% accuracy on the English test of the Chinese national college entrance exam.

Decision Making Math

TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

no code implementations29 Mar 2023 Yaobo Liang, Chenfei Wu, Ting Song, Wenshan Wu, Yan Xia, Yu Liu, Yang Ou, Shuai Lu, Lei Ji, Shaoguang Mao, Yun Wang, Linjun Shou, Ming Gong, Nan Duan

On the other hand, there are also many existing models and systems (symbolic-based or neural-based) that can do some domain-specific tasks very well.

Code Generation Common Sense Reasoning +1

Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval

1 code implementation3 Feb 2023 Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan

Specifically, we propose a multilingual PLM called masked sentence model (MSM), which consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document.

Relation Representation Learning +3

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval

1 code implementation7 Jun 2022 Ning Wu, Yaobo Liang, Houxing Ren, Linjun Shou, Nan Duan, Ming Gong, Daxin Jiang

On the multilingual sentence retrieval task Tatoeba, our model achieves new SOTA results among methods without using bilingual data.

Language Modelling Passage Retrieval +5

Cross-Lingual Ability of Multilingual Masked Language Models: A Study of Language Structure

no code implementations ACL 2022 Yuan Chai, Yaobo Liang, Nan Duan

Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.

Natural Language Inference Retrieval +3

Multi-View Document Representation Learning for Open-Domain Dense Retrieval

no code implementations ACL 2022 Shunyu Zhang, Yaobo Liang, Ming Gong, Daxin Jiang, Nan Duan

Second, to prevent multi-view embeddings from collapsing to the same one, we further propose a global-local loss with annealed temperature to encourage the multiple viewers to better align with different potential queries.

Representation Learning Retrieval

Simpson's Bias in NLP Training

no code implementations13 Mar 2021 Fei Yuan, Longtu Zhang, Huang Bojun, Yaobo Liang

In most machine learning tasks, we evaluate a model $M$ on a given data population $S$ by measuring a population-level metric $F(S;M)$.

Multi-class Classification Sentence +1

GLOW : Global Weighted Self-Attention Network for Web Search

1 code implementation10 Jul 2020 Xuan Shan, Chuanjie Liu, Yiqian Xia, Qi Chen, Yusi Zhang, Kaize Ding, Yaobo Liang, Angen Luo, Yuxiang Luo

Deep matching models aim to facilitate search engines retrieving more relevant documents by mapping queries and documents into semantic vectors in the first-stage retrieval.

Document Ranking Information Retrieval +2

Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension

1 code implementation ACL 2020 Bo Zheng, Haoyang Wen, Yaobo Liang, Nan Duan, Wanxiang Che, Daxin Jiang, Ming Zhou, Ting Liu

Natural Questions is a new challenging machine reading comprehension benchmark with two-grained answers, which are a long answer (typically a paragraph) and a short answer (one or more entities inside the long answer).

Graph Attention Machine Reading Comprehension +1

Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension

no code implementations ACL 2020 Fei Yuan, Linjun Shou, Xuanyu Bai, Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, Daxin Jiang

Multilingual pre-trained models could leverage the training data from a rich source language (such as English) to improve performance on low resource languages.

Boundary Detection Machine Reading Comprehension +2

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

2 code implementations3 Apr 2020 Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.

Natural Language Understanding XLM-R

Dense Procedure Captioning in Narrated Instructional Videos

no code implementations ACL 2019 Botian Shi, Lei Ji, Yaobo Liang, Nan Duan, Peng Chen, Zhendong Niu, Ming Zhou

Understanding narrated instructional videos is important for both research and real-world web applications.

Dense Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.