Search Results for author: Wenhu Chen

Found 92 papers, 53 papers with code

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

no code implementations • 29 May 2024 • Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu, Noah Wang, Quehry Que, Ruibo Liu, Sine Liu, Shawn Guo, Soren Gao, Wangchunshu Zhou, Xinyue Zhang, Yizhi Zhou, YuBo Wang, Yuelin Bai, Yuhan Zhang, Yuxiang Zhang, Zenith Wang, Zhenzhu Yang, Zijian Zhao, Jiajun Zhang, Wanli Ouyang, Wenhao Huang, Wenhu Chen

To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e. g., Pythia, Amber, OLMo), where more details (e. g., pre-training corpus and training code) are being provided.

Paper
Add Code

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

no code implementations • 29 May 2024 • Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$.

Paper
Add Code

UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models

no code implementations • 16 May 2024 • Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin

Recently, Multi-Modal(MM) Large Language Models(LLMs) have unlocked many complex use-cases that require MM understanding (e. g., image captioning or visual question answering) and MM generation (e. g., text-guided image generation or editing) capabilities.

Image Captioning Image Generation +3

Paper
Add Code

MAmmoTH2: Scaling Instructions from the Web

no code implementations • 6 May 2024 • Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen

Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 36. 7% on MATH and from 36% to 68. 4% on GSM8K without training on any in-domain data.

Chatbot GSM8K +1

Paper
Add Code

MANTIS: Interleaved Multi-Image Instruction Tuning

no code implementations • 2 May 2024 • Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

We further evaluate Mantis on single-image benchmarks and demonstrate that Mantis also maintains a strong single-image performance on par with CogVLM and Emu2.

Paper
Add Code

MuPT: A Generative Symbolic Music Pretrained Transformer

no code implementations • 9 Apr 2024 • Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang

In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music.

Music Generation Music Modeling

Paper
Add Code

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations • 5 Apr 2024 • Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modelling Large Language Model

Paper
Add Code

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

no code implementations • 4 Apr 2024 • Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi Li, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu

Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability.

Code Generation

Paper
Add Code

Long-context LLMs Struggle with Long In-context Learning

1 code implementation • 2 Apr 2024 • Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

Our study reveals that long context understanding and reasoning is still a challenging task for the existing LLMs.

2k In-Context Learning +1

Paper
Code

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

no code implementations • 1 Apr 2024 • Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Wenhu Chen, Jie Fu, Ge Zhang

To address the aforementioned limitations, this paper undertakes a comprehensive comparison of model capabilities at various pretraining intermediate checkpoints.

Language Modelling Large Language Model

Paper
Add Code

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

no code implementations • 28 Mar 2024 • Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

Image retrieval, i. e., finding desired images given a reference image, inherently encompasses rich, multi-faceted search intents that are difficult to capture solely using image-based measures.

Image Retrieval Implicit Relations +2

Paper
Add Code

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

no code implementations • 26 Mar 2024 • Yuelin Bai, Xinrun Du, Yiming Liang, Yonggang Jin, Ziqiang Liu, Junting Zhou, Tianyu Zheng, Xincheng Zhang, Nuo Ma, Zekun Wang, Ruibin Yuan, Haihong Wu, Hongquan Lin, Wenhao Huang, Jiajun Zhang, Wenhu Chen, Chenghua Lin, Jie Fu, Min Yang, Shiwen Ni, Ge Zhang

To bridge this gap, we introduce COIG-CQIA, a high-quality Chinese instruction tuning dataset.

Paper
Add Code

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks

no code implementations • 21 Mar 2024 • Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

In the second stage, AnyV2V can plug in any existing image-to-video models to perform DDIM inversion and intermediate feature injection to maintain the appearance and motion consistency with the source video.

Image to Video Generation Style Transfer +1

Paper
Add Code

Reward Guided Latent Consistency Distillation

no code implementations • 16 Mar 2024 • Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps.

Image Generation

Paper
Add Code

DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning

no code implementations • 7 Mar 2024 • Xingwei Qu, Yiming Liang, Yucheng Wang, Tianyu Zheng, Tommy Yue, Lei Ma, Stephen W. Huang, Jiajun Zhang, Wenhu Chen, Chenghua Lin, Jie Fu, Ge Zhang

It has long been assumed that the sheer number of parameters in large language models (LLMs) drives in-context learning (ICL) capabilities, enabling remarkable performance improvements by leveraging task-specific demonstrations.

Few-Shot Learning In-Context Learning +1

Paper
Add Code

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

no code implementations • 26 Feb 2024 • Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.

Paper
Add Code

ChatMusician: Understanding and Generating Music Intrinsically with LLM

1 code implementation • 25 Feb 2024 • Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike Guo

It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the music is treated as a second language.

Text Generation

160

Paper
Code

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

1 code implementation • 22 Feb 2024 • Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.

Code Generation

1,428

Paper
Code

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models

no code implementations • 20 Feb 2024 • Yizhi Li, Ge Zhang, Xingwei Qu, Jiali Li, Zhaoqun Li, Zekun Wang, Hao Li, Ruibin Yuan, Yinghao Ma, Kai Zhang, Wangchunshu Zhou, Yiming Liang, Lei Zhang, Lei Ma, Jiajun Zhang, Zuowen Li, Stephen W. Huang, Chenghua Lin, Wenhu Chen, Jie Fu

The advancement of large language models (LLMs) has enhanced the ability to generalize across a wide range of unseen natural language processing (NLP) tasks through instruction-following.

Instruction Following

Paper
Add Code

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

1 code implementation • 6 Feb 2024 • Weiming Ren, Harry Yang, Ge Zhang, Cong Wei, Xinrun Du, Stephen Huang, Wenhu Chen

To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation.

Image to Video Generation

161

Paper
Code

Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

1 code implementation • 5 Feb 2024 • Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time.

Knowledge Graphs Math

Paper
Code

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

1 code implementation • 24 Jan 2024 • Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines.

Benchmarking Image Captioning +3

Paper
Code

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark

1 code implementation • 22 Jan 2024 • Ge Zhang, Xinrun Du, Bei Chen, Yiming Liang, Tongxu Luo, Tianyu Zheng, Kang Zhu, Yuyang Cheng, Chunpu Xu, Shuyue Guo, Haoran Zhang, Xingwei Qu, Junjie Wang, Ruibin Yuan, Yizhi Li, Zekun Wang, Yudong Liu, Yu-Hsuan Tsai, Fengji Zhang, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu

We introduce CMMMU, a new Chinese Massive Multi-discipline Multimodal Understanding benchmark designed to evaluate LMMs on tasks demanding college-level subject knowledge and deliberate reasoning in a Chinese context.

7,338

Paper
Code

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations • 13 Jan 2024 • Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

Paper
Add Code

Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

1 code implementation • 12 Jan 2024 • Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Qi Jia, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang

In this paper, we introduce Kun, a novel approach for creating high-quality instruction-tuning datasets for large language models (LLMs) without relying on manual annotations.

Instruction Following Translation

Paper
Code

Instruct-Imagen: Image Generation with Multi-modal Instruction

no code implementations • 3 Jan 2024 • Hexiang Hu, Kelvin C. K. Chan, Yu-Chuan Su, Wenhu Chen, Yandong Li, Kihyuk Sohn, Yang Zhao, Xue Ben, Boqing Gong, William Cohen, Ming-Wei Chang, Xuhui Jia

We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.

Image Generation Retrieval

Paper
Add Code

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

no code implementations • 22 Dec 2023 • Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

We evaluate VIESCORE on seven prominent tasks in conditional image tasks and found: (1) VIESCORE (GPT4-v) achieves a high Spearman correlation of 0. 3 with human evaluations, while the human-to-human correlation is 0. 45.

Conditional Image Generation General Knowledge

Paper
Add Code

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

no code implementations • 28 Nov 2023 • Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image.

Benchmarking Information Retrieval +2

Paper
Add Code

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

2 code implementations • 27 Nov 2023 • Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.

Complex Query Answering Logical Reasoning +1

7,338

Paper
Code

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation • 4 Oct 2023 • Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."

Decoder Image Generation

18,319

Paper
Code

ImagenHub: Standardizing the evaluation of conditional image generation models

2 code implementations • 2 Oct 2023 • Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc.

Conditional Image Generation text-guided-image-editing

122

Paper
Code

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

1 code implementation • 1 Oct 2023 • Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen

To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the open-source SoTA correlation with human ratings across these datasets and almost approaches GPT-4 evaluator.

Text Generation

Paper
Code

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

1 code implementation • 15 Sep 2023 • Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.

Caption Generation Language Modelling +1

Paper
Code

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

286

Paper
Code

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

no code implementations • 5 Sep 2023 • YuBo Wang, Xueguang Ma, Wenhu Chen

In this study, we present a system called LLMs Augmented with Medical Textbooks (LLM-AMT) designed to enhance the proficiency of LLMs in specialized domains.

Question Answering Retrieval

Paper
Add Code

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

1 code implementation • 29 Jun 2023 • Le Zhuo, Ruibin Yuan, Jiahao Pan, Yinghao Ma, Yizhi Li, Ge Zhang, Si Liu, Roger Dannenberg, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenhu Chen, Wei Xue, Yike Guo

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal.

Automatic Lyrics Transcription Language Modelling +3

Paper
Code

DreamEdit: Subject-driven Image Editing

no code implementations • 22 Jun 2023 • Tianle Li, Max Ku, Cong Wei, Wenhu Chen

In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i. e., Subject Replacement and Subject Addition.

Image Generation Position

Paper
Add Code

MARBLE: Music Audio Representation Benchmark for Universal Evaluation

1 code implementation • NeurIPS 2023 • Ruibin Yuan, Yinghao Ma, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Le Zhuo, Yiqi Liu, Jiawen Huang, Zeyue Tian, Binyue Deng, Ningzhi Wang, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Roger Dannenberg, Wenhu Chen, Gus Xia, Wei Xue, Si Liu, Shi Wang, Ruibo Liu, Yike Guo, Jie Fu

This is evident in the limited work on deep music representations, the scarcity of large-scale datasets, and the absence of a universal and community-driven benchmark.

Image Generation Information Retrieval +1

Paper
Code

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

1 code implementation • NeurIPS 2023 • Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su

To address this issue, we introduce MagicBrush (https://osu-nlp-group. github. io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing.

text-guided-image-editing

257

Paper
Code

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

1 code implementation • 31 May 2023 • Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghao Xiao, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Zili Wang, Yike Guo, Jie Fu

Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored.

Language Modelling Quantization +1

257

Paper
Code

On the Risk of Misinformation Pollution with Large Language Models

1 code implementation • 23 May 2023 • Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems.

Misinformation Open-Domain Question Answering

Paper
Code

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

no code implementations • 23 May 2023 • Alfonso Amayuelas, Liangming Pan, Wenhu Chen, William Wang

This paper investigates the capabilities of Large Language Models (LLMs) in the context of understanding their own knowledge and measuring their uncertainty.

Known Unknowns

Paper
Add Code

EDIS: Entity-Driven Image Search over Multimodal Web Content

1 code implementation • 23 May 2023 • SiQi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion.

Image Retrieval Retrieval

Paper
Code

Interactive Natural Language Processing

no code implementations • 22 May 2023 • Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

Paper
Add Code

TheoremQA: A Theorem-driven Question Answering dataset

1 code implementation • 21 May 2023 • Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.

Ranked #1 on Natural Questions on TheoremQA

Math Question Answering

152

Paper
Code

Few-shot In-context Learning for Knowledge Base Question Answering

1 code implementation • 2 May 2023 • Tianle Li, Xueguang Ma, Alex Zhuang, Yu Gu, Yu Su, Wenhu Chen

On GrailQA and WebQSP, our model is also on par with other fully-trained models.

In-Context Learning Knowledge Base Question Answering

Paper
Code

DePlot: One-shot visual language reasoning by plot-to-table translation

1 code implementation • 20 Dec 2022 • Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24. 0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

Ranked #2 on Factual Inconsistency Detection in Chart Captioning on CHOCOLATE-LLM

Chart Question Answering Factual Inconsistency Detection in Chart Captioning +3

126,923

Paper
Code

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

2 code implementations • 22 Nov 2022 • Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets.

Math

1,031

Paper
Code

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models

1 code implementation • 20 Nov 2022 • Xichen Pan, Pengda Qin, Yuhong Li, Hui Xue, Wenhu Chen

Conditioned diffusion models have demonstrated state-of-the-art text-to-image synthesis capacity.

Ranked #1 on Story Visualization on Pororo

Story Continuation Story Visualization

180

Paper
Code

Large Language Models are few(1)-shot Table Reasoners

1 code implementation • 13 Oct 2022 • Wenhu Chen

Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus.

Fact Verification In-Context Learning

Paper
Code

Explanations from Large Language Models Make Small Reasoners Better

no code implementations • 13 Oct 2022 • Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan

Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations.

Explanation Generation In-Context Learning +1

Paper
Add Code

Controllable Dialogue Simulation with In-Context Learning

1 code implementation • 9 Oct 2022 • Zekun Li, Wenhu Chen, Shiyang Li, Hong Wang, Jing Qian, Xifeng Yan

Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed.

Data Augmentation In-Context Learning +2

Paper
Code

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

no code implementations • 6 Oct 2022 • Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, William W. Cohen

While language Models store a massive amount of world knowledge implicitly in their parameters, even very large models often fail to encode information about rare entities and events, while incurring huge computational costs.

Open-Ended Question Answering Retrieval +2

Paper
Add Code

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

no code implementations • 29 Sep 2022 • Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters.

Ranked #3 on Text-to-Image Generation on MS COCO

Retrieval Text Retrieval +1

Paper
Add Code

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

no code implementations • 1 Jul 2022 • Wenhu Chen, William W. Cohen, Michiel de Jong, Nitish Gupta, Alessandro Presta, Pat Verga, John Wieting

In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking.

Entity Linking Position +2

Paper
Add Code

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data

no code implementations • Findings (ACL) 2022 • Kai Nakamura, Sharon Levy, Yi-Lin Tuan, Wenhu Chen, William Yang Wang

A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities.

Response Generation Retrieval

Paper
Add Code

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

no code implementations • 10 Apr 2022 • Wenhu Chen, Pat Verga, Michiel de Jong, John Wieting, William Cohen

Retrieval augmented language models have recently become the standard for knowledge intensive tasks.

Decoder Open-Domain Question Answering +2

Paper
Add Code

Attacking Open-domain Question Answering by Injecting Misinformation

1 code implementation • 15 Oct 2021 • Liangming Pan, Wenhu Chen, Min-Yen Kan, William Yang Wang

We curate both human-written and model-generated false documents that we inject into the evidence corpus of QA models and assess the impact on the performance of these systems.

Misinformation Open-Domain Question Answering

Paper
Code

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

no code implementations • Findings (EMNLP) 2021 • Shiyang Li, Semih Yavuz, Wenhu Chen, Xifeng Yan

Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the major semi-supervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data.

named-entity-recognition Named Entity Recognition +6

Paper
Add Code

FinQA: A Dataset of Numerical Reasoning over Financial Data

1 code implementation • EMNLP 2021 • Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations.

Ranked #4 on Question Answering on FinQA

Question Answering

209

Paper
Code

A Dataset for Answering Time-Sensitive Questions

1 code implementation • 13 Aug 2021 • Wenhu Chen, Xinyi Wang, William Yang Wang

Lots of facts can evolve with respect to time.

Benchmarking

Paper
Code

Local Explanation of Dialogue Response Generation

1 code implementation • NeurIPS 2021 • Yi-Lin Tuan, Connor Pryor, Wenhu Chen, Lise Getoor, William Yang Wang

To gain insights into the reasoning process of a generation model, we propose a new method, local explanation of response generation (LERG) that regards the explanations as the mutual interaction of segments in input and output sentences.

Implicit Relations Response Generation +1

Paper
Code

Counterfactual Maximum Likelihood Estimation for Training Deep Networks

1 code implementation • NeurIPS 2021 • Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues.

counterfactual Domain Generalization +2

Paper
Code

A Systematic Investigation of KB-Text Embedding Alignment at Scale

1 code implementation • ACL 2021 • Vardaan Pahuja, Yu Gu, Wenhu Chen, Mehdi Bahrami, Lei Liu, Wei-Peng Chen, Yu Su

Knowledge bases (KBs) and text often contain complementary knowledge: KBs store structured knowledge that can support long range reasoning, while text stores more comprehensive and timely knowledge in an unstructured way.

Link Prediction

Paper
Code

Zero-shot Fact Verification by Claim Generation

1 code implementation • ACL 2021 • Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang

However, for each new domain that requires fact verification, creating a dataset by manually writing claims and linking them to their supporting evidence is expensive.

2k Fact Verification

Paper
Code

Unsupervised Multi-hop Question Answering by Question Generation

1 code implementation • NAACL 2021 • Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang

Obtaining training data for multi-hop question answering (QA) is time-consuming and resource-intensive.

Multi-hop Question Answering Question Answering +2

Paper
Code

Open Question Answering over Tables and Text

1 code implementation • ICLR 2021 • Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.

Ranked #1 on Question Answering on OTT-QA

Open-Ended Question Answering Retrieval

143

Paper
Code

Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via Calibrated Dirichlet Prior RNN

no code implementations • 16 Oct 2020 • Yilin Shen, Wenhu Chen, Hongxia Jin

We design a Dirichlet Prior RNN to model high-order uncertainty by degenerating as softmax layer for RNN model training.

slot-filling Slot Filling +1

Paper
Add Code

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

1 code implementation • EMNLP 2020 • Wenhu Chen, Yu Su, Xifeng Yan, William Yang Wang

We propose a knowledge-grounded pre-training (KGPT), which consists of two parts, 1) a general knowledge-grounded generation model to generate knowledge-enriched text.

Ranked #8 on KG-to-Text Generation on WebNLG 2.0 (Unconstrained)

General Knowledge KG-to-Text Generation +1

146

Paper
Code

Logic2Text: High-Fidelity Natural Language Generation from Logical Forms

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhiyu Chen, Wenhu Chen, Hanwen Zha, Xiyou Zhou, Yunkai Zhang, Sairam Sundaresan, William Yang Wang

If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations.

Text Generation Vocal Bursts Intensity Prediction

Paper
Code

Logical Natural Language Generation from Open-Domain Tables

1 code implementation • ACL 2020 • Wenhu Chen, Jianshu Chen, Yu Su, Zhiyu Chen, William Yang Wang

To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset \cite{chen2019tabfact} featured with a wide range of logical/symbolic inferences as our testbed, and propose new automatic metrics to evaluate the fidelity of generation models w. r. t.\ logical inference.

Text Generation

164

Paper
Code

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

2 code implementations • Findings of the Association for Computational Linguistics 2020 • Wenhu Chen, Hanwen Zha, Zhiyu Chen, Wenhan Xiong, Hong Wang, William Wang

3) a hybrid model that combines heterogeneous information to find the answer.

Ranked #4 on Question Answering on HybridQA

Multi-hop Question Answering Question Answering

205

Paper
Code

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation • CVPR 2020 • Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

155

Paper
Code

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

2 code implementations • 8 Jan 2020 • Pengda Qin, Xin Wang, Wenhu Chen, Chunyun Zhang, Weiran Xu, William Yang Wang

Large-scale knowledge graphs (KGs) are shown to become more important in current information systems.

Relational Reasoning Zero-Shot Learning

Paper
Code

Meta Module Network for Compositional Visual Reasoning

1 code implementation • 8 Oct 2019 • Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

MORPH Visual Reasoning

Paper
Code

TabFact: A Large-scale Dataset for Table-based Fact Verification

1 code implementation • ICLR 2020 • Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED.

Ranked #10 on Table-based Fact Verification on TabFact

16k Fact Checking +4

355

Paper
Code

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

2 code implementations • NeurIPS 2019 • Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, Xifeng Yan

Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation.

Ranked #27 on Image Generation on ImageNet 64x64 (Bits per dim metric)

Time Series Time Series Forecasting

1,930

Paper
Code

Global Textual Relation Embedding for Relational Understanding

1 code implementation • ACL 2019 • Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su

Pre-trained embeddings such as word embeddings and sentence embeddings are fundamental tools facilitating a wide range of downstream NLP tasks.

Ranked #142 on Action Classification on Kinetics-400

Action Classification Relation +3

Paper
Code

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention

2 code implementations • ACL 2019 • Wenhu Chen, Jianshu Chen, Pengda Qin, Xifeng Yan, William Yang Wang

Semantically controlled neural response generation on limited-domain has achieved great performance.

Ranked #5 on Data-to-Text Generation on MULTIWOZ 2.1

Data-to-Text Generation Inductive Bias +1

827

Paper
Code

Few-Shot NLG with Pre-Trained Language Model

2 code implementations • ACL 2020 • Zhiyu Chen, Harini Eavani, Wenhu Chen, Yinyin Liu, William Yang Wang

Neural-based end-to-end approaches to natural language generation (NLG) from structured data or knowledge are data-hungry, making their adoption for real-world applications difficult with limited data.

Few-Shot Learning Language Modelling +1

189

Paper
Code

How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection

1 code implementation • NAACL 2019 • Wenhu Chen, Yu Su, Yilin Shen, Zhiyu Chen, Xifeng Yan, William Wang

Under deep neural networks, a pre-defined vocabulary is required to vectorize text inputs.

General Classification text-classification +1

Paper
Code

A Variational Dirichlet Framework for Out-of-Distribution Detection

no code implementations • ICLR 2019 • Wenhu Chen, Yilin Shen, Hongxia Jin, William Wang

With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications.

Out-of-Distribution Detection Variational Inference

Paper
Add Code

Approximate Distribution Matching for Sequence-to-Sequence Learning

no code implementations • 24 Aug 2018 • Wenhu Chen, Guanlin Li, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions.

Image Captioning Machine Translation +1

Paper
Add Code

XL-NBT: A Cross-lingual Neural Belief Tracking Framework

1 code implementation • EMNLP 2018 • Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang

Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data.

Transfer Learning

Paper
Code

Generative Bridging Network for Neural Sequence Prediction

no code implementations • NAACL 2018 • Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Image Captioning +5

Paper
Add Code

Triangular Architecture for Rare Language Translation

no code implementations • ACL 2018 • Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X, Z)$, especially when $Z$ is a rare language.

Machine Translation NMT +1

Paper
Add Code

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

2 code implementations • ACL 2018 • Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.

Ranked #13 on Visual Storytelling on VIST

Image Captioning Visual Storytelling

136

Paper
Code

Variational Knowledge Graph Reasoning

no code implementations • NAACL 2018 • Wenhu Chen, Wenhan Xiong, Xifeng Yan, William Wang

Inferring missing links in knowledge graphs (KG) has attracted a lot of attention from the research community.

Knowledge Graphs Link Prediction +2

Paper
Add Code

Video Captioning via Hierarchical Reinforcement Learning

no code implementations • CVPR 2018 • Xin Wang, Wenhu Chen, Jiawei Wu, Yuan-Fang Wang, William Yang Wang

Video captioning is the task of automatically generating a textual description of the actions in a video.

Hierarchical Reinforcement Learning reinforcement-learning +2

Paper
Add Code

Generative Bridging Network in Neural Sequence Prediction

no code implementations • 28 Jun 2017 • Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Abstractive Text Summarization Language Modelling +2

Paper
Add Code

A Semi-supervised Framework for Image Captioning

1 code implementation • 16 Nov 2016 • Wenhu Chen, Aurelien Lucchi, Thomas Hofmann

We here propose a novel way of using such textual data by artificially generating missing visual information.

Decoder Image Captioning +1

Paper
Code

Guided Alignment Training for Topic-Aware Neural Machine Translation

1 code implementation • AMTA 2016 • Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models.

Decoder Domain Adaptation +4

1,445

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.