Search Results for author: Wenhu Chen

Found 116 papers, 70 papers with code

VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

no code implementations10 Apr 2025 Haozhe Wang, Chao Qu, Zuming Huang, Wei Chu, Fangzhen Lin, Wenhu Chen

By combining these two techniques, our model, VL-Rethinker, advances state-of-the-art scores on MathVista, MathVerse to achieve 80. 4%, 63. 5% respectively.

Math Multimodal Reasoning

MoCha: Towards Movie-Grade Talking Character Synthesis

no code implementations30 Mar 2025 Cong Wei, Bo Sun, Haoyu Ma, Ji Hou, Felix Juefei-Xu, Zecheng He, Xiaoliang Dai, Luxin Zhang, Kunpeng Li, Tingbo Hou, Animesh Sinha, Peter Vajda, Wenhu Chen

We introduce Talking Characters, a more realistic task to generate talking character animations directly from speech and text.

Video Generation

Towards Trustworthy GUI Agents: A Survey

1 code implementation30 Mar 2025 Yucheng Shi, Wenhao Yu, Wenlin Yao, Wenhu Chen, Ninghao Liu

GUI agents, powered by large foundation models, can interact with digital interfaces, enabling various applications in web automation, mobile navigation, and software testing.

Decision Making Sequential Decision Making +2

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

no code implementations14 Mar 2025 Weiming Ren, Wentao Ma, Huan Yang, Cong Wei, Ge Zhang, Wenhu Chen

State-of-the-art transformer-based large multimodal models (LMMs) struggle to handle hour-long video inputs due to the quadratic complexity of the causal self-attention operations, leading to high computational costs during training and inference.

Mamba Token Reduction +1

VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search

no code implementations13 Mar 2025 Yiming Jia, Jiachen Li, Xiang Yue, Bo Li, Ping Nie, Kai Zou, Wenhu Chen

Models fine-tuned on VisualWebInstruct demonstrate significant performance gains: (1) training from Llava-OV-mid shows 10-20% absolute point gains across benchmarks, (2) training from MAmmoTH-VL shows 5% absoluate gain.

Image Retrieval Math

ABC: Achieving Better Control of Multimodal Embeddings using VLMs

no code implementations1 Mar 2025 Benjamin Schneider, Florian Kerschbaum, Wenhu Chen

These tasks necessitate a multimodal embedding model, which outputs embeddings that combine visual and natural language input.

Image to text Image-to-Text Retrieval +2

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

no code implementations26 Feb 2025 Max Ku, Thomas Chong, Jonathan Leung, Krish Shah, Alvin Yu, Wenhu Chen

Our results reveal that agentic planning is essential for generating detailed long-form videos, and the o3-mini agent achieves a success rate of 93. 8% and an overall score of 0. 77.

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

no code implementations3 Feb 2025 Huaye Zeng, Dongfu Jiang, Haozhe Wang, Ping Nie, Xiaotong Chen, Wenhu Chen

Notably, we follow the R1-style training to start from Qwen2. 5-Coder-base directly and show that our RL training can improve model on HumanEval-plus by over 25\% and MBPP-plus by 6\% for merely 80 optimization steps.

HumanEval mbpp +3

PixelWorld: Towards Perceiving Everything as Pixels

no code implementations31 Jan 2025 Zhiheng Lyu, Xueguang Ma, Wenhu Chen

Existing foundation models typically process visual input as pixels and textual input as tokens, a paradigm that contrasts with human perception, where both modalities are processed in a unified manner.

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

1 code implementation29 Jan 2025 YuBo Wang, Xiang Yue, Wenhu Chen

To validate the effectiveness of CFT, we construct multiple critique datasets (e. g., WebInstruct, MetaMath, NuminaMath), where GPT-4o serves as the teacher to generate critiques in the form of ([query; noisy response], critique).

Instruction Following Math +1

Aligning Instruction Tuning with Pre-training

no code implementations16 Jan 2025 Yiming Liang, Tianyu Zheng, Xinrun Du, Ge Zhang, Jiaheng Liu, Xingwei Qu, Wenqiang Zu, Xingrun Xing, Chujie Zheng, Lei Ma, Wenhu Chen, Guoyin Wang, Zhaoxiang Zhang, Wenhao Huang, Xiang Yue, Jiajun Zhang

Instruction tuning enhances large language models (LLMs) to follow human instructions across diverse tasks, relying on high-quality datasets to guide behavior.

Diversity

VISA: Retrieval Augmented Generation with Visual Source Attribution

no code implementations19 Dec 2024 Xueguang Ma, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Wenhu Chen, Jimmy Lin

Generation with source attribution is important for enhancing the verifiability of retrieval-augmented generation (RAG) systems.

Answer Generation RAG +1

MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale

no code implementations6 Dec 2024 Jarvis Guo, Tuney Zheng, Yuelin Bai, Bo Li, YuBo Wang, King Zhu, Yizhi Li, Graham Neubig, Wenhu Chen, Xiang Yue

To address these challenges, we introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales designed to elicit CoT reasoning.

Multimodal Reasoning Visual Question Answering

VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation

no code implementations1 Dec 2024 Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen

Current large multimodal models (LMMs) face significant challenges in processing and comprehending long-duration or high-resolution videos, which is mainly due to the lack of high-quality datasets.

Instruction Following Video Understanding

Harnessing Webpage UIs for Text-Rich Visual Understanding

no code implementations17 Oct 2024 Junpeng Liu, Tianyue Ou, YiFan Song, Yuxiao Qu, Wai Lam, Chenyan Xiong, Wenhu Chen, Graham Neubig, Xiang Yue

Text-rich visual understanding-the ability to process environments where dense textual content is integrated with visuals-is crucial for multimodal large language models (MLLMs) to interact effectively with structured environments.

document understanding Optical Character Recognition (OCR)

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

1 code implementation14 Oct 2024 Jiacheng Chen, Tianhao Liang, Sherman Siu, Zhengqing Wang, Kai Wang, YuBo Wang, Yuansheng Ni, Wang Zhu, Ziyan Jiang, Bohan Lyu, Dongfu Jiang, Xuan He, YuAn Liu, Hexiang Hu, Xiang Yue, Wenhu Chen

We evaluate a wide variety of frontier vision-language models on MEGA-Bench to understand their capabilities across these dimensions.

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

1 code implementation8 Oct 2024 Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang

In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model.

Video Alignment Video Generation

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks

no code implementations7 Oct 2024 Ziyan Jiang, Rui Meng, Xinyi Yang, Semih Yavuz, Yingbo Zhou, Wenhu Chen

Our results show that VLM2Vec achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models on both in-distribution and out-of-distribution datasets in MMEB.

Information Retrieval Language Modeling +7

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

2 code implementations4 Sep 2024 Xiang Yue, Tianyu Zheng, Yuansheng Ni, YuBo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig

This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark.

Optical Character Recognition (OCR)

LongIns: A Challenging Long-context Instruction-based Exam for LLMs

no code implementations25 Jun 2024 Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

To address these issues, we propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs, which is built based on the existing instruction datasets.

16k 4k

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs

no code implementations21 Jun 2024 Ziyan Jiang, Xueguang Ma, Wenhu Chen

In order to alleviate the imbalance, we propose a new framework LongRAG, consisting of a `long retriever' and a `long reader'.

4k Chunking +2

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

no code implementations20 Jun 2024 Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, YuBo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks.

Unifying Multimodal Retrieval via Document Screenshot Embedding

no code implementations17 Jun 2024 Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

To this end, we propose Document Screenshot Embedding (DSE), a novel retrieval paradigm that regards document screenshots as a unified input format, which does not require any content extraction preprocess and preserves all the information in a document (e. g., text, image and layout).

Language Modelling Natural Questions +2

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

no code implementations16 Jun 2024 Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions.

Benchmarking Spatial Reasoning

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

2 code implementations3 Jun 2024 YuBo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, Wenhu Chen

In the age of large-scale language models, benchmarks like the Massive Multitask Language Understanding (MMLU) have been pivotal in pushing the boundaries of what AI can achieve in language comprehension and reasoning across diverse domains.

MMLU Multi-task Language Understanding

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

1 code implementation29 May 2024 Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$.

Video Generation

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models

1 code implementation16 May 2024 Sahel Sharifymoghaddam, Shivani Upadhyay, Wenhu Chen, Jimmy Lin

Recently, Large Vision Language Models (LVLMs) have unlocked many complex use cases that require Multi-Modal (MM) understanding (e. g., image captioning or visual question answering) and MM generation (e. g., text-guided image generation or editing) capabilities.

Image Captioning Image Generation +3

MAmmoTH2: Scaling Instructions from the Web

no code implementations6 May 2024 Xiang Yue, Tuney Zheng, Ge Zhang, Wenhu Chen

Notably, MAmmoTH2-7B's (Mistral) performance increases from 11% to 36. 7% on MATH and from 36% to 68. 4% on GSM8K without training on any in-domain data.

Chatbot GSM8K +1

MANTIS: Interleaved Multi-Image Instruction Tuning

1 code implementation2 May 2024 Dongfu Jiang, Xuan He, Huaye Zeng, Cong Wei, Max Ku, Qian Liu, Wenhu Chen

We further evaluate Mantis on single-image benchmarks and demonstrate that Mantis also maintains a strong single-image performance on par with CogVLM and Emu2.

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

no code implementations5 Apr 2024 Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Wenhu Chen, Ge Zhang

In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs.

Language Modeling Language Modelling +1

Long-context LLMs Struggle with Long In-context Learning

2 code implementations2 Apr 2024 Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, Wenhu Chen

We introduce a benchmark (LongICLBench) for long in-context learning in extreme-label classification using six datasets with 28 to 174 classes and input lengths from 2K to 50K tokens.

2k In-Context Learning +1

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

1 code implementation28 Mar 2024 Kai Zhang, Yi Luan, Hexiang Hu, Kenton Lee, Siyuan Qiao, Wenhu Chen, Yu Su, Ming-Wei Chang

MagicLens is built on a key novel insight: image pairs that naturally occur on the same web pages contain a wide range of implicit relations (e. g., inside view of), and we can bring those implicit relations explicit by synthesizing instructions via foundation models.

Image Retrieval Implicit Relations +4

AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks

1 code implementation21 Mar 2024 Max Ku, Cong Wei, Weiming Ren, Harry Yang, Wenhu Chen

AnyV2V can leverage any existing image editing tools to support an extensive array of video editing tasks, including prompt-based editing, reference-based style transfer, subject-driven editing, and identity manipulation, which were unattainable by previous methods.

Image to Video Generation Style Transfer +1

Reward Guided Latent Consistency Distillation

no code implementations16 Mar 2024 Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps.

Image Generation

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding

no code implementations26 Feb 2024 Alex Zhuang, Ge Zhang, Tianyu Zheng, Xinrun Du, Junjie Wang, Weiming Ren, Stephen W. Huang, Jie Fu, Xiang Yue, Wenhu Chen

Utilizing this dataset, we train a series of models, referred to as StructLM, based on the Mistral and the CodeLlama model family, ranging from 7B to 34B parameters.

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

1 code implementation22 Feb 2024 Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter.

Code Generation HumanEval +1

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

1 code implementation6 Feb 2024 Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen

To verify the effectiveness of our method, we propose I2V-Bench, a comprehensive evaluation benchmark for I2V generation.

Image to Video Generation

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

1 code implementation5 Feb 2024 Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time.

Knowledge Graphs Math

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

1 code implementation24 Jan 2024 Siwei Wu, Yizhi Li, Kang Zhu, Ge Zhang, Yiming Liang, Kaijing Ma, Chenghao Xiao, Haoran Zhang, Bohao Yang, Wenhu Chen, Wenhao Huang, Noura Al Moubayed, Jie Fu, Chenghua Lin

We further annotate the image-text pairs with two-level subset-subcategory hierarchy annotations to facilitate a more comprehensive evaluation of the baselines.

Benchmarking Image Captioning +3

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

no code implementations13 Jan 2024 Jiaheng Liu, Zhiqi Bai, Yuanxing Zhang, Chenchen Zhang, Yu Zhang, Ge Zhang, Jiakai Wang, Haoran Que, Yukang Chen, Wenbo Su, Tiezheng Ge, Jie Fu, Wenhu Chen, Bo Zheng

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources.

4k Position

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation

1 code implementation22 Dec 2023 Max Ku, Dongfu Jiang, Cong Wei, Xiang Yue, Wenhu Chen

In the rapidly advancing field of conditional image generation research, challenges such as limited explainability lie in effectively evaluating the performance and capabilities of various models.

Conditional Image Generation General Knowledge

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

no code implementations28 Nov 2023 Cong Wei, Yang Chen, Haonan Chen, Hexiang Hu, Ge Zhang, Jie Fu, Alan Ritter, Wenhu Chen

Existing information retrieval (IR) models often assume a homogeneous format, limiting their applicability to diverse user needs, such as searching for images with text descriptions, searching for a news article with a headline image, or finding a similar photo with a query image.

Benchmarking Information Retrieval +2

Kosmos-G: Generating Images in Context with Multimodal Large Language Models

1 code implementation4 Oct 2023 Xichen Pan, Li Dong, Shaohan Huang, Zhiliang Peng, Wenhu Chen, Furu Wei

These limitations keep them far from the ultimate goal of "image as a foreign language in image generation."

Decoder Image Generation

ImagenHub: Standardizing the evaluation of conditional image generation models

2 code implementations2 Oct 2023 Max Ku, Tianle Li, Kai Zhang, Yujie Lu, Xingyu Fu, Wenwen Zhuang, Wenhu Chen

Recently, a myriad of conditional image generation and editing models have been developed to serve different downstream tasks, including text-to-image generation, text-guided image editing, subject-driven image generation, control-guided image generation, etc.

Conditional Image Generation text-guided-image-editing

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks

1 code implementation1 Oct 2023 Dongfu Jiang, Yishan Li, Ge Zhang, Wenhao Huang, Bill Yuchen Lin, Wenhu Chen

To quantitatively assess our metric, we evaluate its correlation with human ratings on 5 held-in datasets, 2 held-out datasets and show that TIGERScore can achieve the open-source SoTA correlation with human ratings across these datasets and almost approaches GPT-4 evaluator.

All Text Generation

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

1 code implementation15 Sep 2023 Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains not well-explored.

Caption Generation Language Modelling +1

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation11 Sep 2023 Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering (Published in Findings of EMNLP 2024)

1 code implementation5 Sep 2023 YuBo Wang, Xueguang Ma, Wenhu Chen

In this study, we present a system called LLMs Augmented with Medical Textbooks (LLM-AMT) designed to enhance the proficiency of LLMs in specialized domains.

Question Answering Retrieval

DreamEdit: Subject-driven Image Editing

no code implementations22 Jun 2023 Tianle Li, Max Ku, Cong Wei, Wenhu Chen

In this work, we aspire to fill the void and propose two novel subject-driven sub-tasks, i. e., Subject Replacement and Subject Addition.

Image Generation Position

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

1 code implementation NeurIPS 2023 Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su

To address this issue, we introduce MagicBrush (https://osu-nlp-group. github. io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing.

text-guided-image-editing

EDIS: Entity-Driven Image Search over Multimodal Web Content

1 code implementation23 May 2023 SiQi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion.

Image Retrieval Retrieval

On the Risk of Misinformation Pollution with Large Language Models

1 code implementation23 May 2023 Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems.

Misinformation Open-Domain Question Answering

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

1 code implementation23 May 2023 Alfonso Amayuelas, Kyle Wong, Liangming Pan, Wenhu Chen, William Wang

This paper investigates the capabilities of Large Language Models (LLMs) in the context of understanding their knowledge and uncertainty over questions.

Known Unknowns Open-Ended Question Answering

Interactive Natural Language Processing

no code implementations22 May 2023 Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

TheoremQA: A Theorem-driven Question Answering dataset

1 code implementation21 May 2023 Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.

Math Question Answering

DePlot: One-shot visual language reasoning by plot-to-table translation

1 code implementation20 Dec 2022 Fangyu Liu, Julian Martin Eisenschlos, Francesco Piccinno, Syrine Krichene, Chenxi Pang, Kenton Lee, Mandar Joshi, Wenhu Chen, Nigel Collier, Yasemin Altun

Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24. 0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

Chart Question Answering Factual Inconsistency Detection in Chart Captioning +3

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

2 code implementations22 Nov 2022 Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets.

Math

Large Language Models are few(1)-shot Table Reasoners

1 code implementation13 Oct 2022 Wenhu Chen

Specifically, we evaluated LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are competent at complex reasoning over table structures, though these models are not pre-trained on any table corpus.

Fact Verification In-Context Learning

Explanations from Large Language Models Make Small Reasoners Better

no code implementations13 Oct 2022 Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan

Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations.

Explanation Generation In-Context Learning +1

Controllable Dialogue Simulation with In-Context Learning

1 code implementation9 Oct 2022 Zekun Li, Wenhu Chen, Shiyang Li, Hong Wang, Jing Qian, Xifeng Yan

Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues under the challenging low-resource settings, with as few as 85 dialogues as a seed.

Data Augmentation In-Context Learning +3

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text

no code implementations6 Oct 2022 Wenhu Chen, Hexiang Hu, Xi Chen, Pat Verga, William W. Cohen

While language Models store a massive amount of world knowledge implicitly in their parameters, even very large models often fail to encode information about rare entities and events, while incurring huge computational costs.

Open-Ended Question Answering RAG +3

Re-Imagen: Retrieval-Augmented Text-to-Image Generator

no code implementations29 Sep 2022 Wenhu Chen, Hexiang Hu, Chitwan Saharia, William W. Cohen

To further evaluate the capabilities of the model, we introduce EntityDrawBench, a new benchmark that evaluates image generation for diverse entities, from frequent to rare, across multiple object categories including dogs, foods, landmarks, birds, and characters.

Image-text Retrieval Text Retrieval +1

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

no code implementations1 Jul 2022 Wenhu Chen, William W. Cohen, Michiel de Jong, Nitish Gupta, Alessandro Presta, Pat Verga, John Wieting

In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking.

Entity Linking Position +2

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data

no code implementations Findings (ACL) 2022 Kai Nakamura, Sharon Levy, Yi-Lin Tuan, Wenhu Chen, William Yang Wang

A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities.

Response Generation Retrieval

Attacking Open-domain Question Answering by Injecting Misinformation

1 code implementation15 Oct 2021 Liangming Pan, Wenhu Chen, Min-Yen Kan, William Yang Wang

We curate both human-written and model-generated false documents that we inject into the evidence corpus of QA models and assess the impact on the performance of these systems.

Misinformation Open-Domain Question Answering

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding

no code implementations Findings (EMNLP) 2021 Shiyang Li, Semih Yavuz, Wenhu Chen, Xifeng Yan

Task-adaptive pre-training (TAPT) and Self-training (ST) have emerged as the major semi-supervised approaches to improve natural language understanding (NLU) tasks with massive amount of unlabeled data.

named-entity-recognition Named Entity Recognition +6

FinQA: A Dataset of Numerical Reasoning over Financial Data

1 code implementation EMNLP 2021 Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations.

Question Answering

Local Explanation of Dialogue Response Generation

1 code implementation NeurIPS 2021 Yi-Lin Tuan, Connor Pryor, Wenhu Chen, Lise Getoor, William Yang Wang

To gain insights into the reasoning process of a generation model, we propose a new method, local explanation of response generation (LERG) that regards the explanations as the mutual interaction of segments in input and output sentences.

Implicit Relations Response Generation +1

Counterfactual Maximum Likelihood Estimation for Training Deep Networks

1 code implementation NeurIPS 2021 Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang

Although deep learning models have driven state-of-the-art performance on a wide array of tasks, they are prone to spurious correlations that should not be learned as predictive clues.

counterfactual Domain Generalization +2

A Systematic Investigation of KB-Text Embedding Alignment at Scale

1 code implementation ACL 2021 Vardaan Pahuja, Yu Gu, Wenhu Chen, Mehdi Bahrami, Lei Liu, Wei-Peng Chen, Yu Su

Knowledge bases (KBs) and text often contain complementary knowledge: KBs store structured knowledge that can support long range reasoning, while text stores more comprehensive and timely knowledge in an unstructured way.

Link Prediction

Zero-shot Fact Verification by Claim Generation

1 code implementation ACL 2021 Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang

However, for each new domain that requires fact verification, creating a dataset by manually writing claims and linking them to their supporting evidence is expensive.

2k Fact Verification

Open Question Answering over Tables and Text

1 code implementation ICLR 2021 Wenhu Chen, Ming-Wei Chang, Eva Schlinger, William Wang, William W. Cohen

In open question answering (QA), the answer to a question is produced by retrieving and then analyzing documents that might contain answers to the question.

Open-Ended Question Answering Retrieval

Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via Calibrated Dirichlet Prior RNN

no code implementations16 Oct 2020 Yilin Shen, Wenhu Chen, Hongxia Jin

We design a Dirichlet Prior RNN to model high-order uncertainty by degenerating as softmax layer for RNN model training.

slot-filling Slot Filling +1

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation

1 code implementation EMNLP 2020 Wenhu Chen, Yu Su, Xifeng Yan, William Yang Wang

We propose a knowledge-grounded pre-training (KGPT), which consists of two parts, 1) a general knowledge-grounded generation model to generate knowledge-enriched text.

General Knowledge KG-to-Text Generation +1

Logical Natural Language Generation from Open-Domain Tables

1 code implementation ACL 2020 Wenhu Chen, Jianshu Chen, Yu Su, Zhiyu Chen, William Yang Wang

To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset \cite{chen2019tabfact} featured with a wide range of logical/symbolic inferences as our testbed, and propose new automatic metrics to evaluate the fidelity of generation models w. r. t.\ logical inference.

Text Generation

VIOLIN: A Large-Scale Dataset for Video-and-Language Inference

1 code implementation CVPR 2020 Jingzhou Liu, Wenhu Chen, Yu Cheng, Zhe Gan, Licheng Yu, Yiming Yang, Jingjing Liu

We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.

Meta Module Network for Compositional Visual Reasoning

1 code implementation8 Oct 2019 Wenhu Chen, Zhe Gan, Linjie Li, Yu Cheng, William Wang, Jingjing Liu

To design a more powerful NMN architecture for practical use, we propose Meta Module Network (MMN) centered on a novel meta module, which can take in function recipes and morph into diverse instance modules dynamically.

MORPH Visual Reasoning

TabFact: A Large-scale Dataset for Table-based Fact Verification

1 code implementation ICLR 2020 Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, William Yang Wang

To this end, we construct a large-scale dataset called TabFact with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED.

16k Fact Checking +4

Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting

2 code implementations NeurIPS 2019 Shiyang Li, Xiaoyong Jin, Yao Xuan, Xiyou Zhou, Wenhu Chen, Yu-Xiang Wang, Xifeng Yan

Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation.

Ranked #32 on Image Generation on ImageNet 64x64 (Bits per dim metric)

Time Series Time Series Forecasting

Global Textual Relation Embedding for Relational Understanding

1 code implementation ACL 2019 Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su

Pre-trained embeddings such as word embeddings and sentence embeddings are fundamental tools facilitating a wide range of downstream NLP tasks.

Action Classification Relation +3

Few-Shot NLG with Pre-Trained Language Model

2 code implementations ACL 2020 Zhiyu Chen, Harini Eavani, Wenhu Chen, Yinyin Liu, William Yang Wang

Neural-based end-to-end approaches to natural language generation (NLG) from structured data or knowledge are data-hungry, making their adoption for real-world applications difficult with limited data.

Few-Shot Learning Language Modeling +3

A Variational Dirichlet Framework for Out-of-Distribution Detection

no code implementations ICLR 2019 Wenhu Chen, Yilin Shen, Hongxia Jin, William Wang

With the recently rapid development in deep learning, deep neural networks have been widely adopted in many real-life applications.

Out-of-Distribution Detection Variational Inference

Approximate Distribution Matching for Sequence-to-Sequence Learning

no code implementations24 Aug 2018 Wenhu Chen, Guanlin Li, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

Then, we interpret sequence-to-sequence learning as learning a transductive model to transform the source local latent distributions to match their corresponding target distributions.

Image Captioning Machine Translation +1

XL-NBT: A Cross-lingual Neural Belief Tracking Framework

1 code implementation EMNLP 2018 Wenhu Chen, Jianshu Chen, Yu Su, Xin Wang, Dong Yu, Xifeng Yan, William Yang Wang

Then, we pre-train a state tracker for the source language as a teacher, which is able to exploit easy-to-access parallel data.

Transfer Learning

Generative Bridging Network for Neural Sequence Prediction

no code implementations NAACL 2018 Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Image Captioning +7

Triangular Architecture for Rare Language Translation

no code implementations ACL 2018 Shuo Ren, Wenhu Chen, Shujie Liu, Mu Li, Ming Zhou, Shuai Ma

Neural Machine Translation (NMT) performs poor on the low-resource language pair $(X, Z)$, especially when $Z$ is a rare language.

Machine Translation NMT +1

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling

2 code implementations ACL 2018 Xin Wang, Wenhu Chen, Yuan-Fang Wang, William Yang Wang

Though impressive results have been achieved in visual captioning, the task of generating abstract stories from photo streams is still a little-tapped problem.

Image Captioning Reinforcement Learning +1

Variational Knowledge Graph Reasoning

no code implementations NAACL 2018 Wenhu Chen, Wenhan Xiong, Xifeng Yan, William Wang

Inferring missing links in knowledge graphs (KG) has attracted a lot of attention from the research community.

Knowledge Graphs Link Prediction +2

Generative Bridging Network in Neural Sequence Prediction

no code implementations28 Jun 2017 Wenhu Chen, Guanlin Li, Shuo Ren, Shujie Liu, Zhirui Zhang, Mu Li, Ming Zhou

In order to alleviate data sparsity and overfitting problems in maximum likelihood estimation (MLE) for sequence prediction tasks, we propose the Generative Bridging Network (GBN), in which a novel bridge module is introduced to assist the training of the sequence prediction model (the generator network).

Abstractive Text Summarization Language Modeling +4

A Semi-supervised Framework for Image Captioning

1 code implementation16 Nov 2016 Wenhu Chen, Aurelien Lucchi, Thomas Hofmann

We here propose a novel way of using such textual data by artificially generating missing visual information.

Decoder Image Captioning +1

Guided Alignment Training for Topic-Aware Neural Machine Translation

1 code implementation AMTA 2016 Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models.

Decoder Domain Adaptation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.