Search Results for author: William Yang Wang

Found 270 papers, 146 papers with code

Counterfactual Vision-and-Language Navigation via Adversarial Path Sampler

no code implementations ECCV 2020 Tsu-Jui Fu, Xin Eric Wang, Matthew F. Peterson,Scott T. Grafton, Miguel P. Eckstein, William Yang Wang

In particular, we present a model-agnostic adversarial path sampler (APS) that learns to sample challenging paths that force the navigator to improve based on the navigation performance.

counterfactual Counterfactual Reasoning +2

BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations

no code implementations13 Jan 2025 Weixi Feng, Chao Liu, Sifei Liu, William Yang Wang, Arash Vahdat, Weili Nie

In addition, we introduce a learnable module to interpolate text embeddings so that users can control semantics in specific frames and obtain smooth object transitions.

Object Text-to-Video Generation +1

Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework

no code implementations22 Dec 2024 Jundong Xu, Hao Fei, Meng Luo, Qian Liu, Liangming Pan, William Yang Wang, Preslav Nakov, Mong-Li Lee, Wynne Hsu

In the context of large language models (LLMs), current advanced reasoning methods have made impressive strides in various reasoning tasks.

Logical Reasoning

Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning

no code implementations15 Dec 2024 Shengqiong Wu, Hao Fei, Liangming Pan, William Yang Wang, Shuicheng Yan, Tat-Seng Chua

Our framework systematically addresses potential issues in both visual and textual inputs by verifying and integrating perception-level information with cognition-level commonsense knowledge, ensuring more reliable outputs.

Hallucination

RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios

1 code implementation12 Dec 2024 Ruiwen Zhou, Wenyue Hua, Liangming Pan, Sitao Cheng, Xiaobao Wu, En Yu, William Yang Wang

This paper introduces RuleArena, a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.

Logical Reasoning Long-Context Understanding

Embracing AI in Education: Understanding the Surge in Large Language Model Use by Secondary Students

no code implementations27 Nov 2024 Tiffany Zhu, Kexun Zhang, William Yang Wang

The impressive essay writing and problem-solving capabilities of large language models (LLMs) like OpenAI's ChatGPT have opened up new avenues in education.

Language Modeling Language Modelling +3

Disentangling Memory and Reasoning Ability in Large Language Models

1 code implementation20 Nov 2024 Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, Yongfeng Zhang

Large Language Models (LLMs) have demonstrated strong performance in handling complex tasks requiring both extensive knowledge and reasoning abilities.

Decision Making Retrieval

Scaling LLM Inference with Optimized Sample Compute Allocation

1 code implementation29 Oct 2024 Kexun Zhang, Shang Zhou, Danqing Wang, William Yang Wang, Lei LI

To scale up inference efficiently with a limited compute, it is crucial to find an optimal allocation for sample compute budgets: Which sampling configurations (model, temperature, language, etc.)

Code Generation

Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling

no code implementations15 Oct 2024 Wenda Xu, Rujun Han, Zifeng Wang, Long T. Le, Dhruv Madeka, Lei LI, William Yang Wang, Rishabh Agarwal, Chen-Yu Lee, Tomas Pfister

To address these limitations, we introduce Speculative Knowledge Distillation (SKD), a novel approach that leverages cooperation between student and teacher models to generate high-quality training data on-the-fly while aligning with the student's inference-time distribution.

Instruction Following Knowledge Distillation +2

COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement

1 code implementation12 Oct 2024 Yuxi Xie, Anirudh Goyal, Xiaobao Wu, Xunjian Yin, Xiao Xu, Min-Yen Kan, Liangming Pan, William Yang Wang

Our approach models multiple token dependencies within manageable context windows, enabling the model to perform iterative refinement internally during the generation process.

Code Generation Computational Efficiency +3

Detecting Training Data of Large Language Models via Expectation Maximization

1 code implementation10 Oct 2024 Gyuwan Kim, Yang Li, Evangelia Spiliopoulou, Jie Ma, Miguel Ballesteros, William Yang Wang

In this paper, we introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm, leveraging the duality that the estimates of these scores can be improved by each other.

Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models

1 code implementation10 Oct 2024 Sitao Cheng, Liangming Pan, Xunjian Yin, Xinyi Wang, William Yang Wang

To support this investigation, we introduce ECHOQA, a benchmark spanning scientific, factual, and commonsense knowledge.

Uncovering Factor Level Preferences to Improve Human-Model Alignment

no code implementations9 Oct 2024 Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda Xu, Inha Cha, William Yang Wang, Alice Oh

Our factor level analysis reveals a substantial discrepancy between human and LLM preferences in generation tasks, whereas LLMs show strong alignment with human preferences in evaluation tasks.

Language Modelling Large Language Model +2

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design

1 code implementation8 Oct 2024 Jiachen Li, Qian Long, Jian Zheng, Xiaofeng Gao, Robinson Piramuthu, Wenhu Chen, William Yang Wang

In this paper, we focus on enhancing a diffusion-based text-to-video (T2V) model during the post-training phase by distilling a highly capable consistency model from a pretrained T2V model.

Video Alignment Video Generation

Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement

2 code implementations6 Oct 2024 Xunjian Yin, Xinyi Wang, Liangming Pan, Xiaojun Wan, William Yang Wang

The rapid advancement of large language models (LLMs) has significantly enhanced the capabilities of AI-driven agents across various tasks.

Mathematical Reasoning Meta-Learning

Human Bias in the Face of AI: The Role of Human Judgement in AI Generated Text Evaluation

no code implementations29 Sep 2024 Tiffany Zhu, Iain Weissburg, Kexun Zhang, William Yang Wang

As AI advances in text generation, human trust in AI generated content remains constrained by biases that go beyond concerns of accuracy.

Text Generation

A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models

no code implementations29 Aug 2024 Yi-Lin Tuan, William Yang Wang

Beyond maximum likelihood estimation (MLE), the standard objective of a language model (LM) that optimizes good examples probabilities, many studies have explored ways that also penalize bad examples for enhancing the quality of output distribution, including unlikelihood training, exponential maximizing average treatment effect (ExMATE), and direct preference optimization (DPO).

Language Modeling Language Modelling

Can Editing LLMs Inject Harm?

1 code implementation29 Jul 2024 Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

Then, we find that editing attacks can inject both types of misinformation into LLMs, and the effectiveness is particularly high for commonsense misinformation injection.

Fairness General Knowledge +4

Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data

no code implementations20 Jul 2024 Xinyi Wang, Antonis Antoniades, Yanai Elazar, Alfonso Amayuelas, Alon Albalak, Kexun Zhang, William Yang Wang

Furthermore, while model performance improves across all tasks as LLM size increases, only factual question answering shows an increase in memorization, whereas machine translation and reasoning tasks exhibit greater generalization, producing more novel outputs.

Language Modelling Machine Translation +6

RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering

1 code implementation19 Jul 2024 Rujun Han, Yuhao Zhang, Peng Qi, Yumo Xu, Jenyuan Wang, Lan Liu, William Yang Wang, Bonan Min, Vittorio Castelli

Question answering based on retrieval augmented generation (RAG-QA) is an important research topic in NLP and has a wide range of real-world applications.

Domain Generalization Language Modeling +4

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

1 code implementation8 Jul 2024 Luke Yoffe, Alfonso Amayuelas, William Yang Wang

To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate.

Language Modeling Language Modelling +1

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

1 code implementation2 Jul 2024 Qiucheng Wu, Handong Zhao, Michael Saxon, Trung Bui, William Yang Wang, Yang Zhang, Shiyu Chang

One understudied capability in VLMs is visual spatial planning -- the ability to comprehend the spatial arrangements of objects and devise action plans to achieve desired outcomes in visual scenes.

Investigating the Transferability of Code Repair for Low-Resource Programming Languages

no code implementations21 Jun 2024 Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang

To explain this behavior, we perform a further analysis and find that contrary to preexisting beliefs, the correlation between reasoning ability and code correction ability is weak.

Code Generation Code Repair

Global Human-guided Counterfactual Explanations for Molecular Properties via Reinforcement Learning

1 code implementation19 Jun 2024 Danqing Wang, Antonis Antoniades, Kha-Dinh Luong, Edwin Zhang, Mert Kosan, Jiachen Li, Ambuj Singh, William Yang Wang, Lei LI

RLHEX provides a flexible framework to incorporate different human-designed principles into the counterfactual explanation generation process, aligning these explanations with domain expertise.

counterfactual Counterfactual Explanation +3

BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment

1 code implementation18 Jun 2024 Wenda Xu, Jiachen Li, William Yang Wang, Lei LI

Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets.

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

no code implementations16 Jun 2024 Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions.

Benchmarking Spatial Reasoning

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

1 code implementation12 Jun 2024 Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, JianFeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics.

counterfactual Future prediction +1

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

no code implementations11 Jun 2024 Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan Roth

We present a novel task and benchmark for evaluating the ability of text-to-image(T2I) generation models to produce images that align with commonsense in real life, which we call Commonsense-T2I.

Adversarial Text Text-to-Image Generation +1

Unveiling the Impact of Coding Data Instruction Fine-Tuning on Large Language Models Reasoning

no code implementations30 May 2024 Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold

First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales.

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback

1 code implementation29 May 2024 Jiachen Li, Weixi Feng, Tsu-Jui Fu, Xinyi Wang, Sugato Basu, Wenhu Chen, William Yang Wang

In this work, we aim to break the quality bottleneck of a video consistency model (VCM) to achieve $\textbf{both fast and high-quality video generation}$.

Video Generation

From Text to Pixel: Advancing Long-Context Understanding in MLLMs

1 code implementation23 May 2024 Yujie Lu, Xiujun Li, Tsu-Jui Fu, Miguel Eckstein, William Yang Wang

The rapid progress in Multimodal Large Language Models (MLLMs) has significantly advanced their ability to process and understand complex visual and textual information.

Language Modeling Language Modelling +4

A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law

1 code implementation2 May 2024 Zhiyu Zoey Chen, Jing Ma, Xinlu Zhang, Nan Hao, An Yan, Armineh Nourbakhsh, Xianjun Yang, Julian McAuley, Linda Petzold, William Yang Wang

In the fast-evolving domain of artificial intelligence, large language models (LLMs) such as GPT-3 and GPT-4 are revolutionizing the landscapes of finance, healthcare, and law: domains characterized by their reliance on professional expertise, challenging data acquisition, high-stakes, and stringent regulatory compliance.

Ethics

Automatic Layout Planning for Visually-Rich Documents with Instruction-Following Models

no code implementations23 Apr 2024 Wanrong Zhu, Jennifer Healey, Ruiyi Zhang, William Yang Wang, Tong Sun

Recent advancements in instruction-following models have made user interactions with models more user-friendly and efficient, broadening their applicability.

Instruction Following

Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

1 code implementation11 Apr 2024 Haotian Zhang, Haoxuan You, Philipp Dufter, BoWen Zhang, Chen Chen, Hong-You Chen, Tsu-Jui Fu, William Yang Wang, Shih-Fu Chang, Zhe Gan, Yinfei Yang

While Ferret seamlessly integrates regional understanding into the Large Language Model (LLM) to facilitate its referring and grounding capability, it poses certain limitations: constrained by the pre-trained fixed visual encoder and failed to perform well on broader tasks.

Language Modeling Language Modelling +2

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

1 code implementation5 Apr 2024 Michael Saxon, Fatima Jahara, Mahsa Khoshnoodi, Yujie Lu, Aditya Sharma, William Yang Wang

With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness -- the semantic coherence of generated images to the prompts they were conditioned on.

Benchmarking

Towards Safety and Helpfulness Balanced Responses via Controllable Large Language Models

no code implementations1 Apr 2024 Yi-Lin Tuan, Xilun Chen, Eric Michael Smith, Louis Martin, Soumya Batra, Asli Celikyilmaz, William Yang Wang, Daniel M. Bikel

As large language models (LLMs) become easily accessible nowadays, the trade-off between safety and helpfulness can significantly impact user experience.

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

no code implementations17 Mar 2024 Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang

Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set.

Translation

Reward Guided Latent Consistency Distillation

no code implementations16 Mar 2024 Jiachen Li, Weixi Feng, Wenhu Chen, William Yang Wang

By distilling a latent consistency model (LCM) from a pre-trained teacher latent diffusion model (LDM), LCD facilitates the generation of high-fidelity images within merely 2 to 4 inference steps.

Image Generation

AKEW: Assessing Knowledge Editing in the Wild

1 code implementation29 Feb 2024 Xiaobao Wu, Liangming Pan, William Yang Wang, Anh Tuan Luu

Knowledge editing injects knowledge updates into language models to keep them correct and up-to-date.

counterfactual knowledge editing +1

Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions

2 code implementations28 Feb 2024 Kexun Zhang, Yee Man Choi, Zhenqiao Song, Taiqi He, William Yang Wang, Lei LI

On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary.

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

1 code implementation5 Feb 2024 Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang

To understand how pre-training with a next-token prediction objective contributes to the emergence of such reasoning capability, we propose that we can view an LM as deriving new conclusions by aggregating indirect reasoning paths seen at pre-training time.

Knowledge Graphs Math

Weak-to-Strong Jailbreaking on Large Language Models

1 code implementation30 Jan 2024 Xuandong Zhao, Xianjun Yang, Tianyu Pang, Chao Du, Lei LI, Yu-Xiang Wang, William Yang Wang

In this paper, we propose the weak-to-strong jailbreaking attack, an efficient method to attack aligned LLMs to produce harmful text.

Position: AI/ML Influencers Have a Place in the Academic Process

no code implementations24 Jan 2024 Iain Xie Weissburg, Mehir Arora, Xinyi Wang, Liangming Pan, William Yang Wang

As the number of accepted papers at AI and ML conferences reaches into the thousands, it has become unclear how researchers access and read research publications.

Causal Inference Diversity +1

Efficient Online Data Mixing For Language Model Pre-Training

1 code implementation5 Dec 2023 Alon Albalak, Liangming Pan, Colin Raffel, William Yang Wang

The data used to pretrain large language models has a decisive impact on a model's downstream performance, which has led to a large body of work on data selection methods that aim to automatically determine the most suitable data to use for pretraining.

Language Modeling Language Modelling +1

Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?

1 code implementation29 Nov 2023 Xiujun Li, Yujie Lu, Zhe Gan, Jianfeng Gao, William Yang Wang, Yejin Choi

Recent multimodal large language models (MLLMs) have shown promising instruction following capabilities on vision-language tasks.

In-Context Learning MM-Vet +1

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

no code implementations2 Nov 2023 Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.

Image Generation Image to text

A Survey on Detection of LLMs-Generated Content

1 code implementation24 Oct 2023 Xianjun Yang, Liangming Pan, Xuandong Zhao, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng

The burgeoning capabilities of advanced large language models (LLMs) such as ChatGPT have led to an increase in synthetic content generation with implications across a variety of sectors, including media, cybersecurity, public discourse, and education.

Survey

ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models

1 code implementation14 Oct 2023 Alex Mei, Sharon Levy, William Yang Wang

As large language models are integrated into society, robustness toward a suite of prompts is increasingly important to maintain reliability in a high-variance environment. Robustness evaluations must comprehensively encapsulate the various settings in which a user may invoke an intelligent system.

Red Teaming

Empowering Psychotherapy with Large Language Models: Cognitive Distortion Detection through Diagnosis of Thought Prompting

1 code implementation11 Oct 2023 Zhiyu Chen, Yujie Lu, William Yang Wang

Mental illness remains one of the most critical public health issues of our time, due to the severe scarcity and accessibility limit of professionals.

Guiding Language Model Reasoning with Planning Tokens

no code implementations9 Oct 2023 Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters.

Language Modeling Language Modelling +1

Zero-Shot Detection of Machine-Generated Codes

1 code implementation8 Oct 2023 Xianjun Yang, Kexun Zhang, Haifeng Chen, Linda Petzold, William Yang Wang, Wei Cheng

We then modify the previous zero-shot text detection method, DetectGPT (Mitchell et al., 2023) by utilizing a surrogate white-box model to estimate the probability of the rightmost tokens, allowing us to identify code snippets generated by language models.

Language Modelling Text Detection

Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

no code implementations4 Oct 2023 Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin

This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.

Safety Alignment

Guiding Instruction-based Image Editing via Multimodal Large Language Models

2 code implementations29 Sep 2023 Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe Gan

Extensive experimental results demonstrate that expressive instructions are crucial to instruction-based image editing, and our MGIE can lead to a notable improvement in automatic metrics and human evaluation while maintaining competitive inference efficiency.

Image Manipulation Response Generation

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

1 code implementation12 Jul 2023 Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action.

Decision Making Natural Language Understanding +1

Multilingual Conceptual Coverage in Text-to-Image Models

1 code implementation2 Jun 2023 Michael Saxon, William Yang Wang

We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.

Benchmarking

DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text

1 code implementation27 May 2023 Xianjun Yang, Wei Cheng, Yue Wu, Linda Petzold, William Yang Wang, Haifeng Chen

However, this progress also presents a significant challenge in detecting the origin of a given text, and current research on detection methods lags behind the rapid evolution of LLMs.

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

1 code implementation NeurIPS 2023 Weixi Feng, Wanrong Zhu, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang Wang

When combined with a downstream image generation model, LayoutGPT outperforms text-to-image models/systems by 20-40% and achieves comparable performance as human users in designing visual layouts for numerical and spatial correctness.

Indoor Scene Synthesis Text-to-Image Generation

ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers

1 code implementation NeurIPS 2023 Kexun Zhang, Danqing Wang, Jingtao Xia, William Yang Wang, Lei LI

To address these challenges, we propose ALGO, a framework that synthesizes Algorithmic programs with LLM-Generated Oracles to guide the generation and verify their correctness.

Code Generation

On the Risk of Misinformation Pollution with Large Language Models

1 code implementation23 May 2023 Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, William Yang Wang

In this paper, we comprehensively investigate the potential misuse of modern Large Language Models (LLMs) for generating credible-sounding misinformation and its subsequent impact on information-intensive applications, particularly Open-Domain Question Answering (ODQA) systems.

Misinformation Open-Domain Question Answering

EDIS: Entity-Driven Image Search over Multimodal Web Content

1 code implementation23 May 2023 SiQi Liu, Weixi Feng, Tsu-Jui Fu, Wenhu Chen, William Yang Wang

Making image retrieval methods practical for real-world search applications requires significant progress in dataset scales, entity comprehension, and multimodal information fusion.

Image Retrieval Retrieval

Let's Think Frame by Frame with VIP: A Video Infilling and Prediction Dataset for Evaluating Video Chain-of-Thought

1 code implementation23 May 2023 Vaishnavi Himakunthala, Andy Ouyang, Daniel Rose, Ryan He, Alex Mei, Yujie Lu, Chinmay Sonar, Michael Saxon, William Yang Wang

Despite exciting recent results showing vision-language systems' capacity to reason about images using natural language, their capacity for video reasoning remains under-explored.

Descriptive Video Prediction

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

2 code implementations23 May 2023 Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei LI

By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report.

Text Generation

Fact-Checking Complex Claims with Program-Guided Reasoning

1 code implementation22 May 2023 Liangming Pan, Xiaobao Wu, Xinyuan Lu, Anh Tuan Luu, William Yang Wang, Min-Yen Kan, Preslav Nakov

Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning.

Fact Checking In-Context Learning

Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning

1 code implementation20 May 2023 Liangming Pan, Alon Albalak, Xinyi Wang, William Yang Wang

We also introduce a self-refinement module, which utilizes the symbolic solver's error messages to revise symbolic formalizations.

Logical Reasoning

Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation

no code implementations18 May 2023 Wanrong Zhu, Xinyi Wang, Yujie Lu, Tsu-Jui Fu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

We conduct a series of experiments to compare the common edits made by humans and GPT-k, evaluate the performance of GPT-k in prompting T2I, and examine factors that may influence this process.

Text Generation Text-to-Image Generation

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

1 code implementation NeurIPS 2023 Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang Wang

Existing automatic evaluation on text-to-image synthesis can only provide an image-text matching score, without considering the object-level compositionality, which results in poor correlation with human judgments.

Attribute Image Generation +2

Data Augmentation for Diverse Voice Conversion in Noisy Environments

no code implementations18 May 2023 Avani Tanna, Michael Saxon, Amr El Abbadi, William Yang Wang

Voice conversion (VC) models have demonstrated impressive few-shot conversion quality on the clean, native speech populations they're trained on.

Data Augmentation Decoder +2

Multimodal Procedural Planning via Dual Text-Image Prompting

1 code implementation2 May 2023 Yujie Lu, Pan Lu, Zhiyu Chen, Wanrong Zhu, Xin Eric Wang, William Yang Wang

The key challenges of MPP are to ensure the informativeness, temporal coherence, and accuracy of plans across modalities.

Image to text Informativeness +1

Users are the North Star for AI Transparency

no code implementations9 Mar 2023 Alex Mei, Michael Saxon, Shiyu Chang, Zachary C. Lipton, William Yang Wang

We conduct a broad literature survey, identifying many clusters of similar conceptions of transparency, tying each back to our north star with analysis of how it furthers or hinders our ideal AI transparency goals.

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data

1 code implementation NeurIPS 2023 Alon Albalak, Colin Raffel, William Yang Wang

In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization.

Few-Shot Learning

SWING: Balancing Coverage and Faithfulness for Dialogue Summarization

1 code implementation25 Jan 2023 Kung-Hsiang Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown

Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries.

Natural Language Inference

CausalDialogue: Modeling Utterance-level Causality in Conversations

1 code implementation20 Dec 2022 Yi-Lin Tuan, Alon Albalak, Wenda Xu, Michael Saxon, Connor Pryor, Lise Getoor, William Yang Wang

Despite their widespread adoption, neural conversation models have yet to exhibit natural chat capabilities with humans.

Dialogue Generation Diversity

Tokenization Consistency Matters for Generative Models on Extractive NLP Tasks

1 code implementation19 Dec 2022 Kaiser Sun, Peng Qi, Yuhao Zhang, Lan Liu, William Yang Wang, Zhiheng Huang

We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets, with a notable average of +1. 7 F2 gain when a BART model is trained on SQuAD and evaluated on 8 QA datasets.

Extractive Question-Answering Hallucination +1

Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI

1 code implementation19 Dec 2022 Alex Mei, Sharon Levy, William Yang Wang

Users' physical safety is an increasing concern as the market for intelligent systems continues to grow, where unconstrained systems may recommend users dangerous actions that can lead to serious injury.

Attribute

SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic Mistakes

1 code implementation19 Dec 2022 Wenda Xu, Xian Qian, Mingxuan Wang, Lei LI, William Yang Wang

In this paper, we propose SESCORE2, a self-supervised approach for training a model-based metric for text generation evaluation.

Dialogue Generation Machine Translation +2

Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations

no code implementations17 Dec 2022 Jifan Chen, Yuhao Zhang, Lan Liu, Rui Dong, Xinchi Chen, Patrick Ng, William Yang Wang, Zhiheng Huang

There has been great progress in unifying various table-to-text tasks using a single encoder-decoder model trained via multi-task learning (Xie et al., 2022).

Decoder Multi-Task Learning

Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis

1 code implementation9 Dec 2022 Weixi Feng, Xuehai He, Tsu-Jui Fu, Varun Jampani, Arjun Akula, Pradyumna Narayana, Sugato Basu, Xin Eric Wang, William Yang Wang

In this work, we improve the compositional skills of T2I models, specifically more accurate attribute binding and better image compositions.

Attribute Image Generation

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

no code implementations29 Nov 2022 Jiachen Li, Edwin Zhang, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William Yang Wang

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.

D4RL Offline RL +3

Bridging the Training-Inference Gap for Dense Phrase Retrieval

no code implementations25 Oct 2022 Gyuwan Kim, Jinhyuk Lee, Barlas Oguz, Wenhan Xiong, Yizhe Zhang, Yashar Mehdad, William Yang Wang

Building dense retrievers requires a series of standard procedures, including training and validating neural models and creating indexes for efficient search.

Open-Domain Question Answering Passage Retrieval +1

WikiWhy: Answering and Explaining Cause-and-Effect Questions

no code implementations21 Oct 2022 Matthew Ho, Aditya Sharma, Justin Chang, Michael Saxon, Sharon Levy, Yujie Lu, William Yang Wang

As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging.

Question Answering

An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding

no code implementations21 Oct 2022 Josiah Ross, Luke Yoffe, Alon Albalak, William Yang Wang

Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency.

Transfer Learning

CPL: Counterfactual Prompt Learning for Vision and Language Models

no code implementations19 Oct 2022 Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang

Prompt tuning is a new few-shot transfer learning technique that only tunes the learnable prompt for pre-trained vision and language models such as CLIP.

counterfactual Image-text Retrieval +1

SafeText: A Benchmark for Exploring Physical Safety in Language Models

no code implementations18 Oct 2022 Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang

Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe.

Text Generation

ULN: Towards Underspecified Vision-and-Language Navigation

1 code implementation18 Oct 2022 Weixi Feng, Tsu-Jui Fu, Yujie Lu, William Yang Wang

Vision-and-Language Navigation (VLN) is a task to guide an embodied agent moving to a target position using language instructions.

Vision and Language Navigation

Mitigating Covertly Unsafe Text within Natural Language Systems

no code implementations17 Oct 2022 Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang

An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences.

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

1 code implementation7 Oct 2022 Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching.

Conversational Question Answering

Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning

1 code implementation NeurIPS 2023 Zih-Yun Chiu, Yi-Lin Tuan, William Yang Wang, Michael C. Yip

In this work, we present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.

Deep Reinforcement Learning reinforcement-learning +1

Dynamic Latent Separation for Deep Learning

no code implementations7 Oct 2022 Yi-Lin Tuan, Zih-Yun Chiu, William Yang Wang

A core problem in machine learning is to learn expressive latent variables for model prediction on complex data that involves multiple sub-components in a flexible and interpretable fashion.

Deep Learning Diversity +1

Anticipating the Unseen Discrepancy for Vision and Language Navigation

no code implementations10 Sep 2022 Yujie Lu, Huiliang Zhang, Ping Nie, Weixi Feng, Wenda Xu, Xin Eric Wang, William Yang Wang

In this paper, we propose an Unseen Discrepancy Anticipating Vision and Language Navigation (DAVIS) that learns to generalize to unseen environments via encouraging test-time visual consistency.

Data Augmentation Decision Making +3

Causal Balancing for Domain Generalization

1 code implementation10 Jun 2022 Xinyi Wang, Michael Saxon, Jiachen Li, Hongyang Zhang, Kun Zhang, William Yang Wang

While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations.

Domain Generalization

Neuro-Symbolic Procedural Planning with Commonsense Prompting

no code implementations6 Jun 2022 Yujie Lu, Weixi Feng, Wanrong Zhu, Wenda Xu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Procedural planning aims to implement complex high-level goals by decomposition into sequential simpler low-level steps.

Graph Sampling

Towards Understanding Gender-Seniority Compound Bias in Natural Language Generation

1 code implementation LREC 2022 Samhita Honnavalli, Aesha Parekh, Lily Ou, Sophie Groenwold, Sharon Levy, Vicente Ordonez, William Yang Wang

Our results show that GPT-2 amplifies bias by considering women as junior and men as senior more often than the ground truth in both domains.

Text Generation

FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue

1 code implementation12 May 2022 Alon Albalak, Yi-Lin Tuan, Pegah Jandaghi, Connor Pryor, Luke Yoffe, Deepak Ramachandran, Lise Getoor, Jay Pujara, William Yang Wang

Task transfer, transferring knowledge contained in related tasks, holds the promise of reducing the quantity of labeled data required to fine-tune language models.

Dialogue Understanding Domain Adaptation +1

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data

no code implementations Findings (ACL) 2022 Kai Nakamura, Sharon Levy, Yi-Lin Tuan, Wenhu Chen, William Yang Wang

A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities.

Response Generation Retrieval

Imagination-Augmented Natural Language Understanding

1 code implementation NAACL 2022 Yujie Lu, Wanrong Zhu, Xin Eric Wang, Miguel Eckstein, William Yang Wang

Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations.

Natural Language Understanding

End-to-end Dense Video Captioning as Sequence Generation

no code implementations COLING 2022 Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut

Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.

Ranked #4 on Dense Video Captioning on ViTT (CIDEr metric, using extra training data)

Dense Video Captioning Descriptive

Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains

1 code implementation26 Jan 2022 Alon Albalak, Sharon Levy, William Yang Wang

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains.

Question Answering Retrieval +1

Relational Graph Learning for Grounded Video Description Generation

no code implementations2 Dec 2021 Wenqiao Zhang, Xin Eric Wang, Siliang Tang, Haizhou Shi, Haocheng Shi, Jun Xiao, Yueting Zhuang, William Yang Wang

Such a setting can help explain the decisions of captioning models and prevents the model from hallucinating object words in its description.

Graph Learning Hallucination +3

VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling

1 code implementation24 Nov 2021 Tsu-Jui Fu, Linjie Li, Zhe Gan, Kevin Lin, William Yang Wang, Lijuan Wang, Zicheng Liu

Further, unlike previous studies that found pre-training tasks on video inputs (e. g., masked frame modeling) not very effective, we design a new pre-training task, Masked Visual-token Modeling (MVM), for better video modeling.

Question Answering Retrieval +5

MIC: Model-agnostic Integrated Cross-channel Recommenders

no code implementations22 Oct 2021 Yujie Lu, Ping Nie, Shengyu Zhang, Ming Zhao, Ruobing Xie, William Yang Wang, Yi Ren

However, existing work are primarily built upon pre-defined retrieval channels, including User-CF (U2U), Item-CF (I2I), and Embedding-based Retrieval (U2I), thus access to the limited correlation between users and items which solely entail from partial information of latent interactions.

Recommendation Systems Retrieval +2

Attacking Open-domain Question Answering by Injecting Misinformation

1 code implementation15 Oct 2021 Liangming Pan, Wenhu Chen, Min-Yen Kan, William Yang Wang

We curate both human-written and model-generated false documents that we inject into the evidence corpus of QA models and assess the impact on the performance of these systems.

Misinformation Open-Domain Question Answering

Self-Supervised Knowledge Assimilation for Expert-Layman Text Style Transfer

1 code implementation6 Oct 2021 Wenda Xu, Michael Saxon, Misha Sra, William Yang Wang

This is a particularly notable issue in the medical domain, where layman are often confused by medical text online.

Language Modelling Self-Supervised Learning +2

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

1 code implementation EMNLP 2021 Alex Jones, William Yang Wang, Kyle Mahowald

We verify some of our linguistic findings by looking at the effect of morphological segmentation on English-Inuktitut alignment, in addition to examining the effect of word order agreement on isomorphism for 66 zero-shot language pairs from a different corpus.

Retrieval Sentence

D-REX: Dialogue Relation Extraction with Explanations

1 code implementation NLP4ConvAI (ACL) 2022 Alon Albalak, Varun Embar, Yi-Lin Tuan, Lise Getoor, William Yang Wang

Existing research studies on cross-sentence relation extraction in long-form multi-party conversations aim to improve relation extraction without considering the explainability of such methods.

Dialog Relation Extraction Relation +3

FinQA: A Dataset of Numerical Reasoning over Financial Data

1 code implementation EMNLP 2021 Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations.

Question Answering

Neural Stylistic Response Generation with Disentangled Latent Variables

no code implementations ACL 2021 Qingfu Zhu, Wei-Nan Zhang, Ting Liu, William Yang Wang

Generating open-domain conversational responses in the desired style usually suffers from the lack of p