Search Results for author: Huan Sun

Found 68 papers, 51 papers with code

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

1 code implementation17 Sep 2024 Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun

In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments.

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

1 code implementation4 Sep 2024 Xiang Yue, Tianyu Zheng, Yuansheng Ni, YuBo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig

This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark.

Optical Character Recognition (OCR)

Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization

1 code implementation23 May 2024 Boshi Wang, Xiang Yue, Yu Su, Huan Sun

The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.

AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs

1 code implementation11 Apr 2024 Zeyi Liao, Huan Sun

Moreover, we utilize those successful suffixes as training data to learn a generative model, named AmpleGCG, which captures the distribution of adversarial suffixes given a harmful query and enables the rapid generation of hundreds of suffixes for any harmful queries in seconds.

Safety Alignment

Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents

no code implementations5 Apr 2024 Harsh Kohli, Huan Sun

The rapid progress of large language models (LLMs) has seen them excel and frequently surpass human performance on standard benchmarks.

Multiple-choice Navigate

AttributionBench: How Hard is Automatic Attribution Evaluation?

1 code implementation23 Feb 2024 Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence.

Binary Classification Language Modelling +1

A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models

1 code implementation18 Feb 2024 Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun

Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy.

When is Tree Search Useful for LLM Planning? It Depends on the Discriminator

1 code implementation16 Feb 2024 Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun

In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method.

Mathematical Reasoning Re-Ranking +2

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

1 code implementation15 Feb 2024 Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun

There is a surprisingly large gap between the speed and scale of their development and deployment and our understanding of their safety risks.

LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset

1 code implementation14 Feb 2024 Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, Huan Sun

Using SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks.

Drug Discovery

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

1 code implementation13 Feb 2024 Bo Peng, Xinyi Ling, Ziru Chen, Huan Sun, Xia Ning

Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce.

Domain Generalization

GPT-4V(ision) is a Generalist Web Agent, if Grounded

1 code implementation3 Jan 2024 Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su

The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering.

Image Captioning Question Answering +1

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities

1 code implementation15 Nov 2023 Lingbo Mo, Boshi Wang, Muhao Chen, Huan Sun

The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward.

Ethics Fairness +3

TableLlama: Towards Open Large Generalist Models for Tables

no code implementations15 Nov 2023 Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun

Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

1 code implementation11 Sep 2023 Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.

Math Mathematical Reasoning

AgentBench: Evaluating LLMs as Agents

1 code implementation7 Aug 2023 Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang

We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.

Decision Making Instruction Following

Biomedical Language Models are Robust to Sub-optimal Tokenization

1 code implementation30 Jun 2023 Bernal Jiménez Gutiérrez, Huan Sun, Yu Su

As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise.

Entity Linking Language Modelling +4

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing

1 code implementation NeurIPS 2023 Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su

To address this issue, we introduce MagicBrush (https://osu-nlp-group. github. io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing.

text-guided-image-editing

Mind2Web: Towards a Generalist Agent for the Web

1 code implementation NeurIPS 2023 Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, Yu Su

We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website.

Federated Learning for Semantic Parsing: Task Formulation, Evaluation Setup, New Algorithms

1 code implementation26 May 2023 Tianshu Zhang, Changchang Liu, Wei-Han Lee, Yu Su, Huan Sun

By leveraging data from multiple clients, the FL paradigm can be especially beneficial for clients that have little training data to develop a data-hungry neural semantic parser on their own.

Federated Learning Text-To-SQL

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

no code implementations23 May 2023 Chang-You Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, Huan Sun

Thus, we systematically study how to enhance LLMs' reasoning ability through chain of thought (CoT) style prompting, including the original chain-of-thought prompting (Wei et al., 2022b) and least-to-most prompting (Zhou et al., 2023).

In-Context Learning SQL Parsing +1

Error Detection for Text-to-SQL Semantic Parsing

1 code implementation23 May 2023 Shijie Chen, Ziru Chen, Huan Sun, Yu Su

Despite remarkable progress in text-to-SQL semantic parsing in recent years, the performance of existing parsers is still far from perfect.

Language Modelling Text-To-SQL

Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate

no code implementations22 May 2023 Boshi Wang, Xiang Yue, Huan Sun

Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks.

Benchmarking Math +1

Text-to-SQL Error Correction with Language Models of Code

1 code implementation22 May 2023 Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun

Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code.

SQL Parsing Text-To-SQL

Automatic Evaluation of Attribution by Large Language Models

1 code implementation10 May 2023 Xiang Yue, Boshi Wang, Ziru Chen, Kai Zhang, Yu Su, Huan Sun

We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.

Fact Checking Language Modelling +3

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

no code implementations6 Mar 2023 Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks.

Transfer Learning

Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters

2 code implementations20 Dec 2022 Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun

Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

1 code implementation25 Oct 2022 Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.

Language Modelling Text Generation

Bootstrapping a User-Centered Task-Oriented Dialogue System

no code implementations11 Jul 2022 Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun

We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.

Data Augmentation Dialogue Management +2

$\mathsf{G^2Retro}$ as a Two-Step Graph Generative Models for Retrosynthesis Prediction

1 code implementation10 Jun 2022 Ziqi Chen, Oluwatosin R. Ayinde, James R. Fuchs, Huan Sun, Xia Ning

It first predicts the reaction centers in the target molecules (products), identifies the synthons needed to assemble the products, and transforms these synthons into reactants.

Retrosynthesis Vocal Bursts Valence Prediction

Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again

1 code implementation16 Mar 2022 Bernal Jiménez Gutiérrez, Nikolas McNeal, Clay Washington, You Chen, Lang Li, Huan Sun, Yu Su

In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i. e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction.

In-Context Learning Model Selection +5

Iteratively Prompt Pre-trained Language Models for Chain of Thought

1 code implementation16 Mar 2022 Boshi Wang, Xiang Deng, Huan Sun

While Pre-trained Language Models (PLMs) internalize a great amount of world knowledge, they have been shown incapable of recalling these knowledge to solve tasks requiring complex & multi-step reasoning.

World Knowledge

Synthetic Question Value Estimation for Domain Adaptation of Question Answering

1 code implementation ACL 2022 Xiang Yue, Ziyu Yao, Huan Sun

Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.

Domain Adaptation Question Answering

DOM-LM: Learning Generalizable Representations for HTML Documents

1 code implementation25 Jan 2022 Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun

We argue that the text and HTML structure together convey important semantics of the content and therefore warrant a special treatment for their representation learning.

Attribute Attribute Extraction +3

TopNet: Learning from Neural Topic Model to Generate Long Stories

no code implementations14 Dec 2021 Yazheng Yang, Boyuan Pan, Deng Cai, Huan Sun

In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model).

Decoder Story Generation

Towards Transparent Interactive Semantic Parsing via Step-by-Step Correction

1 code implementation Findings (ACL) 2022 Lingbo Mo, Ashley Lewis, Huan Sun, Michael White

In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps.

Question Answering Semantic Parsing

ReasonBERT: Pre-trained to Reason with Distant Supervision

1 code implementation EMNLP 2021 Xiang Deng, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun

We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts.

Extractive Question-Answering Question Answering +1

Differential Privacy for Text Analytics via Natural Text Sanitization

1 code implementation Findings (ACL) 2021 Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow

The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.

Language Modelling Privacy Preserving

Learning Structural Edits via Incremental Tree Transformations

1 code implementation ICLR 2021 Ziyu Yao, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig

To show the unique benefits of modeling tree edits directly, we further propose a novel edit encoder for learning to represent edits, as well as an imitation learning method that allows the editor to be more robust.

Imitation Learning

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

2 code implementations30 Oct 2020 Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun

Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.

Domain Adaptation Question Answering +2

Structure-Grounded Pretraining for Text-to-SQL

no code implementations NAACL 2021 Xiang Deng, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson

Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing text-to-SQL datasets for cross-database evaluation.

Text-To-SQL

COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval

1 code implementation EMNLP 2021 Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun

For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.

16k Retrieval

Learning a Cost-Effective Annotation Policy for Question Answering

1 code implementation EMNLP 2020 Bernhard Kratzwald, Stefan Feuerriegel, Huan Sun

State-of-the-art question answering (QA) relies upon large amounts of training data for which labeling is time consuming and thus expensive.

Question Answering

Energy Efficiency Optimization in IRS-Enhanced mmWave Systems with Lens Antenna Array

no code implementations2 Jul 2020 Yazheng Wang, Hancheng Lu, Dan Zhao, Huan Sun

To address this problem, we propose an intelligent reflect surface (IRS) enhanced multi-user mmWave communication system with lens antenna array.

Blocking

Joint Passive Beamforming and User Association Optimization for IRS-assisted mmWave Systems

no code implementations2 Jul 2020 Dan Zhao, Hancheng Lu, Yazheng Wang, Huan Sun

Considering the impact of IRS on user association, we formulate a sum rate maximization problem by jointly optimizing the passive beamforming at IRS and user association, which is an intractable non-convex problem.

TURL: Table Understanding through Representation Learning

1 code implementation26 Jun 2020 Xiang Deng, Huan Sun, Alyssa Lees, You Wu, Cong Yu

In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables.

Cell Entity Annotation Columns Property Annotation +3

Rationalizing Medical Relation Prediction from Corpus-level Statistics

1 code implementation ACL 2020 Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun

Nowadays, the interpretability of machine learning models is becoming increasingly important, especially in the medical domain.

Decision Making Relation

An Imitation Game for Learning Semantic Parsers from User Interaction

1 code implementation EMNLP 2020 Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su

Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers are still a tedious process with challenges such as costly data annotation and privacy risks.

Imitation Learning Text-To-SQL

Practical Annotation Strategies for Question Answering Datasets

no code implementations6 Mar 2020 Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel

Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.

Question Answering

Easy-to-Hard: Leveraging Simple Questions for Complex Question Generation

no code implementations5 Dec 2019 Jie Zhao, Xiang Deng, Huan Sun

This paper makes one of the first efforts toward automatically generating complex questions from knowledge graphs.

Data Augmentation Knowledge Graphs +2

An End-to-End Framework for Cold Question Routing in Community Question Answering Services

no code implementations22 Nov 2019 Jiankai Sun, Jie Zhao, Huan Sun, Srinivasan Parthasarathy

Routing newly posted questions (a. k. a cold questions) to potential answerers with the suitable expertise in Community Question Answering sites (CQAs) is an important and challenging task.

Community Question Answering Graph Embedding

Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study

2 code implementations IJCNLP 2019 Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih

As a promising paradigm, interactive semantic parsing has shown to improve both semantic parsing accuracy and user confidence in the results.

Text-To-SQL

Automatic Table completion using Knowledge Base

no code implementations20 Sep 2019 Bortik Bandyopadhyay, Xiang Deng, Goonmeet Bajaj, Huan Sun, Srinivasan Parthasarathy

In this work, we propose to resolve a new type of heterogeneous query viz: tabular query, which contains a natural language query description, column names of the desired table, and an example row.

Decision Making

Leveraging 2-hop Distant Supervision from Table Entity Pairs for Relation Extraction

1 code implementation IJCNLP 2019 Xiang Deng, Huan Sun

Given two entities, distant supervision exploits sentences that directly mention them for predicting their semantic relation.

Relation Relation Extraction

Reinforced Dynamic Reasoning for Conversational Question Generation

1 code implementation ACL 2019 Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun

This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i. e., previous turns of question-answer pairs).

Decoder Question Answering +3

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

1 code implementation21 Jun 2019 Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun

To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.

Graph Embedding on Biomedical Networks: Methods, Applications, and Evaluations

4 code implementations12 Jun 2019 Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun

Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.

Graph Embedding Link Prediction +2

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

1 code implementation13 Mar 2019 Ziyu Yao, Jayavardhan Reddy Peddamail, Huan Sun

In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called `CoaCor'), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others.

reinforcement-learning Reinforcement Learning +2

StaQC: A Systematically Mined Question-Code Dataset from Stack Overflow

1 code implementation26 Mar 2018 Ziyu Yao, Daniel S. Weld, Wei-Peng Chen, Huan Sun

In this paper, we investigate a new problem of systematically mining question-code pairs from Stack Overflow (in contrast to heuristically collecting them).

Retrieval

An End-to-End Deep Framework for Answer Triggering with a Novel Group-Level Objective

no code implementations EMNLP 2017 Jie Zhao, Yu Su, Ziyu Guan, Huan Sun

Given a question and a set of answer candidates, answer triggering determines whether the candidate set contains any correct answers.

Multiple Instance Learning Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.