no code implementations • NAACL (SUKI) 2022 • Lingbo Mo, Zhen Wang, Jie Zhao, Huan Sun
More fine-grained analyses on transfer behaviors reveal the types of transferred knowledge and transfer patterns.
1 code implementation • 17 Sep 2024 • Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun
In this work, we narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments.
1 code implementation • 4 Sep 2024 • Xiang Yue, Tianyu Zheng, Yuansheng Ni, YuBo Wang, Kai Zhang, Shengbang Tong, Yuxuan Sun, Botao Yu, Ge Zhang, Huan Sun, Yu Su, Wenhu Chen, Graham Neubig
This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark.
1 code implementation • 23 May 2024 • Boshi Wang, Xiang Yue, Yu Su, Huan Sun
The levels of generalization also vary across reasoning types: when faced with out-of-distribution examples, transformers fail to systematically generalize for composition but succeed for comparison.
1 code implementation • 11 Apr 2024 • Zeyi Liao, Huan Sun
Moreover, we utilize those successful suffixes as training data to learn a generative model, named AmpleGCG, which captures the distribution of adversarial suffixes given a harmful query and enables the rapid generation of hundreds of suffixes for any harmful queries in seconds.
no code implementations • 5 Apr 2024 • Harsh Kohli, Huan Sun
The rapid progress of large language models (LLMs) has seen them excel and frequently surpass human performance on standard benchmarks.
1 code implementation • 23 Feb 2024 • Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun
Modern generative search engines enhance the reliability of large language model (LLM) responses by providing cited evidence.
1 code implementation • 18 Feb 2024 • Jaylen Jones, Lingbo Mo, Eric Fosler-Lussier, Huan Sun
Counter narratives - informed responses to hate speech contexts designed to refute hateful claims and de-escalate encounters - have emerged as an effective hate speech intervention strategy.
1 code implementation • 16 Feb 2024 • Ziru Chen, Michael White, Raymond Mooney, Ali Payani, Yu Su, Huan Sun
In this paper, we examine how large language models (LLMs) solve multi-step problems under a language agent framework with three components: a generator, a discriminator, and a planning method.
1 code implementation • 15 Feb 2024 • Lingbo Mo, Zeyi Liao, Boyuan Zheng, Yu Su, Chaowei Xiao, Huan Sun
There is a surprisingly large gap between the speed and scale of their development and deployment and our understanding of their safety risks.
1 code implementation • 14 Feb 2024 • Botao Yu, Frazier N. Baker, Ziqi Chen, Xia Ning, Huan Sun
Using SMolInstruct, we fine-tune a set of open-source LLMs, among which, we find that Mistral serves as the best base model for chemistry tasks.
1 code implementation • 13 Feb 2024 • Bo Peng, Xinyi Ling, Ziru Chen, Huan Sun, Xia Ning
Both the ECInstruct dataset and the eCeLLM models show great potential in empowering versatile and effective LLMs for e-commerce.
1 code implementation • 3 Jan 2024 • Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering.
4 code implementations • CVPR 2024 • Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.
1 code implementation • 15 Nov 2023 • Lingbo Mo, Boshi Wang, Muhao Chen, Huan Sun
The rapid progress in open-source Large Language Models (LLMs) is significantly driving AI development forward.
no code implementations • 15 Nov 2023 • Tianshu Zhang, Xiang Yue, Yifei Li, Huan Sun
Towards that end, we construct TableInstruct, a new dataset with a variety of realistic tables and tasks, for instruction tuning and evaluating LLMs.
1 code implementation • 11 Sep 2023 • Xiang Yue, Xingwei Qu, Ge Zhang, Yao Fu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen
The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset.
1 code implementation • 7 Aug 2023 • Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, Minlie Huang, Yuxiao Dong, Jie Tang
We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting.
1 code implementation • 29 Jul 2023 • Lingbo Mo, Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Sunit Singh, Samuel Stevens, Chang-You Tai, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We introduce TacoBot, a user-centered task-oriented digital assistant designed to guide users through complex real-world tasks with multiple steps.
1 code implementation • 30 Jun 2023 • Bernal Jiménez Gutiérrez, Huan Sun, Yu Su
As opposed to general English, many concepts in biomedical terminology have been designed in recent history by biomedical professionals with the goal of being precise and concise.
1 code implementation • NeurIPS 2023 • Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, Yu Su
To address this issue, we introduce MagicBrush (https://osu-nlp-group. github. io/MagicBrush/), the first large-scale, manually annotated dataset for instruction-guided real image editing that covers diverse scenarios: single-turn, multi-turn, mask-provided, and mask-free editing.
1 code implementation • NeurIPS 2023 • Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, Yu Su
We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website.
1 code implementation • 26 May 2023 • Tianshu Zhang, Changchang Liu, Wei-Han Lee, Yu Su, Huan Sun
By leveraging data from multiple clients, the FL paradigm can be especially beneficial for clients that have little training data to develop a data-hungry neural semantic parser on their own.
no code implementations • 23 May 2023 • Chang-You Tai, Ziru Chen, Tianshu Zhang, Xiang Deng, Huan Sun
Thus, we systematically study how to enhance LLMs' reasoning ability through chain of thought (CoT) style prompting, including the original chain-of-thought prompting (Wei et al., 2022b) and least-to-most prompting (Zhou et al., 2023).
1 code implementation • 23 May 2023 • Shijie Chen, Ziru Chen, Huan Sun, Yu Su
Despite remarkable progress in text-to-SQL semantic parsing in recent years, the performance of existing parsers is still far from perfect.
no code implementations • 22 May 2023 • Boshi Wang, Xiang Yue, Huan Sun
Large language models (LLMs) such as ChatGPT and GPT-4 have shown impressive performance in complex reasoning tasks.
1 code implementation • 22 May 2023 • Ziru Chen, Shijie Chen, Michael White, Raymond Mooney, Ali Payani, Jayanth Srinivasa, Yu Su, Huan Sun
Thus, we propose a novel representation for SQL queries and their edits that adheres more closely to the pre-training corpora of language models of code.
1 code implementation • 10 May 2023 • Xiang Yue, Boshi Wang, Ziru Chen, Kai Zhang, Yu Su, Huan Sun
We manually curate a set of test examples covering 12 domains from a generative search engine, New Bing.
no code implementations • 6 Mar 2023 • Zhen Wang, Rameswar Panda, Leonid Karlinsky, Rogerio Feris, Huan Sun, Yoon Kim
Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks.
2 code implementations • 20 Dec 2022 • Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs).
1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.
no code implementations • 11 Jul 2022 • Shijie Chen, Ziru Chen, Xiang Deng, Ashley Lewis, Lingbo Mo, Samuel Stevens, Zhen Wang, Xiang Yue, Tianshu Zhang, Yu Su, Huan Sun
We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks.
1 code implementation • 10 Jun 2022 • Ziqi Chen, Oluwatosin R. Ayinde, James R. Fuchs, Huan Sun, Xia Ning
It first predicts the reaction centers in the target molecules (products), identifies the synthons needed to assemble the products, and transforms these synthons into reactants.
1 code implementation • 16 Mar 2022 • Bernal Jiménez Gutiérrez, Nikolas McNeal, Clay Washington, You Chen, Lang Li, Huan Sun, Yu Su
In this paper, we present the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with fine-tuning smaller (i. e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction.
1 code implementation • 16 Mar 2022 • Boshi Wang, Xiang Deng, Huan Sun
While Pre-trained Language Models (PLMs) internalize a great amount of world knowledge, they have been shown incapable of recalling these knowledge to solve tasks requiring complex & multi-step reasoning.
1 code implementation • ACL 2022 • Xiang Yue, Ziyu Yao, Huan Sun
Synthesizing QA pairs with a question generator (QG) on the target domain has become a popular approach for domain adaptation of question answering (QA) models.
1 code implementation • 25 Jan 2022 • Xiang Deng, Prashant Shiralkar, Colin Lockard, Binxuan Huang, Huan Sun
We argue that the text and HTML structure together convey important semantics of the content and therefore warrant a special treatment for their representation learning.
Ranked #2 on Attribute Extraction on SWDE
no code implementations • 14 Dec 2021 • Yazheng Yang, Boyuan Pan, Deng Cai, Huan Sun
In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model).
1 code implementation • Findings (ACL) 2022 • Lingbo Mo, Ashley Lewis, Huan Sun, Michael White
In this work, we investigate an interactive semantic parsing framework that explains the predicted logical form step by step in natural language and enables the user to make corrections through natural-language feedback for individual steps.
1 code implementation • EMNLP 2021 • Xiang Deng, Yu Su, Alyssa Lees, You Wu, Cong Yu, Huan Sun
We present ReasonBert, a pre-training method that augments language models with the ability to reason over long-range relations and multiple, possibly hybrid contexts.
Ranked #1 on Semantic Parsing on GraphQuestions
1 code implementation • Findings (ACL) 2021 • Xiang Yue, Minxin Du, Tianhao Wang, Yaliang Li, Huan Sun, Sherman S. M. Chow
The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.
1 code implementation • ICLR 2021 • Ziyu Yao, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig
To show the unique benefits of modeling tree edits directly, we further propose a novel edit encoder for learning to represent edits, as well as an imitation learning method that allows the editor to be more robust.
2 code implementations • 30 Oct 2020 • Xiang Yue, Xinliang Frederick Zhang, Ziyu Yao, Simon Lin, Huan Sun
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.
no code implementations • NAACL 2021 • Xiang Deng, Ahmed Hassan Awadallah, Christopher Meek, Oleksandr Polozov, Huan Sun, Matthew Richardson
Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing text-to-SQL datasets for cross-database evaluation.
1 code implementation • EMNLP 2021 • Xinliang Frederick Zhang, Heming Sun, Xiang Yue, Simon Lin, Huan Sun
For evaluation, we introduce Query Bank and Relevance Set, where the former contains 1, 236 human-paraphrased queries while the latter contains ~32 human-annotated FAQ items for each query.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jie Zhao, Huan Sun
Code retrieval is a key task aiming to match natural and programming languages.
1 code implementation • EMNLP 2020 • Bernhard Kratzwald, Stefan Feuerriegel, Huan Sun
State-of-the-art question answering (QA) relies upon large amounts of training data for which labeling is time consuming and thus expensive.
no code implementations • 2 Jul 2020 • Yazheng Wang, Hancheng Lu, Dan Zhao, Huan Sun
To address this problem, we propose an intelligent reflect surface (IRS) enhanced multi-user mmWave communication system with lens antenna array.
no code implementations • 2 Jul 2020 • Dan Zhao, Hancheng Lu, Yazheng Wang, Huan Sun
Considering the impact of IRS on user association, we formulate a sum rate maximization problem by jointly optimizing the passive beamforming at IRS and user association, which is an intractable non-convex problem.
1 code implementation • 26 Jun 2020 • Xiang Deng, Huan Sun, Alyssa Lees, You Wu, Cong Yu
In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables.
Ranked #1 on Column Type Annotation on WikipediaGS-CTA
1 code implementation • ACL 2020 • Zhen Wang, Jennifer Lee, Simon Lin, Huan Sun
Nowadays, the interpretability of machine learning models is becoming increasingly important, especially in the medical domain.
1 code implementation • EMNLP 2020 • Ziyu Yao, Yiqi Tang, Wen-tau Yih, Huan Sun, Yu Su
Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers are still a tedious process with challenges such as costly data annotation and privacy risks.
1 code implementation • ACL 2020 • Xiang Yue, Bernal Jimenez Gutierrez, Huan Sun
In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task.
no code implementations • 6 Mar 2020 • Bernhard Kratzwald, Xiang Yue, Huan Sun, Stefan Feuerriegel
Here, remarkably, annotating a stratified subset with only 1. 2% of the original training set achieves 97. 7% of the performance as if the complete dataset was annotated.
no code implementations • 5 Dec 2019 • Jie Zhao, Xiang Deng, Huan Sun
This paper makes one of the first efforts toward automatically generating complex questions from knowledge graphs.
no code implementations • 22 Nov 2019 • Jiankai Sun, Jie Zhao, Huan Sun, Srinivasan Parthasarathy
Routing newly posted questions (a. k. a cold questions) to potential answerers with the suitable expertise in Community Question Answering sites (CQAs) is an important and challenging task.
2 code implementations • IJCNLP 2019 • Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih
As a promising paradigm, interactive semantic parsing has shown to improve both semantic parsing accuracy and user confidence in the results.
no code implementations • 20 Sep 2019 • Bortik Bandyopadhyay, Xiang Deng, Goonmeet Bajaj, Huan Sun, Srinivasan Parthasarathy
In this work, we propose to resolve a new type of heterogeneous query viz: tabular query, which contains a natural language query description, column names of the desired table, and an example row.
1 code implementation • IJCNLP 2019 • Xiang Deng, Huan Sun
Given two entities, distant supervision exploits sentences that directly mention them for predicting their semantic relation.
1 code implementation • ACL 2019 • Boyuan Pan, Hao Li, Ziyu Yao, Deng Cai, Huan Sun
This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i. e., previous turns of question-answer pairs).
1 code implementation • 21 Jun 2019 • Zhen Wang, Xiang Yue, Soheil Moosavinasab, Yungui Huang, Simon Lin, Huan Sun
To solve the problem, we propose a new framework SurfCon that leverages two important types of information in the privacy-aware clinical data, i. e., the surface form information, and the global context information for synonym discovery.
4 code implementations • 12 Jun 2019 • Xiang Yue, Zhen Wang, Jingong Huang, Srinivasan Parthasarathy, Soheil Moosavinasab, Yungui Huang, Simon M. Lin, Wen Zhang, Ping Zhang, Huan Sun
Our experimental results demonstrate that the recent graph embedding methods achieve promising results and deserve more attention in the future biomedical graph analysis.
1 code implementation • 13 Mar 2019 • Ziyu Yao, Jayavardhan Reddy Peddamail, Huan Sun
In this work, we investigate a novel perspective of Code annotation for Code retrieval (hence called `CoaCor'), where a code annotation model is trained to generate a natural language annotation that can represent the semantic meaning of a given code snippet and can be leveraged by a code retrieval model to better distinguish relevant code snippets from others.
1 code implementation • 21 Aug 2018 • Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, Huan Sun
Given a text description, most existing semantic parsers synthesize a program in one shot.
Hierarchical Reinforcement Learning reinforcement-learning +3
1 code implementation • 26 Mar 2018 • Ziyu Yao, Daniel S. Weld, Wei-Peng Chen, Huan Sun
In this paper, we investigate a new problem of systematically mining question-code pairs from Stack Overflow (in contrast to heuristically collecting them).
no code implementations • EMNLP 2017 • Jie Zhao, Yu Su, Ziyu Guan, Huan Sun
Given a question and a set of answer candidates, answer triggering determines whether the candidate set contains any correct answers.
2 code implementations • NAACL 2018 • Yu Su, Honglei Liu, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan
We study the problem of textual relation embedding with distant supervision.