no code implementations • 10 Feb 2025 • Xuehang Guo, Xingyao Wang, Yangyi Chen, Sha Li, Chi Han, Manling Li, Heng Ji
Besides substantial performance gaps among agents (from Llama-3. 1 agent <= 3. 33% to Claude-3. 5-Sonnet >= 28. 18%), their consistently low collaboration willingness (<= 4. 86%) suggests fundamental limitations of existing LLM in CSE.
1 code implementation • 8 Jan 2025 • Run Luo, Ting-En Lin, Haonan Zhang, Yuchuan Wu, Xiong Liu, Min Yang, Yongbin Li, Longze Chen, Jiaming Li, Lei Zhang, Yangyi Chen, Hamid Alinejad-Rokny, Fei Huang
In the alignment phase, a pre-trained speech model is further trained on text-image tasks to generalize from vision to speech in a (near) zero-shot manner, outperforming models trained on tri-modal datasets.
no code implementations • 11 Oct 2024 • Yangyi Chen, Binxuan Huang, Yifan Gao, Zhengyang Wang, Jingfeng Yang, Heng Ji
Our two-stage approach consists of first estimating a function that maps computational resources (e. g., FLOPs) to the pre-training Loss using a series of sampling models, followed by mapping the pre-training loss to downstream task Performance after the critical "emergent phase".
1 code implementation • 8 Jul 2024 • Yangyi Chen, Xingyao Wang, Hao Peng, Heng Ji
We present SOLO, a single transformer for Scalable visiOn-Language mOdeling.
1 code implementation • 31 May 2024 • Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao
Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications.
2 code implementations • 1 Feb 2024 • Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji
LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e. g., the scope of pre-defined tools) and restricted flexibility (e. g., inability to compose multiple tools).
1 code implementation • 22 Nov 2023 • Yangyi Chen, Xingyao Wang, Manling Li, Derek Hoiem, Heng Ji
We adopt a weakly-supervised approach to directly generate visual event structures from captions for ViStruct training, capitalizing on abundant image-caption pairs from the web.
no code implementations • CVPR 2024 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences.
1 code implementation • 16 Nov 2023 • Hanning Zhang, Shizhe Diao, Yong Lin, Yi R. Fung, Qing Lian, Xingyao Wang, Yangyi Chen, Heng Ji, Tong Zhang
This approach is formalized by first identifying the disparity in knowledge encompassed by pre-trained parameters compared to that of instruction tuning data.
1 code implementation • 16 Nov 2023 • Genglin Liu, Xingyao Wang, Lifan Yuan, Yangyi Chen, Hao Peng
Can large language models (LLMs) express their uncertainty in situations where they lack sufficient parametric knowledge to generate reasonable responses?
1 code implementation • 29 Sep 2023 • Lifan Yuan, Yangyi Chen, Xingyao Wang, Yi R. Fung, Hao Peng, Heng Ji
It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks.
1 code implementation • 19 Sep 2023 • Xingyao Wang, Zihan Wang, Jiateng Liu, Yangyi Chen, Lifan Yuan, Hao Peng, Heng Ji
However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users.
1 code implementation • 8 Sep 2023 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.
1 code implementation • 21 Jul 2023 • Yangyi Chen, Xingyao Wang, Heng Ji
In this work, we consider the practical scenario that we need to effectively utilize training samples to make PLMs both task-solvers and self-calibrators.
1 code implementation • 7 Jun 2023 • Lifan Yuan, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Fangyuan Zou, Xingyi Cheng, Heng Ji, Zhiyuan Liu, Maosong Sun
Then we introduce BOSS, a Benchmark suite for Out-of-distribution robustneSS evaluation covering 5 tasks and 20 datasets.
1 code implementation • 29 May 2023 • Yangyi Chen, Hongcheng Gao, Ganqu Cui, Lifan Yuan, Dehan Kong, Hanlu Wu, Ning Shi, Bo Yuan, Longtao Huang, Hui Xue, Zhiyuan Liu, Maosong Sun, Heng Ji
In our experiments, we conduct a robustness evaluation of RoBERTa models to demonstrate the effectiveness of our evaluation framework, and further show the rationality of each component in the framework.
2 code implementations • 31 Oct 2022 • Yangyi Chen, Lifan Yuan, Ganqu Cui, Zhiyuan Liu, Heng Ji
We observe a consistent change in calibration performance across six factors.
2 code implementations • 19 Oct 2022 • Yangyi Chen, Hongcheng Gao, Ganqu Cui, Fanchao Qi, Longtao Huang, Zhiyuan Liu, Maosong Sun
We discuss the deficiencies in previous work and propose our suggestions that the research on the Security-oriented adversarial NLP (SoadNLP) should: (1) evaluate their methods on security tasks to demonstrate the real-world concerns; (2) consider real-world attackers' goals, instead of developing impractical methods.
1 code implementation • 17 Jun 2022 • Ganqu Cui, Lifan Yuan, Bingxiang He, Yangyi Chen, Zhiyuan Liu, Maosong Sun
However, we highlight two issues in previous backdoor learning evaluations: (1) The differences between real-world scenarios (e. g. releasing poisoned datasets or models) are neglected, and we argue that each scenario has its own constraints and concerns, thus requires specific evaluation protocols; (2) The evaluation metrics only consider whether the attacks could flip the models' predictions on poisoned samples and retain performances on benign samples, but ignore that poisoned samples should also be stealthy and semantic-preserving.
1 code implementation • Findings (NAACL) 2022 • Lei Xu, Yangyi Chen, Ganqu Cui, Hongcheng Gao, Zhiyuan Liu
Prompt-based learning paradigm bridges the gap between pre-training and fine-tuning, and works effectively under the few-shot setting.
1 code implementation • 28 Oct 2021 • Lifan Yuan, Yichi Zhang, Yangyi Chen, Wei Wei
In this paper, we instantiate our framework with an attack algorithm named Textual Projected Gradient Descent (T-PGD).
1 code implementation • 15 Oct 2021 • Yangyi Chen, Fanchao Qi, Hongcheng Gao, Zhiyuan Liu, Maosong Sun
In this paper, we find two simple tricks that can make existing textual backdoor attacks much more harmful.
1 code implementation • EMNLP 2021 • Fanchao Qi, Yangyi Chen, Xurui Zhang, Mukai Li, Zhiyuan Liu, Maosong Sun
In this paper, we make the first attempt to conduct adversarial and backdoor attacks based on text style transfer, which is aimed at altering the style of a sentence while preserving its meaning.
1 code implementation • EMNLP 2021 • Yangyi Chen, Jin Su, Wei Wei
Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times.
2 code implementations • ACL 2021 • Fanchao Qi, Mukai Li, Yangyi Chen, Zhengyan Zhang, Zhiyuan Liu, Yasheng Wang, Maosong Sun
As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort.
1 code implementation • Findings (ACL) 2021 • Fanchao Qi, Yangyi Chen, Fengyu Wang, Zhiyuan Liu, Xiao Chen, Maosong Sun
We use this method to build an English SKB and a French SKB, and conduct comprehensive evaluations from both intrinsic and extrinsic perspectives.
2 code implementations • EMNLP 2021 • Fanchao Qi, Yangyi Chen, Mukai Li, Yuan YAO, Zhiyuan Liu, Maosong Sun
Nevertheless, there are few studies on defending against textual backdoor attacks.