no code implementations • 25 Apr 2025 • Yusen Zhang, Wenliang Zheng, Aashrith Madasu, Peng Shi, Ryo Kamoi, Hao Zhou, Zhuoyang Zou, Shu Zhao, Sarkar Snigdha Sarathi Das, Vipul Gupta, Xiaoxin Lu, Nan Zhang, Ranran Haoran Zhang, Avitej Iyer, Renze Lou, Wenpeng Yin, Rui Zhang
HRScene incorporates 25 real-world datasets and 2 synthetic diagnostic datasets with resolutions ranging from 1, 024 $\times$ 1, 024 to 35, 503 $\times$ 26, 627.
no code implementations • 29 Oct 2024 • Renze Lou, Hanzi Xu, Sijia Wang, Jiangshu Du, Ryo Kamoi, Xiaoxin Lu, Jian Xie, Yuxuan Sun, Yusen Zhang, Jihyun Janice Ahn, Hongchao Fang, Zhuoyang Zou, Wenchao Ma, Xi Li, Kai Zhang, Congying Xia, Lifu Huang, Wenpeng Yin
Numerous studies have assessed the proficiency of AI systems, particularly large language models (LLMs), in facilitating everyday tasks such as email writing, question answering, and creative content generation.
no code implementations • 13 Aug 2024 • Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei LI, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong
For instance, a group of open-source SWE agents, with a maximum individual resolve rate of 27. 3% on SWE-Bench Lite, can achieve a 34. 3% resolve rate with DEI, making a 25% improvement and beating most closed-source solutions.
1 code implementation • 24 Jun 2024 • Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo, Jing Gu, Haoran Li, Kangda Wei, ZiHao Wang, Lu Cheng, Surangika Ranathunga, Meng Fang, Jie Fu, Fei Liu, Ruihong Huang, Eduardo Blanco, Yixin Cao, Rui Zhang, Philip S. Yu, Wenpeng Yin
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability.
1 code implementation • 23 Jun 2024 • Hanzi Xu, Renze Lou, Jiangshu Du, Vahid Mahzoon, Elmira Talebianaraki, Zhuoan Zhou, Elizabeth Garrison, Slobodan Vucetic, Wenpeng Yin
We define this task as Classify-w/o-Gold and propose it as a new testbed for LLMs.
no code implementations • 10 Jun 2024 • Xi Li, Yusen Zhang, Renze Lou, Chen Wu, Jiaqi Wang
Large Language Models (LLMs), especially those accessed via APIs, have demonstrated impressive capabilities across various domains.
1 code implementation • 4 Apr 2024 • Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang
This work introduces ReaLMistake, the first error detection benchmark consisting of objective, realistic, and diverse errors made by LLMs.
2 code implementations • 2 Feb 2024 • Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su
Are these language agents capable of planning in more complex settings that are out of the reach of prior AI agents?
no code implementations • 31 Jan 2024 • Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin
Mathematical reasoning serves as a cornerstone for assessing the fundamental cognitive capabilities of human intelligence.
1 code implementation • 5 Jan 2024 • Lin Sun, Kai Zhang, Qingyuan Li, Renze Lou
Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases.
no code implementations • 5 Dec 2023 • Renze Lou, Kai Zhang, Jian Xie, Yuxuan Sun, Janice Ahn, Hanzi Xu, Yu Su, Wenpeng Yin
In the realm of large language models (LLMs), enhancing instruction-following capability often involves curating expansive training data.
1 code implementation • 4 Aug 2023 • Renze Lou, Wenpeng Yin
This work proposes a challenging yet more realistic setting for zero-shot cross-task generalization: zero-shot instruction following, presuming the existence of a paragraph-style task definition while no demonstrations exist.
1 code implementation • 22 May 2023 • Jian Xie, Kai Zhang, Jiangjie Chen, Renze Lou, Yu Su
By providing external information to large language models (LLMs), tool augmentation (including retrieval augmentation) has emerged as a promising solution for addressing the limitations of LLMs' static parametric memory.
1 code implementation • 18 Mar 2023 • Renze Lou, Kai Zhang, Wenpeng Yin
This survey paper tries to summarize and provide insights to the current research on instruction following, particularly, by answering the following questions: (i) What is task instruction, and what instruction types exist?
1 code implementation • 3 Mar 2023 • Xiaojie Gu, Renze Lou, Lin Sun, Shangxin Li
Conversational Causal Emotion Entailment (C2E2) is a task that aims at recognizing the causes corresponding to a target emotion in a conversation.
1 code implementation • 1 Jun 2022 • Yutong Wang, Renze Lou, Kai Zhang, MaoYan Chen, Yujiu Yang
To address these problems, in this work, we propose a novel learning framework named MORE (Metric learning-based Open Relation Extraction).
no code implementations • EMNLP 2021 • Weicheng Ma, Renze Lou, Kai Zhang, Lili Wang, Soroush Vosoughi
Compared to AUTOSEM, a strong baseline method, GradTS improves the performance of MT-DNN with a bert-base-cased backend model, from 0. 33% to 17. 93% on 8 natural language understanding (NLU) tasks in the GLUE benchmarks.
no code implementations • ACL 2021 • Weicheng Ma, Kai Zhang, Renze Lou, Lili Wang, Soroush Vosoughi
Through extensive experiments, we show that (1) pruning a number of attention heads in a multi-lingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments.