1 code implementation • 17 Apr 2025 • Mehmet Hamza Erol, Batu El, Mirac Suzgun, Mert Yuksekgonul, James Zou
We find that innovations in lightweight, large, and reasoning models have been essential for pushing the frontier in basic quantitative, knowledge-intensive, and complex quantitative tasks, respectively.
1 code implementation • 13 Apr 2025 • Nitya Thakkar, Mert Yuksekgonul, Jake Silberg, Animesh Garg, Nanyun Peng, Fei Sha, Rose Yu, Carl Vondrick, James Zou
The results show that 27% of reviewers who received feedback updated their reviews, and over 12, 000 feedback suggestions from the agent were incorporated by those reviewers.
1 code implementation • 10 Apr 2025 • Mirac Suzgun, Mert Yuksekgonul, Federico Bianchi, Dan Jurafsky, James Zou
Overall, our findings present DC as a promising approach for augmenting LMs with persistent memory, bridging the divide between isolated inference events and the cumulative, experience-driven learning characteristic of human cognition.
no code implementations • 24 Feb 2025 • Bariscan Kurtkaya, Fatih Dinc, Mert Yuksekgonul, Marta Blanco-Pozo, Ege Cirakman, Mark Schnitzer, Yucel Yemez, Hidenori Tanaka, Peng Yuan, Nina Miolane
Short-term memory is essential for cognitive processing, yet our understanding of its neural mechanisms remains unclear.
1 code implementation • 7 Feb 2025 • Wanjia Zhao, Mert Yuksekgonul, Shirley Wu, James Zou
A key challenge in optimizing multi-agent systems is acquiring suitable training data for specialized agents.
no code implementations • 4 Jan 2025 • Fatih Dinc, Ege Cirakman, Yiqi Jiang, Mert Yuksekgonul, Mark J. Schnitzer, Hidenori Tanaka
\emph{Abrupt learning} is commonly observed in neural networks, where long plateaus in network performance are followed by rapid convergence to a desirable solution.
2 code implementations • 11 Jun 2024 • Mert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang, Carlos Guestrin, James Zou
Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity.
Ranked #5 on
on GPQA
1 code implementation • 8 Feb 2024 • Federico Bianchi, Patrick John Chia, Mert Yuksekgonul, Jacopo Tagliabue, Dan Jurafsky, James Zou
We develop NegotiationArena: a flexible framework for evaluating and probing the negotiation abilities of LLM agents.
no code implementations • 10 Nov 2023 • Angela Zhang, Mert Yuksekgonul, Joshua Guild, James Zou, Joseph C. Wu
One early application has been to medicine, where LLMs have been investigated to streamline clinical workflows and facilitate clinical analysis and decision-making.
1 code implementation • 24 Oct 2023 • Marah I Abdin, Suriya Gunasekar, Varun Chandrasekaran, Jerry Li, Mert Yuksekgonul, Rahee Ghosh Peshawaria, Ranjita Naik, Besmira Nushi
Motivated by rising concerns around factual incorrectness and hallucinations of LLMs, we present KITAB, a new dataset for measuring constraint satisfaction abilities of language models.
no code implementations • 11 Oct 2023 • Ranjita Naik, Varun Chandrasekaran, Mert Yuksekgonul, Hamid Palangi, Besmira Nushi
Large language models (LLMs) are documented to struggle in settings that require complex reasoning.
1 code implementation • 26 Sep 2023 • Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones, Suriya Gunasekar, Ranjita Naik, Hamid Palangi, Ece Kamar, Besmira Nushi
We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text.
1 code implementation • 1 May 2023 • Shirley Wu, Mert Yuksekgonul, Linjun Zhang, James Zou
Deep neural networks often rely on spurious correlations to make predictions, which hinders generalization beyond training environments.
2 code implementations • 6 Apr 2023 • Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou
In this study, we evaluate the performance of several widely-used GPT detectors using writing samples from native and non-native English writers.
1 code implementation • bioRxiv 2023 • Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas Montine, James Zou
This is the largest public dataset for pathology images annotated with natural text.
no code implementations • 1 Feb 2023 • Roxana Daneshjou, Mert Yuksekgonul, Zhuo Ran Cai, Roberto Novoa, James Zou
To provide a medical dataset densely annotated by domain experts with annotations useful across multiple disease processes, we developed SkinCon: a skin disease dataset densely annotated by dermatologists.
3 code implementations • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
1 code implementation • 4 Oct 2022 • Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou
ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO & Flickr30k-Order, to test for order sensitivity.
no code implementations • 31 May 2022 • Mert Yuksekgonul, Maggie Wang, James Zou
When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models.
1 code implementation • 24 Jun 2021 • Abubakar Abid, Mert Yuksekgonul, James Zou
Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases.
1 code implementation • ICML UDL 2020 • Alexander Mathis, Thomas Biasi, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, Mackenzie Weygandt Mathis
Neural networks are highly effective tools for pose estimation.
1 code implementation • 2 Oct 2019 • Mert Yuksekgonul, Ozgur Emre Sivrikaya, Mustafa Gokce Baydogan
In this work, we propose a simple model that provides permutation invariant maximally predictive prototype generator from a given dataset, which leads to interpretability of the solution and concrete insights to the nature and the solution of a problem.