no code implementations • 16 Apr 2025 • Siyan Zhao, Devaansh Gupta, Qinqing Zheng, Aditya Grover
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL).
1 code implementation • 13 Feb 2025 • Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, Kaixiang Lin
We introduce PrefEval, a benchmark for evaluating LLMs' ability to infer, memorize and adhere to user preferences in a long-context conversational setting.
1 code implementation • 17 Dec 2024 • Hritik Bansal, Daniel Israel, Siyan Zhao, Shufan Li, Tung Nguyen, Aditya Grover
To address these gaps, we present MedMax, the first large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models.
no code implementations • 22 Oct 2024 • Zhenyuan Yang, Zhengliang Liu, Jing Zhang, Cen Lu, Jiaxin Tai, Tianyang Zhong, Yiwei Li, Siyan Zhao, Teng Yao, Qing Liu, Jinlin Yang, Qixin Liu, Zhaowei Li, Kexin Wang, Longjun Ma, Dajiang Zhu, Yudan Ren, Bao Ge, Wei zhang, Ning Qiang, Tuo Zhang, Tianming Liu
This study examines the capabilities of advanced Large Language Models (LLMs), particularly the o1 model, in the context of literary analysis.
no code implementations • 15 Oct 2024 • Eric Hanchen Jiang, Zhi Zhang, Dinghuai Zhang, Andrew Lizarraga, Chenheng Xu, Yasi Zhang, Siyan Zhao, Zhengjie Xu, Peiyu Yu, Yuer Tang, Deqian Kong, Ying Nian Wu
Advancements in reinforcement learning have led to the development of sophisticated models capable of learning complex decision-making tasks.
1 code implementation • 17 Jun 2024 • Siyan Zhao, Tung Nguyen, Aditya Grover
In-context learning is a key paradigm in large language models (LLMs) that enables them to generalize to new tasks and domains by simply prompting these models with a few exemplars without explicit parameter updates.
1 code implementation • 15 Apr 2024 • Siyan Zhao, Daniel Israel, Guy Van Den Broeck, Aditya Grover
In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length.
1 code implementation • 17 Oct 2023 • Siyan Zhao, John Dang, Aditya Grover
We introduce Group Preference Optimization (GPO), an alignment framework that steers language models to preferences of individual groups in a few-shot manner.