1 code implementation • 21 May 2025 • Tong Zheng, Lichang Chen, Simeng Han, R. Thomas McCoy, Heng Huang
To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning.
no code implementations • 16 May 2025 • Simeng Han, Stephen Xia, Grant Zhang, Howard Dai, Chen Liu, Lichang Chen, Hoang Huy Nguyen, Hongyuan Mei, Jiayuan Mao, R. Thomas McCoy
Accuracy remains a standard metric for evaluating AI systems, but it offers limited insight into how models arrive at their solutions.
1 code implementation • 26 Feb 2025 • Wei Xiong, Hanning Zhang, Chenlu Ye, Lichang Chen, Nan Jiang, Tong Zhang
We study self-rewarding reasoning large language models (LLMs), which can simultaneously generate step-by-step reasoning and evaluate the correctness of their outputs during the inference time-without external feedback.
no code implementations • 16 Oct 2024 • Lichang Chen, Hexiang Hu, Mingda Zhang, YiWen Chen, Zifeng Wang, Yandong Li, Pranav Shyam, Tianyi Zhou, Heng Huang, Ming-Hsuan Yang, Boqing Gong
To address this, OmnixR offers two evaluation variants: (1)synthetic subset: a synthetic dataset generated automatically by translating text into multiple modalities--audio, images, video, and hybrids (Omnify).
no code implementations • 20 Sep 2024 • Tianqi Liu, Wei Xiong, Jie Ren, Lichang Chen, Junru Wu, Rishabh Joshi, Yang Gao, Jiaming Shen, Zhen Qin, Tianhe Yu, Daniel Sohn, Anastasiia Makarova, Jeremiah Liu, YuAn Liu, Bilal Piot, Abe Ittycheriah, Aviral Kumar, Mohammad Saleh
Our RRM improves the performance of a pairwise reward model trained on Gemma-2-9b-it, on RewardBench, increasing accuracy from 80. 61% to 84. 15%.
no code implementations • 18 Sep 2024 • Xuanchang Zhang, Wei Xiong, Lichang Chen, Tianyi Zhou, Heng Huang, Tong Zhang
In this work, we extend the study of biases in preference learning beyond the commonly recognized length bias, offering a comprehensive analysis of a wider range of format biases.
no code implementations • 11 Jun 2024 • Lichang Chen, Jiuhai Chen, Chenxi Liu, John Kirchenbauer, Davit Soselia, Chen Zhu, Tom Goldstein, Tianyi Zhou, Heng Huang
In this paper, we propose a more efficient data exploration strategy for online preference tuning (OPTune), which does not rely on human-curated or pre-collected teacher responses but dynamically samples informative responses for on-policy preference alignment.
no code implementations • 30 May 2024 • Xinlu Zhang, Zhiyu Zoey Chen, Xi Ye, Xianjun Yang, Lichang Chen, William Yang Wang, Linda Ruth Petzold
First, coding data tuning enhances the overall reasoning capabilities of LLMs across different model families and scales.
no code implementations • CVPR 2024 • Tianyu Luan, Zhong Li, Lele Chen, Xuan Gong, Lichang Chen, Yi Xu, Junsong Yuan
Then, we calculate the Area Under the Curve (AUC) difference between the two spectrums, so that each frequency band that captures either the overall or detailed shape is equitably considered.
1 code implementation • 19 Feb 2024 • Ruibo Chen, Yihan Wu, Lichang Chen, Guodong Liu, Qi He, Tianyi Xiong, Chenxi Liu, Junfeng Guo, Heng Huang
In the first stage, we devise a scoring network to evaluate the difficulty of training instructions, which is co-trained with the VLM.
1 code implementation • 16 Feb 2024 • Ming Li, Jiuhai Chen, Lichang Chen, Tianyi Zhou
To examine DEBATUNE, we curate the largest dataset of debate topics so far, which covers 710 controversial topics and corresponding arguments for each topic.
2 code implementations • 15 Feb 2024 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou
This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data.
1 code implementation • 11 Feb 2024 • Lichang Chen, Chen Zhu, Davit Soselia, Jiuhai Chen, Tianyi Zhou, Tom Goldstein, Heng Huang, Mohammad Shoeybi, Bryan Catanzaro
In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs.
no code implementations • 27 Oct 2023 • Ruibo Chen, Tianyi Xiong, Yihan Wu, Guodong Liu, Zhengmian Hu, Lichang Chen, Yanshuo Chen, Chenxi Liu, Heng Huang
This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes.
9 code implementations • CVPR 2024 • Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou
Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.
Ranked #1 on
Visual Question Answering (VQA)
on HallusionBench
1 code implementation • 23 Oct 2023 • Xinlu Zhang, Chenxin Tian, Xianjun Yang, Lichang Chen, Zekun Li, Linda Ruth Petzold
Instruction-finetuning (IFT) has become crucial in aligning Large Language Models (LLMs) with diverse human needs and has shown great potential in medical applications.
2 code implementations • 18 Oct 2023 • Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou
Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation.
3 code implementations • 23 Aug 2023 • Ming Li, Yong Zhang, Zhitao Li, Jiuhai Chen, Lichang Chen, Ning Cheng, Jianzong Wang, Tianyi Zhou, Jing Xiao
In the realm of Large Language Models (LLMs), the balance between instruction data quality and quantity is a focal point.
1 code implementation • 31 Jul 2023 • Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, Hongxia Jin
To demonstrate the threat, we propose a simple method to perform VPI by poisoning the model's instruction tuning data, which proves highly effective in steering the LLM.
3 code implementations • 17 Jul 2023 • Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin
Large language models (LLMs) strengthen instruction-following capability through instruction-finetuning (IFT) on supervised instruction/response data.
2 code implementations • 5 Jun 2023 • Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, Tianyi Zhou
Large language models~(LLMs) are instruction followers, but it can be challenging to find the best instruction for different situations, especially for black-box LLMs on which backpropagation is forbidden.
1 code implementation • 23 May 2023 • Wentao Bao, Lichang Chen, Heng Huang, Yu Kong
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional visual concepts, e. g., sliced tomatoes, where the model is learned only from the seen compositions, e. g., sliced potatoes and red tomatoes.
no code implementations • 3 May 2023 • Lichang Chen, Minhao Cheng, Heng Huang
Backdoor learning has become an emerging research area towards building a trustworthy machine learning system.
no code implementations • 3 May 2023 • Lichang Chen, Heng Huang, Minhao Cheng
To address this critical problem, we first investigate and find that the loss landscape of vanilla prompt tuning is precipitous when it is visualized, where a slight change of input data can cause a big fluctuation in the loss landscape.
no code implementations • 12 Apr 2023 • Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules.
no code implementations • 6 Apr 2023 • Jiuhai Chen, Lichang Chen, Heng Huang, Tianyi Zhou
However, it is not clear whether CoT is still effective on more recent instruction finetuned (IFT) LLMs such as ChatGPT.
no code implementations • 14 Mar 2023 • Jiuhai Chen, Lichang Chen, Chen Zhu, Tianyi Zhou
Moreover, ICL (with and w/o CoT) using only one correct demo significantly outperforms all-demo ICL adopted by most previous works, indicating the weakness of LLMs in finding correct demo(s) for input queries, which is difficult to evaluate on the biased datasets.
no code implementations • 9 Jul 2021 • Yiqun Lin, Lichang Chen, Haibin Huang, Chongyang Ma, Xiaoguang Han, Shuguang Cui
Sampling, grouping, and aggregation are three important components in the multi-scale analysis of point clouds.
no code implementations • ECCV 2020 • Lichang Chen, Guosheng Lin, Shijie Wang, Qingyao Wu
Scene Graph, as a vital tool to bridge the gap between language domain and image domain, has been widely adopted in the cross-modality task like VQA.