Search Results for author: Xiaoran Fan

Found 18 papers, 12 papers with code

Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations

1 code implementation19 Mar 2025 Shuo Li, Jiajun Sun, Guodong Zheng, Xiaoran Fan, Yujiong Shen, Yi Lu, Zhiheng Xi, Yuming Yang, Wenming Tan, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Recently, multimodal large language models (MLLMs) have demonstrated remarkable performance in visual-language tasks.

Distill Visual Chart Reasoning Ability from LLMs to MLLMs

2 code implementations24 Oct 2024 wei he, Zhiheng Xi, Wanxu Zhao, Xiaoran Fan, Yiwen Ding, Zifei Shan, Tao Gui, Qi Zhang, Xuanjing Huang

Specifically, we employ text-based synthesizing techniques to construct chart-plotting code and produce ReachQA, a dataset containing 3k reasoning-intensive charts and 20k Q&A pairs to enhance both recognition and reasoning abilities.

Multimodal Reasoning Visual Reasoning

Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs

no code implementations15 Oct 2024 Shuo Li, Tao Ji, Xiaoran Fan, Linsheng Lu, Leyi Yang, Yuming Yang, Zhiheng Xi, Rui Zheng, Yuran Wang, Xiaohui Zhao, Tao Gui, Qi Zhang, Xuanjing Huang

Our findings indicate that the ability to prevent sycophancy is predominantly observed in higher layers of the model.

Hallucination

Leveraging Foundation Models for Zero-Shot IoT Sensing

1 code implementation29 Jul 2024 Dinghao Xue, Xiaoran Fan, Tao Chen, Guohao Lan, Qun Song

To address this, zero-shot learning (ZSL) aims to classify data of unseen classes with the help of semantic information.

Data Augmentation Generalized Zero-Shot Learning

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

1 code implementation8 Feb 2024 Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, wei he, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models.

GSM8K reinforcement-learning +1

RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning

1 code implementation16 Jan 2024 Junjie Ye, Yilong Wu, Songyang Gao, Caishuang Huang, Sixian Li, Guanyu Li, Xiaoran Fan, Qi Zhang, Tao Gui, Xuanjing Huang

To bridge this gap, we introduce RoTBench, a multi-level benchmark for evaluating the robustness of LLMs in tool learning.

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

1 code implementation1 Jan 2024 Junjie Ye, Guanyu Li, Songyang Gao, Caishuang Huang, Yilong Wu, Sixian Li, Xiaoran Fan, Shihan Dou, Tao Ji, Qi Zhang, Tao Gui, Xuanjing Huang

Existing evaluations of tool learning primarily focus on validating the alignment of selected tools for large language models (LLMs) with expected outcomes.

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

1 code implementation15 Dec 2023 Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, ShiLiang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks.

Language Modelling Multi-Task Learning +1

RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

no code implementations17 Oct 2023 Enyu Zhou, Rui Zheng, Zhiheng Xi, Songyang Gao, Xiaoran Fan, Zichu Fei, Jingting Ye, Tao Gui, Qi Zhang, Xuanjing Huang

Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors.

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

2 code implementations7 Nov 2022 Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang, Yu Sun, Hua Wu

In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing.

Representation Learning Speech Representation Learning +4

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

no code implementations21 Jul 2022 Boyang xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations.

Action Recognition Video Classification +1

PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

no code implementations CVPR 2022 Zhijian Yang, Xiaoran Fan, Volkan Isler, Hyun Soo Park

Based on this insight, we introduce a time-invariant transfer function called pose kernel -- the impulse response of audio signals induced by the body pose.

regression

AuraSense: Robot Collision Avoidance by Full Surface Proximity Detection

no code implementations10 Aug 2021 Xiaoran Fan, Riley Simmons-Edler, Daewon Lee, Larry Jackel, Richard Howard, Daniel Lee

In this paper, we introduce the phenomenon of the Leaky Surface Wave (LSW), a novel sensing modality, and present AuraSense, a proximity detection system using the LSW.

Collision Avoidance

Cannot find the paper you are looking for? You can Submit a new open access paper.