Search Results for author: Yichen Guo

Found 8 papers, 4 papers with code

Mitigating Hallucinations via Inter-Layer Consistency Aggregation in Large Vision-Language Models

no code implementations18 May 2025 Kai Tang, Jinhao You, Xiuqi Ge, Hanze Li, Yichen Guo, Xiande Huang

Despite the impressive capabilities of Large Vision-Language Models (LVLMs), they remain susceptible to hallucinations-generating content that is inconsistent with the input image.

Hallucination MME

STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference

no code implementations18 May 2025 Yichen Guo, Hanze Li, Zonghao Zhang, Jinhao You, Kai Tang, Xiande Huang

Although large vision-language models (LVLMs) leverage rich visual token representations to achieve strong performance on multimodal tasks, these tokens also introduce significant computational overhead during inference.

Token Reduction

DINN360: Deformable Invertible Neural Network for Latitude-Aware 360deg Image Rescaling

1 code implementation CVPR 2023 Yichen Guo, Mai Xu, Lai Jiang, Leonid Sigal, Yunjin Chen

To alleviate this issue, we propose the first attempt at 360deg image rescaling, which refers to downscaling a 360deg image to a visually valid low-resolution (LR) counterpart and then upscaling to a high-resolution (HR) 360deg image given the LR variant.

Image Rescaling valid

DAQE: Enhancing the Quality of Compressed Images by Exploiting the Inherent Characteristic of Defocus

1 code implementation20 Nov 2022 Qunliang Xing, Mai Xu, Xin Deng, Yichen Guo

Image defocus is inherent in the physics of image formation caused by the optical aberration of lenses, providing plentiful information on image quality.

Does Text Attract Attention on E-Commerce Images: A Novel Saliency Prediction Dataset and Method

1 code implementation CVPR 2022 Lai Jiang, Yifei Li, Shengxi Li, Mai Xu, Se Lei, Yichen Guo, Bo Huang

E-commerce images are playing a central role in attracting people's attention when retailing and shopping online, and an accurate attention prediction is of significant importance for both customers and retailers, where its research is yet to start.

Multi-Task Learning Saliency Prediction +1

Blind VQA on 360° Video via Progressively Learning from Pixels, Frames and Video

1 code implementation18 Nov 2021 Li Yang, Mai Xu, Shengxi Li, Yichen Guo, Zulin Wang

When assessing the quality of 360{\textdegree} video, human tends to perceive its quality degradation from the viewport-based spatial distortion of each spherical frame to motion artifact across adjacent frames, ending with the video-level quality score, i. e., a progressive quality assessment paradigm.

Visual Question Answering (VQA)

LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-Resolution

no code implementations CVPR 2021 Xin Deng, Hao Wang, Mai Xu, Yichen Guo, Yuhang Song, Li Yang

In addition, we propose a deep reinforcement learning scheme with a latitude adaptive reward, in order to automatically select optimal upscaling factors for different latitude bands.

Deep Reinforcement Learning Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.