1 code implementation • 10 Mar 2025 • Yingzhe Peng, Gongrui Zhang, Miaosen Zhang, Zhiyuan You, Jie Liu, Qipeng Zhu, Kai Yang, Xingzhong Xu, Xin Geng, Xu Yang
Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment.
no code implementations • CVPR 2025 • Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, Chao Dong
This loss captures intra-dataset relationships, facilitating co-training across multiple IQA datasets.
no code implementations • CVPR 2025 • Zixuan Chen, Yujin Wang, Xin Cai, Zhiyuan You, Zheming Lu, Fan Zhang, Shi Guo, Tianfan Xue
In this work, we propose UltraFusion, the first exposure fusion technique that can merge input with 9 stops differences.
1 code implementation • 22 Dec 2024 • Jinyu Zhang, Zhiyuan You, Jize Wang, Xinyi Le
Nonetheless, training-free methods for DIE encounter two primary challenges: (1) understanding the complex relationship between layout and textual elements in VRDs, and (2) providing accurate guidance to pre-trained models.
1 code implementation • 23 Oct 2024 • Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong
Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations.
no code implementations • 26 Sep 2024 • Xin Cai, Zhiyuan You, Hailong Zhang, Wentao Liu, Jinwei Gu, Tianfan Xue
By conditioning on the low-frequency content retrieved in the first stage, the diffusion model effectively reconstructs the high-frequency details that are typically lost in the lensless imaging process, while also maintaining image fidelity.
1 code implementation • 29 Jul 2024 • JinFan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong
Based on the causal effect theory, the proposed diagnostic tool can refresh our common knowledge and bring a deeper understanding of low-level vision models.
1 code implementation • 29 May 2024 • Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue
We introduce a ground-truth-informed dataset construction approach to enhance data quality, and scale up the dataset to 495K under the brief-detail joint framework.
2 code implementations • 14 Dec 2023 • Zhiyuan You, Zheyuan Li, Jinjin Gu, Zhenfei Yin, Tianfan Xue, Chao Dong
To build the DepictQA model, we establish a hierarchical task framework, and collect a multi-modal IQA training dataset.
no code implementations • 5 Sep 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Lei Cui, Yu Zheng, Xinyi Le
Second, CNN tends to reconstruct both normal samples and anomalies well, making them still hard to distinguish.
1 code implementation • 8 Jun 2022 • Zhiyuan You, Lei Cui, Yujun Shen, Kai Yang, Xin Lu, Yu Zheng, Xinyi Le
For example, when learning a unified model for 15 categories in MVTec-AD, we surpass the second competitor on the tasks of both anomaly detection (from 88. 1% to 96. 5%) and anomaly localization (from 89. 5% to 96. 8%).
Ranked #13 on
Multi-class Anomaly Detection
on MVTec AD
1 code implementation • 22 Jan 2022 • Zhiyuan You, Kai Yang, Wenhan Luo, Xin Lu, Lei Cui, Xinyi Le
This work studies the problem of few-shot object counting, which counts the number of exemplar objects (i. e., described by one or several support images) occurring in the query image.
Ranked #2 on
Object Counting
on CARPK