1 code implementation • 29 May 2025 • Yuanxin Liu, Kun Ouyang, HaoNing Wu, Yi Liu, Lin Sui, Xinhao Li, Yan Zhong, Y. Charles, Xinyu Zhou, Xu sun
Recent studies have shown that long chain-of-thought (CoT) reasoning can significantly enhance the performance of large language models (LLMs) on complex tasks.
1 code implementation • 22 May 2025 • HaoNing Wu, Xiao Huang, Yaohui Chen, Ya zhang, Yanfeng Wang, Weidi Xie
Multimodal large language models (MLLMs) have achieved impressive success in question-answering tasks, yet their capabilities for spatial understanding are less explored.
no code implementations • 6 May 2025 • Jiayuan Rao, Zifeng Li, HaoNing Wu, Ya zhang, Yanfeng Wang, Weidi Xie
Recent advancements in AI-driven soccer understanding have demonstrated rapid progress, yet existing research predominantly focuses on isolated or narrow tasks.
1 code implementation • 10 Apr 2025 • Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin
We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).
1 code implementation • 2 Apr 2025 • Yiting Lu, Xin Li, HaoNing Wu, Bingchen Li, Weisi Lin, Zhibo Chen
To mitigate this, we propose a new paradigm for perception-oriented instruction tuning, i. e., Q-Adapt, which aims to eliminate the conflicts and achieve the synergy between these two EIQA tasks when adapting LMM, resulting in enhanced multi-faceted explanations of IQA.
2 code implementations • 2 Apr 2025 • Kun Ouyang, Yuanxin Liu, HaoNing Wu, Yi Liu, Hao Zhou, Jie zhou, Fandong Meng, Xu sun
Motivated by the success of Reinforcement Learning with Verifiable Reward (RLVR) in unlocking LLM reasoning abilities, this work aims to improve MLLMs in video spatial reasoning through the RLVR paradigm.
1 code implementation • CVPR 2025 • Chunyi Li, Yuan Tian, Xiaoyue Ling, ZiCheng Zhang, Haodong Duan, HaoNing Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, Weisi Lin, Guangtao Zhai
Considering the huge gap between human and machine visual systems, this paper proposes the topic: Image Quality Assessment for Machine Vision for the first time.
no code implementations • 12 Mar 2025 • Linli Yao, HaoNing Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu sun, Junnan Li
Despite recent advances in Video Large Language Models (VideoLLMs), effectively understanding long-form videos remains a significant challenge.
1 code implementation • 12 Mar 2025 • ZiCheng Zhang, HaoNing Wu, Ziheng Jia, Weisi Lin, Guangtao Zhai
With this joint learning framework and corresponding training strategy, we develop Q-SiT, the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.
no code implementations • 10 Mar 2025 • Yan Yang, Dongxu Li, HaoNing Wu, Bei Chen, Liu Liu, Liyuan Pan, Junnan Li
Solving expert-level multimodal tasks is a key milestone towards general intelligence.
no code implementations • CVPR 2025 • ZiCheng Zhang, Ziheng Jia, HaoNing Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai
With the rising interest in research on Large Multi-modal Models (LMMs) for video understanding, many studies have emphasized general video comprehension capabilities, neglecting the systematic exploration into video quality understanding.
1 code implementation • 4 Dec 2024 • HaoNing Wu, Ziheng Zhao, Ya zhang, Weidi Xie, Yanfeng Wang
Medical image segmentation has recently demonstrated impressive progress with deep neural networks, yet the heterogeneous modalities and scarcity of mask annotations limit the development of segmentation models on unannotated modalities.
1 code implementation • CVPR 2025 • Jiayuan Rao, HaoNing Wu, Hao Jiang, Ya zhang, Yanfeng Wang, Weidi Xie
As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world.
no code implementations • CVPR 2025 • Ziyang Luo, HaoNing Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li
To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles.
1 code implementation • 6 Nov 2024 • Ziheng Jia, ZiCheng Zhang, Jiaying Qian, HaoNing Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min
To address this gap, we introduce the VQA2 Instruction Dataset - the first visual question answering instruction dataset that focuses on video quality assessment.
1 code implementation • 8 Oct 2024 • Dongxu Li, Yudong Liu, HaoNing Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Fan Zhou, Chengen Huang, Yanpeng Li, Chongyan Zhu, Xiaoyi Ren, Chao Li, Yifan Ye, Peng Liu, Lihuan Zhang, Hanshu Yan, Guoyin Wang, Bei Chen, Junnan Li
Information comes in diverse modalities.
Ranked #11 on
Video Question Answering
on TVBench
1 code implementation • 7 Oct 2024 • Chunyi Li, Jianbo Zhang, ZiCheng Zhang, HaoNing Wu, Yuan Tian, Wei Sun, Guo Lu, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai
However, various corruptions in the real world mean that images will not be as ideal as in simulations, presenting significant challenges for the practical application of LMMs.
no code implementations • 30 Sep 2024 • ZiCheng Zhang, Ziheng Jia, HaoNing Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai
With the rising interest in research on Large Multi-modal Models (LMMs) for video understanding, many studies have emphasized general video comprehension capabilities, neglecting the systematic exploration into video quality understanding.
no code implementations • 15 Sep 2024 • Yinan Sun, ZiCheng Zhang, HaoNing Wu, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min
However, these models also exhibit hallucinations, which limit their reliability as AI systems, especially in tasks involving low-level visual perception and understanding.
2 code implementations • 10 Sep 2024 • King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang
However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.
1 code implementation • 20 Aug 2024 • HaoNing Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya zhang, Yanfeng Wang
Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.
1 code implementation • 24 Jul 2024 • Chaofeng Chen, Sensen Yang, HaoNing Wu, Liang Liao, ZiCheng Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content.
1 code implementation • 24 Jul 2024 • Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, HaoNing Wu, J. H. Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin
Moreover, we explored the ability of LVLMs to perceive image sequences within the context of our multi-image association task.
1 code implementation • 22 Jul 2024 • HaoNing Wu, Dongxu Li, Bei Chen, Junnan Li
In addition, our results indicate that model performance on the benchmark improves only when they are capable of processing more frames, positioning LongVideoBench as a valuable benchmark for evaluating future-generation long-context LMMs.
1 code implementation • 26 Jun 2024 • Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.
1 code implementation • 13 Jun 2024 • Chunyi Li, Xiele Wu, HaoNing Wu, Donghui Feng, ZiCheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged.
1 code implementation • 5 Jun 2024 • ZiCheng Zhang, HaoNing Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai
How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models.
1 code implementation • 29 May 2024 • Hanwei Zhu, HaoNing Wu, Yixuan Li, ZiCheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang
Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios.
1 code implementation • 14 May 2024 • Wei Sun, Weixia Zhang, Yanwei Jiang, HaoNing Wu, ZiCheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai
We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ.
Ranked #1 on
Face Image Quality Assessment
on PIQ23
1 code implementation • 14 May 2024 • Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai
Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos.
1 code implementation • 6 May 2024 • Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, ZiCheng Zhang, Chunyi Li, HaoNing Wu, Guangtao Zhai
Equally important to the VEC is the Video Quality Assessment (VQA).
1 code implementation • 29 Apr 2024 • Chunyi Li, HaoNing Wu, Hongkun Hao, ZiCheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai
Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly.
1 code implementation • 28 Apr 2024 • ZiCheng Zhang, HaoNing Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai
Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored.
no code implementations • 25 Apr 2024 • Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, HaoNing Wu, Yixuan Gao, Yuqin Cao, ZiCheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng, Jianquan Yang, Weigang Wang, Xi Fang, Xiaoxin Lv, Jun Yan, Tianwu Zhi, Yabin Zhang, Yaohui Li, Yang Li, Jingwen Xu, Jianzhao Liu, Yiting Liao, Junlin Li, Zihao Yu, Yiting Lu, Xin Li, Hossein Motamednia, S. Farhad Hosseini-Benvidi, Fengbin Guan, Ahmad Mahmoudi-Aznaveh, Azadeh Mansouri, Ganzorig Gankhuyag, Kihwan Yoon, Yifang Xu, Haotian Fan, Fangyuan Kong, Shiling Zhao, Weifeng Dong, Haibing Yin, Li Zhu, Zhiling Wang, Bingchen Huang, Avinab Saha, Sandeep Mishra, Shashank Gupta, Rajesh Sureddi, Oindrila Saha, Luigi Celona, Simone Bianco, Paolo Napoletano, Raimondo Schettini, Junfeng Yang, Jing Fu, Wei zhang, Wenzhi Cao, Limei Liu, Han Peng, Weijun Yuan, Zhan Li, Yihang Cheng, Yifan Deng, Haohui Li, Bowen Qu, Yao Li, Shuqing Luo, Shunzhou Wang, Wei Gao, Zihao Lu, Marcos V. Conde, Xinrui Wang, Zhibo Chen, Ruling Liao, Yan Ye, Qiulin Wang, Bing Li, Zhaokun Zhou, Miao Geng, Rui Chen, Xin Tao, Xiaoyu Liang, Shangkun Sun, Xingyuan Ma, Jiaze Li, Mengduo Yang, Haoran Xu, Jie zhou, Shiding Zhu, Bohan Yu, Pengfei Chen, Xinrui Xu, Jiabin Shen, Zhichao Duan, Erfan Asadi, Jiahe Liu, Qi Yan, Youran Qu, Xiaohui Zeng, Lele Wang, Renjie Liao
A total of 196 participants have registered in the video track.
1 code implementation • 24 Apr 2024 • Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, ZiCheng Zhang, HaoNing Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Wei Sun, Yuqin Cao, Yanwei Jiang, Jun Jia, Zhichao Zhang, Zijian Chen, Weixia Zhang, Xiongkuo Min, Steve Göring, Zihao Qi, Chen Feng
The performance of the top-5 submissions is reviewed and provided here as a survey of diverse deep models for efficient video quality assessment of user-generated content.
1 code implementation • 17 Apr 2024 • Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei LI, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Fangyuan Kong, Haotian Fan, Yifang Xu, Haoran Xu, Mengduo Yang, Jie zhou, Jiaze Li, Shijie Wen, Mai Xu, Da Li, Shunyu Yao, Jiazhi Du, WangMeng Zuo, Zhibo Li, Shuai He, Anlong Ming, Huiyuan Fu, Huadong Ma, Yong Wu, Fie Xue, Guozhi Zhao, Lina Du, Jie Guo, Yu Zhang, huimin zheng, JunHao Chen, Yue Liu, Dulan Zhou, Kele Xu, Qisheng Xu, Tao Sun, Zhixiang Ding, Yuhang Hu
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i. e., Kuaishou/Kwai Platform.
no code implementations • 4 Apr 2024 • Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, ZiCheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, HaoNing Wu, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai
With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media.
1 code implementation • 18 Mar 2024 • Tengchuan Kou, Xiaohong Liu, ZiCheng Zhang, Chunyi Li, HaoNing Wu, Xiongkuo Min, Guangtao Zhai, Ning Liu
Based on T2VQA-DB, we propose a novel transformer-based model for subjective-aligned Text-to-Video Quality Assessment (T2VQA).
no code implementations • 26 Feb 2024 • HaoNing Wu, Hanwei Zhu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin
Comparative settings (e. g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses.
2 code implementations • 26 Feb 2024 • Chunyi Li, Guo Lu, Donghui Feng, HaoNing Wu, ZiCheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang
With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic.
1 code implementation • 11 Feb 2024 • ZiCheng Zhang, HaoNing Wu, Erli Zhang, Guangtao Zhai, Weisi Lin
To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e. g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions.
1 code implementation • 16 Jan 2024 • Yipo Huang, Quan Yuan, Xiangfei Sheng, Zhichao Yang, HaoNing Wu, Pengfei Chen, Yuzhe Yang, Leida Li, Weisi Lin
An obvious obstacle lies in the absence of a specific benchmark to evaluate the effectiveness of MLLMs on aesthetic perception.
no code implementations • 15 Jan 2024 • Jay Zhangjie Wu, Guian Fang, HaoNing Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, YuChao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou
Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.
1 code implementation • 2 Jan 2024 • Chunyi Li, HaoNing Wu, ZiCheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai
With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge.
1 code implementation • CVPR 2024 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.
1 code implementation • CVPR 2024 • Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, HaoNing Wu, Qiong Yan, Weisi Lin
Further we propose a local distortion extractor to obtain local distortion features from the pretrained CNNs and a local distortion injector to inject the local distortion features into ViT.
1 code implementation • 28 Dec 2023 • HaoNing Wu, ZiCheng Zhang, Weixia Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Yixuan Gao, Annan Wang, Erli Zhang, Wenxiu Sun, Qiong Yan, Xiongkuo Min, Guangtao Zhai, Weisi Lin
The explosion of visual content available online underscores the requirement for an accurate machine assessor to robustly evaluate scores across diverse types of visual contents.
Ranked #1 on
Video Quality Assessment
on LIVE-FB LSVQ
no code implementations • 23 Dec 2023 • ZiCheng Zhang, HaoNing Wu, Zhongpeng Ji, Chunyi Li, Erli Zhang, Wei Sun, Xiaohong Liu, Xiongkuo Min, Fengyu Sun, Shangling Jui, Weisi Lin, Guangtao Zhai
Recent advancements in Multi-modality Large Language Models (MLLMs) have demonstrated remarkable capabilities in complex high-level vision tasks.
1 code implementation • 9 Dec 2023 • Zijian Chen, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang
In this paper, we take the first step to benchmark and assess the visual naturalness of AI-generated images.
1 code implementation • 9 Dec 2023 • Chaofeng Chen, Shangchen Zhou, Liang Liao, HaoNing Wu, Wenxiu Sun, Qiong Yan, Weisi Lin
Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network.
1 code implementation • 27 Nov 2023 • Chaofeng Chen, Annan Wang, HaoNing Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin
While fine-tuning the U-Net can partially improve performance, it remains suffering from the suboptimal text encoder.
2 code implementations • CVPR 2024 • HaoNing Wu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin
Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model.
1 code implementation • 25 Sep 2023 • HaoNing Wu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin
To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.
no code implementations • 23 Aug 2023 • Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, HaoNing Wu, Qiong Yan, Weisi Lin
Further, we propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.
1 code implementation • 6 Aug 2023 • Chaofeng Chen, Jiadi Mo, Jingwen Hou, HaoNing Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin
Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner.
Ranked #11 on
Video Quality Assessment
on MSU SR-QA Dataset
no code implementations • 19 Jul 2023 • Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu, Yusheng Zhang, Rongyu Zhang, Hang Shi, Qihang Xu, Longan Xiao, Zhiliang Ma, Mirko Agarla, Luigi Celona, Claudio Rota, Raimondo Schettini, Zhiwei Huang, Yanan Li, Xiaotao Wang, Lei Lei, Hongye Liu, Wei Hong, Ironhead Chuang, Allen Lin, Drake Guan, Iris Chen, Kae Lou, Willy Huang, Yachun Tasi, Yvonne Kao, Haotian Fan, Fangyuan Kong, Shiqi Zhou, Hao liu, Yu Lai, Shanshan Chen, Wenqi Wang, HaoNing Wu, Chaofeng Chen, Chunzheng Zhu, Zekun Guo, Shiling Zhao, Haibing Yin, Hongkui Wang, Hanene Brachemi Meftah, Sid Ahmed Fezza, Wassim Hamidouche, Olivier Déforges, Tengfei Shi, Azadeh Mansouri, Hossein Motamednia, Amir Hossein Bakhtiari, Ahmad Mahmoudi Aznaveh
61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions.
1 code implementation • 6 Jul 2023 • ZiCheng Zhang, Wei Sun, Yingjie Zhou, HaoNing Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans.
1 code implementation • 24 Jun 2023 • HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang
Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
1 code implementation • 7 Jun 2023 • Chunyi Li, ZiCheng Zhang, HaoNing Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
With the rapid advancements of the text-to-image generative model, AI-generated images (AGIs) have been widely applied to entertainment, education, social media, etc.
1 code implementation • 1 Jun 2023 • Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie
Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.
1 code implementation • 22 May 2023 • HaoNing Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e. g. sharpness of a video).
2 code implementations • 28 Apr 2023 • HaoNing Wu, Liang Liao, Annan Wang, Chaofeng Chen, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin
The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies.
2 code implementations • 26 Feb 2023 • HaoNing Wu, Liang Liao, Jingwen Hou, Chaofeng Chen, Erli Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Recent learning-based video quality assessment (VQA) algorithms are expensive to implement due to the cost of data collection of human quality opinions, and are less robust across various scenarios due to the biases of these opinions.
3 code implementations • ICCV 2023 • HaoNing Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
In light of this, we propose the Disentangled Objective Video Quality Evaluator (DOVER) to learn the quality of UGC videos based on the two perspectives.
Ranked #1 on
Video Quality Assessment
on YouTube-UGC
4 code implementations • 11 Oct 2022 • HaoNing Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Jinwei Gu, Weisi Lin
On the other hand, existing practices, such as resizing and cropping, will change the quality of original videos due to the loss of details and contents, and are therefore harmful to quality assessment.
Ranked #2 on
Video Quality Assessment
on KoNViD-1k
(using extra training data)
1 code implementation • 8 Jul 2022 • Liang Liao, Kangmin Xu, HaoNing Wu, Chaofeng Chen, Wenxiu Sun, Qiong Yan, Weisi Lin
Experiments show that the perceptual representation in the HVS is an effective way of predicting subjective temporal quality, and thus TPQI can, for the first time, achieve comparable performance to the spatial quality metric and be even more effective in assessing videos with large temporal variations.
4 code implementations • 6 Jul 2022 • HaoNing Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin
Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations.
Ranked #4 on
Video Quality Assessment
on LIVE-VQC
(using extra training data)
1 code implementation • 20 Jun 2022 • HaoNing Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin
Based on prominent time-series modeling ability of transformers, we propose a novel and effective transformer-based VQA method to tackle these two issues.
Ranked #6 on
Video Quality Assessment
on KoNViD-1k
1 code implementation • CVPR 2022 • Baisong Guo, Xiaoyun Zhang, HaoNing Wu, Yu Wang, Ya zhang, Yan-Feng Wang
Previous super-resolution (SR) approaches often formulate SR as a regression problem and pixel wise restoration, which leads to a blurry and unreal SR output.
no code implementations • 25 Sep 2020 • Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, WangMeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou, Zhipeng Luo, Yuehan Yao, Zhenyu Xu, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Tongtong Zhao, Shanshan Zhao, Yoseob Han, Byung-Hoon Kim, JaeHyun Baek, HaoNing Wu, Dejia Xu, Bo Zhou, Wei Guan, Xiaobo Li, Chen Ye, Hao Li, Yukai Shi, Zhijing Yang, Xiaojun Yang, Haoyu Zhong, Xin Li, Xin Jin, Yaojun Wu, Yingxue Pang, Sen Liu, Zhi-Song Liu, Li-Wen Wang, Chu-Tak Li, Marie-Paule Cani, Wan-Chi Siu, Yuanbo Zhou, Rao Muhammad Umer, Christian Micheloni, Xiaofeng Cong, Rajat Gupta, Keon-Hee Ahn, Jun-Hyuk Kim, Jun-Ho Choi, Jong-Seok Lee, Feras Almasri, Thomas Vandamme, Olivier Debeir
This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020.