Search Results for author: HaoNing Wu

Found 70 papers, 55 papers with code

VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?

1 code implementation29 May 2025 Yuanxin Liu, Kun Ouyang, HaoNing Wu, Yi Liu, Lin Sui, Xinhao Li, Yan Zhong, Y. Charles, Xinyu Zhou, Xu sun

Recent studies have shown that long chain-of-thought (CoT) reasoning can significantly enhance the performance of large language models (LLMs) on complex tasks.

Video Understanding

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

1 code implementation22 May 2025 HaoNing Wu, Xiao Huang, Yaohui Chen, Ya zhang, Yanfeng Wang, Weidi Xie

Multimodal large language models (MLLMs) have achieved impressive success in question-answering tasks, yet their capabilities for spatial understanding are less explored.

Motion Estimation Question Answering +1

Multi-Agent System for Comprehensive Soccer Understanding

no code implementations6 May 2025 Jiayuan Rao, Zifeng Li, HaoNing Wu, Ya zhang, Yanfeng Wang, Weidi Xie

Recent advancements in AI-driven soccer understanding have demonstrated rapid progress, yet existing research predominantly focuses on isolated or narrow tasks.

Kimi-VL Technical Report

1 code implementation10 Apr 2025 Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, HaoNing Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen, Zongyu Lin

We present Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities - all while activating only 2. 8B parameters in its language decoder (Kimi-VL-A3B).

Long-Context Understanding Mathematical Reasoning +4

Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning

1 code implementation2 Apr 2025 Yiting Lu, Xin Li, HaoNing Wu, Bingchen Li, Weisi Lin, Zhibo Chen

To mitigate this, we propose a new paradigm for perception-oriented instruction tuning, i. e., Q-Adapt, which aims to eliminate the conflicts and achieve the synergy between these two EIQA tasks when adapting LMM, resulting in enhanced multi-faceted explanations of IQA.

Attribute Image Quality Assessment +2

SpaceR: Reinforcing MLLMs in Video Spatial Reasoning

2 code implementations2 Apr 2025 Kun Ouyang, Yuanxin Liu, HaoNing Wu, Yi Liu, Hao Zhou, Jie zhou, Fandong Meng, Xu sun

Motivated by the success of Reinforcement Learning with Verifiable Reward (RLVR) in unlocking LLM reasoning abilities, this work aims to improve MLLMs in video spatial reasoning through the RLVR paradigm.

MME Spatial Reasoning +2

Image Quality Assessment: From Human to Machine Preference

1 code implementation CVPR 2025 Chunyi Li, Yuan Tian, Xiaoyue Ling, ZiCheng Zhang, Haodong Duan, HaoNing Wu, Ziheng Jia, Xiaohong Liu, Xiongkuo Min, Guo Lu, Weisi Lin, Guangtao Zhai

Considering the huge gap between human and machine visual systems, this paper proposes the topic: Image Quality Assessment for Machine Vision for the first time.

Image Quality Assessment

Generative Frame Sampler for Long Video Understanding

no code implementations12 Mar 2025 Linli Yao, HaoNing Wu, Kun Ouyang, Yuanxing Zhang, Caiming Xiong, Bei Chen, Xu sun, Junnan Li

Despite recent advances in Video Large Language Models (VideoLLMs), effectively understanding long-form videos remains a significant challenge.

Video Understanding

Teaching LMMs for Image Quality Scoring and Interpreting

1 code implementation12 Mar 2025 ZiCheng Zhang, HaoNing Wu, Ziheng Jia, Weisi Lin, Guangtao Zhai

With this joint learning framework and corresponding training strategy, we develop Q-SiT, the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.

Descriptive Image Quality Assessment +2

Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs

no code implementations CVPR 2025 ZiCheng Zhang, Ziheng Jia, HaoNing Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

With the rising interest in research on Large Multi-modal Models (LMMs) for video understanding, many studies have emphasized general video comprehension capabilities, neglecting the systematic exploration into video quality understanding.

Multiple-choice Video Generation +1

MRGen: Diffusion-based Controllable Data Engine for MRI Segmentation towards Unannotated Modalities

1 code implementation4 Dec 2024 HaoNing Wu, Ziheng Zhao, Ya zhang, Weidi Xie, Yanfeng Wang

Medical image segmentation has recently demonstrated impressive progress with deep neural networks, yet the heterogeneous modalities and scarcity of mask annotations limit the development of segmentation models on unannotated modalities.

Image Generation Image Segmentation +4

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

no code implementations CVPR 2025 Ziyang Luo, HaoNing Wu, Dongxu Li, Jing Ma, Mohan Kankanhalli, Junnan Li

To further streamline our evaluation, we introduce VideoAutoBench as an auxiliary benchmark, where human annotators label winners in a subset of VideoAutoArena battles.

Chatbot Multiple-choice +2

VQA$^2$: Visual Question Answering for Video Quality Assessment

1 code implementation6 Nov 2024 Ziheng Jia, ZiCheng Zhang, Jiaying Qian, HaoNing Wu, Wei Sun, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min

To address this gap, we introduce the VQA2 Instruction Dataset - the first visual question answering instruction dataset that focuses on video quality assessment.

Question Answering Video Quality Assessment +1

R-Bench: Are your Large Multimodal Model Robust to Real-world Corruptions?

1 code implementation7 Oct 2024 Chunyi Li, Jianbo Zhang, ZiCheng Zhang, HaoNing Wu, Yuan Tian, Wei Sun, Guo Lu, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

However, various corruptions in the real world mean that images will not be as ideal as in simulations, presenting significant challenges for the practical application of LMMs.

Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs

no code implementations30 Sep 2024 ZiCheng Zhang, Ziheng Jia, HaoNing Wu, Chunyi Li, Zijian Chen, Yingjie Zhou, Wei Sun, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

With the rising interest in research on Large Multi-modal Models (LMMs) for video understanding, many studies have emphasized general video comprehension capabilities, neglecting the systematic exploration into video quality understanding.

Benchmarking Multiple-choice +2

Explore the Hallucination on Low-level Perception for MLLMs

no code implementations15 Sep 2024 Yinan Sun, ZiCheng Zhang, HaoNing Wu, Xiaohong Liu, Weisi Lin, Guangtao Zhai, Xiongkuo Min

However, these models also exhibit hallucinations, which limit their reliability as AI systems, especially in tasks involving low-level visual perception and understanding.

Hallucination Question Answering +1

LIME: Less Is More for MLLM Evaluation

2 code implementations10 Sep 2024 King Zhu, Qianbo Zang, Shian Jia, Siwei Wu, Feiteng Fang, Yizhi Li, Shawn Gavin, Tuney Zheng, Jiawei Guo, Bo Li, HaoNing Wu, Xingwei Qu, Jian Yang, Zachary Liu, Xiang Yue, J. H. Liu, Chenghua Lin, Min Yang, Shiwen Ni, Wenhao Huang, Ge Zhang

However, many of these benchmarks include overly simple or uninformative samples, complicating the effective distinction of different MLLMs' performance.

Image Captioning Question Answering +1

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

1 code implementation20 Aug 2024 HaoNing Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya zhang, Yanfeng Wang

Diffusion models have emerged as frontrunners in text-to-image generation, but their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic deviations and object replication.

Denoising Scheduling +2

Q-Ground: Image Quality Grounding with Large Multi-modality Models

1 code implementation24 Jul 2024 Chaofeng Chen, Sensen Yang, HaoNing Wu, Liang Liao, ZiCheng Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content.

Image Quality Assessment

LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

1 code implementation22 Jul 2024 HaoNing Wu, Dongxu Li, Bei Chen, Junnan Li

In addition, our results indicate that model performance on the benchmark improves only when they are capable of processing more frames, positioning LongVideoBench as a valuable benchmark for evaluating future-generation long-context LMMs.

Multiple-choice Question Answering +2

MatchTime: Towards Automatic Soccer Game Commentary Generation

1 code implementation26 Jun 2024 Jiayuan Rao, HaoNing Wu, Chang Liu, Yanfeng Wang, Weidi Xie

Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience.

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

1 code implementation13 Jun 2024 Chunyi Li, Xiele Wu, HaoNing Wu, Donghui Feng, ZiCheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged.

Image Compression Image to text

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

1 code implementation5 Jun 2024 ZiCheng Zhang, HaoNing Wu, Chunyi Li, Yingjie Zhou, Wei Sun, Xiongkuo Min, Zijian Chen, Xiaohong Liu, Weisi Lin, Guangtao Zhai

How to accurately and efficiently assess AI-generated images (AIGIs) remains a critical challenge for generative models.

Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare

1 code implementation29 May 2024 Hanwei Zhu, HaoNing Wu, Yixuan Li, ZiCheng Zhang, Baoliang Chen, Lingyu Zhu, Yuming Fang, Guangtao Zhai, Weisi Lin, Shiqi Wang

Extensive experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training with converted single image quality score for inference, surpassing state-of-the-art IQA models across diverse scenarios.

Dual-Branch Network for Portrait Image Quality Assessment

1 code implementation14 May 2024 Wei Sun, Weixia Zhang, Yanwei Jiang, HaoNing Wu, ZiCheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai

We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ.

Face Image Quality Assessment Image Quality Assessment +3

Enhancing Blind Video Quality Assessment with Rich Quality-aware Features

1 code implementation14 May 2024 Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai

Motivated by previous researches that leverage pre-trained features extracted from various computer vision models as the feature representation for BVQA, we further explore rich quality-aware features from pre-trained blind image quality assessment (BIQA) and BVQA models as auxiliary features to help the BVQA model to handle complex distortions and diverse content of social media videos.

Video Quality Assessment

G-Refine: A General Quality Refiner for Text-to-Image Generation

1 code implementation29 Apr 2024 Chunyi Li, HaoNing Wu, Hongkun Hao, ZiCheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Based on the mechanisms of the Human Visual System (HVS) and syntax trees, the first two indicators can respectively identify the perception and alignment deficiencies, and the last module can apply targeted quality enhancement accordingly.

Text to Image Generation Text-to-Image Generation

LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

1 code implementation28 Apr 2024 ZiCheng Zhang, HaoNing Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored.

Point Cloud Quality Assessment

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

no code implementations25 Apr 2024 Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, HaoNing Wu, Yixuan Gao, Yuqin Cao, ZiCheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng, Jianquan Yang, Weigang Wang, Xi Fang, Xiaoxin Lv, Jun Yan, Tianwu Zhi, Yabin Zhang, Yaohui Li, Yang Li, Jingwen Xu, Jianzhao Liu, Yiting Liao, Junlin Li, Zihao Yu, Yiting Lu, Xin Li, Hossein Motamednia, S. Farhad Hosseini-Benvidi, Fengbin Guan, Ahmad Mahmoudi-Aznaveh, Azadeh Mansouri, Ganzorig Gankhuyag, Kihwan Yoon, Yifang Xu, Haotian Fan, Fangyuan Kong, Shiling Zhao, Weifeng Dong, Haibing Yin, Li Zhu, Zhiling Wang, Bingchen Huang, Avinab Saha, Sandeep Mishra, Shashank Gupta, Rajesh Sureddi, Oindrila Saha, Luigi Celona, Simone Bianco, Paolo Napoletano, Raimondo Schettini, Junfeng Yang, Jing Fu, Wei zhang, Wenzhi Cao, Limei Liu, Han Peng, Weijun Yuan, Zhan Li, Yihang Cheng, Yifan Deng, Haohui Li, Bowen Qu, Yao Li, Shuqing Luo, Shunzhou Wang, Wei Gao, Zihao Lu, Marcos V. Conde, Xinrui Wang, Zhibo Chen, Ruling Liao, Yan Ye, Qiulin Wang, Bing Li, Zhaokun Zhou, Miao Geng, Rui Chen, Xin Tao, Xiaoyu Liang, Shangkun Sun, Xingyuan Ma, Jiaze Li, Mengduo Yang, Haoran Xu, Jie zhou, Shiding Zhu, Bohan Yu, Pengfei Chen, Xinrui Xu, Jiabin Shen, Zhichao Duan, Erfan Asadi, Jiahe Liu, Qi Yan, Youran Qu, Xiaohui Zeng, Lele Wang, Renjie Liao

A total of 196 participants have registered in the video track.

Image Quality Assessment Image Restoration +2

AIGIQA-20K: A Large Database for AI-Generated Image Quality Assessment

no code implementations4 Apr 2024 Chunyi Li, Tengchuan Kou, Yixuan Gao, Yuqin Cao, Wei Sun, ZiCheng Zhang, Yingjie Zhou, Zhichao Zhang, Weixia Zhang, HaoNing Wu, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

With the rapid advancements in AI-Generated Content (AIGC), AI-Generated Images (AIGIs) have been widely applied in entertainment, education, and social media.

Image Quality Assessment

Subjective-Aligned Dataset and Metric for Text-to-Video Quality Assessment

1 code implementation18 Mar 2024 Tengchuan Kou, Xiaohong Liu, ZiCheng Zhang, Chunyi Li, HaoNing Wu, Xiongkuo Min, Guangtao Zhai, Ning Liu

Based on T2VQA-DB, we propose a novel transformer-based model for subjective-aligned Text-to-Video Quality Assessment (T2VQA).

Language Modeling Language Modelling +3

Towards Open-ended Visual Quality Comparison

no code implementations26 Feb 2024 HaoNing Wu, Hanwei Zhu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Chunyi Li, Annan Wang, Wenxiu Sun, Qiong Yan, Xiaohong Liu, Guangtao Zhai, Shiqi Wang, Weisi Lin

Comparative settings (e. g. pairwise choice, listwise ranking) have been adopted by a wide range of subjective studies for image quality assessment (IQA), as it inherently standardizes the evaluation criteria across different observers and offer more clear-cut responses.

Image Quality Assessment

MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model

2 code implementations26 Feb 2024 Chunyi Li, Guo Lu, Donghui Feng, HaoNing Wu, ZiCheng Zhang, Xiaohong Liu, Guangtao Zhai, Weisi Lin, Wenjun Zhang

With the evolution of storage and communication protocols, ultra-low bitrate image compression has become a highly demanding topic.

Decoder Image Compression +1

Q-Bench+: A Benchmark for Multi-modal Foundation Models on Low-level Vision from Single Images to Pairs

1 code implementation11 Feb 2024 ZiCheng Zhang, HaoNing Wu, Erli Zhang, Guangtao Zhai, Weisi Lin

To this end, we design benchmark settings to emulate human language responses related to low-level vision: the low-level visual perception (A1) via visual question answering related to low-level attributes (e. g. clarity, lighting); and the low-level visual description (A2), on evaluating MLLMs for low-level text descriptions.

Image Quality Assessment Question Answering +1

Q-Refine: A Perceptual Quality Refiner for AI-Generated Image

1 code implementation2 Jan 2024 Chunyi Li, HaoNing Wu, ZiCheng Zhang, Hongkun Hao, Kaiwei Zhang, Lei Bai, Xiaohong Liu, Xiongkuo Min, Weisi Lin, Guangtao Zhai

With the rapid evolution of the Text-to-Image (T2I) model in recent years, their unsatisfactory generation result has become a challenge.

Image Quality Assessment

Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation CVPR 2024 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation but still struggle to generate image sequences coherently.

Text to Image Generation Text-to-Image Generation +1

Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

1 code implementation CVPR 2024 Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, HaoNing Wu, Qiong Yan, Weisi Lin

Further we propose a local distortion extractor to obtain local distortion features from the pretrained CNNs and a local distortion injector to inject the local distortion features into ViT.

Image Quality Assessment Inductive Bias +1

Exploring the Naturalness of AI-Generated Images

1 code implementation9 Dec 2023 Zijian Chen, Wei Sun, HaoNing Wu, ZiCheng Zhang, Jun Jia, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

In this paper, we take the first step to benchmark and assess the visual naturalness of AI-generated images.

Iterative Token Evaluation and Refinement for Real-World Super-Resolution

1 code implementation9 Dec 2023 Chaofeng Chen, Shangchen Zhou, Liang Liao, HaoNing Wu, Wenxiu Sun, Qiong Yan, Weisi Lin

Distortion removal involves simple HQ token prediction with LQ images, while texture generation uses a discrete diffusion model to iteratively refine the distortion removal output with a token refinement network.

Image Super-Resolution Texture Synthesis

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

1 code implementation27 Nov 2023 Chaofeng Chen, Annan Wang, HaoNing Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin

While fine-tuning the U-Net can partially improve performance, it remains suffering from the suboptimal text encoder.

reinforcement-learning Reinforcement Learning

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

2 code implementations CVPR 2024 HaoNing Wu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Kaixin Xu, Chunyi Li, Jingwen Hou, Guangtao Zhai, Geng Xue, Wenxiu Sun, Qiong Yan, Weisi Lin

Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model.

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

1 code implementation25 Sep 2023 HaoNing Wu, ZiCheng Zhang, Erli Zhang, Chaofeng Chen, Liang Liao, Annan Wang, Chunyi Li, Wenxiu Sun, Qiong Yan, Guangtao Zhai, Weisi Lin

To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment.

Image Quality Assessment

Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment

no code implementations23 Aug 2023 Kangmin Xu, Liang Liao, Jing Xiao, Chaofeng Chen, HaoNing Wu, Qiong Yan, Weisi Lin

Further, we propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.

Image Quality Assessment Inductive Bias +1

TOPIQ: A Top-down Approach from Semantics to Distortions for Image Quality Assessment

1 code implementation6 Aug 2023 Chaofeng Chen, Jiadi Mo, Jingwen Hou, HaoNing Wu, Liang Liao, Wenxiu Sun, Qiong Yan, Weisi Lin

Our approach to IQA involves the design of a heuristic coarse-to-fine network (CFANet) that leverages multi-scale features and progressively propagates multi-level semantic information to low-level representations in a top-down manner.

Local Distortion Video Quality Assessment

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

1 code implementation6 Jul 2023 ZiCheng Zhang, Wei Sun, Yingjie Zhou, HaoNing Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans.

Boost Video Frame Interpolation via Motion Adaptation

1 code implementation24 Jun 2023 HaoNing Wu, Xiaoyun Zhang, Weidi Xie, Ya zhang, Yanfeng Wang

Video frame interpolation (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.

Motion Estimation Video Frame Interpolation

AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment

1 code implementation7 Jun 2023 Chunyi Li, ZiCheng Zhang, HaoNing Wu, Wei Sun, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

With the rapid advancements of the text-to-image generative model, AI-generated images (AGIs) have been widely applied to entertainment, education, social media, etc.

Image Quality Assessment

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

1 code implementation1 Jun 2023 Chang Liu, HaoNing Wu, Yujie Zhong, Xiaoyun Zhang, Yanfeng Wang, Weidi Xie

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently.

Story Visualization Style Transfer +3

Towards Explainable In-the-Wild Video Quality Assessment: A Database and a Language-Prompted Approach

1 code implementation22 May 2023 HaoNing Wu, Erli Zhang, Liang Liao, Chaofeng Chen, Jingwen Hou, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Though subjective studies have collected overall quality scores for these videos, how the abstract quality scores relate with specific factors is still obscure, hindering VQA methods from more concrete quality evaluations (e. g. sharpness of a video).

Video Quality Assessment Visual Question Answering (VQA)

Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment

2 code implementations28 Apr 2023 HaoNing Wu, Liang Liao, Annan Wang, Chaofeng Chen, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin

The proliferation of videos collected during in-the-wild natural settings has pushed the development of effective Video Quality Assessment (VQA) methodologies.

Video Quality Assessment Visual Question Answering (VQA)

Exploring Opinion-unaware Video Quality Assessment with Semantic Affinity Criterion

2 code implementations26 Feb 2023 HaoNing Wu, Liang Liao, Jingwen Hou, Chaofeng Chen, Erli Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Recent learning-based video quality assessment (VQA) algorithms are expensive to implement due to the cost of data collection of human quality opinions, and are less robust across various scenarios due to the biases of these opinions.

Video Quality Assessment Visual Question Answering (VQA)

Neighbourhood Representative Sampling for Efficient End-to-end Video Quality Assessment

4 code implementations11 Oct 2022 HaoNing Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Jinwei Gu, Weisi Lin

On the other hand, existing practices, such as resizing and cropping, will change the quality of original videos due to the loss of details and contents, and are therefore harmful to quality assessment.

Ranked #2 on Video Quality Assessment on KoNViD-1k (using extra training data)

Video Quality Assessment Visual Question Answering (VQA)

Exploring the Effectiveness of Video Perceptual Representation in Blind Video Quality Assessment

1 code implementation8 Jul 2022 Liang Liao, Kangmin Xu, HaoNing Wu, Chaofeng Chen, Wenxiu Sun, Qiong Yan, Weisi Lin

Experiments show that the perceptual representation in the HVS is an effective way of predicting subjective temporal quality, and thus TPQI can, for the first time, achieve comparable performance to the spatial quality metric and be even more effective in assessing videos with large temporal variations.

Video Quality Assessment Visual Question Answering (VQA)

FAST-VQA: Efficient End-to-end Video Quality Assessment with Fragment Sampling

4 code implementations6 Jul 2022 HaoNing Wu, Chaofeng Chen, Jingwen Hou, Liang Liao, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Consisting of fragments and FANet, the proposed FrAgment Sample Transformer for VQA (FAST-VQA) enables efficient end-to-end deep VQA and learns effective video-quality-related representations.

Ranked #4 on Video Quality Assessment on LIVE-VQC (using extra training data)

Video Quality Assessment

DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment

1 code implementation20 Jun 2022 HaoNing Wu, Chaofeng Chen, Liang Liao, Jingwen Hou, Wenxiu Sun, Qiong Yan, Weisi Lin

Based on prominent time-series modeling ability of transformers, we propose a novel and effective transformer-based VQA method to tackle these two issues.

Time Series Analysis Video Quality Assessment +1

LAR-SR: A Local Autoregressive Model for Image Super-Resolution

1 code implementation CVPR 2022 Baisong Guo, Xiaoyun Zhang, HaoNing Wu, Yu Wang, Ya zhang, Yan-Feng Wang

Previous super-resolution (SR) approaches often formulate SR as a regression problem and pixel wise restoration, which leads to a blurry and unreal SR output.

Image Super-Resolution

Cannot find the paper you are looking for? You can Submit a new open access paper.