Search Results for author: Zhengzhong Tu

Found 38 papers, 23 papers with code

STAMP: Scalable Task And Model-agnostic Collaborative Perception

1 code implementation24 Jan 2025 Xiangbo Gao, Runsheng Xu, Jiachen Li, Ziran Wang, Zhiwen Fan, Zhengzhong Tu

Perception is crucial for autonomous driving, but single-agent perception is often constrained by sensors' physical limitations, leading to degraded performance under severe occlusion, adverse weather conditions, and when detecting distant objects.

Autonomous Driving

HFMF: Hierarchical Fusion Meets Multi-Stream Models for Deepfake Detection

1 code implementation10 Jan 2025 Anant Mehta, Bryant McArthur, Nagarjuna Kolloju, Zhengzhong Tu

The first component of our approach integrates vision Transformers and convolutional nets through a hierarchical feature fusion mechanism.

DeepFake Detection Face Swapping +2

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

1 code implementation19 Dec 2024 Shuo Xing, Hongyuan Hua, Xiangbo Gao, Shenzhe Zhu, Renjie Li, Kexin Tian, Xiaopeng Li, Heng Huang, Tianbao Yang, Zhangyang Wang, Yang Zhou, Huaxiu Yao, Zhengzhong Tu

Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs -- an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems.

Autonomous Driving Benchmarking +4

OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Driving

1 code implementation19 Dec 2024 Shuo Xing, Chengyuan Qian, Yuping Wang, Hongyuan Hua, Kexin Tian, Yang Zhou, Zhengzhong Tu

Furthermore, OpenEMMA demonstrates effectiveness, generalizability, and robustness across a variety of challenging driving scenarios, offering a more efficient and effective approach to autonomous driving.

Autonomous Driving

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

no code implementations6 Dec 2024 Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu

Recent advances in large-scale text-to-image (T2I) diffusion models have enabled a variety of downstream applications, including style customization, subject-driven personalization, and conditional generation.

Video Quality Assessment: A Comprehensive Survey

1 code implementation4 Dec 2024 Qi Zheng, Yibo Fan, Leilei Huang, Tianyu Zhu, Jiaming Liu, Zhijian Hao, Shuo Xing, Chia-Ju Chen, Xiongkuo Min, Alan C. Bovik, Zhengzhong Tu

Numerous deep learning-based VQA models have been developed, with progress in this direction driven by the creation of content-diverse, large-scale human-labeled databases that supply ground truth psychometric video quality data.

Benchmarking Survey +2

MWFormer: Multi-Weather Image Restoration Using Degradation-Aware Transformers

1 code implementation26 Nov 2024 Ruoxi Zhu, Zhengzhong Tu, Jiaming Liu, Alan C. Bovik, Yibo Fan

Moreover, MWFormer allows for a novel way of tuning, during application, to either a single type of weather restoration or to hybrid weather restoration without any retraining, offering greater controllability than existing methods.

Contrastive Learning Image Restoration

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

1 code implementation25 Nov 2024 Hanhui Wang, Yihua Zhang, Ruizheng Bai, Yue Zhao, Sijia Liu, Zhengzhong Tu

Recent advancements in diffusion models have made generative image editing more accessible, enabling creative edits but raising ethical concerns, particularly regarding malicious edits to human portraits that threaten privacy and identity security.

Privacy Preserving

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

1 code implementation12 Nov 2024 Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, Yue Zhao

Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection.

Optical Flow Estimation Out-of-Distribution Detection +1

Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models

1 code implementation4 Oct 2024 Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen

Specifically, using LLaVA-34B, our proposed dynamic contrastive decoding improves an average accuracy of 2. 24%.

AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results

1 code implementation21 Aug 2024 Maksim Smirnov, Aleksandr Gushchin, Anastasia Antsiferova, Dmitry Vatolin, Radu Timofte, Ziheng Jia, ZiCheng Zhang, Wei Sun, Jiaying Qian, Yuqin Cao, Yinan Sun, Yuxin Zhu, Xiongkuo Min, Guangtao Zhai, Kanjar De, Qing Luo, Ao-Xiang Zhang, Peng Zhang, Haibo Lei, Linyan Jiang, Yaqing Li, Wenhui Meng, Zhenzhong Chen, Zhengxue Cheng, Jiahao Xiao, Jun Xu, Chenlong He, Qi Zheng, Ruoxi Zhu, Min Li, Yibo Fan, Zhengzhong Tu

The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H. 264, HEVC/H. 265, AV1, and VVC/H. 266) and containing a comprehensive collection of compression artifacts.

Image Manipulation valid +3

4K4DGen: Panoramic 4D Generation at 4K Resolution

no code implementations19 Jun 2024 Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

Subsequently, we propose \textbf{Dynamic Panoramic Lifting} to elevate the panoramic video into a 4D immersive environment while preserving spatial and temporal consistency.

4k

Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving

no code implementations CVPR 2024 Jinlong Li, Baolu Li, Zhengzhong Tu, Xinyu Liu, Qing Guo, Felix Juefei-Xu, Runsheng Xu, Hongkai Yu

Vision-centric perception systems for autonomous driving have gained considerable attention recently due to their cost-effectiveness and scalability, especially compared to LiDAR-based systems.

Autonomous Driving

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models

no code implementations1 Apr 2024 Kangfu Mei, Zhengzhong Tu, Mauricio Delbracio, Hossein Talebi, Vishal M. Patel, Peyman Milanfar

We study the scaling properties of latent diffusion models (LDMs) with an emphasis on their sampling efficiency.

V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions

no code implementations17 Mar 2024 Baolu Li, Jinlong Li, Xinyu Liu, Runsheng Xu, Zhengzhong Tu, Jiacheng Guo, Xiaopeng Li, Hongkai Yu

In this paper, we propose a Domain Generalization based approach, named V2X-DGW, for LiDAR-based 3D object detection on multi-agent perception system under adverse weather conditions.

3D Object Detection Domain Generalization +2

SPIRE: Semantic Prompt-Driven Image Restoration

no code implementations18 Dec 2023 Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process.

Deblurring Denoising +2

CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation

1 code implementation CVPR 2024 Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar

Our conditional-task learning and distillation approach outperforms previous distillation methods, achieving a new state-of-the-art in producing high-quality images with very few steps (e. g., 1-4) across multiple tasks, including super-resolution, text-guided image editing, and depth-to-image generation.

Image Enhancement Super-Resolution +1

MULLER: Multilayer Laplacian Resizer for Vision

1 code implementation ICCV 2023 Zhengzhong Tu, Peyman Milanfar, Hossein Talebi

Specifically, we select a state-of-the-art vision Transformer, MaxViT, as the baseline, and show that, if trained with MULLER, MaxViT gains up to 0. 6% top-1 accuracy, and meanwhile enjoys 36% inference cost saving to achieve similar top-1 accuracy on ImageNet-1k, as compared to the standard training scheme.

Image Classification Image Quality Assessment +2

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

2 code implementations5 Jul 2022 Runsheng Xu, Zhengzhong Tu, Hao Xiang, Wei Shao, Bolei Zhou, Jiaqi Ma

The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation.

3D Object Detection Autonomous Driving +2

Pik-Fix: Restoring and Colorizing Old Photos

1 code implementation4 May 2022 Runsheng Xu, Zhengzhong Tu, Yuanqi Du, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, Alan Bovik, Hongkai Yu

Our proposed framework consists of three modules: a restoration sub-network that conducts restoration from degradations, a similarity network that performs color histogram matching and color transfer, and a colorization subnet that learns to predict the chroma elements of images conditioned on chromatic reference signals.

Colorization

MaxViT: Multi-Axis Vision Transformer

14 code implementations4 Apr 2022 Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li

We also show that our proposed model expresses strong generative modeling capability on ImageNet, demonstrating the superior potential of MaxViT blocks as a universal vision module.

Image Classification object-detection +1

Perceptual Quality Assessment of UGC Gaming Videos

no code implementations31 Mar 2022 Xiangxu Yu, Zhengzhong Tu, Neil Birkbeck, Yilin Wang, Balu Adsumilli, Alan C. Bovik

In recent years, with the vigorous development of the video game industry, the proportion of gaming videos on major video websites like YouTube has dramatically increased.

Video Quality Assessment Visual Question Answering (VQA)

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

1 code implementation20 Mar 2022 Runsheng Xu, Hao Xiang, Zhengzhong Tu, Xin Xia, Ming-Hsuan Yang, Jiaqi Ma

In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles.

3D Object Detection Autonomous Vehicles +1

ROMNet: Renovate the Old Memories

no code implementations5 Feb 2022 Runsheng Xu, Zhengzhong Tu, Yuanqi Du, Xiaoyu Dong, Jinlong Li, Zibo Meng, Jiaqi Ma, Hongkai Yu

Renovating the memories in old photos is an intriguing research topic in computer vision fields.

Colorization

MAXIM: Multi-Axis MLP for Image Processing

3 code implementations CVPR 2022 Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, Yinxiao Li

In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks.

Deblurring Image Deblurring +6

FAVER: Blind Quality Prediction of Variable Frame Rate Videos

1 code implementation5 Jan 2022 Qi Zheng, Zhengzhong Tu, Pavan C. Madhusudana, Xiaoyang Zeng, Alan C. Bovik, Yibo Fan

Video quality assessment (VQA) remains an important and challenging problem that affects many applications at the widest scales.

Cloud Computing Video Quality Assessment +1

Predicting Eye Fixations Under Distortion Using Bayesian Observers

no code implementations6 Feb 2021 Zhengzhong Tu

Visual attention is very an essential factor that affects how human perceives visual signals.

Blocking

Regression or Classification? New Methods to Evaluate No-Reference Picture and Video Quality Models

no code implementations30 Jan 2021 Zhengzhong Tu, Chia-Ju Chen, Li-Heng Chen, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Video and image quality assessment has long been projected as a regression problem, which requires predicting a continuous quality score given an input stimulus.

General Classification Image Quality Assessment +2

RAPIQUE: Rapid and Accurate Video Quality Prediction of User Generated Content

1 code implementation26 Jan 2021 Zhengzhong Tu, Xiangxu Yu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

However, these models are either incapable or inefficient for predicting the quality of complex and diverse UGC videos in practical applications.

Video Quality Assessment

Adaptive Debanding Filter

1 code implementation22 Sep 2020 Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Banding artifacts, which manifest as staircase-like color bands on pictures or video frames, is a common distortion caused by compression of low-textured smooth regions.

Quantization

UGC-VQA: Benchmarking Blind Video Quality Assessment for User Generated Content

5 code implementations29 May 2020 Zhengzhong Tu, Yilin Wang, Neil Birkbeck, Balu Adsumilli, Alan C. Bovik

Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms.

Benchmarking feature selection +2

BBAND Index: A No-Reference Banding Artifact Predictor

no code implementations27 Feb 2020 Zhengzhong Tu, Jessie Lin, Yilin Wang, Balu Adsumilli, Alan C. Bovik

Banding artifact, or false contouring, is a common video compression impairment that tends to appear on large flat regions in encoded videos.

Video Compression

Fitness Done Right: a Real-time Intelligent Personal Trainer for Exercise Correction

no code implementations30 Oct 2019 Yun Chen, Yiyue Chen, Zhengzhong Tu

Finally, key values for key features of the two poses are computed correspondingly in the pose error detection part, which helps give correction advice.

Cannot find the paper you are looking for? You can Submit a new open access paper.