Search Results for author: Xinchao Wang

Found 218 papers, 135 papers with code

Hallucinating Visual Instances in Total Absentia

no code implementations ECCV 2020 Jiayan Qiu, Yiding Yang, Xinchao Wang, DaCheng Tao

This seemingly minor difference in fact makes the HVITA a much challenging task, as the restoration algorithm would have to not only infer the category of the object in total absentia, but also hallucinate an object of which the appearance is consistent with the background.

Hallucination Image Inpainting +1

Collaboration by Competition: Self-coordinated Knowledge Amalgamation for Multi-talent Student Learning

no code implementations ECCV 2020 Sihui Luo, Wenwen Pan, Xinchao Wang, Dazhou Wang, Haihong Tang, Mingli Song

To this end, we propose a self-coordinate knowledge amalgamation network (SOKA-Net) for learning the multi-talent student model.

Discrete Diffusion in Large Language and Multimodal Models: A Survey

1 code implementation16 Jun 2025 Runpeng Yu, Qi Li, Xinchao Wang

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs).

Denoising

Test3R: Learning to Reconstruct 3D at Test Time

1 code implementation16 Jun 2025 Yuheng Yuan, Qiuhong Shen, Shizun Wang, Xingyi Yang, Xinchao Wang

Extensive experiments demonstrate that our technique significantly outperforms previous state-of-the-art methods on the 3D reconstruction and multi-view depth estimation tasks.

3D Reconstruction Depth Estimation

Diversity-Guided MLP Reduction for Efficient Large Vision Transformers

1 code implementation10 Jun 2025 Chengchao Shen, Hourun Zhu, Gongfan Fang, Jianxin Wang, Xinchao Wang

To this end, we focus on the recoverability of the compressed models and propose a Diversity-Guided MLP Reduction (DGMR) method to significantly reduce the parameters of large vision transformers with only negligible performance degradation.

Diversity

PixelThink: Towards Efficient Chain-of-Pixel Reasoning

no code implementations29 May 2025 Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang

Existing reasoning segmentation approaches typically fine-tune multimodal large language models (MLLMs) using image-text pairs and corresponding mask labels.

Reasoning Segmentation reinforcement-learning +2

Minute-Long Videos with Dual Parallelisms

1 code implementation27 May 2025 Zeqing Wang, Bowen Zheng, Xingyi Yang, Zhenxiong Tan, Yuecong Xu, Xinchao Wang

Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos.

Denoising Video Generation

Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps

no code implementations24 May 2025 Sicheng Feng, Song Wang, Shuyi Ouyang, Lingdong Kong, Zikai Song, Jianke Zhu, Huan Wang, Xinchao Wang

Multimodal large language models (MLLMs) have recently achieved significant progress in visual tasks, including semantic scene understanding and text-image alignment, with reasoning variants enhancing performance on complex tasks involving mathematics and logic.

Scene Understanding Spatial Reasoning +1

VeriThinker: Learning to Verify Makes Reasoning Model Efficient

1 code implementation23 May 2025 Zigeng Chen, Xinyin Ma, Gongfan Fang, Ruonan Yu, Xinchao Wang

Extensive experiments validate that VeriThinker substantially reduces reasoning chain lengths while maintaining or even slightly improving accuracy.

model

Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

1 code implementation22 May 2025 Runpeng Yu, Xinyin Ma, Xinchao Wang

We observe that training with a purely discrete diffusion approach leads to significant training instability, suboptimal performance, and severe length bias issues.

Language Modeling Language Modelling +2

dKV-Cache: The Cache for Diffusion Language Models

2 code implementations21 May 2025 Xinyin Ma, Runpeng Yu, Gongfan Fang, Xinchao Wang

Our approach is motivated by the observation that different tokens have distinct representation dynamics throughout the diffusion process.

Code Generation Denoising

Thinkless: LLM Learns When to Think

1 code implementation19 May 2025 Gongfan Fang, Xinyin Ma, Xinchao Wang

Reasoning Language Models, capable of extended chain-of-thought reasoning, have demonstrated remarkable performance on tasks requiring complex logical inference.

GSM8K Math

Top-Down Compression: Revisit Efficient Vision Token Projection for Visual Instruction Tuning

no code implementations17 May 2025 Bonan Li, ZiCheng Zhang, Songhua Liu, Weihao Yu, Xinchao Wang

Visual instruction tuning aims to enable large language models to comprehend the visual world, with a pivotal challenge lying in establishing an effective vision-to-language projection.

Efficient Reasoning Models: A Survey

1 code implementation15 Apr 2025 Sicheng Feng, Gongfan Fang, Xinyin Ma, Xinchao Wang

Reasoning models have demonstrated remarkable progress in solving complex and logic-intensive tasks by generating extended Chain-of-Thoughts (CoTs) prior to arriving at a final answer.

Knowledge Distillation Model Compression +1

Ultra-Resolution Adaptation with Ease

1 code implementation20 Mar 2025 Ruonan Yu, Songhua Liu, Zhenxiong Tan, Xinchao Wang

Text-to-image diffusion models have achieved remarkable progress in recent years.

2k 4k +1

1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering

no code implementations20 Mar 2025 Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang

(Q1) \textbf{Short-Lifespan Gaussians}: 4DGS uses a large portion of Gaussians with short temporal span to represent scene dynamics, leading to an excessive number of Gaussians.

POSTA: A Go-to Framework for Customized Artistic Poster Generation

no code implementations CVPR 2025 Haoyu Chen, Xiaojie Xu, Wenbo Li, Jingjing Ren, Tian Ye, Songhua Liu, Ying-Cong Chen, Lei Zhu, Xinchao Wang

To train our models, we develop the PosterArt dataset, comprising high-quality artistic posters annotated with layout, typography, and pixel-level stylized text segmentation.

Text Segmentation

OminiControl2: Efficient Conditioning for Diffusion Transformers

1 code implementation11 Mar 2025 Zhenxiong Tan, Qiaochu Xue, Xingyi Yang, Songhua Liu, Xinchao Wang

Fine-grained control of text-to-image diffusion transformer models (DiT) remains a critical challenge for practical deployment.

Conditional Image Generation Denoising

PE3R: Perception-Efficient 3D Reconstruction

1 code implementation10 Mar 2025 Jie Hu, Shizun Wang, Xinchao Wang

PE3R employs a feed-forward architecture to enable rapid 3D semantic field reconstruction.

3D Reconstruction Zero-shot Generalization

Understanding Dataset Distillation via Spectral Filtering

no code implementations3 Mar 2025 Deyu Bo, Songhua Liu, Xinchao Wang

To address this limitation, we further propose Curriculum Frequency Matching (CFM), which gradually adjusts the filter parameter to cover both low- and high-frequency information of the FFC and FLC matrices.

Dataset Distillation Feature Correlation

Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling

no code implementations27 Feb 2025 Hanyang Kong, Xingyi Yang, Xinchao Wang

In response, we introduce Efficient Dynamic Gaussian Splatting (EDGS), which represents dynamic scenes via sparse time-variant attribute modeling.

Attribute

GraphBridge: Towards Arbitrary Transfer Learning in GNNs

1 code implementation26 Feb 2025 Li Ju, Xingyi Yang, Qi Li, Xinchao Wang

Empirical validation, conducted over 16 datasets representative of these scenarios, confirms the framework's capacity for task- and domain-agnostic transfer learning within graph-like data, marking a significant advancement in the field of GNNs.

Transfer Learning

Introducing Visual Perception Token into Multimodal Large Language Model

1 code implementation24 Feb 2025 Runpeng Yu, Xinyin Ma, Xinchao Wang

The Region Selection Token explicitly identifies specific regions in an image that require further perception, while the Vision Re-Encoding Token uses its hidden states as control signals to guide additional visual perception processes.

Language Modeling Language Modelling +3

CoT-Valve: Length-Compressible Chain-of-Thought Tuning

1 code implementation13 Feb 2025 Xinyin Ma, Guangnian Wan, Runpeng Yu, Gongfan Fang, Xinchao Wang

Moreover, we show that this property is valuable for compressing the reasoning chain.

GSM8K

Seeing World Dynamics in a Nutshell

1 code implementation5 Feb 2025 Qiuhong Shen, Xuanyu Yi, Mingbao Lin, Hanwang Zhang, Shuicheng Yan, Xinchao Wang

We consider the problem of efficiently representing casually captured monocular videos in a spatially- and temporally-coherent manner.

Video Reconstruction

Event-based Video Super-Resolution via State Space Models

no code implementations CVPR 2025 Zeyu Xiao, Xinchao Wang

In this paper, we introduce MamEVSR, a Mamba-based network for event-based VSR that leverages the selective state space model, Mamba.

Mamba State Space Models +1

Diffusion Model is Effectively Its Own Teacher

no code implementations CVPR 2025 Xinyin Ma, Runpeng Yu, Songhua Liu, Gongfan Fang, Xinchao Wang

We further validate the effectiveness of our method on text-to-image diffusion models, such as Stable Diffusion, and also observe notable improvement in image quality.

model

Generative Sparse-View Gaussian Splatting

no code implementations CVPR 2025 Hanyang Kong, Xingyi Yang, Xinchao Wang

Novel view synthesis from limited observations remains a significant challenge due to the lack of information in under-sampled regions, often resulting in noticeable artifacts.

Novel View Synthesis

CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation

no code implementations CVPR 2025 Bonan Li, ZiCheng Zhang, Xingyi Yang, Xinchao Wang

To further enhance cross-view consistency and alleviate content drift, CoSER rapidly scan all views in spiral bidirectional manner to aware holistic information and then scores each point based on semantic material.

3D Generation Text to 3D

Remix-DiT: Mixing Diffusion Transformers for Multi-Expert Denoising

1 code implementation7 Dec 2024 Gongfan Fang, Xinyin Ma, Xinchao Wang

The goal of Remix-DiT is to craft N diffusion experts for different denoising timesteps, yet without the need for expensive training of N independent models.

Denoising

One-shot Federated Learning via Synthetic Distiller-Distillate Communication

1 code implementation6 Dec 2024 Junyuan Zhang, Songhua Liu, Xinchao Wang

Additionally, they may encounter scalability issues with complex datasets due to inherent two-step information loss: first, during local training (from data to model), and second, when transferring knowledge to the server model (from model to inversed data).

Data-free Knowledge Distillation Federated Learning

TinyFusion: Diffusion Transformers Learned Shallow

1 code implementation CVPR 2025 Gongfan Fang, Kunjun Li, Xinyin Ma, Xinchao Wang

In this work, we present TinyFusion, a depth pruning method designed to remove redundant layers from diffusion transformers via end-to-end learning.

Image Generation

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

1 code implementation CVPR 2025 Zigeng Chen, Xinyin Ma, Gongfan Fang, Xinchao Wang

The large model serves as the 'drafter', specializing in generating low-frequency content at smaller scales, while the smaller model serves as the 'refiner', solely focusing on predicting high-frequency details at larger scales.

Image Generation Zero-shot Generalization

OminiControl: Minimal and Universal Control for Diffusion Transformer

2 code implementations22 Nov 2024 Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, Xinchao Wang

In this paper, we introduce OminiControl, a highly versatile and parameter-efficient framework that integrates image conditions into pre-trained Diffusion Transformer (DiT) models.

HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters

no code implementations2 Nov 2024 Yujie Mo, Runpeng Yu, Xiaofeng Zhu, Xinchao Wang

We further design a label-propagated contrastive loss and two self-supervised losses to optimize dual adapters and incorporate unlabeled nodes as potential labeled data.

Adversarial Training: A Survey

no code implementations19 Oct 2024 Mengnan Zhao, Lihe Zhang, Jingwen Ye, Huchuan Lu, BaoCai Yin, Xinchao Wang

Adversarial training (AT) refers to integrating adversarial examples -- inputs altered with imperceptible perturbations that can significantly impact model predictions -- into the training process.

Survey

Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching

no code implementations10 Oct 2024 Ruonan Yu, Songhua Liu, Jingwen Ye, Xinchao Wang

Addressing these concerns, this paper introduces Teddy, a Taylor-approximated dataset distillation framework designed to handle large-scale dataset and enhance efficiency.

Dataset Distillation

Poison-splat: Computation Cost Attack on 3D Gaussian Splatting

1 code implementation10 Oct 2024 Jiahao Lu, Yifan Zhang, Qiuhong Shen, Xinchao Wang, Shuicheng Yan

However, in this work, we reveal a significant security vulnerability that has been largely overlooked in 3DGS: the computation cost of training 3DGS could be maliciously tampered by poisoning the input data.

3DGS

MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models

1 code implementation26 Sep 2024 Gongfan Fang, Hongxu Yin, Saurav Muralidharan, Greg Heinrich, Jeff Pool, Jan Kautz, Pavlo Molchanov, Xinchao Wang

This approach facilitates end-to-end training on large-scale datasets and offers two notable advantages: 1) High-quality Masks - our method effectively scales to large datasets and learns accurate masks; 2) Transferability - the probabilistic modeling of mask distribution enables the transfer learning of sparsity across domains or tasks.

Large Language Model Model Compression +2

Attention Prompting on Image for Large Vision-Language Models

1 code implementation25 Sep 2024 Runpeng Yu, Weihao Yu, Xinchao Wang

To fill this gap, in this work, we propose a new prompting technique named Attention Prompting on Image, which just simply overlays a text-query-guided attention heatmap on the original input image and effectively enhances LVLM on various tasks.

MM-Vet Visual Prompting

Vista3D: Unravel the 3D Darkside of a Single Image

1 code implementation18 Sep 2024 Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang

We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts.

3D Generation Diversity

Kolmogorov-Arnold Transformer

1 code implementation16 Sep 2024 Xingyi Yang, Xinchao Wang

In this paper, we introduce the Kolmogorov-Arnold Transformer (KAT), a novel architecture that replaces MLP layers with Kolmogorov-Arnold Network (KAN) layers to enhance the expressiveness and performance of the model.

Image Classification

FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally

1 code implementation12 Sep 2024 Qiuhong Shen, Xingyi Yang, Xinchao Wang

Extensive experiments demonstrate the efficiency and robustness of our method in segmenting various scenes, and its superior performance in downstream tasks such as object removal and inpainting.

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

no code implementations12 Sep 2024 Yinwei Wu, Xianpan Zhou, Bing Ma, Xuefeng Su, Kai Ma, Xinchao Wang

The Layout-to-Image (L2I) task was introduced to address the positioning challenges by incorporating bounding boxes as spatial control signals, but it still falls short in generating precise instance features.

Text to Image Generation Text-to-Image Generation

LinFusion: 1 GPU, 1 Minute, 16K Image

1 code implementation3 Sep 2024 Songhua Liu, Weihao Yu, Zhenxiong Tan, Xinchao Wang

Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance.

16k Causal Inference +1

Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation

1 code implementation26 Aug 2024 Yushun Tang, Shuoshuo Chen, Zhihe Lu, Xinchao Wang, Zhihai He

Specifically, we introduce an extra token, referred to as \textit{domain shift token}, at each layer of the transformer network.

Prediction Test-time Adaptation

Focus on Neighbors and Know the Whole: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation

no code implementations23 Aug 2024 Bonan Li, ZiCheng Zhang, Xingyi Yang, Xinchao Wang

To further enhance cross-view consistency and alleviate content drift, CoSER rapidly scan all views in spiral bidirectional manner to aware holistic information and then scores each point based on semantic material.

3D Generation Text to 3D

Heavy Labels Out! Dataset Distillation with Label Space Lightening

no code implementations15 Aug 2024 Ruonan Yu, Songhua Liu, Zigeng Chen, Jingwen Ye, Xinchao Wang

Extensive experiments demonstrate that with only about 0. 003% of the original storage required for a complete set of soft labels, we achieve comparable performance to current state-of-the-art dataset distillation methods on large-scale datasets.

Dataset Distillation

MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

1 code implementation1 Aug 2024 Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, JianFeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang, Xinchao Wang

Using MM-Vet v2 to benchmark large multimodal models, we found that Claude 3. 5 Sonnet is the best model with a score of 71. 8, slightly outperforming GPT-4o which scored 71. 0.

Math MM-Vet +3

Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

1 code implementation22 Jul 2024 Yihao Ai, Yifei Qi, Bo wang, Yu Cheng, Xinchao Wang, Robby T. Tan

Our primary novelty lies in leveraging two complementary-teacher networks to generate more reliable pseudo labels, enabling our model achieves competitive performance on extremely low-light images without the need for training with low-light ground truths.

2D Human Pose Estimation Pose Estimation

Encapsulating Knowledge in One Prompt

1 code implementation16 Jul 2024 Qi Li, Runpeng Yu, Xinchao Wang

This paradigm encapsulates knowledge from various models into a solitary prompt without altering the original models or requiring access to the training data, which enables us to achieve efficient and convenient knowledge transfer in more realistic scenarios.

Transfer Learning

LiteFocus: Accelerated Diffusion Inference for Long Audio Synthesis

1 code implementation15 Jul 2024 Zhenxiong Tan, Xinyin Ma, Gongfan Fang, Xinchao Wang

Latent diffusion models have shown promising results in audio generation, making notable advancements over traditional methods.

Audio Generation Audio Synthesis

Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

1 code implementation9 Jul 2024 Taolin Zhang, Jiawang Bai, Zhihe Lu, Dongze Lian, Genping Wang, Xinchao Wang, Shu-Tao Xia

The synthesized query equipped with task-specific knowledge serves to extract the useful features for downstream tasks from the intermediate representations of the pre-trained model in a query-only manner.

Transfer Learning

Isomorphic Pruning for Vision Models

1 code implementation5 Jul 2024 Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang

For instance, we improve the accuracy of DeiT-Tiny from 74. 52% to 77. 50% by pruning an off-the-shelf DeiT-Base model.

Video-Infinity: Distributed Long Video Generation

no code implementations24 Jun 2024 Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao Wang

Specifically, we propose two coherent mechanisms: Clip parallelism and Dual-scope attention.

Video Generation

Neural Lineage

no code implementations CVPR 2024 Runpeng Yu, Xinchao Wang

Given a well-behaved neural network, is possible to identify its parent, based on which it was tuned?

Few-Shot Anomaly Detection via Category-Agnostic Registration Learning

1 code implementation13 Jun 2024 Chaoqin Huang, Haoyan Guan, Aofan Jiang, Ya zhang, Michael Spratling, Xinchao Wang, Yanfeng Wang

At test time, an image and its corresponding support set, consisting of a few normal images from the same category, are supplied, and anomalies are identified by comparing the registered features of the test image to its corresponding support image features.

Anomaly Detection Representation Learning

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

2 code implementations11 Jun 2024 Zigeng Chen, Xinyin Ma, Gongfan Fang, Zhenxiong Tan, Xinchao Wang

To address this, we introduce AsyncDiff, a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices.

Denoising

Compositional Video Generation as Flow Equalization

1 code implementation10 Jun 2024 Xingyi Yang, Xinchao Wang

Despite the promising results, a significant challenge remains: these models struggle to fully grasp complex compositional interactions between multiple concepts and actions.

Video Editing Video Generation

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

2 code implementations10 Jun 2024 Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors.

3D Generation Attribute

Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching

1 code implementation3 Jun 2024 Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang

In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters.

Denoising

GFlow: Recovering 4D World from Monocular Video

no code implementations28 May 2024 Shizun Wang, Xingyi Yang, Qiuhong Shen, Zhenxiang Jiang, Xinchao Wang

To solve this, we introduce GFlow, a new framework that utilizes only 2D priors (depth and optical flow) to lift a video to a 4D scene, as a flow of 3D Gaussians through space and time.

4D reconstruction Novel View Synthesis +1

MambaOut: Do We Really Need Mamba for Vision?

4 code implementations CVPR 2025 Weihao Yu, Xinchao Wang

For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks.

image-classification Image Classification +4

Distilled Datamodel with Reverse Gradient Matching

no code implementations CVPR 2024 Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang

To investigate the impact of changes in training data on a pre-trained model, a common approach is leave-one-out retraining.

Ungeneralizable Examples

no code implementations CVPR 2024 Jingwen Ye, Xinchao Wang

The training of contemporary deep learning models heavily relies on publicly available data, posing a risk of unauthorized access to online data and raising concerns about data privacy.

MindBridge: A Cross-Subject Brain Decoding Framework

1 code implementation CVPR 2024 Shizun Wang, Songhua Liu, Zhenxiong Tan, Xinchao Wang

Currently, brain decoding is confined to a per-subject-per-model paradigm, limiting its applicability to the same individual for whom the decoding model is trained.

Brain Decoding Data Augmentation +2

Hash3D: Training-free Acceleration for 3D Generation

1 code implementation CVPR 2025 Xingyi Yang, Xinchao Wang

The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models.

3D Generation Image to 3D +1

Unsegment Anything by Simulating Deformation

1 code implementation CVPR 2024 Jiahao Lu, Xingyi Yang, Xinchao Wang

Foundation segmentation models, while powerful, pose a significant risk: they enable users to effortlessly extract any objects from any digital content with a single click, potentially leading to copyright infringement or malicious misuse.

Segmentation

Relation Rectification in Diffusion Model

no code implementations CVPR 2024 Yinwei Wu, Xingyi Yang, Xinchao Wang

Despite their exceptional generative abilities, large text-to-image diffusion models, much like skilled but careless artists, often struggle with accurately depicting visual relationships between objects.

model Relation

Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction

2 code implementations27 Mar 2024 Qiuhong Shen, Zike Wu, Xuanyu Yi, Pan Zhou, Hanwang Zhang, Shuicheng Yan, Xinchao Wang

We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed.

3D Generation 3DGS +3

Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

1 code implementation CVPR 2024 Chaoqin Huang, Aofan Jiang, Jinghao Feng, Ya zhang, Xinchao Wang, Yanfeng Wang

Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains.

Anomaly Classification Anomaly Segmentation

Through the Dual-Prism: A Spectral Perspective on Graph Data Augmentation for Graph Classification

1 code implementation18 Jan 2024 Yutong Xia, Runpeng Yu, Yuxuan Liang, Xavier Bresson, Xinchao Wang, Roger Zimmermann

Graph Neural Networks have become the preferred tool to process graph data, with their efficacy being boosted through graph data augmentation techniques.

Data Augmentation Graph Classification

Mutual-modality Adversarial Attack with Semantic Perturbation

no code implementations20 Dec 2023 Jingwen Ye, Ruonan Yu, Songhua Liu, Xinchao Wang

Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.

Adversarial Attack

DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators

no code implementations14 Dec 2023 Hanyang Kong, Dongze Lian, Michael Bi Mi, Xinchao Wang

We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating unbounded flythrough scenes from textual prompts.

Image Generation Perpetual View Generation +1

SlimSAM: 0.1% Data Makes Segment Anything Slim

2 code implementations8 Dec 2023 Zigeng Chen, Gongfan Fang, Xinyin Ma, Xinchao Wang

To address this challenging trade-off, we introduce SlimSAM, a novel data-efficient SAM compression method that achieves superior performance with extremely less training data.

Generator Born from Classifier

no code implementations NeurIPS 2023 Runpeng Yu, Xinchao Wang

In this paper, we make a bold attempt toward an ambitious task: given a pre-trained classifier, we aim to reconstruct an image generator, without relying on any data samples.

Image Generation

DeepCache: Accelerating Diffusion Models for Free

3 code implementations CVPR 2024 Xinyin Ma, Gongfan Fang, Xinchao Wang

Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities.

Denoising Image Generation

Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

1 code implementation28 Nov 2023 Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang

However, performance advancements are limited when relying solely on intricate algorithmic designs for a single model, even one exhibiting strong performance, e. g., CLIP-ViT-B/16.

Prompt Engineering

GraphAdapter: Tuning Vision-Language Models With Dual Knowledge Graph

1 code implementation NeurIPS 2023 Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, Xinchao Wang

To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i. e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph.

Transfer Learning

Priority-Centric Human Motion Generation in Discrete Latent Space

no code implementations ICCV 2023 Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence.

Motion Generation

SG-Former: Self-guided Transformer with Evolving Token Reallocation

1 code implementation ICCV 2023 Sucheng Ren, Xingyi Yang, Songhua Liu, Xinchao Wang

At the heart of our approach is to utilize a significance map, which is estimated through hybrid-scale self-attention and evolves itself during training, to reallocate tokens based on the significance of each region.

Diffusion Model as Representation Learner

1 code implementation ICCV 2023 Xingyi Yang, Xinchao Wang

In this paper, we conduct an in-depth investigation of the representation power of DPMs, and propose a novel knowledge transfer method that leverages the knowledge acquired by generative DPMs for recognition tasks.

Denoising image-classification +5

Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey

1 code implementation18 Aug 2023 Xin Li, Yulin Ren, Xin Jin, Cuiling Lan, Xingrui Wang, Wenjun Zeng, Xinchao Wang, Zhibo Chen

Image restoration (IR) has been an indispensable and challenging task in the low-level vision field, which strives to improve the subjective quality of images distorted by various forms of degradation.

Deblurring Image Restoration +2

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

1 code implementation ICCV 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

To work with auxiliary tasks, we propose a novel auxiliary-adapted transformer, which can handle incomplete, corrupted motion data and achieve coordinate recovery via capturing spatial-temporal dependencies.

Human motion prediction Human Pose Forecasting +1

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation4 Aug 2023 Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math MM-Vet +1

PseudoCal: A Source-Free Approach to Unsupervised Uncertainty Calibration in Domain Adaptation

no code implementations14 Jul 2023 Dapeng Hu, Jian Liang, Xinchao Wang, Chuan-Sheng Foo

The conventional in-domain calibration method, \textit{temperature scaling} (TempScal), encounters challenges due to domain distribution shifts and the absence of labeled target domain data.

Unsupervised Domain Adaptation

Distribution Shift Inversion for Out-of-Distribution Prediction

1 code implementation CVPR 2023 Runpeng Yu, Songhua Liu, Xingyi Yang, Xinchao Wang

Machine learning society has witnessed the emergence of a myriad of Out-of-Distribution (OoD) algorithms, which address the distribution shift between the training and the testing distribution by searching for a unified predictor or invariant feature representation.

Domain Generalization Prediction

Evolving Knowledge Mining for Class Incremental Segmentation

1 code implementation3 Jun 2023 Zhihe Lu, Shuicheng Yan, Xinchao Wang

In this paper, we for the first time investigate the efficient multi-grained knowledge reuse for CISS, and propose a novel method, Evolving kNowleDge minING (ENDING), employing a frozen backbone.

Class-Incremental Semantic Segmentation Knowledge Distillation

LLM-Pruner: On the Structural Pruning of Large Language Models

2 code implementations NeurIPS 2023 Xinyin Ma, Gongfan Fang, Xinchao Wang

With LLM being a general-purpose task solver, we explore its compression in a task-agnostic manner, which aims to preserve the multi-task solving and language generation ability of the original LLM.

Text Generation zero-shot-classification +1

Structural Pruning for Diffusion Models

2 code implementations NeurIPS 2023 Gongfan Fang, Xinyin Ma, Xinchao Wang

Generative modeling has recently undergone remarkable advancements, primarily propelled by the transformative implications of Diffusion Probabilistic Models (DPMs).

Can SAM Boost Video Super-Resolution?

no code implementations11 May 2023 Zhihe Lu, Zeyu Xiao, Jiawang Bai, Zhiwei Xiong, Xinchao Wang

To use the SAM-based prior, we propose a simple yet effective module -- SAM-guidEd refinEment Module (SEEM), which can enhance both alignment and fusion procedures by the utilization of semantic information.

Optical Flow Estimation Video Super-Resolution

Deep Graph Reprogramming

no code implementations CVPR 2023 Yongcheng Jing, Chongbin Yuan, Li Ju, Yiding Yang, Xinchao Wang, DaCheng Tao

In this paper, we explore a novel model reusing task tailored for graph neural networks (GNNs), termed as "deep graph reprogramming".

3D Object Recognition Action Recognition +1

Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

no code implementations CVPR 2023 Hao Tang, Songhua Liu, Tianwei Lin, Shaoli Huang, Fu Li, Dongliang He, Xinchao Wang

On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality.

Meta-Learning Style Transfer

Segment Anything in Non-Euclidean Domains: Challenges and Opportunities

no code implementations23 Apr 2023 Yongcheng Jing, Xinchao Wang, DaCheng Tao

The recent work known as Segment Anything (SA) has made significant strides in pushing the boundaries of semantic segmentation into the era of foundation models.

Image Inpainting object-detection +2

Anything-3D: Towards Single-view Anything Reconstruction in the Wild

1 code implementation19 Apr 2023 Qiuhong Shen, Xingyi Yang, Xinchao Wang

3D reconstruction from a single-RGB image in unconstrained real-world scenarios presents numerous challenges due to the inherent diversity and complexity of objects and environments.

3D Reconstruction Diversity +1

Any-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate

1 code implementation19 Apr 2023 Songhua Liu, Jingwen Ye, Xinchao Wang

Existing approaches either apply the holistic style of the style image in a global manner, or migrate local colors and textures of the style image to the content counterparts in a pre-defined way.

Style Transfer

InceptionNeXt: When Inception Meets ConvNeXt

13 code implementations CVPR 2024 Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.

Image Classification Semantic Segmentation

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

2 code implementations CVPR 2023 Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle.

Human Pose Forecasting motion prediction +3

Partial Network Cloning

1 code implementation CVPR 2023 Jingwen Ye, Songhua Liu, Xinchao Wang

Unlike prior methods that update all or at least part of the parameters in the target network throughout the knowledge transfer process, PNC conducts partial parametric "cloning" from a source network and then injects the cloned module to the target, without modifying its parameters.

Transfer Learning

DepGraph: Towards Any Structural Pruning

1 code implementation CVPR 2023 Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang

Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks.

Network Pruning Neural Network Compression

Dataset Distillation: A Comprehensive Review

1 code implementation17 Jan 2023 Ruonan Yu, Songhua Liu, Xinchao Wang

Recent success of deep learning is largely attributed to the sheer amount of data used for training deep neural networks. Despite the unprecedented success, the massive data, unfortunately, significantly increases the burden on storage and transmission and further gives rise to a cumbersome model training process.

Dataset Condensation Dataset Distillation

Slimmable Dataset Condensation

no code implementations CVPR 2023 Songhua Liu, Jingwen Ye, Runpeng Yu, Xinchao Wang

In this paper, we explore the problem of slimmable dataset condensation, to extract a smaller synthetic dataset given only previous condensation results.

Dataset Condensation Dataset Distillation

Few-Shot Dataset Distillation via Translative Pre-Training

1 code implementation ICCV 2023 Songhua Liu, Xinchao Wang

We pre-train the translator on some large datasets like ImageNet so that it requires only a limited number of adaptation steps on the target dataset.

Dataset Distillation

Diffusion Probabilistic Model Made Slim

no code implementations CVPR 2023 Xingyi Yang, Daquan Zhou, Jiashi Feng, Xinchao Wang

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms.

Image Generation model +1

AvatarGen: A 3D Generative Model for Animatable Human Avatars

1 code implementation26 Nov 2022 Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng

Specifically, we decompose the generative 3D human synthesis into pose-guided mapping and canonical representation with predefined human pose and shape, such that the canonical representation can be explicitly driven to different poses and shapes with the guidance of a 3D parametric human model SMPL.

Human Animation

Task Residual for Tuning Vision-Language Models

1 code implementation CVPR 2023 Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, Xinchao Wang

Large-scale vision-language models (VLMs) pre-trained on billion-level data have learned general visual representations and broad visual concepts.

Transfer Learning

Dataset Factorization for Condensation

1 code implementation NIPS 2022 Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang

In this paper, we study dataset distillation (DD), from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline.

Dataset Distillation Diversity +2

Dataset Distillation via Factorization

3 code implementations30 Oct 2022 Songhua Liu, Kai Wang, Xingyi Yang, Jingwen Ye, Xinchao Wang

In this paper, we study \xw{dataset distillation (DD)}, from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline.

Dataset Distillation Hallucination +1

MetaFormer Baselines for Vision

8 code implementations24 Oct 2022 Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

Deep Model Reassembly

1 code implementation24 Oct 2022 Xingyi Yang, Daquan Zhou, Songhua Liu, Jingwen Ye, Xinchao Wang

Given a collection of heterogeneous models pre-trained from distinct sources and with diverse architectures, the goal of DeRy, as its name implies, is to first dissect each model into distinctive building blocks, and then selectively reassemble the derived blocks to produce customized networks under both the hardware resource and performance constraints.

model Transfer Learning

Reachability-Aware Laplacian Representation in Reinforcement Learning

no code implementations24 Oct 2022 Kaixin Wang, Kuangqi Zhou, Jiashi Feng, Bryan Hooi, Xinchao Wang

In Reinforcement Learning (RL), Laplacian Representation (LapRep) is a task-agnostic state representation that encodes the geometry of the environment.

reinforcement-learning Reinforcement Learning +1

Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning

1 code implementation17 Oct 2022 Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang

With the proposed SSF, our model obtains 2. 46% (90. 72% vs. 88. 54%) and 11. 48% (73. 10% vs. 65. 57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0. 3M parameters.

image-classification Image Classification +1

Training Spiking Neural Networks with Local Tandem Learning

1 code implementation10 Oct 2022 Qu Yang, Jibin Wu, Malu Zhang, Yansong Chua, Xinchao Wang, Haizhou Li

The LTL rule follows the teacher-student learning approach by mimicking the intermediate feature representations of a pre-trained ANN.

Attention Diversification for Domain Generalization

1 code implementation9 Oct 2022 Rang Meng, Xianfeng Li, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, ShiLiang Pu

Under this guidance, a novel Attention Diversification framework is proposed, in which Intra-Model and Inter-Model Attention Diversification Regularization are collaborated to reassign appropriate attention to diverse task-related features.

Domain Generalization

Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons

no code implementations25 Aug 2022 Yu Cheng, Yihao Ai, Bo wang, Xinchao Wang, Robby T. Tan

In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons, and unlike the top-down methods, do not rely on human detection.

2D Pose Estimation Human Detection +1

AvatarGen: a 3D Generative Model for Animatable Human Avatars

1 code implementation1 Aug 2022 Jianfeng Zhang, Zihang Jiang, Dingdong Yang, Hongyi Xu, Yichun Shi, Guoxian Song, Zhongcong Xu, Xinchao Wang, Jiashi Feng

Unsupervised generation of clothed virtual humans with various appearance and animatable poses is important for creating 3D human avatars and other AR/VR applications.

3D Human Reconstruction

Federated Selective Aggregation for Knowledge Amalgamation

1 code implementation27 Jul 2022 Donglin Xie, Ruonan Yu, Gongfan Fang, Jie Song, Zunlei Feng, Xinchao Wang, Li Sun, Mingli Song

The goal of FedSA is to train a student model for a new task with the help of several decentralized teachers, whose pre-training tasks and data are different and agnostic.

Learning Graph Neural Networks for Image Style Transfer

no code implementations24 Jul 2022 Yongcheng Jing, Yining Mao, Yiding Yang, Yibing Zhan, Mingli Song, Xinchao Wang, DaCheng Tao

To this end, we develop an elaborated GNN model with content and style local patches as the graph vertices.

Image Stylization

Learning with Recoverable Forgetting

1 code implementation17 Jul 2022 Jingwen Ye, Yifang Fu, Jie Song, Xingyi Yang, Songhua Liu, Xin Jin, Mingli Song, Xinchao Wang

Life-long learning aims at learning a sequence of tasks without forgetting the previously acquired knowledge.

General Knowledge Transfer Learning

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

1 code implementation13 Jul 2022 Songhua Liu, Jingwen Ye, Sucheng Ren, Xinchao Wang

Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility.

Face Generation Style Transfer

Factorizing Knowledge in Neural Networks

1 code implementation4 Jul 2022 Xingyi Yang, Jingwen Ye, Xinchao Wang

The core idea of KF lies in the modularization and assemblability of knowledge: given a pretrained network model as input, KF aims to decompose it into several factor networks, each of which handles only a dedicated task and maintains task-specific knowledge factorized from the source network.

Disentanglement Transfer Learning

Slimmable Domain Adaptation

1 code implementation CVPR 2022 Rang Meng, WeiJie Chen, Shicai Yang, Jie Song, Luojun Lin, Di Xie, ShiLiang Pu, Xinchao Wang, Mingli Song, Yueting Zhuang

In this paper, we introduce a simple framework, Slimmable Domain Adaptation, to improve cross-domain generalization with a weight-sharing model bank, from which models of different capacities can be sampled to accommodate different accuracy-efficiency trade-offs.

Domain Generalization Unsupervised Domain Adaptation

Learning Domain Adaptive Object Detection with Probabilistic Teacher

2 code implementations13 Jun 2022 Meilin Chen, WeiJie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, ShiLiang Pu

In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter.

Object object-detection +1

Inception Transformer

4 code implementations25 May 2022 Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

image-classification Image Classification

Tyger: Task-Type-Generic Active Learning for Molecular Property Prediction

no code implementations23 May 2022 Kuangqi Zhou, Kaixin Wang, Jiashi Feng, Jian Tang, Tingyang Xu, Xinchao Wang

However, existing best deep AL methods are mostly developed for a single type of learning task (e. g., single-label classification), and hence may not perform well in molecular property prediction that involves various task types.

Active Learning Drug Discovery +3

Prompting to Distill: Boosting Data-Free Knowledge Distillation via Reinforced Prompt

no code implementations16 May 2022 Xinyin Ma, Xinchao Wang, Gongfan Fang, Yongliang Shen, Weiming Lu

Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the dependence of original training data, and has recently achieved impressive results in accelerating pre-trained language models.

Data-free Knowledge Distillation

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

1 code implementation ACL 2022 Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li

In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9, 082 turns and 24, 449 utterances.

Cultural Vocal Bursts Intensity Prediction Diversity +1

Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels

1 code implementation30 Apr 2022 Kai Wang, Xiangyu Peng, Shuo Yang, Jianfei Yang, Zheng Zhu, Xinchao Wang, Yang You

This paradigm, however, is prone to significant degeneration under heavy label noise, as the number of clean samples is too small for conventional methods to behave well.

Learning with noisy labels

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

1 code implementation CVPR 2022 Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang

Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses.

3D Human Pose Estimation Hallucination

Point2Seq: Detecting 3D Objects as Sequences

1 code implementation CVPR 2022 Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang

We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.

3D Object Detection Decoder +2

CAFE: Learning to Condense Dataset by Aligning Features

2 code implementations CVPR 2022 Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, Yang You

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one.

Dataset Condensation

Geometric Structure Preserving Warp for Natural Image Stitching

1 code implementation CVPR 2022 Peng Du, Jifeng Ning, Jiguang Cui, Shaoli Huang, Xinchao Wang, Jiaxin Wang

Further, an optimized GES energy term is presented to reasonably determine the weights of the sampling points on the geometric structure, and the term is added into the Global Similarity Prior (GSP) stitching model called GES-GSP to achieve a smooth transition between local alignment and geometric structure preservation.

Edge Detection Image Stitching

PONet: Robust 3D Human Pose Estimation via Learning Orientations Only

no code implementations21 Dec 2021 Jue Wang, Shaoli Huang, Xinchao Wang, DaCheng Tao

Conventional 3D human pose estimation relies on first detecting 2D body keypoints and then solving the 2D to 3D correspondence problem. Despite the promising results, this learning paradigm is highly dependent on the quality of the 2D keypoint detector, which is inevitably fragile to occlusions and out-of-image absences. In this paper, we propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only, hence bypassing the error-prone keypoint detector in the absence of image evidence.

3D Human Pose Estimation

Up to 100$\times$ Faster Data-free Knowledge Distillation

2 code implementations12 Dec 2021 Gongfan Fang, Kanya Mo, Xinchao Wang, Jie Song, Shitao Bei, Haofei Zhang, Mingli Song

At the heart of our approach is a novel strategy to reuse the shared common features in training data so as to synthesize different data instances.

Data-free Knowledge Distillation

Safe Distillation Box

1 code implementation5 Dec 2021 Jingwen Ye, Yining Mao, Jie Song, Xinchao Wang, Cheng Jin, Mingli Song

In other words, all users may employ a model in SDB for inference, but only authorized users get access to KD from the model.

Knowledge Distillation

Shunted Self-Attention via Multi-Scale Token Aggregation

1 code implementation CVPR 2022 Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

This novel merging scheme enables the self-attention to learn relationships between objects with different sizes and simultaneously reduces the token numbers and the computational cost.

MetaFormer Is Actually What You Need for Vision

18 code implementations CVPR 2022 Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Object Detection +2

Meta Clustering Learning for Large-scale Unsupervised Person Re-identification

no code implementations19 Nov 2021 Xin Jin, Tianyu He, Xu Shen, Tongliang Liu, Xinchao Wang, Jianqiang Huang, Zhibo Chen, Xian-Sheng Hua

Unsupervised Person Re-identification (U-ReID) with pseudo labeling recently reaches a competitive performance compared to fully-supervised ReID methods based on modern clustering algorithms.

Clustering Unsupervised Person Re-Identification

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

no code implementations27 Oct 2021 Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li

Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity.

Diversity Emotion Classification +2

Unleash the Potential of Adaptation Models via Dynamic Domain Labels

no code implementations29 Sep 2021 Xin Jin, Tianyu He, Xu Shen, Songhua Wu, Tongliang Liu, Xinchao Wang, Jianqiang Huang, Zhibo Chen, Xian-Sheng Hua

In this paper, we propose an embarrassing simple yet highly effective adversarial domain adaptation (ADA) method for effectively training models for alignment.

Domain Adaptation Memorization

How Well Does Self-Supervised Pre-Training Perform with Streaming ImageNet?

no code implementations NeurIPS Workshop ImageNet_PPF 2021 Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng

Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.

Self-Supervised Learning

Meta-Aggregator: Learning to Aggregate for 1-bit Graph Neural Networks

no code implementations ICCV 2021 Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, DaCheng Tao

In this paper, we study a novel meta aggregation scheme towards binarizing graph neural networks (GNNs).

Structure-Aware Feature Generation for Zero-Shot Learning

no code implementations16 Aug 2021 Lianbo Zhang, Shaoli Huang, Xinchao Wang, Wei Liu, DaCheng Tao

In this paper, we introduce a novel structure-aware feature generation scheme, termed as SA-GAN, to explicitly account for the topological structure in learning both the latent space and the generative networks.

Attribute Generative Adversarial Network +1

Boundary Knowledge Translation based Reference Semantic Segmentation

no code implementations1 Aug 2021 Lechao Cheng, Zunlei Feng, Xinchao Wang, Ya Jie Liu, Jie Lei, Mingli Song

In this paper, we introduce a novel Reference semantic segmentation Network (Ref-Net) to conduct visual boundary knowledge translation.

Segmentation Semantic Segmentation +1

Edge-competing Pathological Liver Vessel Segmentation with Limited Labels

1 code implementation1 Aug 2021 Zunlei Feng, Zhonghua Wang, Xinchao Wang, Xiuming Zhang, Lechao Cheng, Jie Lei, Yuexuan Wang, Mingli Song

The diagnosis of MVI needs discovering the vessels that contain hepatocellular carcinoma cells and counting their number in each vessel, which depends heavily on experiences of the doctor, is largely subjective and time-consuming.

Segmentation whole slide images

Visual Boundary Knowledge Translation for Foreground Segmentation

1 code implementation1 Aug 2021 Zunlei Feng, Lechao Cheng, Xinchao Wang, Xiang Wang, Yajie Liu, Xiangtong Du, Mingli Song

To this end, we propose a Translation Segmentation Network (Trans-Net), which comprises a segmentation network and two boundary discriminators.

Foreground Segmentation Image Segmentation +3

Tree-Like Decision Distillation

no code implementations CVPR 2021 Jie Song, Haofei Zhang, Xinchao Wang, Mengqi Xue, Ying Chen, Li Sun, DaCheng Tao, Mingli Song

Knowledge distillation pursues a diminutive yet well-behaved student network by harnessing the knowledge learned by a cumbersome teacher model.

Decision Making Knowledge Distillation

Turning Frequency to Resolution: Video Super-Resolution via Event Cameras

no code implementations CVPR 2021 Yongcheng Jing, Yiding Yang, Xinchao Wang, Mingli Song, DaCheng Tao

To this end, we propose an Event-based VSR framework (E-VSR), of which the key component is an asynchronous interpolation (EAI) module that reconstructs a high-frequency (HF) video stream with uniform and tiny pixel displacements between neighboring frames from an event stream.

Video Super-Resolution

Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

no code implementations CVPR 2021 Yiding Yang, Zhou Ren, Haoxiang Li, Chunluan Zhou, Xinchao Wang, Gang Hua

In this paper, we propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame, and hence may serve as a robust estimation even in challenging scenarios including occlusion.

Graph Neural Network Multi-Person Pose Estimation +2

Contrastive Model Inversion for Data-Free Knowledge Distillation

2 code implementations18 May 2021 Gongfan Fang, Jie Song, Xinchao Wang, Chengchao Shen, Xingen Wang, Mingli Song

In this paper, we propose Contrastive Model Inversion~(CMI), where the data diversity is explicitly modeled as an optimizable objective, to alleviate the mode collapse issue.

Contrastive Learning Data-free Knowledge Distillation +2

KDExplainer: A Task-oriented Attention Model for Explaining Knowledge Distillation

1 code implementation10 May 2021 Mengqi Xue, Jie Song, Xinchao Wang, Ying Chen, Xingen Wang, Mingli Song

Knowledge distillation (KD) has recently emerged as an efficacious scheme for learning compact deep neural networks (DNNs).

Knowledge Distillation Mixture-of-Experts +1

How Well Does Self-Supervised Pre-Training Perform with Streaming Data?

no code implementations ICLR 2022 Dapeng Hu, Shipeng Yan, Qizhengqiu Lu, Lanqing Hong, Hailin Hu, Yifan Zhang, Zhenguo Li, Xinchao Wang, Jiashi Feng

Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained.

Representation Learning Self-Supervised Learning

Online Multiple Object Tracking with Cross-Task Synergy

1 code implementation CVPR 2021 Song Guo, Jingya Wang, Xinchao Wang, DaCheng Tao

On the other hand, such reliable embeddings can boost identity-awareness through memory aggregation, hence strengthen attention modules and suppress drifts.

Multiple Object Tracking Object +1

Training Generative Adversarial Networks in One Stage

1 code implementation CVPR 2021 Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin Li, Jie Song, Mingli Song

Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort.

Data-free Knowledge Distillation Image Generation

SPAGAN: Shortest Path Graph Attention Network

1 code implementation10 Jan 2021 Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, DaCheng Tao

SPAGAN therefore allows for a more informative and intact exploration of the graph structure and further {a} more effective aggregation of information from distant neighbors into the center node, as compared to node-based GCN methods.

Graph Attention

Self-Born Wiring for Neural Trees

no code implementations ICCV 2021 Ying Chen, Feng Mao, Jie Song, Xinchao Wang, Huiqiong Wang, Mingli Song

Neural trees aim at integrating deep neural networks and decision trees so as to bring the best of the two worlds, including representation learning from the former and faster inference from the latter.

Representation Learning

Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition

1 code implementation ICCV 2021 Shaoli Huang, Xinchao Wang, DaCheng Tao

Learning mid-level representation for fine-grained recognition is easily dominated by a limited number of highly discriminative patterns, degrading its robustness and generalization capability.

Material Recognition Scene Recognition

Overcoming Catastrophic Forgetting in Graph Neural Networks

1 code implementation10 Dec 2020 Huihui Liu, Yiding Yang, Xinchao Wang

Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.

Continual Learning

Progressive Network Grafting for Few-Shot Knowledge Distillation

2 code implementations9 Dec 2020 Chengchao Shen, Xinchao Wang, Youtan Yin, Jie Song, Sihui Luo, Mingli Song

In this paper, we investigate the practical few-shot knowledge distillation scenario, where we assume only a few samples without human annotations are available for each category.

Knowledge Distillation Model Compression +1

SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data

2 code implementations9 Dec 2020 Shaoli Huang, Xinchao Wang, DaCheng Tao

As the main discriminative information of a fine-grained image usually resides in subtle regions, methods along this line are prone to heavy label noise in fine-grained recognition.

Fine-Grained Image Classification Semantic Composition +1

One-sample Guided Object Representation Disassembling

no code implementations NeurIPS 2020 Zunlei Feng, Yongming He, Xinchao Wang, Xin Gao, Jie Lei, Cheng Jin, Mingli Song

In this paper, we introduce the One-sample Guided Object Representation Disassembling (One-GORD) method, which only requires one annotated sample for each object category to learn disassembled object representation from unannotated images.

Data Augmentation image-classification +2

Learning Propagation Rules for Attribution Map Generation

no code implementations ECCV 2020 Yiding Yang, Jiayan Qiu, Mingli Song, DaCheng Tao, Xinchao Wang

Prior gradient-based attribution-map methods rely on handcrafted propagation rules for the non-linear/activation layers during the backward pass, so as to produce gradients of the input and then the attribution map.

Factorizable Graph Convolutional Networks

1 code implementation NeurIPS 2020 Yiding Yang, Zunlei Feng, Mingli Song, Xinchao Wang

In this paper, we introduce a novel graph convolutional network (GCN), termed as factorizable graph convolutional network(FactorGCN), that explicitly disentangles such intertwined relations encoded in a graph.

Graph Classification Graph Regression +1

Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets

no code implementations18 Jul 2020 Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang, Antoni B. Chan

State-of-the-art multi-object tracking~(MOT) methods follow the tracking-by-detection paradigm, where object trajectories are obtained by associating per-frame outputs of object detectors.

Cell Tracking Multi-Object Tracking +1

Impression Space from Deep Template Network

no code implementations10 Jul 2020 Gongfan Fang, Xinchao Wang, Haofei Zhang, Jie Song, Mingli Song

This network is referred to as the {\emph{Template Network}} because its filters will be used as templates to reconstruct images from the impression.

Image Generation Translation

Disassembling Object Representations without Labels

no code implementations3 Apr 2020 Zunlei Feng, Xinchao Wang, Yongming He, Yike Yuan, Xin Gao, Mingli Song

In this paper, we study a new representation-learning task, which we termed as disassembling object representations.

General Classification Generative Adversarial Network +3

Learning Oracle Attention for High-fidelity Face Completion

no code implementations CVPR 2020 Tong Zhou, Changxing Ding, Shaowen Lin, Xinchao Wang, DaCheng Tao

While recent works adopted the attention mechanism to learn the contextual relations among elements of the face, they have largely overlooked the disastrous impacts of inaccurate attention scores; in addition, they fail to pay sufficient attention to key facial components, the completion results of which largely determine the authenticity of a face image.

Facial Inpainting Vocal Bursts Intensity Prediction

Distilling Knowledge from Graph Convolutional Networks

1 code implementation CVPR 2020 Yiding Yang, Jiayan Qiu, Mingli Song, DaCheng Tao, Xinchao Wang

To enable the knowledge transfer from the teacher GCN to the student, we propose a local structure preserving module that explicitly accounts for the topological semantics of the teacher.

Knowledge Distillation Transfer Learning

DEPARA: Deep Attribution Graph for Deep Knowledge Transferability

1 code implementation CVPR 2020 Jie Song, Yixin Chen, Jingwen Ye, Xinchao Wang, Chengchao Shen, Feng Mao, Mingli Song

In this paper, we propose the DEeP Attribution gRAph (DEPARA) to investigate the transferability of knowledge learned from PR-DNNs.

Model Selection Transfer Learning

Data-Free Adversarial Distillation

3 code implementations23 Dec 2019 Gongfan Fang, Jie Song, Chengchao Shen, Xinchao Wang, Da Chen, Mingli Song

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer.

Knowledge Distillation Model Compression +2

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

1 code implementation26 Nov 2019 Ya Zhao, Rui Xu, Xinchao Wang, Peng Hou, Haihong Tang, Mingli Song

In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers.

Knowledge Distillation Lipreading +3

Dynamic Instance Normalization for Arbitrary Style Transfer

no code implementations16 Nov 2019 Yongcheng Jing, Xiao Liu, Yukang Ding, Xinchao Wang, Errui Ding, Mingli Song, Shilei Wen

Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way.

Style Transfer

Deep Model Transferability from Attribution Maps

2 code implementations NeurIPS 2019 Jie Song, Yixin Chen, Xinchao Wang, Chengchao Shen, Mingli Song

Exploring the transferability between heterogeneous tasks sheds light on their intrinsic interconnections, and consequently enables knowledge transfer from one task to another so as to reduce the training effort of the latter.

model Transfer Learning

Customizing Student Networks From Heterogeneous Teachers via Adaptive Knowledge Amalgamation

2 code implementations ICCV 2019 Chengchao Shen, Mengqi Xue, Xinchao Wang, Jie Song, Li Sun, Mingli Song

To this end, we introduce a dual-step strategy that first extracts the task-specific knowledge from the heterogeneous teachers sharing the same sub-task, and then amalgamates the extracted knowledge to build the student network.

Knowledge Amalgamation from Heterogeneous Networks by Common Feature Learning

2 code implementations24 Jun 2019 Sihui Luo, Xinchao Wang, Gongfan Fang, Yao Hu, Dapeng Tao, Mingli Song

An increasing number of well-trained deep networks have been released online by researchers and developers, enabling the community to reuse them in a plug-and-play way without accessing the training annotations.

One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation

1 code implementation5 Jun 2019 Chenhong Zhou, Changxing Ding, Xinchao Wang, Zhentai Lu, DaCheng Tao

The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation.

 Ranked #1 on Brain Tumor Segmentation on BRATS-2015 (using extra training data)

Brain Tumor Segmentation Image Segmentation +2

Amalgamating Filtered Knowledge: Learning Task-customized Student from Multi-task Teachers

1 code implementation28 May 2019 Jingwen Ye, Xinchao Wang, Yixin Ji, Kairi Ou, Mingli Song

Many well-trained Convolutional Neural Network(CNN) models have now been released online by developers for the sake of effortless reproducing.

Not All Parts Are Created Equal: 3D Pose Estimation by Modelling Bi-directional Dependencies of Body Parts

no code implementations20 May 2019 Jue Wang, Shaoli Huang, Xinchao Wang, DaCheng Tao

We model parts with higher DOFs like the elbows, as dependent components of the corresponding parts with lower DOFs like the torso, of which the 3D locations can be more reliably estimated.

3D Pose Estimation All +1

Amalgamating Knowledge towards Comprehensive Classification

1 code implementation7 Nov 2018 Chengchao Shen, Xinchao Wang, Jie Song, Li Sun, Mingli Song

We propose in this paper to study a new model-reusing task, which we term as \emph{knowledge amalgamation}.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.