Search Results for author: Guang Dai

Found 28 papers, 6 papers with code

Unbiased General Annotated Dataset Generation

no code implementations14 Dec 2024 Dengyang Jiang, Haoyu Wang, Lei Zhang, Wei Wei, Guang Dai, Mengmeng Wang, Jingdong Wang, Yanning Zhang

Pre-training backbone networks on a general annotated dataset (e. g., ImageNet) that comprises numerous manually collected images with category annotations has proven to be indispensable for enhancing the generalization capacity of downstream visual tasks.

Dataset Generation Image Generation

Visual Object Tracking across Diverse Data Modalities: A Review

no code implementations13 Dec 2024 Mengmeng Wang, Teli Ma, Shuo Xin, Xiaojun Hou, Jiazheng Xing, Guang Dai, Jingdong Wang, Yong liu

Specifically, we first review three types of mainstream single-modal VOT, including RGB, thermal infrared and point cloud tracking.

Visual Object Tracking

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

no code implementations24 Oct 2024 Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong liu, Feng Tian, Guang Dai, Jingdong Wang, Qianying Wang

To resolve this, we introduce the Logistic Schedule, a novel noise schedule designed to eliminate singularities, improve inversion stability, and provide a better noise space for image editing.

On the Risk of Evidence Pollution for Malicious Social Text Detection in the Era of LLMs

no code implementations16 Oct 2024 Herun Wan, Minnan Luo, Zhixiong Su, Guang Dai, Xiang Zhao

To mitigate its negative impact, we propose three defense strategies from both the data and model sides, including machine-generated text detection, a mixture of experts, and parameter updating.

Text Detection

Flipped Classroom: Aligning Teacher Attention with Student in Generalized Category Discovery

no code implementations29 Sep 2024 Haonan Lin, Wenbin An, Jiahao Wang, Yan Chen, Feng Tian, Mengmeng Wang, Guang Dai, Qianying Wang, Jingdong Wang

Recent advancements have shown promise in applying traditional Semi-Supervised Learning strategies to the task of Generalized Category Discovery (GCD).

SpotActor: Training-Free Layout-Controlled Consistent Image Generation

no code implementations7 Sep 2024 Jiahao Wang, Caixia Yan, Weizhan Zhang, Haonan Lin, Mengmeng Wang, Guang Dai, Tieliang Gong, Hao Sun, Jingdong Wang

For these issues, we pioneer a novel task, Layout-to-Consistent-Image (L2CI) generation, which produces consistent and compositional images in accordance with the given layout conditions and text prompts.

Image Generation object-detection +1

Disentangled Noisy Correspondence Learning

no code implementations10 Aug 2024 Zhuohang Dang, Minnan Luo, Jihong Wang, Chengyou Jia, Haochen Han, Herun Wan, Guang Dai, Xiaojun Chang, Jingdong Wang

Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy.

cross-modal alignment Cross-Modal Retrieval +2

Timestep-Aware Correction for Quantized Diffusion Models

no code implementations4 Jul 2024 Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong liu, Jingdong Wang

This accumulation of error becomes particularly problematic in low-precision scenarios, leading to significant distortions in the generated images.

Attribute Noise Estimation +1

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

1 code implementation18 Jun 2024 Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, Qianying Wang, Guang Dai, Ping Chen, Shijian Lu

To this end, we propose Assembly of Global and Local Attention (AGLA), a training-free and plug-and-play approach that mitigates object hallucinations by exploring an ensemble of global features for response generation and local features for visual discrimination simultaneously.

Object Response Generation +1

Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

no code implementations28 May 2024 Hao Di, Haishan Ye, Yueling Zhang, Xiangyu Chang, Guang Dai, Ivor W. Tsang

Variance reduction techniques are designed to decrease the sampling variance, thereby accelerating convergence rates of first-order (FO) and zeroth-order (ZO) optimization methods.

OneActor: Consistent Character Generation via Cluster-Conditioned Guidance

no code implementations16 Apr 2024 Jiahao Wang, Caixia Yan, Haonan Lin, Weizhan Zhang, Mengmeng Wang, Tieliang Gong, Guang Dai, Hao Sun

To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance.

Consistent Character Generation Denoising +1

TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On

1 code implementation1 Apr 2024 Jiazheng Xing, Chao Xu, Yijie Qian, Yang Liu, Guang Dai, Baigui Sun, Yong liu, Jingdong Wang

However, the clothing identity uncontrollability and training inefficiency of existing diffusion-based methods, which struggle to maintain the identity even with full parameter training, are significant limitations that hinder the widespread applications.

Virtual Try-on

DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation

no code implementations CVPR 2024 Haonan Lin, Mengmeng Wang, Yan Chen, Wenbin An, Yuzhe Yao, Guang Dai, Qianying Wang, Yong liu, Jingdong Wang

While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centered images, novel challenges arise with a nuanced task of "identity fine editing": precisely modifying specific features of a subject while maintaining its inherent identity and context.

Denoising Face Generation

Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer

no code implementations23 Feb 2024 Yanjun Zhao, Sizhe Dang, Haishan Ye, Guang Dai, Yi Qian, Ivor W. Tsang

Fine-tuning large language models (LLMs) with classic first-order optimizers entails prohibitive GPU memory due to the backpropagation process.

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

no code implementations22 Jan 2024 Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu

In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.

Action Recognition Decoder +1

Disentangled Representation Learning with Transmitted Information Bottleneck

no code implementations3 Nov 2023 Zhuohang Dang, Minnan Luo, Chengyou Jia, Guang Dai, Jihong Wang, Xiaojun Chang, Jingdong Wang

Encoding only the task-related information from the raw data, \ie, disentangled representation learning, can greatly contribute to the robustness and generalizability of models.

Disentanglement Variational Inference

SUBP: Soft Uniform Block Pruning for 1xN Sparse CNNs Multithreading Acceleration

1 code implementation10 Oct 2023 Jingyang Xiang, Siqi Li, Jun Chen, Shipeng Bai, Yukai Ma, Guang Dai, Yong liu

To overcome them, this paper proposes a novel \emph{\textbf{S}oft \textbf{U}niform \textbf{B}lock \textbf{P}runing} (SUBP) approach to train a uniform 1$\times$N sparse structured network from scratch.

PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement

no code implementations20 Sep 2023 Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Jingdong Wang

Dominant Person Search methods aim to localize and recognize query persons in a unified network, which jointly optimizes two sub-tasks, \ie, pedestrian detection and Re-IDentification (ReID).

Denoising Pedestrian Detection +2

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

no code implementations21 Aug 2023 Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong liu

This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold.

Second-order methods

SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-form Layout-to-Image Generation

no code implementations20 Aug 2023 Chengyou Jia, Minnan Luo, Zhuohang Dang, Guang Dai, Xiaojun Chang, Mengmeng Wang, Jingdong Wang

Despite significant progress in Text-to-Image (T2I) generative models, even lengthy and complex text descriptions still struggle to convey detailed controls.

Diversity Layout-to-Image Generation

MA-FSAR: Multimodal Adaptation of CLIP for Few-Shot Action Recognition

no code implementations3 Aug 2023 Jiazheng Xing, Chao Xu, Mengmeng Wang, Guang Dai, Baigui Sun, Yong liu, Jingdong Wang, Jian Zhao

To tackle these issues, we introduce MA-FSAR, a framework that employs the Parameter-Efficient Fine-Tuning (PEFT) technique to enhance the CLIP visual encoder in terms of action-related temporal and semantic representations.

Few-Shot action recognition Few Shot Action Recognition +1

Learning Discretized Neural Networks under Ricci Flow

no code implementations7 Feb 2023 Jun Chen, Hanwen Chen, Mengmeng Wang, Guang Dai, Ivor W. Tsang, Yong liu

By introducing a partial differential equation on metrics, i. e., the Ricci flow, we establish the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation.

Optimal Scoring for Unsupervised Learning

no code implementations NeurIPS 2009 Zhihua Zhang, Guang Dai

We are often interested in casting classification and clustering problems in a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria.

Clustering General Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.