Search Results for author: Jiajun Liang

Found 35 papers, 22 papers with code

Asymmetric Decision-Making in Online Knowledge Distillation:Unifying Consensus and Divergence

no code implementations9 Mar 2025 Zhaowei Chen, Borui Zhao, Yuchen Ge, Yuhao Chen, RenJie Song, Jiajun Liang

Building on these findings, we propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models while continuously promoting feature diversity in teacher models.

Decision Making Knowledge Distillation +2

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

no code implementations6 Mar 2025 Shen Zhang, Yaning Tan, Siyuan Liang, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang

Diffusion transformers(DiTs) struggle to generate images at resolutions higher than their training resolutions.

Optimizing Knowledge Distillation in Transformers: Enabling Multi-Head Attention without Alignment Barriers

no code implementations11 Feb 2025 Zhaodong Bing, Linze Li, Jiajun Liang

Knowledge distillation (KD) in transformers often faces challenges due to misalignment in the number of attention heads between teacher and student models.

Image Classification Image Generation +2

Improving Video Generation with Human Feedback

no code implementations23 Jan 2025 Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang

Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist.

Video Generation

1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024

1 code implementation28 Sep 2024 Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang

Multi-view egocentric hand tracking is a challenging task and plays a critical role in VR interaction.

Position

Cascade Prompt Learning for Vision-Language Model Adaptation

2 code implementations26 Sep 2024 Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li

Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.

General Knowledge Image Classification +3

Revisiting Prompt Pretraining of Vision-Language Models

no code implementations10 Sep 2024 Zhenyuan Chen, Lingfeng Yang, Shuo Chen, Zhaowei Chen, Jiajun Liang, Xiang Li

To address the above issues, in this paper, we propose a general framework termed Revisiting Prompt Pretraining (RPP), which targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.

MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

2 code implementations27 Aug 2024 Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang

To address this issue, we introduce MegActor-$\Sigma$: a mixed-modal conditional diffusion transformer (DiT), which can flexibly inject audio and visual modality control signals into portrait animation.

Portrait Animation

Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and Theory

no code implementations9 Jul 2024 Jiajun Liang, Qian Zhang, Wei Deng, Qifan Song, Guang Lin

This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification.

Federated Learning parameter estimation +1

MegActor: Harness the Power of Raw Video for Vivid Portrait Animation

2 code implementations31 May 2024 Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan

Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research.

Portrait Animation Style Transfer +1

Towards RGB-NIR Cross-modality Image Registration and Beyond

no code implementations30 May 2024 Huadong Li, Shichao Dong, Jin Wang, Rong Fu, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji

This paper focuses on the area of RGB(visible)-NIR(near-infrared) cross-modality image registration, which is crucial for many downstream vision tasks to fully leverage the complementary information present in visible and infrared images.

Image Registration

MCSDNet: Mesoscale Convective System Detection Network via Multi-scale Spatiotemporal Information

1 code implementation26 Apr 2024 Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu

Different from the previous models, MCSDNet targets on multi-frames detection and leverages multi-scale spatiotemporal information for the detection of MCS regions in remote sensing imagery(RSI).

FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and Editability

no code implementations6 Dec 2023 Linze Li, Sunqi Fan, Hengjun Pu, Zhaodong Bing, Yao Tang, Tianzhu Ye, Tong Yang, Liangyu Chen, Jiajun Liang

Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models, delivering substantial improvements over the original outcomes in terms of facial fidelity, text-to-image editability, and video motion.

Face Model Video Generation

Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion

1 code implementation1 Dec 2023 Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji

To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed.

Depth Completion Depth Estimation +2

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

no code implementations29 Nov 2023 Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang

Extensive experiments demonstrate that our approach can address object duplication and heavy computation issues, achieving state-of-the-art performance on higher-resolution image synthesis tasks.

Attribute Image Generation +1

Boosting Generalization with Adaptive Style Techniques for Fingerprint Liveness Detection

no code implementations20 Oct 2023 Kexin Zhu, Bo Lin, Yang Qiu, Adam Yule, Yao Tang, Jiajun Liang

We introduce a high-performance fingerprint liveness feature extraction technique that secured first place in LivDet 2023 Fingerprint Representation Challenge.

Style Transfer

DOT: A Distillation-Oriented Trainer

1 code implementation ICCV 2023 Borui Zhao, Quan Cui, RenJie Song, Jiajun Liang

In this paper, we observe a trade-off between task and distillation losses, i. e., introducing distillation loss limits the convergence of task loss.

Knowledge Distillation

Cumulative Spatial Knowledge Distillation for Vision Transformers

1 code implementation ICCV 2023 Borui Zhao, RenJie Song, Jiajun Liang

(2) Distilling knowledge from CNN limits the network convergence in the later training period since ViT's capability of integrating global information is suppressed by CNN's local-inductive-bias supervision.

Inductive Bias Knowledge Distillation +1

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

1 code implementation CVPR 2023 Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang

Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy.

Efficient ViTs

DarkVisionNet: Low-Light Imaging via RGB-NIR Fusion with Deep Inconsistency Prior

1 code implementation13 Mar 2023 Shuangping Jin, Bingbing Yu, Minhao Jing, Yi Zhou, Jiajun Liang, Renhe Ji

To handle this, we propose a new RGB-NIR fusion algorithm called Dark Vision Net (DVN) with two technical novelties: Deep Structure and Deep Inconsistency Prior (DIP).

SSIM

SimpleDG: Simple Domain Generalization Baseline without Bells and Whistles

1 code implementation26 Oct 2022 Zhi Lv, Bo Lin, Siyuan Liang, Lihua Wang, Mochen Yu, Yao Tang, Jiajun Liang

We present a simple domain generalization baseline, which wins second place in both the common context generalization track and the hybrid context generalization track respectively in NICO CHALLENGE 2022.

Domain Generalization

Efficient One Pass Self-distillation with Zipf's Label Smoothing

1 code implementation26 Jul 2022 Jiajun Liang, Linze Li, Zhaodong Bing, Borui Zhao, Yao Tang, Bo Lin, Haoqiang Fan

This paper proposes an efficient self-distillation method named Zipf's Label Smoothing (Zipf's LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters.

Explaining Deepfake Detection by Analysing Image Matching

1 code implementation20 Jul 2022 Shichao Dong, Jin Wang, Jiajun Liang, Haoqiang Fan, Renhe Ji

Besides the supervision of binary labels, deepfake detection models implicitly learn artifact-relevant visual concepts through the FST-Matching (i. e. the matching fake, source, target images) in the training set.

DeepFake Detection Face Swapping +1

Decoupled Knowledge Distillation

1 code implementation CVPR 2022 Borui Zhao, Quan Cui, RenJie Song, Yiyu Qiu, Jiajun Liang

To provide a novel viewpoint to study logit distillation, we reformulate the classical KD loss into two parts, i. e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD).

Image Classification Knowledge Distillation +1

Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

1 code implementation8 Mar 2022 Quan Cui, Bingchen Zhao, Zhao-Min Chen, Borui Zhao, RenJie Song, Jiajun Liang, Boyan Zhou, Osamu Yoshie

This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i. e., image classification.

Image Classification Transfer Learning

Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information

1 code implementation CVPR 2022 Lingfeng Yang, Xiang Li, RenJie Song, Borui Zhao, Juntian Tao, Shihao Zhou, Jiajun Liang, Jian Yang

Therefore, it is helpful to leverage additional information, e. g., the locations and dates for data shooting, which can be easily accessible but rarely exploited.

Fine-Grained Image Classification

Sharp Impossibility Results for Hyper-graph Testing

no code implementations NeurIPS 2021 Jiashun Jin, Tracy Ke, Jiajun Liang

In a broad Degree-Corrected Mixed-Membership (DCMM) setting, we test whether a non-uniform hypergraph has only one community or has multiple communities.

Information Theoretic Limits of Exact Recovery in Sub-hypergraph Models for Community Detection

no code implementations29 Jan 2021 Jiajun Liang, Chuyang Ke, Jean Honorio

Our bounds are tight and pertain to the community detection problems in various models such as the planted hypergraph stochastic block model, the planted densest sub-hypergraph model, and the planted multipartite hypergraph model.

Community Detection Stochastic Block Model

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +4

Cannot find the paper you are looking for? You can Submit a new open access paper.