no code implementations • 9 Mar 2025 • Zhaowei Chen, Borui Zhao, Yuchen Ge, Yuhao Chen, RenJie Song, Jiajun Liang
Building on these findings, we propose Asymmetric Decision-Making (ADM) to enhance feature consensus learning for student models while continuously promoting feature diversity in teacher models.
no code implementations • 6 Mar 2025 • Shen Zhang, Yaning Tan, Siyuan Liang, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang
Diffusion transformers(DiTs) struggle to generate images at resolutions higher than their training resolutions.
no code implementations • 11 Feb 2025 • Zhaodong Bing, Linze Li, Jiajun Liang
Knowledge distillation (KD) in transformers often faces challenges due to misalignment in the number of attention heads between teacher and student models.
no code implementations • 23 Jan 2025 • Jie Liu, Gongye Liu, Jiajun Liang, Ziyang Yuan, Xiaokun Liu, Mingwu Zheng, Xiele Wu, Qiulin Wang, Wenyu Qin, Menghan Xia, Xintao Wang, Xiaohong Liu, Fei Yang, Pengfei Wan, Di Zhang, Kun Gai, Yujiu Yang, Wanli Ouyang
Video generation has achieved significant advances through rectified flow techniques, but issues like unsmooth motion and misalignment between videos and prompts persist.
1 code implementation • 28 Sep 2024 • Minqiang Zou, Zhi Lv, Riqiang Jin, Tian Zhan, Mochen Yu, Yao Tang, Jiajun Liang
Multi-view egocentric hand tracking is a challenging task and plays a critical role in VR interaction.
2 code implementations • 26 Sep 2024 • Ge Wu, Xin Zhang, Zheng Li, Zhaowei Chen, Jiajun Liang, Jian Yang, Xiang Li
Prompt learning has surfaced as an effective approach to enhance the performance of Vision-Language Models (VLMs) like CLIP when applied to downstream tasks.
no code implementations • 10 Sep 2024 • Zhenyuan Chen, Lingfeng Yang, Shuo Chen, Zhaowei Chen, Jiajun Liang, Xiang Li
To address the above issues, in this paper, we propose a general framework termed Revisiting Prompt Pretraining (RPP), which targets at improving the fitting and generalization ability from two aspects: prompt structure and prompt supervision.
2 code implementations • 27 Aug 2024 • Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan, Jin Wang
To address this issue, we introduce MegActor-$\Sigma$: a mixed-modal conditional diffusion transformer (DiT), which can flexibly inject audio and visual modality control signals into portrait animation.
no code implementations • 9 Jul 2024 • Jiajun Liang, Qian Zhang, Wei Deng, Qifan Song, Guang Lin
This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification.
2 code implementations • 31 May 2024 • Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji, Jiajun Liang, Haoqiang Fan
Despite raw driving videos contain richer information on facial expressions than intermediate representations such as landmarks in the field of portrait animation, they are seldom the subject of research.
no code implementations • 30 May 2024 • Huadong Li, Shichao Dong, Jin Wang, Rong Fu, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji
This paper focuses on the area of RGB(visible)-NIR(near-infrared) cross-modality image registration, which is crucial for many downstream vision tasks to fully leverage the complementary information present in visible and infrared images.
1 code implementation • 26 Apr 2024 • Jiajun Liang, Baoquan Zhang, Yunming Ye, Xutao Li, Chuyao Luo, Xukai Fu
Different from the previous models, MCSDNet targets on multi-frames detection and leverages multi-scale spatiotemporal information for the detection of MCS regions in remote sensing imagery(RSI).
2 code implementations • 25 Mar 2024 • Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Zheng Liu, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao
A holistic 3Dunderstanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation.
1 code implementation • CVPR 2024 • Zhishan Zhou, Shihao. zhou, Zhi Lv, Minqiang Zou, Yao Tang, Jiajun Liang
3D hand pose estimation has found broad application in areas such as gesture recognition and human-machine interaction tasks.
Ranked #3 on
3D Hand Pose Estimation
on DexYCB
no code implementations • 6 Dec 2023 • Linze Li, Sunqi Fan, Hengjun Pu, Zhaodong Bing, Yao Tang, Tianzhu Ye, Tong Yang, Liangyu Chen, Jiajun Liang
Our method's efficacy has been validated on multiple representative DreamBooth and LoRA models, delivering substantial improvements over the original outcomes in terms of facial fidelity, text-to-image editability, and video motion.
1 code implementation • 1 Dec 2023 • Huadong Li, Minhao Jing, Jiajun Liang, Haoqiang Fan, Renhe Ji
To this end, we revisit the task of radar-camera depth completion and present a new method with sparse LiDAR supervision to outperform previous dense LiDAR supervision methods in both accuracy and speed.
no code implementations • 29 Nov 2023 • Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun Liang
Extensive experiments demonstrate that our approach can address object duplication and heavy computation issues, achieving state-of-the-art performance on higher-resolution image synthesis tasks.
no code implementations • 20 Oct 2023 • Kexin Zhu, Bo Lin, Yang Qiu, Adam Yule, Yao Tang, Jiajun Liang
We introduce a high-performance fingerprint liveness feature extraction technique that secured first place in LivDet 2023 Fingerprint Representation Challenge.
1 code implementation • 7 Oct 2023 • Zhishan Zhou, Zhi Lv, Shihao Zhou, Minqiang Zou, Tong Wu, Mochen Yu, Yao Tang, Jiajun Liang
This report introduce our work on Egocentric 3D Hand Pose Estimation workshop.
1 code implementation • ICCV 2023 • Borui Zhao, Quan Cui, RenJie Song, Jiajun Liang
In this paper, we observe a trade-off between task and distillation losses, i. e., introducing distillation loss limits the convergence of task loss.
1 code implementation • ICCV 2023 • Borui Zhao, RenJie Song, Jiajun Liang
(2) Distilling knowledge from CNN limits the network convergence in the later training period since ViT's capability of integrating global information is suppressed by CNN's local-inductive-bias supervision.
1 code implementation • CVPR 2023 • Siyuan Wei, Tianzhu Ye, Shen Zhang, Yao Tang, Jiajun Liang
Experiments on various transformers demonstrate the effectiveness of our method, while analysis experiments prove our higher robustness to the errors of the token pruning policy.
Ranked #1 on
Efficient ViTs
on ImageNet-1K (with DeiT-S)
1 code implementation • CVPR 2023 • Yuhao Chen, Xin Tan, Borui Zhao, Zhaowei Chen, RenJie Song, Jiajun Liang, Xuequan Lu
ANL introduces the additional negative pseudo-label for all unlabeled data to leverage low-confidence examples.
1 code implementation • 13 Mar 2023 • Shuangping Jin, Bingbing Yu, Minhao Jing, Yi Zhou, Jiajun Liang, Renhe Ji
To handle this, we propose a new RGB-NIR fusion algorithm called Dark Vision Net (DVN) with two technical novelties: Deep Structure and Deep Inconsistency Prior (DIP).
1 code implementation • CVPR 2023 • Shichao Dong, Jin Wang, Renhe Ji, Jiajun Liang, Haoqiang Fan, Zheng Ge
In this paper, we analyse the generalization ability of binary classifiers for the task of deepfake detection.
1 code implementation • 26 Oct 2022 • Zhi Lv, Bo Lin, Siyuan Liang, Lihua Wang, Mochen Yu, Yao Tang, Jiajun Liang
We present a simple domain generalization baseline, which wins second place in both the common context generalization track and the hybrid context generalization track respectively in NICO CHALLENGE 2022.
1 code implementation • 26 Jul 2022 • Jiajun Liang, Linze Li, Zhaodong Bing, Borui Zhao, Yao Tang, Bo Lin, Haoqiang Fan
This paper proposes an efficient self-distillation method named Zipf's Label Smoothing (Zipf's LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters.
1 code implementation • 20 Jul 2022 • Shichao Dong, Jin Wang, Jiajun Liang, Haoqiang Fan, Renhe Ji
Besides the supervision of binary labels, deepfake detection models implicitly learn artifact-relevant visual concepts through the FST-Matching (i. e. the matching fake, source, target images) in the training set.
1 code implementation • CVPR 2022 • Borui Zhao, Quan Cui, RenJie Song, Yiyu Qiu, Jiajun Liang
To provide a novel viewpoint to study logit distillation, we reformulate the classical KD loss into two parts, i. e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD).
1 code implementation • 8 Mar 2022 • Quan Cui, Bingchen Zhao, Zhao-Min Chen, Borui Zhao, RenJie Song, Jiajun Liang, Boyan Zhou, Osamu Yoshie
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i. e., image classification.
1 code implementation • CVPR 2022 • Lingfeng Yang, Xiang Li, RenJie Song, Borui Zhao, Juntian Tao, Shihao Zhou, Jiajun Liang, Jian Yang
Therefore, it is helpful to leverage additional information, e. g., the locations and dates for data shooting, which can be easily accessible but rarely exploited.
no code implementations • NeurIPS 2021 • Jiashun Jin, Tracy Ke, Jiajun Liang
In a broad Degree-Corrected Mixed-Membership (DCMM) setting, we test whether a non-uniform hypergraph has only one community or has multiple communities.
no code implementations • 29 Jan 2021 • Jiajun Liang, Chuyang Ke, Jean Honorio
Our bounds are tight and pertain to the community detection problems in various models such as the planted hypergraph stochastic block model, the planted densest sub-hypergraph model, and the planted multipartite hypergraph model.
no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.
Ranked #32 on
Scene Text Recognition
on SVT
31 code implementations • CVPR 2017 • Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang
Previous approaches for scene text detection have already achieved promising performances across various benchmarks.
Ranked #3 on
Scene Text Detection
on COCO-Text
Curved Text Detection
Optical Character Recognition (OCR)
+1