no code implementations • 15 Dec 2024 • Hao Shao, Shulun Wang, Yang Zhou, Guanglu Song, Dailan He, Shuo Qin, Zhuofan Zong, Bingqi Ma, Yu Liu, Hongsheng Li
Our approach effectively mitigates key challenges in video face swapping, including temporal flickering, identity preservation, and robustness to occlusions and pose variations.
no code implementations • 12 Dec 2024 • Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, Hongsheng Li
To effectively exploit consistent visual elements within multiple images, we leverage the multi-image comprehension and instruction-following capabilities of the multimodal large language model (MLLM), prompting it to capture consistent visual elements based on the instruction.
no code implementations • 9 Dec 2024 • Yunpeng Liu, Boxiao Liu, Yi Zhang, Xingzhong Hou, Guanglu Song, Yu Liu, Haihang You
Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity of this curriculum, then ensure that the curriculum maintains consistent learning complexity across different timesteps by having the teacher model iterate more steps when the noise intensity is low.
no code implementations • 2 Oct 2024 • Jianxiong Li, Zhihao Wang, Jinliang Zheng, Xiaoai Zhou, Guanming Wang, Guanglu Song, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Junzhi Yu, Xianyuan Zhan
Multimodal task specification is essential for enhanced robotic performance, where \textit{Cross-modality Alignment} enables the robot to holistically understand complex task instructions.
no code implementations • 19 Sep 2024 • Dongzhi Jiang, Renrui Zhang, Ziyu Guo, Yanmin Wu, Jiayi Lei, Pengshuo Qiu, Pan Lu, Zehui Chen, Chaoyou Fu, Guanglu Song, Peng Gao, Yu Liu, Chunyuan Li, Hongsheng Li
We further present error analysis to unveil current LMMs still struggle to fully grasp the multimodal search tasks, and conduct ablation study to indicate the potential of scaling test-time computation for AI search engine.
no code implementations • 17 Jun 2024 • Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu
To deal with this issue, we propose a novel framework to fully harness the capabilities of LLMs.
1 code implementation • 28 May 2024 • Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Xiaogang Wang, Hongsheng Li
In this paper, we identify three key flaws in the current design of Latent Consistency Models (LCMs).
no code implementations • 1 May 2024 • Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li
In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise.
1 code implementation • 19 Apr 2024 • Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu
In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts.
1 code implementation • CVPR 2024 • Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu
Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space.
2 code implementations • 4 Apr 2024 • Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li
We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm.
1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions.
1 code implementation • 20 Mar 2024 • Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li
We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.
1 code implementation • 19 Mar 2024 • Linjiang Huang, Rongyao Fang, Aiping Zhang, Guanglu Song, Si Liu, Yu Liu, Hongsheng Li
In this study, we delve into the generation of high-resolution images from pre-trained diffusion models, addressing persistent challenges, such as repetitive patterns and structural distortions, that emerge when models are applied beyond their trained resolutions.
1 code implementation • 1 Feb 2024 • Fu-Yun Wang, Zhaoyang Huang, Weikang Bian, Xiaoyu Shi, Keqiang Sun, Guanglu Song, Yu Liu, Hongsheng Li
This paper introduces an effective method for computation-efficient personalized style video generation without requiring access to any personalized video data.
no code implementations • 25 Oct 2023 • Manyuan Zhang, Bingqi Ma, Guanglu Song, Yunxiao Wang, Hongsheng Li, Yu Liu
During the COVID-19 coronavirus epidemic, almost everyone is wearing masks, which poses a huge challenge for deep learning-based face recognition algorithms.
no code implementations • ICCV 2023 • Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
We observe that different regions of interest in the visual feature map are suitable for performing query classification and box localization tasks, even for the same object.
1 code implementation • NeurIPS 2023 • Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo
Text-to-image generation has recently witnessed remarkable achievements.
Ranked #11 on Text-to-Image Generation on MS COCO
1 code implementation • 29 May 2023 • Fu-Yun Wang, Wenshuo Chen, Guanglu Song, Han-Jia Ye, Yu Liu, Hongsheng Li
To address this challenge, we introduce a novel paradigm dubbed as Gen-L-Video, capable of extending off-the-shelf short video diffusion models for generating and editing videos comprising hundreds of frames with diverse semantic segments without introducing additional training, all while preserving content consistency.
1 code implementation • ICCV 2023 • Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu
The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning.
Ranked #3 on 3D Object Detection on nuScenes Camera Only
no code implementations • ICCV 2023 • Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang
Bridging this semantic gap now requires case-by-case algorithm design which is time-consuming and heavily relies on experienced adjustment.
no code implementations • ICCV 2023 • Shanshan Lao, Guanglu Song, Boxiao Liu, Yu Liu, Yujiu Yang
In MKD, random patches of the input image are masked, and the corresponding missing feature is recovered by forcing it to imitate the output of the teacher.
5 code implementations • ICCV 2023 • Zhuofan Zong, Guanglu Song, Yu Liu
This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN.
Ranked #1 on Instance Segmentation on COCO test-dev (using extra training data)
1 code implementation • 22 Nov 2022 • Linjiang Huang, Kaixin Lu, Guanglu Song, Liang Wang, Si Liu, Yu Liu, Hongsheng Li
In this paper, we present a novel training scheme, namely Teach-DETR, to learn better DETR-based detectors from versatile teacher detectors.
1 code implementation • 20 Oct 2022 • Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo
To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts.
no code implementations • 29 Aug 2022 • Manyuan Zhang, Guanglu Song, Yu Liu, Hongsheng Li
To eliminate the bias of single-aspect research and provide an overall understanding of the face recognition model design, we first carefully design the search space for each aspect, then a comprehensive search method is introduced to jointly search optimal data cleaning, architecture, and loss function design.
1 code implementation • 18 Aug 2022 • Jianming Liang, Guanglu Song, Biao Leng, Yu Liu
The method, called UniHead, views different visual perception tasks as the dispersible points learning via the transformer encoder architecture.
no code implementations • 8 Aug 2022 • Bingqi Ma, Guanglu Song, Boxiao Liu, Yu Liu
To better understand this, we reformulate the noise type of each class in a more fine-grained manner as N-identities|K^C-clusters.
2 code implementations • 12 Jul 2022 • Jihao Liu, Xin Huang, Guanglu Song, Hongsheng Li, Yu Liu
Finally, we integrate configurable operators and DSMs into a unified search space and search with a Reinforcement Learning-based search algorithm to fully explore the optimal combination of the operators.
Ranked #12 on Neural Architecture Search on ImageNet
7 code implementations • 24 Jan 2022 • Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao
Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.
Ranked #162 on Image Classification on ImageNet
2 code implementations • 12 Jan 2022 • Kunchang Li, Yali Wang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao
For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 9% and 71. 2% top-1 accuracy respectively.
1 code implementation • 24 Nov 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu
Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
no code implementations • ICCV 2021 • Boxiao Liu, Shenghan Zhang, Guanglu Song, Haihang You, Yu Liu
In this paper, we first quantitatively define the uniformity of the sampled data for training, providing a unified view for methods that learn from biased data.
Ranked #1 on Face Verification on IJB-C (training dataset metric)
no code implementations • 8 Oct 2021 • Jihao Liu, Hongsheng Li, Guanglu Song, Xin Huang, Yu Liu
Recently, transformer and multi-layer perceptron (MLP) architectures have achieved impressive results on various vision tasks.
Ranked #254 on Image Classification on ImageNet
3 code implementations • ICLR 2022 • Kunchang Li, Yali Wang, Gao Peng, Guanglu Song, Yu Liu, Hongsheng Li, Yu Qiao
For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60. 8% and 71. 4% top-1 accuracy respectively.
Ranked #9 on Action Recognition on Something-Something V1
no code implementations • 29 Sep 2021 • Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu
Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation.
no code implementations • 25 May 2021 • Jihao Liu, Ming Zhang, Yangting Sun, Boxiao Liu, Guanglu Song, Yu Liu, Hongsheng Li
Further, an architecture knowledge pool together with a block similarity function is proposed to utilize parameter knowledge and reduces the searching time by 2 times.
no code implementations • ICCV 2021 • Boxiao Liu, Guanglu Song, Manyuan Zhang, Haihang You, Yu Liu
When collaborated with the popular ArcFace on million-level data representation learning, we found that the switchable manner in SKH can effectively eliminate the gradient conflict generated by real-world label noise on a single K-class hyperplane.
no code implementations • ECCV 2020 • Manyuan Zhang, Guanglu Song, Hang Zhou, Yu Liu
We show the discrimiability knowledge has good properties that can be distilled by a light-weight distillation network and can be generalized on the unseen target set.
2 code implementations • 16 Jun 2020 • Siyu Chen, Junting Pan, Guanglu Song, Manyuan Zhang, Hao Shao, Ziyi Lin, Jing Shao, Hongsheng Li, Yu Liu
This technical report introduces our winning solution to the spatio-temporal action localization track, AVA-Kinetics Crossover, in ActivityNet Challenge 2020.
2 code implementations • CVPR 2020 • Guanglu Song, Yu Liu, Xiaogang Wang
The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.
Ranked #80 on Object Detection on COCO test-dev
no code implementations • 17 Mar 2020 • Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors.
2 code implementations • 17 Mar 2020 • Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.
1 code implementation • 12 Mar 2020 • Manyuan Zhang, Hao Shao, Guanglu Song, Yu Liu, Junjie Yan
In this technical report, we briefly introduce the solutions of our team 'Efficient' for the Multi-Moments in Time challenge in ICCV 2019.
1 code implementation • 2 Sep 2019 • Yu Liu, Guanglu Song, Manyuan Zhang, Jihao Liu, Yucong Zhou, Junjie Yan
Large scale face recognition is challenging especially when the computational budget is limited.
no code implementations • ECCV 2018 • Yu Liu, Guanglu Song, Jing Shao, Xiao Jin, Xiaogang Wang
It is inspired by the observation of the weights in classification layer (called extit{anchors}) converge to the central direction of each class in hyperspace.
no code implementations • CVPR 2018 • Guanglu Song, Yu Liu, Ming Jiang, Yujie Wang, Junjie Yan, Biao Leng
Fully convolutional neural network (FCN) has been dominating the game of face detection task for a few years with its congenital capability of sliding-window-searching with shared kernels, which boiled down all the redundant calculation, and most recent state-of-the-art methods such as Faster-RCNN, SSD, YOLO and FPN use FCN as their backbone.
no code implementations • 23 Nov 2017 • Guanglu Song, Biao Leng, Yu Liu, Congrui Hetang, Shaofan Cai
One of the major restrictions on the performance of video-based person re-id is partial noise caused by occlusion, blur and illumination.