no code implementations • 14 Dec 2023 • Hanyang Kong, Dongze Lian, Michael Bi Mi, Xinchao Wang
We introduce DreamDrone, an innovative method for generating unbounded flythrough scenes from textual prompts.
1 code implementation • 6 Nov 2023 • Shuo Wang, Jing Li, Zibo Zhao, Dongze Lian, Binbin Huang, Xiaomei Wang, Zhengxin Li, Shenghua Gao
Holistic scene understanding includes semantic segmentation, surface normal estimation, object boundary detection, depth estimation, etc.
1 code implementation • NeurIPS 2023 • Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, Xinchao Wang
To mitigate that, we propose an effective adapter-style tuning strategy, dubbed GraphAdapter, which performs the textual adapter by explicitly modeling the dual-modality structure knowledge (i. e., the correlation of different semantics/classes in textual and visual modalities) with a dual knowledge graph.
no code implementations • ICCV 2023 • Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence.
1 code implementation • ICCV 2023 • Daquan Zhou, Kai Wang, Jianyang Gu, Xiangyu Peng, Dongze Lian, Yifan Zhang, Yang You, Jiashi Feng
Extensive experiments demonstrate that DQ is able to generate condensed small datasets for training unseen network architectures with state-of-the-art compression ratios for lossless model training.
no code implementations • 24 Jul 2023 • Jiaben Chen, Yichen Zhu, Dongze Lian, Jiaqi Yang, Yifu Wang, Renrui Zhang, Xinhang Liu, Shenhan Qian, Laurent Kneip, Shenghua Gao
We therefore propose to incorporate RGB information in an event-guided optical flow refinement strategy.
1 code implementation • ICCV 2023 • Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
1 code implementation • CVPR 2023 • Sixun Dong, Huazhang Hu, Dongze Lian, Weixin Luo, Yicheng Qian, Shenghua Gao
Sequential video understanding, as an emerging video understanding task, has driven lots of researchers' attention because of its goal-oriented nature.
1 code implementation • CVPR 2023 • Jiaben Chen, Renrui Zhang, Dongze Lian, Jiaqi Yang, Ziyao Zeng, Jianbo Shi
To generalize to a new instrument or event class, drawing inspiration from the text-prompt design, we insert an additional query as an audio prompt while freezing the attention mechanism.
1 code implementation • 17 Oct 2022 • Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang
With the proposed SSF, our model obtains 2. 46% (90. 72% vs. 88. 54%) and 11. 48% (73. 10% vs. 65. 57%) performance improvement on FGVC and VTAB-1k in terms of Top-1 accuracy compared to the full fine-tuning but only fine-tuning about 0. 3M parameters.
1 code implementation • CVPR 2022 • Huazhang Hu, Sixun Dong, Yiqun Zhao, Dongze Lian, Zhengxin Li, Shenghua Gao
Existing methods focus on performing repetitive action counting in short videos, which is tough for dealing with longer videos in more realistic scenarios.
Ranked #2 on Repetitive Action Counting on RepCount
1 code implementation • CVPR 2022 • Yicheng Qian, Weixin Luo, Dongze Lian, Xu Tang, Peilin Zhao, Shenghua Gao
In this paper, we propose a novel sequence verification task that aims to distinguish positive video pairs performing the same action sequence from negative ones with step-level transformations but still conducting the same task.
2 code implementations • ICLR 2022 • Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao
Our proposed AS-MLP obtains 51. 5 mAP on the COCO validation set and 49. 5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures.
Ranked #13 on Semantic Segmentation on DensePASS
1 code implementation • CVPR 2021 • Binbin Huang, Dongze Lian, Weixin Luo, Shenghua Gao
Then we combine the contextual information from the landmark feature convolution module with the target's visual features for grounding.
1 code implementation • ICCV 2021 • Yanyu Xu, Ziming Zhong, Dongze Lian, Jing Li, Zhengxin Li, Xinxing Xu, Shenghua Gao
To fully leverage the data captured from different scenes with different view angles while reducing the annotation cost, this paper studies a novel crowd counting setting, i. e. only using partial annotations in each image as training data.
1 code implementation • ICLR 2020 • Dongze Lian, Yin Zheng, Yintao Xu, Yanxiong Lu, Leyu Lin, Peilin Zhao, Junzhou Huang, Shenghua Gao
Recently, Neural Architecture Search (NAS) has been successfully applied to multiple artificial intelligence areas and shows better performance compared with hand-designed networks.
1 code implementation • 4 Jul 2019 • Dongze Lian, Zehao Yu, Shenghua Gao
There are two merits for our two-stage solution based gaze following: i) our solution mimics the behavior of human in gaze following, therefore it is more psychological plausible; ii) besides using heatmap to supervise the output of our network, we can also leverage gaze direction to facilitate the training of gaze direction pathway, therefore our network can be more robustly trained.
no code implementations • CVPR 2019 • Dongze Lian, Jing Li, Jia Zheng, Weixin Luo, Shenghua Gao
Specifically, to improve the robustness of detection-based approaches for small/tiny heads, we leverage density map to improve the head/non-head classification in detection network where density map serves as the probability of a pixel being a head.
1 code implementation • CVPR 2019 • Zehao Yu, Jia Zheng, Dongze Lian, Zihan Zhou, Shenghua Gao
In the first stage, we train a CNN to map each pixel to an embedding space where pixels from the same plane instance have similar embeddings.
Ranked #1 on Plane Instance Segmentation on NYU Depth v2
no code implementations • ECCV 2018 • Hao Cheng, Dongze Lian, Shenghua Gao, Yanlin Geng
Inspired by the pioneering work of information bottleneck principle for Deep Neural Networks (DNNs) analysis, we design an information plane based framework to evaluate the capability of DNNs for image classification tasks, which not only helps understand the capability of DNNs, but also helps us choose a neural network which leads to higher classification accuracy more efficiently.
1 code implementation • CVPR 2018 • Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao
To predict a future frame with higher quality for normal events, other than the commonly used appearance (spatial) constraints on intensity and gradient, we also introduce a motion (temporal) constraint in video prediction by enforcing the optical flow between predicted frames and ground truth frames to be consistent, and this is the first work that introduces a temporal constraint into the video prediction task.
1 code implementation • 28 Dec 2017 • Wen Liu, Weixin Luo, Dongze Lian, Shenghua Gao
To predict a future frame with higher quality for normal events, other than the commonly used appearance (spatial) constraints on intensity and gradient, we also introduce a motion (temporal) constraint in video prediction by enforcing the optical flow between predicted frames and ground truth frames to be consistent, and this is the first work that introduces a temporal constraint into the video prediction task.
Ranked #2 on Traffic Accident Detection on SA