no code implementations • 20 Nov 2024 • Hang Zhou, Xiaoxu Zheng, Yunhe Wang, Michael Bi Mi, Deyi Xiong, Kai Han
Recurrent neural network (RNNs) that are capable of modeling long-distance dependencies are widely used in various speech tasks, eg., keyword spotting (KWS) and speech enhancement (SE).
1 code implementation • 18 Sep 2024 • Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang
We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts.
1 code implementation • 5 Jul 2024 • Gongfan Fang, Xinyin Ma, Michael Bi Mi, Xinchao Wang
For instance, we improve the accuracy of DeiT-Tiny from 74. 52% to 77. 50% by pruning an off-the-shelf DeiT-Base model.
1 code implementation • 3 Jun 2024 • Xinyin Ma, Gongfan Fang, Michael Bi Mi, Xinchao Wang
In this study, we make an interesting and somehow surprising observation: the computation of a large proportion of layers in the diffusion transformer, through introducing a caching mechanism, can be readily removed even without updating the model parameters.
1 code implementation • 20 Jan 2024 • Nhat M. Hoang, Kehong Gong, Chuan Guo, Michael Bi Mi
Specifically, we separate the denoising objectives of a diffusion model into two stages: obtaining conditional rough motion approximations in the initial $T-T^*$ steps by learning the noisy annotated motions, followed by the unconditional refinement of these preliminary motions during the last $T^*$ steps using unannotated motions.
no code implementations • 15 Jan 2024 • Xin Yang, Wending Yan, Yuan Yuan, Michael Bi Mi, Robby T. Tan
They struggle to acquire new knowledge while also retaining previously learned knowledge. To address these problems, we propose a semantic segmentation method for multiple adverse weather conditions that incorporates adaptive knowledge acquisition, pseudolabel blending, and weather composition replay.
no code implementations • 29 Dec 2023 • Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan
Further, to ensure the distinguishability among various regions, we introduce a region-level contrastive clustering loss to pull closer similar regions across images.
no code implementations • 14 Dec 2023 • Hanyang Kong, Dongze Lian, Michael Bi Mi, Xinchao Wang
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating unbounded flythrough scenes from textual prompts.
no code implementations • 1 Dec 2023 • Kerui Gu, Zhihao LI, Shiyong Liu, Jianzhuang Liu, Songcen Xu, Youliang Yan, Michael Bi Mi, Kenji Kawaguchi, Angela Yao
Estimating 3D rotations is a common procedure for 3D computer vision.
Ranked #16 on
3D Human Pose Estimation
on 3DPW
no code implementations • ICCV 2023 • Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence.
1 code implementation • ICCV 2023 • Ming Nie, Yujing Xue, Chunwei Wang, Chaoqiang Ye, Hang Xu, Xinge Zhu, Qingqiu Huang, Michael Bi Mi, Xinchao Wang, Li Zhang
Recently, polar-based representation has shown promising properties in perceptual tasks.
1 code implementation • CVPRW 2023 • Marcos V. Conde, Manuel Kolmet, Tim Seizinger, Tom E. Bishop, Radu Timofte, Xiangyu Kong, Dafeng Zhang, Jinlong Wu, Fan Wang, Juewen Peng, Zhiyu Pan, Chengxin Liu, Xianrui Luo, Huiqiang Sun, Liao Shen, Zhiguo Cao, Ke Xian, Chaowei Liu, Zigeng Chen, Xingyi Yang, Songhua Liu, Yongcheng Jing, Michael Bi Mi, Xinchao Wang, Zhihao Yang, Wenyi Lian, Siyuan Lai, Haichuan Zhang, Trung Hoang, Amirsaeed Yazdani, Vishal Monga, Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao, Jens Sjölund, Thomas B. Schön, Yuxuan Zhao, Baoliang Chen, Yiqing Xu, JiXiang Niu
We present the new Bokeh Effect Transformation Dataset (BETD), and review the proposed solutions for this novel task at the NTIRE 2023 Bokeh Effect Transformation Challenge.
no code implementations • CVPR 2023 • Ziwei Yu, Chen Li, Linlin Yang, Xiaoxu Zheng, Michael Bi Mi, Gim Hee Lee, Angela Yao
However, the reconstructed meshes are prone to artifacts and do not appear as plausible hand shapes.
1 code implementation • CVPR 2024 • Kai Xu, Ziwei Yu, Xin Wang, Michael Bi Mi, Angela Yao
We show that bilinear interpolation inherently attenuates high-frequency information while an MLP-based coordinate network can approximate more frequencies.
Ranked #2 on
Video Super-Resolution
on REDS4- 4x upscaling
1 code implementation • ICCV 2023 • Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang
We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities.
Ranked #3 on
Motion Synthesis
on AIST++
1 code implementation • CVPR 2023 • Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang
Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks.
no code implementations • 25 Jan 2023 • Kerui Gu, Linlin Yang, Michael Bi Mi, Angela Yao
Experimental results on both the human body and hand benchmarks show that BCIR is faster to train and more accurate than the original integral regression, making it competitive with state-of-the-art detection methods.
1 code implementation • 21 Jan 2023 • Shihao Zhang, Linlin Yang, Michael Bi Mi, Xiaoxu Zheng, Angela Yao
In computer vision, it is often observed that formulating regression problems as a classification task often yields better performance.
Ranked #19 on
Crowd Counting
on ShanghaiTech B
1 code implementation • 24 Nov 2022 • Xin Yang, Michael Bi Mi, Yuan Yuan, Xin Wang, Robby T. Tan
In our DA framework, we retain the depth and background information during the domain feature alignment.
2 code implementations • CVPR 2022 • Fan Yan, Ming Nie, Xinyue Cai, Jianhua Han, Hang Xu, Zhen Yang, Chaoqiang Ye, Yanwei Fu, Michael Bi Mi, Li Zhang
We present ONCE-3DLanes, a real-world autonomous driving dataset with lane layout annotation in 3D space.
1 code implementation • CVPR 2022 • Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses.
Ranked #42 on
3D Human Pose Estimation
on MPI-INF-3DHP
1 code implementation • CVPR 2022 • Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang
We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.