no code implementations • 7 Apr 2025 • Karan Dalal, Daniel Koceja, Gashon Hussein, Jiarui Xu, Yue Zhao, Youjin Song, Shihao Han, Ka Chun Cheung, Jan Kautz, Carlos Guestrin, Tatsunori Hashimoto, Sanmi Koyejo, Yejin Choi, Yu Sun, Xiaolong Wang
We have only experimented with one-minute videos due to resource constraints, but the approach can be extended to longer videos and more complex stories.
no code implementations • 23 Mar 2025 • Xuesong Chen, Shaoshuai Shi, Tao Ma, Jingqiu Zhou, Simon See, Ka Chun Cheung, Hongsheng Li
In this paper, we introduce M3Net, a novel multimodal and multi-task network that simultaneously tackles detection, segmentation, and 3D occupancy prediction for autonomous driving and achieves superior performance than single task model.
no code implementations • 21 Jan 2025 • Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu
We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures.
no code implementations • 31 Oct 2024 • Xiufeng Huang, RuiQi Li, Yiu-ming Cheung, Ka Chun Cheung, Simon See, Renjie Wan
3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets.
no code implementations • 30 Oct 2024 • Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan
Single-view 3D reconstruction methods like Triplane Gaussian Splatting (TGS) have enabled high-quality 3D model generation from just a single image input within seconds.
no code implementations • 18 Jul 2024 • Xiufeng Huang, Ka Chun Cheung, Simon See, Renjie Wan
While approaches like CopyRNeRF have been introduced to embed binary messages into NeRF models as digital signatures for copyright protection, the process of recolorization can remove these binary messages.
no code implementations • 15 Jul 2024 • Xingzhi Zhou, Xin Dong, Chunhao Li, Yuning Bai, Yulong Xu, Ka Chun Cheung, Simon See, Xinpeng Song, Runshun Zhang, Xuezhong Zhou, Nevin L. Zhang
However, this task faces limitations due to the scarcity of high-quality clinical datasets and the complex relationship between symptoms and herbs.
no code implementations • 10 Jul 2024 • Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan
Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation.
1 code implementation • 25 Jun 2024 • Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu
To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers.
no code implementations • CVPR 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu
Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.
no code implementations • 29 Jan 2024 • Xiaoyu Shi, Zhaoyang Huang, Fu-Yun Wang, Weikang Bian, Dasong Li, Yi Zhang, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
For the first stage, we propose a diffusion-based motion field predictor, which focuses on deducing the trajectories of the reference image's pixels.
no code implementations • 26 Jan 2024 • Xingzhi Zhou, Zhiliang Tian, Ka Chun Cheung, Simon See, Nevin L. Zhang
Test-time domain adaptation effectively adjusts the source domain model to accommodate unseen domain shifts in a target domain during inference.
no code implementations • CVPR 2024 • Sixing Yan, William K. Cheung, Ivor W. Tsang, Keith Chiu, Terence M. Tong, Ka Chun Cheung, Simon See
Automatic radiology report generation using deep learning models has been recently explored and found promising.
1 code implementation • ICCV 2023 • Ziyuan Luo, Qing Guo, Ka Chun Cheung, Simon See, Renjie Wan
Neural Radiance Fields (NeRF) have the potential to be a major representation of media.
1 code implementation • ICCV 2023 • Xuesong Chen, Shaoshuai Shi, Chao Zhang, Benjin Zhu, Qiang Wang, Ka Chun Cheung, Simon See, Hongsheng Li
3D multi-object tracking (MOT) is vital for many applications including autonomous driving vehicles and service robots.
1 code implementation • ICCV 2023 • Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directional optical flows for the center frame in a three-frame manner.
1 code implementation • CVPR 2023 • Xiaoyu Shi, Zhaoyang Huang, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li
FlowFormer introduces a transformer architecture into optical flow estimation and achieves state-of-the-art performance.
no code implementations • 16 Nov 2022 • Yihang Gao, Ka Chun Cheung, Michael K. Ng
Physics-informed neural networks (PINNs) have attracted significant attention for solving partial differential equations (PDEs) in recent years because they alleviate the curse of dimensionality that appears in traditional methods.
no code implementations • 22 Oct 2022 • Dongkyu Lee, Ka Chun Cheung, Nevin L. Zhang
Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter.
no code implementations • 22 Oct 2022 • Dongkyu Lee, Zhiliang Tian, Yingxiu Zhao, Ka Chun Cheung, Nevin L. Zhang
The question is answered in our work with the concept of model calibration; we view a teacher model not only as a source of knowledge but also as a gauge to detect miscalibration of a student.
no code implementations • 19 Sep 2022 • Zhaoyang Huang, Xiaokun Pan, Weihong Pan, Weikang Bian, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li
We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker.
1 code implementation • 10 Aug 2022 • Dasong Li, Yi Zhang, Ka Chun Cheung, Xiaogang Wang, Hongwei Qin, Hongsheng Li
With the integration, MSDI-Net can handle various and complicated blurry patterns adaptively.
Ranked #23 on
Image Deblurring
on GoPro
1 code implementation • CVPR 2023 • Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li
In this study, we propose a simple yet effective framework for video restoration.
Ranked #2 on
Deblurring
on GoPro
(using extra training data)
1 code implementation • 12 May 2022 • Xuesong Chen, Shaoshuai Shi, Benjin Zhu, Ka Chun Cheung, Hang Xu, Hongsheng Li
Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots.
1 code implementation • 30 Mar 2022 • Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li
We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow.
Ranked #4 on
Optical Flow Estimation
on KITTI 2015 (train)
no code implementations • 29 Sep 2021 • Dongkyu Lee, Ka Chun Cheung, Nevin Zhang
Overconfidence has been shown to impair generalization and calibration of a neural network.
no code implementations • 27 Apr 2021 • Yixiao Ge, Xiao Zhang, Ching Lam Choi, Ka Chun Cheung, Peipei Zhao, Feng Zhu, Xiaogang Wang, Rui Zhao, Hongsheng Li
In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network.
no code implementations • 7 Apr 2021 • Zhaoyang Huang, Xiaokun Pan, Runsen Xu, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li
However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.
1 code implementation • 20 Nov 2019 • Shaohuai Shi, Xiaowen Chu, Ka Chun Cheung, Simon See
Distributed stochastic gradient descent (SGD) algorithms are widely deployed in training large-scale deep learning models, while the communication overhead among workers becomes the new system bottleneck.