no code implementations • 16 Jan 2025 • Shiu-hong Kao, Xiao Li, Jinglu Wang, Chi-Keung Tang, Yu-Wing Tai, Yan Lu
Training 3D reconstruction models with 2D visual data traditionally requires prior knowledge of camera poses for the training samples, a process that is both time-consuming and prone to errors.
no code implementations • 4 Oct 2024 • Zixuan Wang, Chi-Keung Tang, Yu-Wing Tai
For video-to-audio (VTA) tasks, most existing methods require training a timestamp detector to synchronize video events with the generated audio, a process that can be tedious and time-consuming.
no code implementations • 25 Sep 2024 • Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
Cinematographers adeptly capture the essence of the world, crafting compelling visual narratives through intricate camera movements.
no code implementations • 8 Jun 2024 • Jianmeng Liu, Yichen Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang
Recent conditional 3D completion works have mainly relied on CLIP or BERT to encode textual information, which cannot support complex instruction.
1 code implementation • CVPR 2024 • Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang, Pedro Miraldo, Suhas Lohit, Moitreya Chatterjee
Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering.
Ranked #1 on Novel View Synthesis on .
no code implementations • 27 May 2024 • Qi Wu, Yubo Zhao, Yifan Wang, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
This is accomplished by encoding and quantizing motions into discrete tokens that align with the language model's vocabulary.
no code implementations • 25 May 2024 • Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung Tang
C3LLM adapts the Large Language Model (LLM) structure as a bridge for aligning different modalities, synthesizing the given conditional information, and making multimodal generation in a discrete manner.
1 code implementation • 5 Jan 2024 • Hao Zhang, Yu-Wing Tai, Chi-Keung Tang
However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.
no code implementations • 30 Dec 2023 • Han Jiang, Haosen Sun, Ruoxuan Li, Chi-Keung Tang, Yu-Wing Tai
Second and the remaining problem is thus 3D multiview consistency among all completed images, now guided by the seed images and their 3D proxies.
no code implementations • 5 Dec 2023 • Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang
This paper explores promptable NeRF generation (e. g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control.
1 code implementation • 3 Dec 2023 • Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang
The main issues are: 1) how to perform direct and accurate user control in editing; 2) how to execute editings like changing shape, expression, and layout without unsightly distortion and artifacts to the edited content; and 3) how to maintain spatio-temporal consistency of video after editing.
no code implementations • CVPR 2024 • Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai
Recently, the Segment Anything Model (SAM) has showcased remarkable capabilities of zero-shot segmentation, while NeRF (Neural Radiance Fields) has gained popularity as a method for various 3D problems beyond novel view synthesis.
no code implementations • CVPR 2024 • Juntao Zhang, Yuehuai Liu, Yu-Wing Tai, Chi-Keung Tang
Specifically, C3Net first aligns the conditions from multi-modalities to the same semantic latent space using modality-specific encoders based on contrastive training.
1 code implementation • 27 Nov 2023 • Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang
Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).
1 code implementation • 27 Nov 2023 • Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
This paper presents InceptionHuman, a prompt-to-NeRF framework that allows easy control via a combination of prompts in different modalities (e. g., text, poses, edge, segmentation map, etc) as inputs to generate photorealistic 3D humans.
no code implementations • ICCV 2023 • Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed.
1 code implementation • ICCV 2023 • Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu
While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains.
1 code implementation • 3 Jul 2023 • Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu
The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models.
no code implementations • 7 Jun 2023 • Yanan sun, Zihan Zhong, Qi Fan, Chi-Keung Tang, Yu-Wing Tai
Our thorough studies validate that models pre-trained as such can learn rich representations of both modalities, improving their ability to understand how images and text relate to each other.
3 code implementations • NeurIPS 2023 • Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.
Ranked #1 on Zero-Shot Instance Segmentation on LVIS v1.0 val
2 code implementations • NeurIPS 2023 • Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang
The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.
2 code implementations • 28 May 2023 • Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang
We believe this paradigm will open up new avenues in the dynamics of distillation and pave the way for efficient dataset distillation.
no code implementations • 24 May 2023 • Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang
Novel view synthesis via Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS) typically necessitates dense observations with hundreds of input images to circumvent artifacts.
no code implementations • 22 May 2023 • Han Jiang, Ruoxuan Li, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang
No significant work has been done to directly merge two partially overlapping scenes using NeRF representations.
1 code implementation • ICCV 2023 • Yichen Liu, Benran Hu, Junkai Huang, Yu-Wing Tai, Chi-Keung Tang
This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as {\bf \inerflong}, or \inerf.
1 code implementation • CVPR 2023 • Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
A consistency loss is then enforced on the found matches.
no code implementations • 26 Mar 2023 • Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
This paper analyzes the NeRF's struggles in such settings and proposes Clean-NeRF for accurate 3D reconstruction and novel view rendering in complex scenes.
1 code implementation • CVPR 2023 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai
Instead, our method resorts to spatial and temporal sparsity for solving general UHR matting.
no code implementations • 22 Nov 2022 • Shengnan Liang, Yichen Liu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang
We present ONeRF, a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.
2 code implementations • CVPR 2023 • Benran Hu, Junkai Huang, Yichen Liu, Yu-Wing Tai, Chi-Keung Tang
This paper presents the first significant object detection framework, NeRF-RPN, which directly operates on NeRF.
1 code implementation • 21 Nov 2022 • Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang
This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).
no code implementations • 21 Nov 2022 • Changlin Li, Guangyang Wu, Yanan sun, Xin Tao, Chi-Keung Tang, Yu-Wing Tai
The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame.
no code implementations • 8 Nov 2022 • Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai
Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.
no code implementations • 2 Oct 2022 • Xinhang Liu, Jiaben Chen, Huai Yu, Yu-Wing Tai, Chi-Keung Tang
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss, enabling an unsupervised partitioning of a scene into salient or meaningful regions corresponding to different object instances.
1 code implementation • 8 Aug 2022 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang
Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees).
1 code implementation • 28 Jul 2022 • Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.
Ranked #1 on Video Instance Segmentation on HQ-YTVIS
1 code implementation • 23 Jul 2022 • Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang
Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching strategy to alleviate this problem, which uses query prototypes to match query features, where the query prototypes are collected from high-confidence query predictions.
Ranked #13 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot)
3 code implementations • 30 May 2022 • Peng Zheng, Huazhu Fu, Deng-Ping Fan, Qi Fan, Jie Qin, Yu-Wing Tai, Chi-Keung Tang, Luc van Gool
In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes.
Ranked #1 on Co-Salient Object Detection on CoCA
1 code implementation • CVPR 2022 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai
A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality.
1 code implementation • CVPR 2022 • Xinpeng Liu, Yong-Lu Li, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi-Keung Tang
Human-Object Interaction (HOI) detection plays a core role in activity understanding.
no code implementations • 15 Feb 2022 • Mu-Ruei Tseng, Abhishek Gupta, Chi-Keung Tang, Yu-Wing Tai
All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction.
1 code implementation • CVPR 2022 • Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree.
Ranked #1 on Instance Segmentation on BDD100K val
no code implementations • ICCV 2021 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang
To facilitate this new research, we construct the first large-scale video object inpainting benchmark YouTube-VOI to provide realistic occlusion scenarios with both occluded and visible object masks available.
1 code implementation • NeurIPS 2021 • Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.
Ranked #1 on Video Instance Segmentation on BDD100K val
Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +3
3 code implementations • NeurIPS 2021 • Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang
This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation.
Ranked #7 on Video Object Segmentation on YouTube-VOS 2019
Semantic Segmentation Semi-Supervised Video Object Segmentation +1
1 code implementation • 30 Apr 2021 • Qi Fan, Chi-Keung Tang, Yu-Wing Tai
We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity.
1 code implementation • CVPR 2021 • Yanan sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai
Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.
1 code implementation • CVPR 2021 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai
Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.
1 code implementation • CVPR 2021 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang
Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries.
Ranked #1 on Instance Segmentation on KINS
5 code implementations • CVPR 2021 • Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang
We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance.
Ranked #1 on Interactive Video Object Segmentation on DAVIS 2017 (using extra training data)
Interactive Video Object Segmentation Semantic Segmentation +2
1 code implementation • 17 Nov 2020 • Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang
Despite excellent progress has been made, the performance on action recognition still heavily relies on specific datasets, which are difficult to extend new action classes due to labor-intensive labeling.
no code implementations • ICCV 2021 • Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang
We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames.
Ranked #1 on Action Recognition on HAA500
no code implementations • 27 Aug 2020 • Ji Liu, Heshan Liu, Mang-Tik Chiu, Yu-Wing Tai, Chi-Keung Tang
We propose a novel pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution (1024 * 1024), given respectively an image of the reference and target person.
1 code implementation • ECCV 2020 • Lei Ke, Shichao Li, Yanan sun, Yu-Wing Tai, Chi-Keung Tang
GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass.
Ranked #1 on Autonomous Driving on ApolloCar3D
1 code implementation • ECCV 2020 • Qi Fan, Lei Ke, Wenjie Pei, Chi-Keung Tang, Yu-Wing Tai
We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.
Ranked #81 on Instance Segmentation on COCO test-dev
no code implementations • CVPR 2020 • Xuhua Huang, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang
Significant progress has been made in Video Object Segmentation (VOS), the video object tracking task in its finest level.
Ranked #71 on Semi-Supervised Video Object Segmentation on DAVIS 2016
1 code implementation • CVPR 2020 • Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng
End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.
Ranked #13 on Weakly-supervised 3D Human Pose Estimation on Human3.6M
1 code implementation • 8 May 2020 • Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.
2 code implementations • CVPR 2020 • Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang
In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.
Ranked #1 on Semantic Segmentation on BIG (using extra training data)
no code implementations • 12 Oct 2019 • Yao Xiao, Dan Meng, Cewu Lu, Chi-Keung Tang
The long-standing challenges for offline handwritten Chinese character recognition (HCCR) are twofold: Chinese characters can be very diverse and complicated while similarly looking, and cursive handwriting (due to increased writing speed and infrequent pen lifting) makes strokes and even characters connected together in a flowing manner.
3 code implementations • CVPR 2020 • Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai
To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations.
Ranked #23 on Few-Shot Object Detection on MS-COCO (10-shot)
no code implementations • 2 Aug 2019 • Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang
Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.
1 code implementation • CVPR 2020 • Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang
In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only.
Ranked #21 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)
no code implementations • 24 Jul 2019 • Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang
Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions.
1 code implementation • ICCV 2019 • Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang
Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details.
no code implementations • 1 Feb 2018 • Zheng Wu, Ruiheng Chang, Jiaxu Ma, Cewu Lu, Chi-Keung Tang
We propose a novel approach for instance segmen- tation given an image of homogeneous object clus- ter (HOC).
no code implementations • 26 Nov 2017 • Boyu Liu, Yanzhao Wang, Yu-Wing Tai, Chi-Keung Tang
We introduce a one-shot learning approach for video object tracking.
1 code implementation • ECCV 2018 • Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang
In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions.
1 code implementation • ECCV 2018 • Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang
We train a generated adversarial network, i. e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images.
no code implementations • ECCV 2018 • Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang
In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage.
Ranked #5 on Human action generation on NTU RGB+D 2D
no code implementations • ICCV 2017 • Yongyi Lu, Cewu Lu, Chi-Keung Tang
Video object detection is a fundamental tool for many applications.
no code implementations • ECCV 2018 • Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang
We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that satisfies the given attributes.
no code implementations • CVPR 2018 • Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas Guibas
Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level.
7 code implementations • 12 Jul 2016 • Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang
We alternate the pruning and retraining to further reduce zero activations in a network.
no code implementations • 19 Jan 2016 • Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Gerard Medioni
We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation.
no code implementations • ICCV 2015 • Cewu Lu, Shu Liu, Jiaya Jia, Chi-Keung Tang
Closed contour is an important objectness indicator.
no code implementations • ICCV 2015 • Cewu Lu, Yongyi Lu, Hao Chen, Chi-Keung Tang
In the testing phase, sliding CNN models are applied which produces a set of response maps that can be effectively filtered by the learned co-presence prior to output the final bounding boxes for localizing an object.
no code implementations • CVPR 2015 • Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, Chi-Keung Tang
Distance metric plays a key role in grouping superpixels to produce object proposals for object detection.
no code implementations • 22 Sep 2014 • Cewu Lu, Hao Chen, Qifeng Chen, Hei Law, Yao Xiao, Chi-Keung Tang
We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams.
no code implementations • CVPR 2014 • Cewu Lu, Jiaya Jia, Chi-Keung Tang
We propose binary range-sample feature in depth.
no code implementations • CVPR 2014 • Cewu Lu, Di Lin, Jiaya Jia, Chi-Keung Tang
Given a single outdoor image, this paper proposes a collaborative learning approach for labeling it as either sunny or cloudy.
no code implementations • CVPR 2014 • Yao Xiao, Efstratios Tsougenis, Chi-Keung Tang
We present the first automatic method to remove shadows from single RGB-D images.