Search Results for author: Chi-Keung Tang

Found 75 papers, 40 papers with code

Segment Anything Meets Point Tracking

1 code implementation3 Jul 2023 Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models.

Interactive Video Object Segmentation Object +5

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

2 code implementations CVPR 2020 Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang

In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.

 Ranked #1 on Semantic Segmentation on BIG (using extra training data)

4k Land Cover Classification +3

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

1 code implementation CVPR 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries.

Amodal Instance Segmentation Boundary Detection +4

Occlusion-Aware Instance Segmentation via BiLayer Network Architectures

1 code implementation8 Aug 2022 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees).

Instance Segmentation Segmentation +2

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

5 code implementations CVPR 2021 Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance.

 Ranked #1 on Interactive Video Object Segmentation on DAVIS 2017 (using extra training data)

Interactive Video Object Segmentation Semantic Segmentation +2

Few-Shot Video Object Detection

1 code implementation30 Apr 2021 Qi Fan, Chi-Keung Tang, Yu-Wing Tai

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity.

Few-Shot Video Object Detection Object +2

Cascaded deep monocular 3D human pose estimation with evolutionary training data

1 code implementation CVPR 2020 Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.

Data Augmentation Monocular 3D Human Pose Estimation +3

Semantic Image Matting

1 code implementation CVPR 2021 Yanan sun, Chi-Keung Tang, Yu-Wing Tai

Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.

Semantic Image Matting Transparent objects

Deep High Dynamic Range Imaging with Large Foreground Motions

1 code implementation ECCV 2018 Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang

In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions.

Translation Vocal Bursts Intensity Prediction

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

1 code implementation ICCV 2019 Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details.

Style Transfer

GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector

2 code implementations30 May 2022 Peng Zheng, Huazhu Fu, Deng-Ping Fan, Qi Fan, Jie Qin, Yu-Wing Tai, Chi-Keung Tang, Luc van Gool

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes.

Co-Salient Object Detection Object +2

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

1 code implementation CVPR 2021 Yanan sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.

Image Matting Optical Flow Estimation +1

Human Instance Matting via Mutual Guidance and Multi-Instance Refinement

1 code implementation CVPR 2022 Yanan sun, Chi-Keung Tang, Yu-Wing Tai

A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality.

Image Matting Instance Segmentation +1

Self-Support Few-Shot Semantic Segmentation

1 code implementation23 Jul 2022 Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang

Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching strategy to alleviate this problem, which uses query prototypes to match query features, where the query prototypes are collected from high-confidence query predictions.

Few-Shot Semantic Segmentation Segmentation +1

Image Generation from Sketch Constraint Using Contextual GAN

1 code implementation ECCV 2018 Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We train a generated adversarial network, i. e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images.

Image-to-Image Translation Translation

Stable Segment Anything Model

1 code implementation27 Nov 2023 Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations NeurIPS 2023 Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

Instance Neural Radiance Field

1 code implementation ICCV 2023 Yichen Liu, Benran Hu, Junkai Huang, Yu-Wing Tai, Chi-Keung Tang

This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as {\bf \inerflong}, or \inerf.

3D Instance Segmentation Panoptic Segmentation +1

DragVideo: Interactive Drag-style Video Editing

1 code implementation3 Dec 2023 Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang

The main issues are: 1) how to perform direct and accurate user control in editing; 2) how to execute editings like changing shape, expression, and layout without unsightly distortion and artifacts to the edited content; and 3) how to maintain spatio-temporal consistency of video after editing.

Video Editing Video Generation

Video Mask Transfiner for High-Quality Video Instance Segmentation

1 code implementation28 Jul 2022 Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.

Instance Segmentation Semantic Segmentation +2

One-Shot Object Detection without Fine-Tuning

1 code implementation8 May 2020 Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.

Metric Learning Object +2

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation5 Jan 2024 Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation21 Nov 2022 Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

Semi-Supervised Few-Shot Atomic Action Recognition

1 code implementation17 Nov 2020 Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang

Despite excellent progress has been made, the performance on action recognition still heavily relies on specific datasets, which are difficult to extend new action classes due to labor-intensive labeling.

Atomic action recognition

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

1 code implementation28 May 2023 Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

Our method consistently enhances the distillation algorithms, even on much larger-scale and more heterogeneous datasets, e. g. ImageNet-1K and Kinetics-400.

Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent Synthetic Images

1 code implementation27 Nov 2023 Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper presents Deceptive-Human, a novel Prompt-to-NeRF framework capitalizing state-of-the-art control diffusion models (e. g., ControlNet) to generate a high-quality controllable 3D human NeRF.

Density Estimation

Deep Video Generation, Prediction and Completion of Human Action Sequences

no code implementations ECCV 2018 Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage.

Human action generation Video Generation +1

Attribute-Guided Face Generation Using Conditional CycleGAN

no code implementations ECCV 2018 Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang

We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that satisfies the given attributes.

Attribute Face Generation +2

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

no code implementations CVPR 2018 Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas Guibas

Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level.

Human-Object Interaction Detection Image Captioning +1

A Closed-Form Solution to Tensor Voting: Theory and Applications

no code implementations19 Jan 2016 Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Gerard Medioni

We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation.

Stereo Matching Stereo Matching Hand

1-HKUST: Object Detection in ILSVRC 2014

no code implementations22 Sep 2014 Cewu Lu, Hao Chen, Qifeng Chen, Hei Law, Yao Xiao, Chi-Keung Tang

We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams.

Object object-detection +3

Two-Class Weather Classification

no code implementations CVPR 2014 Cewu Lu, Di Lin, Jiaya Jia, Chi-Keung Tang

Given a single outdoor image, this paper proposes a collaborative learning approach for labeling it as either sunny or cloudy.

Classification General Classification +1

Square Localization for Efficient and Accurate Object Detection

no code implementations ICCV 2015 Cewu Lu, Yongyi Lu, Hao Chen, Chi-Keung Tang

In the testing phase, sliding CNN models are applied which produces a set of response maps that can be effectively filtered by the learned co-presence prior to output the final bounding boxes for localizing an object.

Object object-detection +2

StableNet: Semi-Online, Multi-Scale Deep Video Stabilization

no code implementations24 Jul 2019 Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang

Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions.

Video Stabilization

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking

no code implementations2 Aug 2019 Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang

Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.

Video Object Tracking Visual Tracking

Template-Instance Loss for Offline Handwritten Chinese Character Recognition

no code implementations12 Oct 2019 Yao Xiao, Dan Meng, Cewu Lu, Chi-Keung Tang

The long-standing challenges for offline handwritten Chinese character recognition (HCCR) are twofold: Chinese characters can be very diverse and complicated while similarly looking, and cursive handwriting (due to increased writing speed and infrequent pen lifting) makes strokes and even characters connected together in a flowing manner.

Offline Handwritten Chinese Character Recognition

Pose-Guided High-Resolution Appearance Transfer via Progressive Training

no code implementations27 Aug 2020 Ji Liu, Heshan Liu, Mang-Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

We propose a novel pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution (1024 * 1024), given respectively an image of the reference and target person.

Video Generation Vocal Bursts Intensity Prediction

HAA500: Human-Centric Atomic Action Dataset with Curated Videos

no code implementations ICCV 2021 Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang

We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames.

Action Classification Action Recognition

Occlusion-Aware Video Object Inpainting

no code implementations ICCV 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

To facilitate this new research, we construct the first large-scale video object inpainting benchmark YouTube-VOI to provide realistic occlusion scenarios with both occluded and visible object masks available.

Object Texture Synthesis +1

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

no code implementations15 Feb 2022 Mu-Ruei Tseng, Abhishek Gupta, Chi-Keung Tang, Yu-Wing Tai

All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction.

Atomic action recognition

Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation

no code implementations2 Oct 2022 Xinhang Liu, Jiaben Chen, Huai Yu, Yu-Wing Tai, Chi-Keung Tang

The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss, enabling an unsupervised partitioning of a scene into salient or meaningful regions corresponding to different object instances.

3D Object Editing Object +2

Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

no code implementations8 Nov 2022 Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.

Autonomous Driving Domain Generalization

H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

no code implementations21 Nov 2022 Changlin Li, Guangyang Wu, Yanan sun, Xin Tao, Chi-Keung Tang, Yu-Wing Tai

The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame.

Video Frame Interpolation

ONeRF: Unsupervised 3D Object Segmentation from Multiple Views

no code implementations22 Nov 2022 Shengnan Liang, Yichen Liu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We present ONeRF, a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.

3D scene Editing Object +1

Clean-NeRF: Reformulating NeRF to account for View-Dependent Observations

no code implementations26 Mar 2023 Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper analyzes the NeRF's struggles in such settings and proposes Clean-NeRF for accurate 3D reconstruction and novel view rendering in complex scenes.

3D Reconstruction Density Estimation +3

Registering Neural Radiance Fields as 3D Density Images

no code implementations22 May 2023 Han Jiang, Ruoxuan Li, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang

No significant work has been done to directly merge two partially overlapping scenes using NeRF representations.

Contrastive Learning

Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models

no code implementations24 May 2023 Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang

We introduce Deceptive-NeRF, a novel methodology for few-shot NeRF reconstruction, which leverages diffusion models to synthesize plausible pseudo-observations to improve the reconstruction.

UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks

no code implementations7 Jun 2023 Yanan sun, Zihan Zhong, Qi Fan, Chi-Keung Tang, Yu-Wing Tai

Our thorough studies validate that models pre-trained as such can learn rich representations of both modalities, improving their ability to understand how images and text relate to each other.

Semantic Segmentation

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

no code implementations ICCV 2023 Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed.

Action Recognition Temporal Action Localization

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

no code implementations29 Nov 2023 Juntao Zhang, Yuehuai Liu, Yu-Wing Tai, Chi-Keung Tang

Specifically, C3Net first aligns the conditions from multi-modalities to the same semantic latent space using modality-specific encoders based on contrastive training.

multimodal generation

SANeRF-HQ: Segment Anything for NeRF in High Quality

no code implementations3 Dec 2023 Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai

Recently, the Segment Anything Model (SAM) has showcased remarkable capabilities of zero-shot segmentation, while NeRF (Neural Radiance Fields) has gained popularity as a method for various 3D problems beyond novel view synthesis.

Novel View Synthesis Object +4

Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent

no code implementations5 Dec 2023 Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

This paper explores promptable NeRF generation (e. g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control.

3D Generation 3D Reconstruction

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

no code implementations30 Dec 2023 Han Jiang, Haosen Sun, Ruoxuan Li, Chi-Keung Tang, Yu-Wing Tai

Second and the remaining problem is thus 3D multiview consistency among all completed images, now guided by the seed images and their 3D proxies.

Cannot find the paper you are looking for? You can Submit a new open access paper.