Search Results for author: Chi-Keung Tang

Found 75 papers, 40 papers with code

Segment Anything in High Quality

2 code implementations • NeurIPS 2023 • Lei Ke, Mingqiao Ye, Martin Danelljan, Yifan Liu, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

HQ-SAM is only trained on the introduced detaset of 44k masks, which takes only 4 hours on 8 GPUs.

Ranked #1 on Zero-Shot Instance Segmentation on LVIS v1.0 val

Zero-Shot Instance Segmentation Zero Shot Segmentation

13,402

Paper
Code

Segment Anything Meets Point Tracking

1 code implementation • 3 Jul 2023 • Frano Rajič, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models.

Interactive Video Object Segmentation Object +5

900

Paper
Code

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

2 code implementations • CVPR 2020 • Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang

In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.

Ranked #1 on Semantic Segmentation on BIG (using extra training data)

4k Land Cover Classification +3

786

Paper
Code

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

3 code implementations • NeurIPS 2021 • Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation.

Ranked #7 on Video Object Segmentation on YouTube-VOS 2019

Semantic Segmentation Semi-Supervised Video Object Segmentation +1

521

Paper
Code

Mask Transfiner for High-Quality Instance Segmentation

2 code implementations • CVPR 2022 • Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree.

Ranked #1 on Instance Segmentation on BDD100K val

Instance Segmentation Segmentation +2

519

Paper
Code

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

1 code implementation • CVPR 2021 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries.

Ranked #1 on Instance Segmentation on KINS

Amodal Instance Segmentation Boundary Detection +4

507

Paper
Code

Occlusion-Aware Instance Segmentation via BiLayer Network Architectures

1 code implementation • 8 Aug 2022 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Unlike previous instance segmentation methods, we model image formation as a composition of two overlapping layers, and propose Bilayer Convolutional Network (BCNet), where the top layer detects occluding objects (occluders) and the bottom layer infers partially occluded instances (occludees).

Instance Segmentation Segmentation +2

507

Paper
Code

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

5 code implementations • CVPR 2021 • Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance.

Ranked #1 on Interactive Video Object Segmentation on DAVIS 2017 (using extra training data)

Interactive Video Object Segmentation Semantic Segmentation +2

445

Paper
Code

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

7 code implementations • 12 Jul 2016 • Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang

We alternate the pruning and retraining to further reduce zero activations in a network.

Efficient Neural Network

406

Paper
Code

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

3 code implementations • CVPR 2020 • Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai

To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations.

Ranked #21 on Few-Shot Object Detection on MS-COCO (10-shot)

Few-Shot Object Detection Object +2

375

Paper
Code

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation

1 code implementation • NeurIPS 2021 • Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.

Ranked #1 on Video Instance Segmentation on BDD100K val

Multi-Object Tracking and Segmentation Multiple Object Track and Segmentation +3

358

Paper
Code

Mask-Free Video Instance Segmentation

1 code implementation • CVPR 2023 • Lei Ke, Martin Danelljan, Henghui Ding, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

A consistency loss is then enforced on the found matches.

Ranked #1 on Video Instance Segmentation on Youtube-VIS (trained with no video masks)

Instance Segmentation Optical Flow Estimation +4

349

Paper
Code

Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation

1 code implementation • ECCV 2020 • Qi Fan, Lei Ke, Wenjie Pei, Chi-Keung Tang, Yu-Wing Tai

We propose to learn the underlying class-agnostic commonalities that can be generalized from mask-annotated categories to novel categories.

Ranked #79 on Instance Segmentation on COCO test-dev

Instance Segmentation Segmentation +1

344

Paper
Code

Few-Shot Video Object Detection

1 code implementation • 30 Apr 2021 • Qi Fan, Chi-Keung Tang, Yu-Wing Tai

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity.

Few-Shot Video Object Detection Object +2

344

Paper
Code

Cascaded deep monocular 3D human pose estimation with evolutionary training data

1 code implementation • CVPR 2020 • Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.

Ranked #13 on Weakly-supervised 3D Human Pose Estimation on Human3.6M

Data Augmentation Monocular 3D Human Pose Estimation +3

327

Paper
Code

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

1 code implementation • CVPR 2020 • Xiang Li, Tianhan Wei, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only.

Ranked #20 on Few-Shot Semantic Segmentation on FSS-1000 (5-shot)

Few-Shot Semantic Segmentation Object +2

268

Paper
Code

Semantic Image Matting

1 code implementation • CVPR 2021 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai

Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.

Ranked #1 on Semantic Image Matting on Semantic Image Matting Dataset

Semantic Image Matting Transparent objects

214

Paper
Code

NeRF-RPN: A general framework for object detection in NeRFs

2 code implementations • CVPR 2023 • Benran Hu, Junkai Huang, Yichen Liu, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant object detection framework, NeRF-RPN, which directly operates on NeRF.

object-detection Object Detection

210

Paper
Code

Deep High Dynamic Range Imaging with Large Foreground Motions

1 code implementation • ECCV 2018 • Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang

In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions.

Translation Vocal Bursts Intensity Prediction

180

Paper
Code

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

1 code implementation • ICCV 2019 • Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details.

Style Transfer

177

Paper
Code

GCoNet+: A Stronger Group Collaborative Co-Salient Object Detector

2 code implementations • 30 May 2022 • Peng Zheng, Huazhu Fu, Deng-Ping Fan, Qi Fan, Jie Qin, Yu-Wing Tai, Chi-Keung Tang, Luc van Gool

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes.

Ranked #1 on Co-Salient Object Detection on CoCA

Co-Salient Object Detection Object +2

143

Paper
Code

GSNet: Joint Vehicle Pose and Shape Reconstruction with Geometrical and Scene-aware Supervision

1 code implementation • ECCV 2020 • Lei Ke, Shichao Li, Yanan sun, Yu-Wing Tai, Chi-Keung Tang

GSNet utilizes a unique four-way feature extraction and fusion scheme and directly regresses 6DoF poses and shapes in a single forward pass.

Ranked #1 on Autonomous Driving on ApolloCar3D

3D Car Instance Understanding 3D Pose Estimation +11

133

Paper
Code

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

1 code implementation • CVPR 2021 • Yanan sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.

Image Matting Optical Flow Estimation +1

Paper
Code

Human Instance Matting via Mutual Guidance and Multi-Instance Refinement

1 code implementation • CVPR 2022 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai

A new instance matting metric called instance matting quality (IMQ) is proposed, which addresses the absence of a unified and fair means of evaluation emphasizing both instance recognition and matting quality.

Image Matting Instance Segmentation +1

Paper
Code

Cascade-DETR: Delving into High-Quality Universal Object Detection

1 code implementation • ICCV 2023 • Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains.

Object object-detection +2

Paper
Code

Self-Support Few-Shot Semantic Segmentation

1 code implementation • 23 Jul 2022 • Qi Fan, Wenjie Pei, Yu-Wing Tai, Chi-Keung Tang

Motivated by the simple Gestalt principle that pixels belonging to the same object are more similar than those to different objects of same class, we propose a novel self-support matching strategy to alleviate this problem, which uses query prototypes to match query features, where the query prototypes are collected from high-confidence query predictions.

Ranked #12 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot)

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Image Generation from Sketch Constraint Using Contextual GAN

1 code implementation • ECCV 2018 • Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We train a generated adversarial network, i. e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images.

Image-to-Image Translation Translation

Paper
Code

Stable Segment Anything Model

1 code implementation • 27 Nov 2023 • Qi Fan, Xin Tao, Lei Ke, Mingqiao Ye, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu-Wing Tai, Chi-Keung Tang

Thus, our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0. 08 M) and fast adaptation (by 1 training epoch).

Segmentation

Paper
Code

FaceDNeRF: Semantics-Driven Face Reconstruction, Prompt Editing and Relighting with Diffusion Models

2 code implementations • NeurIPS 2023 • Hao Zhang, Yanbo Xu, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries.

3D Face Reconstruction Video Editing +1

Paper
Code

Instance Neural Radiance Field

1 code implementation • ICCV 2023 • Yichen Liu, Benran Hu, Junkai Huang, Yu-Wing Tai, Chi-Keung Tang

This paper presents one of the first learning-based NeRF 3D instance segmentation pipelines, dubbed as {\bf \inerflong}, or \inerf.

3D Instance Segmentation Panoptic Segmentation +1

Paper
Code

Ultrahigh Resolution Image/Video Matting With Spatio-Temporal Sparsity

1 code implementation • CVPR 2023 • Yanan sun, Chi-Keung Tang, Yu-Wing Tai

Instead, our method resorts to spatial and temporal sparsity for solving general UHR matting.

Image Matting Video Matting

Paper
Code

DragVideo: Interactive Drag-style Video Editing

1 code implementation • 3 Dec 2023 • Yufan Deng, Ruida Wang, Yuhao Zhang, Yu-Wing Tai, Chi-Keung Tang

The main issues are: 1) how to perform direct and accurate user control in editing; 2) how to execute editings like changing shape, expression, and layout without unsightly distortion and artifacts to the edited content; and 3) how to maintain spatio-temporal consistency of video after editing.

Video Editing Video Generation

Paper
Code

Video Mask Transfiner for High-Quality Video Instance Segmentation

1 code implementation • 28 Jul 2022 • Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

While Video Instance Segmentation (VIS) has seen rapid progress, current approaches struggle to predict high-quality masks with accurate boundary details.

Ranked #1 on Video Instance Segmentation on HQ-YTVIS

Instance Segmentation Semantic Segmentation +2

Paper
Code

One-Shot Object Detection without Fine-Tuning

1 code implementation • 8 May 2020 • Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.

Metric Learning Object +2

Paper
Code

FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF

1 code implementation • 5 Jan 2024 • Hao Zhang, Yu-Wing Tai, Chi-Keung Tang

However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge.

Video Editing

Paper
Code

FLNeRF: 3D Facial Landmarks Estimation in Neural Radiance Fields

1 code implementation • 21 Nov 2022 • Hao Zhang, Tianyuan Dai, Yu-Wing Tai, Chi-Keung Tang

This paper presents the first significant work on directly predicting 3D face landmarks on neural radiance fields (NeRFs).

Paper
Code

Semi-Supervised Few-Shot Atomic Action Recognition

1 code implementation • 17 Nov 2020 • Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang

Despite excellent progress has been made, the performance on action recognition still heavily relies on specific datasets, which are difficult to extend new action classes due to labor-intensive labeling.

Atomic action recognition

Paper
Code

Distill Gold from Massive Ores: Efficient Dataset Distillation via Critical Samples Selection

1 code implementation • 28 May 2023 • Yue Xu, Yong-Lu Li, Kaitong Cui, Ziyu Wang, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

Our method consistently enhances the distillation algorithms, even on much larger-scale and more heterogeneous datasets, e. g. ImageNet-1K and Kinetics-400.

Paper
Code

Deceptive-Human: Prompt-to-NeRF 3D Human Generation with 3D-Consistent Synthetic Images

1 code implementation • 27 Nov 2023 • Shiu-hong Kao, Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper presents Deceptive-Human, a novel Prompt-to-NeRF framework capitalizing state-of-the-art control diffusion models (e. g., ControlNet) to generate a high-quality controllable 3D human NeRF.

Density Estimation

Paper
Code

Interactiveness Field in Human-Object Interactions

1 code implementation • CVPR 2022 • Xinpeng Liu, Yong-Lu Li, Xiaoqian Wu, Yu-Wing Tai, Cewu Lu, Chi-Keung Tang

Human-Object Interaction (HOI) detection plays a core role in activity understanding.

Human-Object Interaction Detection Object

Paper
Code

Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters

no code implementations • 1 Feb 2018 • Zheng Wu, Ruiheng Chang, Jiaxu Ma, Cewu Lu, Chi-Keung Tang

We propose a novel approach for instance segmen- tation given an image of homogeneous object clus- ter (HOC).

Instance Segmentation One-Shot Learning +1

Paper
Add Code

Deep Video Generation, Prediction and Completion of Human Action Sequences

no code implementations • ECCV 2018 • Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage.

Ranked #5 on Human action generation on NTU RGB+D 2D

Human action generation Video Generation +1

Paper
Add Code

MAVOT: Memory-Augmented Video Object Tracking

no code implementations • 26 Nov 2017 • Boyu Liu, Yanzhao Wang, Yu-Wing Tai, Chi-Keung Tang

We introduce a one-shot learning approach for video object tracking.

Object One-Shot Learning +2

Paper
Add Code

Attribute-Guided Face Generation Using Conditional CycleGAN

no code implementations • ECCV 2018 • Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang

We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that satisfies the given attributes.

Attribute Face Generation +2

Paper
Add Code

Beyond Holistic Object Recognition: Enriching Image Understanding with Part States

no code implementations • CVPR 2018 • Cewu Lu, Hao Su, Yongyi Lu, Li Yi, Chi-Keung Tang, Leonidas Guibas

Important high-level vision tasks such as human-object interaction, image captioning and robotic manipulation require rich semantic descriptions of objects at part level.

Human-Object Interaction Detection Image Captioning +1

Paper
Add Code

A Closed-Form Solution to Tensor Voting: Theory and Applications

no code implementations • 19 Jan 2016 • Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Gerard Medioni

We prove a closed-form solution to tensor voting (CFTV): given a point set in any dimensions, our closed-form solution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation.

Stereo Matching Stereo Matching Hand

Paper
Add Code

1-HKUST: Object Detection in ILSVRC 2014

no code implementations • 22 Sep 2014 • Cewu Lu, Hao Chen, Qifeng Chen, Hei Law, Yao Xiao, Chi-Keung Tang

We participated in the object detection track of ILSVRC 2014 and received the fourth place among the 38 teams.

Object object-detection +3

Paper
Add Code

Range-Sample Depth Feature for Action Recognition

no code implementations • CVPR 2014 • Cewu Lu, Jiaya Jia, Chi-Keung Tang

We propose binary range-sample feature in depth.

Action Recognition Temporal Action Localization

Paper
Add Code

Shadow Removal from Single RGB-D Images

no code implementations • CVPR 2014 • Yao Xiao, Efstratios Tsougenis, Chi-Keung Tang

We present the first automatic method to remove shadows from single RGB-D images.

Intrinsic Image Decomposition Shadow Removal

Paper
Add Code

Two-Class Weather Classification

no code implementations • CVPR 2014 • Cewu Lu, Di Lin, Jiaya Jia, Chi-Keung Tang

Given a single outdoor image, this paper proposes a collaborative learning approach for labeling it as either sunny or cloudy.

Classification General Classification +1

Paper
Add Code

Complexity-Adaptive Distance Metric for Object Proposals Generation

no code implementations • CVPR 2015 • Yao Xiao, Cewu Lu, Efstratios Tsougenis, Yongyi Lu, Chi-Keung Tang

Distance metric plays a key role in grouping superpixels to produce object proposals for object detection.

Object object-detection +2

Paper
Add Code

Contour Box: Rejecting Object Proposals Without Explicit Closed Contours

no code implementations • ICCV 2015 • Cewu Lu, Shu Liu, Jiaya Jia, Chi-Keung Tang

Closed contour is an important objectness indicator.

Object

Paper
Add Code

Square Localization for Efficient and Accurate Object Detection

no code implementations • ICCV 2015 • Cewu Lu, Yongyi Lu, Hao Chen, Chi-Keung Tang

In the testing phase, sliding CNN models are applied which produces a set of response maps that can be effectively filtered by the learned co-presence prior to output the final bounding boxes for localizing an object.

Object object-detection +2

Paper
Add Code

Online Video Object Detection Using Association LSTM

no code implementations • ICCV 2017 • Yongyi Lu, Cewu Lu, Chi-Keung Tang

Video object detection is a fundamental tool for many applications.

Object object-detection +1

Paper
Add Code

StableNet: Semi-Online, Multi-Scale Deep Video Stabilization

no code implementations • 24 Jul 2019 • Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang

Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions.

Video Stabilization

Paper
Add Code

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking

no code implementations • 2 Aug 2019 • Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang

Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.

Video Object Tracking Visual Tracking

Paper
Add Code

Template-Instance Loss for Offline Handwritten Chinese Character Recognition

no code implementations • 12 Oct 2019 • Yao Xiao, Dan Meng, Cewu Lu, Chi-Keung Tang

The long-standing challenges for offline handwritten Chinese character recognition (HCCR) are twofold: Chinese characters can be very diverse and complicated while similarly looking, and cursive handwriting (due to increased writing speed and infrequent pen lifting) makes strokes and even characters connected together in a flowing manner.

Offline Handwritten Chinese Character Recognition

Paper
Add Code

Fast Video Object Segmentation With Temporal Aggregation Network and Dynamic Template Matching

no code implementations • CVPR 2020 • Xuhua Huang, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang

Significant progress has been made in Video Object Segmentation (VOS), the video object tracking task in its finest level.

Ranked #71 on Semi-Supervised Video Object Segmentation on DAVIS 2016

Object One-Shot Learning +6

Paper
Add Code

Pose-Guided High-Resolution Appearance Transfer via Progressive Training

no code implementations • 27 Aug 2020 • Ji Liu, Heshan Liu, Mang-Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

We propose a novel pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution (1024 * 1024), given respectively an image of the reference and target person.

Video Generation Vocal Bursts Intensity Prediction

Paper
Add Code

HAA500: Human-Centric Atomic Action Dataset with Curated Videos

no code implementations • ICCV 2021 • Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang

We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames.

Ranked #1 on Action Recognition on HAA500

Action Classification Action Recognition

Paper
Add Code

Occlusion-Aware Video Object Inpainting

no code implementations • ICCV 2021 • Lei Ke, Yu-Wing Tai, Chi-Keung Tang

To facilitate this new research, we construct the first large-scale video object inpainting benchmark YouTube-VOI to provide realistic occlusion scenarios with both occluded and visible object masks available.

Object Texture Synthesis +1

Paper
Add Code

HAA4D: Few-Shot Human Atomic Action Recognition via 3D Spatio-Temporal Skeletal Alignment

no code implementations • 15 Feb 2022 • Mu-Ruei Tseng, Abhishek Gupta, Chi-Keung Tang, Yu-Wing Tai

All training and testing 3D skeletons in HAA4D are globally aligned, using a deep alignment model to the same global space, making each skeleton face the negative z-direction.

Atomic action recognition

Paper
Add Code

Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation

no code implementations • 2 Oct 2022 • Xinhang Liu, Jiaben Chen, Huai Yu, Yu-Wing Tai, Chi-Keung Tang

The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss, enabling an unsupervised partitioning of a scene into salient or meaningful regions corresponding to different object instances.

3D Object Editing Object +2

Paper
Add Code

Normalization Perturbation: A Simple Domain Generalization Method for Real-World Domain Shifts

no code implementations • 8 Nov 2022 • Qi Fan, Mattia Segu, Yu-Wing Tai, Fisher Yu, Chi-Keung Tang, Bernt Schiele, Dengxin Dai

Thus, we propose to perturb the channel statistics of source domain features to synthesize various latent styles, so that the trained deep model can perceive diverse potential domains and generalizes well even without observations of target domain data in training.

Autonomous Driving Domain Generalization

Paper
Add Code

H-VFI: Hierarchical Frame Interpolation for Videos with Large Motions

no code implementations • 21 Nov 2022 • Changlin Li, Guangyang Wu, Yanan sun, Xin Tao, Chi-Keung Tang, Yu-Wing Tai

The learnt deformable kernel is then utilized in convolving the input frames for predicting the interpolated frame.

Video Frame Interpolation

Paper
Add Code

ONeRF: Unsupervised 3D Object Segmentation from Multiple Views

no code implementations • 22 Nov 2022 • Shengnan Liang, Yichen Liu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We present ONeRF, a method that automatically segments and reconstructs object instances in 3D from multi-view RGB images without any additional manual annotations.

3D scene Editing Object +1

Paper
Add Code

Clean-NeRF: Reformulating NeRF to account for View-Dependent Observations

no code implementations • 26 Mar 2023 • Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

This paper analyzes the NeRF's struggles in such settings and proposes Clean-NeRF for accurate 3D reconstruction and novel view rendering in complex scenes.

3D Reconstruction Density Estimation +3

Paper
Add Code

Registering Neural Radiance Fields as 3D Density Images

no code implementations • 22 May 2023 • Han Jiang, Ruoxuan Li, Haosen Sun, Yu-Wing Tai, Chi-Keung Tang

No significant work has been done to directly merge two partially overlapping scenes using NeRF representations.

Contrastive Learning

Paper
Add Code

Deceptive-NeRF: Enhancing NeRF Reconstruction using Pseudo-Observations from Diffusion Models

no code implementations • 24 May 2023 • Xinhang Liu, Jiaben Chen, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang

We introduce Deceptive-NeRF, a novel methodology for few-shot NeRF reconstruction, which leverages diffusion models to synthesize plausible pseudo-observations to improve the reconstruction.

Paper
Add Code

UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks

no code implementations • 7 Jun 2023 • Yanan sun, Zihan Zhong, Qi Fan, Chi-Keung Tang, Yu-Wing Tai

Our thorough studies validate that models pre-trained as such can learn rich representations of both modalities, improving their ability to understand how images and text relate to each other.

Semantic Segmentation

Paper
Add Code

EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding

no code implementations • ICCV 2023 • Yue Xu, Yong-Lu Li, Zhemin Huang, Michael Xu Liu, Cewu Lu, Yu-Wing Tai, Chi-Keung Tang

With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets such as Ego4D and EPIC-KITCHENS have been proposed.

Action Recognition Temporal Action Localization

Paper
Add Code

C3Net: Compound Conditioned ControlNet for Multimodal Content Generation

no code implementations • 29 Nov 2023 • Juntao Zhang, Yuehuai Liu, Yu-Wing Tai, Chi-Keung Tang

Specifically, C3Net first aligns the conditions from multi-modalities to the same semantic latent space using modality-specific encoders based on contrastive training.

multimodal generation

Paper
Add Code

SANeRF-HQ: Segment Anything for NeRF in High Quality

no code implementations • 3 Dec 2023 • Yichen Liu, Benran Hu, Chi-Keung Tang, Yu-Wing Tai

Recently, the Segment Anything Model (SAM) has showcased remarkable capabilities of zero-shot segmentation, while NeRF (Neural Radiance Fields) has gained popularity as a method for various 3D problems beyond novel view synthesis.

Novel View Synthesis Object +4

Paper
Add Code

Prompt2NeRF-PIL: Fast NeRF Generation via Pretrained Implicit Latent

no code implementations • 5 Dec 2023 • Jianmeng Liu, Yuyao Zhang, Zeyuan Meng, Yu-Wing Tai, Chi-Keung Tang

This paper explores promptable NeRF generation (e. g., text prompt or single image prompt) for direct conditioning and fast generation of NeRF parameters for the underlying 3D scenes, thus undoing complex intermediate steps while providing full 3D generation with conditional control.

3D Generation 3D Reconstruction

Paper
Add Code

Inpaint4DNeRF: Promptable Spatio-Temporal NeRF Inpainting with Generative Diffusion Models

no code implementations • 30 Dec 2023 • Han Jiang, Haosen Sun, Ruoxuan Li, Chi-Keung Tang, Yu-Wing Tai

Second and the remaining problem is thus 3D multiview consistency among all completed images, now guided by the seed images and their 3D proxies.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.