Search Results for author: Yu-Wing Tai

Found 73 papers, 34 papers with code

Transcoded Video Restoration by Temporal Spatial Auxiliary Network

1 code implementation15 Dec 2021 Li Xu, Gang He, Jinjia Zhou, Jie Lei, Weiying Xie, Yunsong Li, Yu-Wing Tai

In most video platforms, such as Youtube, and TikTok, the played videos usually have undergone multiple video encodings such as hardware encoding by recording devices, software encoding by video editing apps, and single/multiple video transcoding by video application servers.

Video Editing Video Restoration

NeRF-SR: High-Quality Neural Radiance Fields using Super-Sampling

no code implementations3 Dec 2021 Chen Wang, Xian Wu, Yuan-Chen Guo, Song-Hai Zhang, Yu-Wing Tai, Shi-Min Hu

We present NeRF-SR, a solution for high-resolution (HR) novel view synthesis with mostly low-resolution (LR) inputs.

Novel View Synthesis

Mask Transfiner for High-Quality Instance Segmentation

no code implementations26 Nov 2021 Lei Ke, Martin Danelljan, Xia Li, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu

Instead of operating on regular dense tensors, our Mask Transfiner decomposes and represents the image regions as a quadtree.

Instance Segmentation Semantic Segmentation

Occlusion-Aware Video Object Inpainting

no code implementations ICCV 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

To facilitate this new research, we construct the first large-scale video object inpainting benchmark YouTube-VOI to provide realistic occlusion scenarios with both occluded and visible object masks available.

Texture Synthesis Video Inpainting

Few-Shot Video Object Detection

1 code implementation30 Apr 2021 Qi Fan, Chi-Keung Tang, Yu-Wing Tai

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to visual learning in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity.

Few-Shot Video Object Detection Video Object Detection

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

1 code implementation CVPR 2021 Yanan sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, Yu-Wing Tai

Despite the significant progress made by deep learning in natural image matting, there has been so far no representative work on deep learning for video matting due to the inherent technical challenges in reasoning temporal domain and lack of large-scale video matting datasets.

Image Matting Matting +2

Semantic Image Matting

1 code implementation CVPR 2021 Yanan sun, Chi-Keung Tang, Yu-Wing Tai

Specifically, we consider and learn 20 classes of matting patterns, and propose to extend the conventional trimap to semantic trimap.

Matting Semantic Image Matting +1

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

1 code implementation CVPR 2021 Lei Ke, Yu-Wing Tai, Chi-Keung Tang

Segmenting highly-overlapping objects is challenging, because typically no distinction is made between real object contours and occlusion boundaries.

Amodal Instance Segmentation Boundary Detection +4

Group Collaborative Learning for Co-Salient Object Detection

1 code implementation CVPR 2021 Qi Fan, Deng-Ping Fan, Huazhu Fu, Chi Keung Tang, Ling Shao, Yu-Wing Tai

We present a novel group collaborative learning framework (GCoNet) capable of detecting co-salient objects in real time (16ms), by simultaneously mining consensus representations at group level based on the two necessary criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module; 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module conditioning the inconsistent consensus.

Co-Salient Object Detection Salient Object Detection

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

3 code implementations CVPR 2021 Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

We present Modular interactive VOS (MiVOS) framework which decouples interaction-to-mask and mask propagation, allowing for higher generalizability and better performance.

 Ranked #1 on Interactive Video Object Segmentation on DAVIS 2017 (using extra training data)

Interactive Video Object Segmentation Semantic Segmentation +2

PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features

2 code implementations24 Feb 2021 Yang You, Yujing Lou, Ruoxi Shi, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Weiming Wang, Cewu Lu

Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point.

3D Feature Matching Data Augmentation

Semi-Supervised Few-Shot Atomic Action Recognition

1 code implementation17 Nov 2020 Xiaoyuan Ni, Sizhe Song, Yu-Wing Tai, Chi-Keung Tang

Despite excellent progress has been made, the performance on action recognition still heavily relies on specific datasets, which are difficult to extend new action classes due to labor-intensive labeling.

Action Recognition

HAA500: Human-Centric Atomic Action Dataset with Curated Videos

no code implementations ICCV 2021 Jihoon Chung, Cheng-hsin Wuu, Hsuan-ru Yang, Yu-Wing Tai, Chi-Keung Tang

We contribute HAA500, a manually annotated human-centric atomic action dataset for action recognition on 500 classes with over 591K labeled frames.

Action Classification Action Recognition

Pose-Guided High-Resolution Appearance Transfer via Progressive Training

no code implementations27 Aug 2020 Ji Liu, Heshan Liu, Mang-Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

We propose a novel pose-guided appearance transfer network for transferring a given reference appearance to a target pose in unprecedented image resolution (1024 * 1024), given respectively an image of the reference and target person.

Video Generation

Fully Convolutional Networks for Continuous Sign Language Recognition

no code implementations ECCV 2020 Ka Leong Cheng, Zhaoyang Yang, Qifeng Chen, Yu-Wing Tai

Continuous sign language recognition (SLR) is a challenging task that requires learning on both spatial and temporal dimensions of signing frame sequences.

Sign Language Recognition

Dense Hybrid Recurrent Multi-view Stereo Net with Dynamic Consistency Checking

2 code implementations ECCV 2020 Jian-Feng Yan, Zizhuang Wei, Hongwei Yi, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

In this paper, we propose an efficient and effective dense hybrid recurrent multi-view stereo net with dynamic consistency checking, namely $D^{2}$HC-RMVSNet, for accurate dense point cloud reconstruction.

Point cloud reconstruction

Dive Deeper Into Box for Object Detection

no code implementations ECCV 2020 Ran Chen, Yong liu, Mengdan Zhang, Shu Liu, Bei Yu, Yu-Wing Tai

Anchor free methods have defined the new frontier in state-of-the-art object detection researches where accurate bounding box estimation is the key to the success of these methods.

Object Detection

Cascaded deep monocular 3D human pose estimation with evolutionary training data

1 code implementation CVPR 2020 Shichao Li, Lei Ke, Kevin Pratama, Yu-Wing Tai, Chi-Keung Tang, Kwang-Ting Cheng

End-to-end deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation, yet these models may fail for unseen poses with limited and fixed training data.

Data Augmentation Monocular 3D Human Pose Estimation +2

One-Shot Object Detection without Fine-Tuning

1 code implementation8 May 2020 Xiang Li, Lin Zhang, Yau Pun Chen, Yu-Wing Tai, Chi-Keung Tang

Deep learning has revolutionized object detection thanks to large-scale datasets, but their object categories are still arguably very limited.

Metric Learning One-Shot Object Detection

CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement

1 code implementation CVPR 2020 Ho Kei Cheng, Jihoon Chung, Yu-Wing Tai, Chi-Keung Tang

In this paper, we propose a novel approach to address the high-resolution segmentation problem without using any high-resolution training data.

 Ranked #1 on Semantic Segmentation on BIG (using extra training data)

Scene Parsing Semantic Segmentation

Learning Video Object Segmentation from Unlabeled Videos

1 code implementation CVPR 2020 Xiankai Lu, Wenguan Wang, Jianbing Shen, Yu-Wing Tai, David Crandall, Steven C. H. Hoi

We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos, unlike most existing methods which rely heavily on extensive annotated data.

Representation Learning Semantic Segmentation +3

Spatial-Scale Aligned Network for Fine-Grained Recognition

no code implementations5 Jan 2020 Lizhao Gao, Hai-Hua Xu, Chong Sun, Junling Liu, Yu-Wing Tai

Existing approaches for fine-grained visual recognition focus on learning marginal region-based representations while neglecting the spatial and scale misalignments, leading to inferior performance.

Fine-Grained Visual Recognition

Pyramid Multi-view Stereo Net with Self-adaptive View Aggregation

1 code implementation ECCV 2020 Hongwei Yi, Zizhuang Wei, Mingyu Ding, Runze Zhang, Yisong Chen, Guoping Wang, Yu-Wing Tai

n this paper, we propose an effective and efficient pyramid multi-view stereo (MVS) net with self-adaptive view aggregation for accurate and complete dense point cloud reconstruction.

3D Point Cloud Reconstruction Depth Estimation +1

Reflective Decoding Network for Image Captioning

no code implementations ICCV 2019 Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance.

Image Captioning

Cross-Domain Adaptation for Animal Pose Estimation

no code implementations ICCV 2019 Jinkun Cao, Hongyang Tang, Hao-Shu Fang, Xiaoyong Shen, Cewu Lu, Yu-Wing Tai

Therefore, the easily available human pose dataset, which is of a much larger scale than our labeled animal dataset, provides important prior knowledge to boost up the performance on animal pose estimation.

Animal Pose Estimation Domain Adaptation

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

3 code implementations CVPR 2020 Qi Fan, Wei Zhuo, Chi-Keung Tang, Yu-Wing Tai

To train our network, we contribute a new dataset that contains 1000 categories of various objects with high-quality annotations.

Few-Shot Object Detection

SF-Net: Structured Feature Network for Continuous Sign Language Recognition

no code implementations4 Aug 2019 Zhaoyang Yang, Zhenmei Shi, Xiaoyong Shen, Yu-Wing Tai

The proposed SF-Net extracts features in a structured manner and gradually encodes information at the frame level, the gloss level and the sentence level into the feature representation.

Sign Language Recognition

DAWN: Dual Augmented Memory Network for Unsupervised Video Object Tracking

no code implementations2 Aug 2019 Zhenmei Shi, Haoyang Fang, Yu-Wing Tai, Chi-Keung Tang

Our Dual Augmented Memory Network (DAWN) is unique in remembering both target and background, and using an improved attention LSTM memory to guide the focus on memorized features.

Video Object Tracking Visual Tracking

StableNet: Semi-Online, Multi-Scale Deep Video Stabilization

no code implementations24 Jul 2019 Chia-Hung Huang, Hang Yin, Yu-Wing Tai, Chi-Keung Tang

Video stabilization algorithms are of greater importance nowadays with the prevalence of hand-held devices which unavoidably produce videos with undesirable shaky motions.

Affine Transformation Video Stabilization

Landmark Assisted CycleGAN for Cartoon Face Generation

no code implementations2 Jul 2019 Ruizheng Wu, Xiaodong Gu, Xin Tao, Xiaoyong Shen, Yu-Wing Tai, Jiaya Jia

In this paper, we are interested in generating an cartoon face of a person by using unpaired training data between real faces and cartoon ones.

Face Generation

Memory-Attended Recurrent Network for Video Captioning

1 code implementation CVPR 2019 Wenjie Pei, Jiyuan Zhang, Xiangrong Wang, Lei Ke, Xiaoyong Shen, Yu-Wing Tai

Typical techniques for video captioning follow the encoder-decoder framework, which can only focus on one source video being processed.

Video Captioning

LADN: Local Adversarial Disentangling Network for Facial Makeup and De-Makeup

1 code implementation ICCV 2019 Qiao Gu, Guanzhi Wang, Mang Tik Chiu, Yu-Wing Tai, Chi-Keung Tang

Central to our method are multiple and overlapping local adversarial discriminators in a content-style disentangling network for achieving local detail transfer between facial images, with the use of asymmetric loss functions for dramatic makeup styles with high-frequency details.

Style Transfer

Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution

1 code implementation23 Nov 2018 Yang You, Yujing Lou, Qi Liu, Yu-Wing Tai, Lizhuang Ma, Cewu Lu, Weiming Wang

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown.

3D Feature Matching Data Augmentation

Physics-Based Generative Adversarial Models for Image Restoration and Beyond

no code implementations2 Aug 2018 Jinshan Pan, Jiangxin Dong, Yang Liu, Jiawei Zhang, Jimmy Ren, Jinhui Tang, Yu-Wing Tai, Ming-Hsuan Yang

We present an algorithm to directly solve numerous image restoration problems (e. g., image deblurring, image dehazing, image deraining, etc.).

Deblurring Image Deblurring +3

Pairwise Body-Part Attention for Recognizing Human-Object Interactions

1 code implementation ECCV 2018 Hao-Shu Fang, Jinkun Cao, Yu-Wing Tai, Cewu Lu

We propose a new pairwise body-part attention model which can learn to focus on crucial parts, and their correlations for HOI recognition.

Human-Object Interaction Detection

Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer

1 code implementation CVPR 2018 Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, Cewu Lu

In this paper, we present a novel method to generate synthetic human part segmentation data using easily-obtained human keypoint annotations.

Ranked #4 on Human Part Segmentation on PASCAL-Part (using extra training data)

Human Parsing Human Part Segmentation +2

Image Generation from Sketch Constraint Using Contextual GAN

1 code implementation ECCV 2018 Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, Chi-Keung Tang

We train a generated adversarial network, i. e, contextual GAN to learn the joint distribution of sketch and the corresponding image by using joint images.

Image-to-Image Translation Translation

Deep High Dynamic Range Imaging with Large Foreground Motions

1 code implementation ECCV 2018 Shangzhe Wu, Jiarui Xu, Yu-Wing Tai, Chi-Keung Tang

In state-of-the-art deep HDR imaging, input images are first aligned using optical flows before merging, which are still error-prone due to occlusion and large motions.


Deep Video Generation, Prediction and Completion of Human Action Sequences

no code implementations ECCV 2018 Haoye Cai, Chunyan Bai, Yu-Wing Tai, Chi-Keung Tang

In the second stage, a skeleton-to-image network is trained, which is used to generate a human action video given the complete human pose sequence generated in the first stage.

Human action generation Video Generation +1

Adversarial Attacks Beyond the Image Space

no code implementations CVPR 2019 Xiaohui Zeng, Chenxi Liu, Yu-Siang Wang, Weichao Qiu, Lingxi Xie, Yu-Wing Tai, Chi Keung Tang, Alan L. Yuille

Though image-space adversaries can be interpreted as per-pixel albedo change, we verify that they cannot be well explained along these physically meaningful dimensions, which often have a non-local effect.

Question Answering Visual Question Answering

Image Dehazing using Bilinear Composition Loss Function

no code implementations1 Oct 2017 Hui Yang, Jinshan Pan, Qiong Yan, Wenxiu Sun, Jimmy Ren, Yu-Wing Tai

In this paper, we introduce a bilinear composition loss function to address the problem of image dehazing.

Image Dehazing

Attribute-Guided Face Generation Using Conditional CycleGAN

no code implementations ECCV 2018 Yongyi Lu, Yu-Wing Tai, Chi-Keung Tang

We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that satisfies the given attributes.

Face Generation Face Swapping +1

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation

1 code implementation CVPR 2017 Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon

In this paper, we introduce robust and synergetic hand-crafted features and a simple but efficient deep feature from a convolutional neural network (CNN) architecture for defocus estimation.

Defocus Estimation Image Generation

RMPE: Regional Multi-person Pose Estimation

5 code implementations ICCV 2017 Hao-Shu Fang, Shuqin Xie, Yu-Wing Tai, Cewu Lu

In this paper, we propose a novel regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes.

Human Detection Multi-Person Pose Estimation

Refining Geometry from Depth Sensors using IR Shading Images

no code implementations18 Aug 2016 Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

To resolve the ambiguity in our model between the normals and distances, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not captured and reconstructed by the Kinect fusion.

Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures

4 code implementations12 Jul 2016 Hengyuan Hu, Rui Peng, Yu-Wing Tai, Chi-Keung Tang

We alternate the pruning and retraining to further reduce zero activations in a network.

Efficient and Robust Color Consistency for Community Photo Collections

no code implementations CVPR 2016 Jaesik Park, Yu-Wing Tai, Sudipta N. Sinha, In So Kweon

We present a robust low-rank matrix factorization method to estimate the unknown parameters of this model.

Deep Saliency with Encoded Low level Distance Map and High Level Features

2 code implementations CVPR 2016 Gayoung Lee, Yu-Wing Tai, Junmo Kim

Recent advances in saliency detection have utilized deep learning to obtain high level features to detect salient regions in a scene.

Saliency Detection

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

no code implementations13 Feb 2016 Jimmy Ren, Yongtao Hu, Yu-Wing Tai, Chuan Wang, Li Xu, Wenxiu Sun, Qiong Yan

This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable.

Speaker Identification

RGB-Guided Hyperspectral Image Upsampling

no code implementations ICCV 2015 Hyeokhyen Kwon, Yu-Wing Tai

On the contrary, latest imaging sensors capture a RGB image with resolution of multiple times larger than a hyperspectral image.

Fast Randomized Singular Value Thresholding for Low-rank Optimization

no code implementations1 Sep 2015 Tae-Hyun Oh, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

The problems related to NNM, or WNNM, can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT), or Weighted SVT, but they suffer from high computational cost of Singular Value Decomposition (SVD) at each iteration.


Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization

no code implementations CVPR 2015 Tae-Hyun Oh, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

The problems related to NNM (or WNNM) can be solved iteratively by applying a closed-form proximal operator, called Singular Value Thresholding (SVT) (or Weighted SVT), but they suffer from high computational cost to compute a Singular Value Decomposition (SVD) at each iteration.


Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation

no code implementations CVPR 2015 Hyeokhyen Kwon, Yu-Wing Tai, Stephen Lin

Depth maps captured by consumer-level depth cameras such as Kinect are usually degraded by noise, missing values, and quantization.

Dictionary Learning Quantization

Partial Sum Minimization of Singular Values in Robust PCA: Algorithm and Applications

no code implementations4 Mar 2015 Tae-Hyun Oh, Yu-Wing Tai, Jean-Charles Bazin, Hyeongwoo Kim, In So Kweon

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers.

Edge Detection

Exploiting Shading Cues in Kinect IR Images for Geometry Refinement

no code implementations CVPR 2014 Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon

To resolve ambiguity in our model between normals and distance, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not reconstructed by the Kinect fusion.

Salient Region Detection via High-Dimensional Color Transform

no code implementations CVPR 2014 Jiwhan Kim, Dongyoon Han, Yu-Wing Tai, Junmo Kim

By mapping a low dimensional RGB color to a feature vector in a high-dimensional color space, we show that we can linearly separate the salient regions from the background by finding an optimal linear combination of color coefficients in the high-dimensional color space.

Calibrating a Non-isotropic Near Point Light Source using a Plane

no code implementations CVPR 2014 Jaesik Park, Sudipta N. Sinha, Yasuyuki Matsushita, Yu-Wing Tai, In So Kweon

We show that a non-isotropic near point light source rigidly attached to a camera can be calibrated using multiple images of a weakly textured planar scene.

Shading-Based Shape Refinement of RGB-D Images

no code implementations CVPR 2013 Lap-Fai Yu, Sai-Kit Yeung, Yu-Wing Tai, Stephen Lin

We present a shading-based shape refinement algorithm which uses a noisy, incomplete depth map from Kinect to help resolve ambiguities in shape-from-shading.

Cannot find the paper you are looking for? You can Submit a new open access paper.