Search Results for author: Chong Luo

Found 24 papers, 11 papers with code

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

no code implementations13 Dec 2022 Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang

Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.

Instance Segmentation Semantic Segmentation +2

TridentSE: Guiding Speech Enhancement with 32 Global Tokens

no code implementations24 Oct 2022 Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo

In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.

Speech Enhancement

An Anchor-Free Detector for Continuous Speech Keyword Spotting

no code implementations9 Aug 2022 Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.

Keyword Spotting object-detection +1

RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion

no code implementations28 Jun 2022 Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo

In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.

Peripheral Vision Transformer

1 code implementation14 Jun 2022 Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho

We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.

Image Classification

Retriever: Learning Content-Style Representation as a Token-Level Bipartite Graph

2 code implementations ICLR 2022 Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng

Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys.

Quantization Style Transfer +1

Make It Move: Controllable Image-to-Video Generation with Text Descriptions

1 code implementation CVPR 2022 Yaosi Hu, Chong Luo, Zhenzhong Chen

With both controllable appearance and motion, TI2V aims at generating videos from a static image and a text description.

Image to Video Generation

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

1 code implementation12 Sep 2021 Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.

speech editing Voice Conversion

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

1 code implementation30 Aug 2021 Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.

Self-Supervised Visual Representations Learning by Contrastive Mask Prediction

no code implementations ICCV 2021 Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha

In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.

Representation Learning Self-Supervised Learning

Unsupervised Visual Representation Learning by Tracking Patches in Video

1 code implementation CVPR 2021 Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong

The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.

Action Classification Action Recognition +1

VAE^2: Preventing Posterior Collapse of Variational Video Predictions in the Wild

no code implementations28 Jan 2021 Yizhou Zhou, Chong Luo, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng

We believe that VAE$^2$ is also applicable to other stochastic sequence prediction problems where training data are lack of stochasticity.

Video Prediction

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

no code implementations CVPR 2020 Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.

Action Recognition Action Recognition In Videos +1

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

4 code implementations Applications of Artificial Intelligence Conference 2019 Dacheng Yin, Chong Luo, Zhiwei Xiong, Wen-Jun Zeng

We discover that the two streams should communicate with each other, and this is crucial to phase prediction.

Sound Audio and Speech Processing

Posterior-Guided Neural Architecture Search

1 code implementation23 Jun 2019 Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng

Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient.

Image Classification Neural Architecture Search

Towards a Better Match in Siamese Network Based Visual Object Tracker

no code implementations5 Sep 2018 Anfeng He, Chong Luo, Xinmei Tian, Wen-Jun Zeng

Recently, Siamese network based trackers have received tremendous interest for their fast tracking speed and high performance.

Visual Object Tracking

Cannot find the paper you are looking for? You can Submit a new open access paper.