no code implementations • 13 Dec 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang
Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
no code implementations • 24 Oct 2022 • Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo
In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.
no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan
This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.
Ranked #1 on
Zero-Shot Video Retrieval
on MSR-VTT
no code implementations • 9 Aug 2022 • Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo
Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.
no code implementations • 28 Jun 2022 • Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo
In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.
1 code implementation • 14 Jun 2022 • Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
2 code implementations • ICLR 2022 • Dacheng Yin, Xuanchi Ren, Chong Luo, Yuwang Wang, Zhiwei Xiong, Wenjun Zeng
Last, an innovative link attention module serves as the decoder to reconstruct data from the decomposed content and style, with the help of the linking keys.
1 code implementation • 26 Jan 2022 • Guangting Wang, Yucheng Zhao, Chuanxin Tang, Chong Luo, Wenjun Zeng
It can be even replaced by a zero-parameter operation.
Ranked #66 on
Object Detection
on COCO minival
(APM metric)
1 code implementation • CVPR 2022 • Yaosi Hu, Chong Luo, Zhenzhong Chen
With both controllable appearance and motion, TI2V aims at generating videos from a static image and a text description.
2 code implementations • 12 Sep 2021 • Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng
Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module.
Ranked #326 on
Image Classification
on ImageNet
1 code implementation • 12 Sep 2021 • Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.
1 code implementation • 30 Aug 2021 • Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
no code implementations • ICCV 2021 • Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.
1 code implementation • CVPR 2021 • Guangting Wang, Yizhou Zhou, Chong Luo, Wenxuan Xie, Wenjun Zeng, Zhiwei Xiong
The proxy task is to estimate the position and size of the image patch in a sequence of video frames, given only the target bounding box in the first frame.
no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.
no code implementations • 28 Jan 2021 • Yizhou Zhou, Chong Luo, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng
We believe that VAE$^2$ is also applicable to other stochastic sequence prediction problems where training data are lack of stochasticity.
no code implementations • CVPR 2020 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng
Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.
no code implementations • CVPR 2020 • Guangting Wang, Chong Luo, Xiaoyan Sun, Zhiwei Xiong, Wen-Jun Zeng
We propose a principled three-step approach to build a high-performance tracker.
4 code implementations • Applications of Artificial Intelligence Conference 2019 • Dacheng Yin, Chong Luo, Zhiwei Xiong, Wen-Jun Zeng
We discover that the two streams should communicate with each other, and this is crucial to phase prediction.
Sound Audio and Speech Processing
1 code implementation • 23 Jun 2019 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng
Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient.
no code implementations • CVPR 2019 • Guangting Wang, Chong Luo, Zhiwei Xiong, Wen-Jun Zeng
The two stages are connected in series as the input proposals of the FM stage are generated by the CM stage.
no code implementations • 2 Dec 2018 • Stephen H. Bach, Daniel Rodriguez, Yintao Liu, Chong Luo, Haidong Shao, Cassandra Xia, Souvik Sen, Alexander Ratner, Braden Hancock, Houman Alborzi, Rahul Kuchhal, Christopher Ré, Rob Malkin
Labeling training data is one of the most costly bottlenecks in developing machine learning-based applications.
no code implementations • 5 Sep 2018 • Anfeng He, Chong Luo, Xinmei Tian, Wen-Jun Zeng
Recently, Siamese network based trackers have received tremendous interest for their fast tracking speed and high performance.
Ranked #9 on
Visual Object Tracking
on VOT2017/18
1 code implementation • CVPR 2018 • Anfeng He, Chong Luo, Xinmei Tian, Wen-Jun Zeng
SA-Siam is composed of a semantic branch and an appearance branch.
Ranked #1 on
Visual Object Tracking
on OTB-50