no code implementations • 5 Dec 2024 • Longtao Zheng, Yifan Zhang, Hanzhong Guo, Jiachun Pan, Zhenxiong Tan, Jiahao Lu, Chuanxin Tang, Bo An, Shuicheng Yan
Recent advances in video diffusion models have unlocked new potential for realistic audio-driven talking video generation.
no code implementations • CVPR 2024 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo
We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.
no code implementations • 12 Apr 2023 • Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo
Filler words like ``um" or ``uh" are common in spontaneous speech.
1 code implementation • CVPR 2023 • Yucheng Zhao, Chong Luo, Chuanxin Tang, Dongdong Chen, Noel Codella, Zheng-Jun Zha
We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.
no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang
Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
Ranked #1 on
Semi-Supervised Video Object Segmentation
on Long Video Dataset
(using extra training data)
no code implementations • 24 Oct 2022 • Dacheng Yin, Zhiyuan Zhao, Chuanxin Tang, Zhiwei Xiong, Chong Luo
In this paper, we present TridentSE, a novel architecture for speech enhancement, which is capable of efficiently capturing both global information and local details.
no code implementations • 9 Aug 2022 • Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo
Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefined keywords in a continuous speech.
no code implementations • 28 Jun 2022 • Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo
In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.
2 code implementations • 26 Jan 2022 • Guangting Wang, Yucheng Zhao, Chuanxin Tang, Chong Luo, Wenjun Zeng
It can be even replaced by a zero-parameter operation.
Ranked #79 on
Object Detection
on COCO minival
(APM metric)
1 code implementation • 12 Sep 2021 • Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.
2 code implementations • 12 Sep 2021 • Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng
Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module.
Ranked #427 on
Image Classification
on ImageNet
1 code implementation • 30 Aug 2021 • Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.