no code implementations • 14 Aug 2024 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
The field of autonomous driving increasingly demands high-quality annotated video training data.
no code implementations • 28 May 2024 • Yifan Bai, Dongming Wu, Yingfei Liu, Fan Jia, Weixin Mao, Ziheng Zhang, Yucheng Zhao, Jianbing Shen, Xing Wei, Tiancai Wang, Xiangyu Zhang
Despite its simplicity, Atlas demonstrates superior performance in both 3D detection and ego planning tasks on nuScenes dataset, proving that 3D-tokenized LLM is the key to reliable autonomous driving.
no code implementations • 28 Mar 2024 • Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang
Autonomous driving progress relies on large-scale annotated datasets.
no code implementations • 22 Jan 2024 • Jinghuai Yao, Puyuan Du, Yucheng Zhao, YuBo Wang
Visible (VIS) imagery of satellites has various important applications in meteorology, including monitoring Tropical Cyclones (TCs).
no code implementations • 17 Jan 2024 • Shuo Wang, Fan Jia, Yingfei Liu, Yucheng Zhao, Zehui Chen, Tiancai Wang, Chi Zhang, Xiangyu Zhang, Feng Zhao
This paper introduces the Stream Query Denoising (SQD) strategy as a novel approach for temporal modeling in high-definition map (HD-map) construction.
no code implementations • CVPR 2024 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang
This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.
no code implementations • 22 Nov 2023 • Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang
Based on the vision-action pairs, we construct a general world model based on MLLM and diffusion model for autonomous driving, termed ADriver-I.
no code implementations • 20 Nov 2023 • Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang
Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent.
no code implementations • 12 Apr 2023 • Zhiyuan Zhao, Lijun Wu, Chuanxin Tang, Dacheng Yin, Yucheng Zhao, Chong Luo
Filler words like ``um" or ``uh" are common in spontaneous speech.
1 code implementation • CVPR 2023 • Yucheng Zhao, Chong Luo, Chuanxin Tang, Dongdong Chen, Noel Codella, Zheng-Jun Zha
We believe that the concept of streaming video model and the implementation of S-ViT are solid steps towards a unified deep learning architecture for video understanding.
no code implementations • CVPR 2023 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Chuanxin Tang, Xiyang Dai, Yucheng Zhao, Yujia Xie, Lu Yuan, Yu-Gang Jiang
Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank.
Ranked #1 on Semi-Supervised Video Object Segmentation on Long Video Dataset (using extra training data)
no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan
This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.
Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)
no code implementations • 28 Jun 2022 • Dacheng Yin, Chuanxin Tang, Yanqing Liu, Xiaoqiang Wang, Zhiyuan Zhao, Yucheng Zhao, Zhiwei Xiong, Sheng Zhao, Chong Luo
In the proposed paradigm, global and local factors in speech are explicitly decomposed and separately manipulated to achieve high speaker similarity and continuous prosody.
1 code implementation • 14 Jun 2022 • Juhong Min, Yucheng Zhao, Chong Luo, Minsu Cho
We propose to incorporate peripheral position encoding to the multi-head self-attention layers to let the network learn to partition the visual field into diverse peripheral regions given training data.
2 code implementations • 26 Jan 2022 • Guangting Wang, Yucheng Zhao, Chuanxin Tang, Chong Luo, Wenjun Zeng
It can be even replaced by a zero-parameter operation.
Ranked #79 on Object Detection on COCO minival (APM metric)
2 code implementations • 12 Sep 2021 • Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, Wenjun Zeng
Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module.
Ranked #400 on Image Classification on ImageNet
1 code implementation • 12 Sep 2021 • Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
Given a piece of speech and its transcript text, text-based speech editing aims to generate speech that can be seamlessly inserted into the given speech by editing the transcript.
1 code implementation • 30 Aug 2021 • Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
no code implementations • ICCV 2021 • Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.
1 code implementation • 15 Apr 2021 • Haojin Yang, Zhen Shen, Yucheng Zhao
Deep convolutional neural networks (CNN) have achieved astonishing results in a large variety of applications.
Ranked #892 on Image Classification on ImageNet
no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.