1 code implementation • 12 Sep 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu
For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.
no code implementations • CVPR 2018 • Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan
Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.
no code implementations • 10 Nov 2017 • Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang
Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.
1 code implementation • 11 Nov 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu
From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.
no code implementations • 10 May 2018 • Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li
We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.
no code implementations • 13 Jul 2018 • Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu
Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.
no code implementations • 31 Oct 2018 • Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li
Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu
Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.
no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu
Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.
Motion Detection Motion Detection In Non-Stationary Scenes +1
no code implementations • 14 Dec 2018 • Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du
On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.
Ranked #1 on Action Recognition on UTD-MHAD
2 code implementations • 2 Aug 2019 • Kun Han, Junwen Chen, HUI ZHANG, Haiyang Xu, Yiping Peng, Yun Wang, Ning Ding, Hui Deng, Yonghu Gao, Tingwei Guo, Yi Zhang, Yahao He, Baochang Ma, Yu-Long Zhou, Kangli Zhang, Chao Liu, Ying Lyu, Chenxi Wang, Cheng Gong, Yunbo Wang, Wei Zou, Hui Song, Xiangang Li
In this paper we present DELTA, a deep learning based language technology platform.
Ranked #3 on Text Classification on Yahoo! Answers
no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang
Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.
no code implementations • 24 Aug 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu
Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.
no code implementations • 26 Aug 2019 • Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang
In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).
no code implementations • 13 Sep 2019 • Zheng Zhu, Hongxuan Ma, Wei Zou
Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.
1 code implementation • 19 Sep 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma
This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.
no code implementations • 24 Sep 2019 • Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang
In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.
no code implementations • 22 Oct 2019 • Ruixiong Zhang, Wei Zou, Xiangang Li
To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.
1 code implementation • 22 Oct 2019 • Dongwei Jiang, Xiaoning Lei, Wubo Li, Ne Luo, Yuxuan Hu, Wei Zou, Xiangang Li
Speech recognition technologies are gaining enormous popularity in various industrial applications.
no code implementations • 23 Oct 2019 • Wubo Li, Wei Zou, Xiangang Li
Multimodalities provide promising performance than unimodality in most tasks.
1 code implementation • ACL 2020 • Wei Zou, Shu-Jian Huang, Jun Xie, Xin-yu Dai, Jia-Jun Chen
Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance.
1 code implementation • 20 May 2020 • Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li
In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.
no code implementations • 19 Oct 2020 • Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li
This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.
Audio and Speech Processing
no code implementations • 21 Oct 2020 • Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li
Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.
1 code implementation • 27 Oct 2020 • Dongwei Jiang, Wubo Li, Miao Cao, Wei Zou, Xiangang Li
Self-supervised visual pretraining has shown significant progress recently.
no code implementations • 26 Apr 2021 • Jianwei Sun, Zhiyuan Tang, Hengxin Yin, Wei Wang, Xi Zhao, Shuaijiang Zhao, Xiaoning Lei, Wei Zou, Xiangang Li
Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.
Ranked #1 on Speech Recognition on GigaSpeech
no code implementations • 8 Mar 2022 • Snehasish Roy Chowdhury, Ramesh Arumugam, Wei Zou, V. K. Chandrasekar, D. V. Senthilkumar
Nevertheless, at the local scale, the spread of the inhomogeneous steady states increases up to a critical value of the limiting factor, favoring the metacommunity persistence, and then starts decreasing for further decrease in the limiting factor with varying local interaction.
no code implementations • 19 Apr 2022 • Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li
In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).
no code implementations • 19 Apr 2022 • Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li
This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.
1 code implementation • 1 Aug 2022 • Hu Su, Yonghao He, Rui Jiang, Jiabin Zhang, Wei Zou, Bin Fan
The dynamic smooth label is assigned to supervise the classification branch.
1 code implementation • 17 Aug 2022 • Goutham Rajendran, Wei Zou
Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 28 Jul 2023 • Cheng Wen, Xianghui Sun, Shuaijiang Zhao, Xiaoquan Fang, Liangyu Chen, Wei Zou
This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation.
no code implementations • 27 Oct 2023 • Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui
The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response.
no code implementations • 5 Jan 2024 • Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui
This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents.
1 code implementation • 12 Jan 2024 • Shuaijie She, Wei Zou, ShuJian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen
To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language.
1 code implementation • 12 Feb 2024 • Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia
We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts.
1 code implementation • NeurIPS 2023 • Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng
Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips.
no code implementations • 5 Mar 2024 • Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou
Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues.
1 code implementation • 28 Mar 2024 • Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia
Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models.
1 code implementation • 13 Apr 2024 • Wei Zou, Ziyuan Zhuang, ShuJian Huang, Jia Liu, Jiajun Chen
Paraphrase generation aims to produce high-quality and diverse utterances of a given text.