Search Results for author: Wei Zou

Found 41 papers, 17 papers with code

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

1 code implementation • 12 Sep 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.

Action Classification Action Recognition +3

Paper
Code

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

no code implementations • CVPR 2018 • Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.

Optical Flow Estimation

Paper
Add Code

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations • 10 Nov 2017 • Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

Paper
Add Code

End-to-end Video-level Representation Learning for Action Recognition

1 code implementation • 11 Nov 2017 • Jiagang Zhu, Wei Zou, Zheng Zhu

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.

Action Recognition Optical Flow Estimation +2

Paper
Code

A comparable study of modeling units for end-to-end Mandarin speech recognition

no code implementations • 10 May 2018 • Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li

We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.

speech-recognition Speech Recognition

Paper
Add Code

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

no code implementations • 13 Jul 2018 • Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Moving Object Detection object-detection +1

Paper
Add Code

Towards End-to-End Code-Switching Speech Recognition

no code implementations • 31 Oct 2018 • Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Optical Flow Based Online Moving Foreground Analysis

no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.

Clustering Moving Object Detection +2

Paper
Add Code

An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

no code implementations • 18 Nov 2018 • Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Motion Detection Motion Detection In Non-Stationary Scenes +1

Paper
Add Code

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations • 14 Dec 2018 • Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Ranked #1 on Action Recognition on UTD-MHAD

Action Recognition Multimodal Activity Recognition +3

Paper
Add Code

DELTA: A DEep learning based Language Technology plAtform

2 code implementations • 2 Aug 2019 • Kun Han, Junwen Chen, HUI ZHANG, Haiyang Xu, Yiping Peng, Yun Wang, Ning Ding, Hui Deng, Yonghu Gao, Tingwei Guo, Yi Zhang, Yahao He, Baochang Ma, Yu-Long Zhou, Kangli Zhang, Chao Liu, Ying Lyu, Chenxi Wang, Cheng Gong, Yunbo Wang, Wei Zou, Hui Song, Xiangang Li

In this paper we present DELTA, a deep learning based language technology platform.

Ranked #3 on Text Classification on Yahoo! Answers

Abstractive Text Summarization Intent Detection +9

1,585

Paper
Code

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations • 15 Aug 2019 • Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Paper
Add Code

Camera Pose Correction in SLAM Based on Bias Values of Map Points

no code implementations • 24 Aug 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu

Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.

feature selection Pose Estimation

Paper
Add Code

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations • 26 Aug 2019 • Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Paper
Add Code

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

no code implementations • 13 Sep 2019 • Zheng Zhu, Hongxuan Ma, Wei Zou

Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.

Optical Flow Estimation Visual Tracking

Paper
Add Code

EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras

1 code implementation • 19 Sep 2019 • Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma

This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.

Pose Estimation

Paper
Code

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

no code implementations • 24 Sep 2019 • Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang

In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.

Position

Paper
Add Code

Cross-task pre-training for on-device acoustic scene classification

no code implementations • 22 Oct 2019 • Ruixiong Zhang, Wei Zou, Xiangang Li

To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.

Acoustic Scene Classification Classification +3

Paper
Add Code

Improving Transformer-based Speech Recognition Using Unsupervised Pre-training

1 code implementation • 22 Oct 2019 • Dongwei Jiang, Xiaoning Lei, Wubo Li, Ne Luo, Yuxuan Hu, Wei Zou, Xiangang Li

Speech recognition technologies are gaining enormous popularity in various industrial applications.

speech-recognition Speech Recognition +1

941

Paper
Code

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

no code implementations • 23 Oct 2019 • Wubo Li, Wei Zou, Xiangang Li

Multimodalities provide promising performance than unimodality in most tasks.

Paper
Add Code

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

1 code implementation • ACL 2020 • Wei Zou, Shu-Jian Huang, Jun Xie, Xin-yu Dai, Jia-Jun Chen

Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance.

Machine Translation Translation

Paper
Code

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

1 code implementation • 20 May 2020 • Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li

In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.

speech-recognition Speech Recognition +2

941

Paper
Code

DiDiSpeech: A Large Scale Mandarin Speech Corpus

no code implementations • 19 Oct 2020 • Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.

Audio and Speech Processing

Paper
Add Code

TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog

no code implementations • 21 Oct 2020 • Wubo Li, Dongwei Jiang, Wei Zou, Xiangang Li

Audio Visual Scene-aware Dialog (AVSD) is a task to generate responses when discussing about a given video.

Machine Translation Multi-Task Learning +2

Paper
Add Code

Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning

1 code implementation • 27 Oct 2020 • Dongwei Jiang, Wubo Li, Miao Cao, Wei Zou, Xiangang Li

Self-supervised visual pretraining has shown significant progress recently.

Representation Learning Speech Emotion Recognition +2

941

Paper
Code

Semantic Data Augmentation for End-to-End Mandarin Speech Recognition

no code implementations • 26 Apr 2021 • Jianwei Sun, Zhiyuan Tang, Hengxin Yin, Wei Wang, Xi Zhao, Shuaijiang Zhao, Xiaoning Lei, Wei Zou, Xiangang Li

Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

2 code implementations • 13 Jun 2021 • Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

Ranked #1 on Speech Recognition on GigaSpeech

Sentence speech-recognition +1

599

Paper
Code

Role of limiting dispersal on metacommunity stability and persistence

no code implementations • 8 Mar 2022 • Snehasish Roy Chowdhury, Ramesh Arumugam, Wei Zou, V. K. Chandrasekar, D. V. Senthilkumar

Nevertheless, at the local scale, the spread of the inhomogeneous steady states increases up to a critical value of the limiting factor, favoring the metacommunity persistence, and then starts decreasing for further decrease in the limiting factor with varying local interaction.

Paper
Add Code

Time Domain Adversarial Voice Conversion for ADD 2022

no code implementations • 19 Apr 2022 • Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).

Voice Conversion

Paper
Add Code

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

no code implementations • 19 Apr 2022 • Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.

Voice Conversion

Paper
Add Code

DSLA: Dynamic smooth label assignment for efficient anchor-free object detection

1 code implementation • 1 Aug 2022 • Hu Su, Yonghao He, Rui Jiang, Jiabin Zhang, Wei Zou, Bin Fan

The dynamic smooth label is assigned to supervise the classification branch.

Classification object-detection +1

Paper
Code

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

1 code implementation • 17 Aug 2022 • Goutham Rajendran, Wei Zou

Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

1 code implementation • 28 Jul 2023 • Cheng Wen, Xianghui Sun, Shuaijiang Zhao, Xiaoquan Fang, Liangyu Chen, Wei Zou

This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation.

Language Modelling

7,533

Paper
Code

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

no code implementations • 27 Oct 2023 • Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response.

Response Generation

Paper
Add Code

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

no code implementations • 5 Jan 2024 • Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents.

Paper
Add Code

MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

1 code implementation • 12 Jan 2024 • Shuaijie She, Wei Zou, ShuJian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen

To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language.

Mathematical Reasoning

Paper
Code

PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

1 code implementation • 12 Feb 2024 • Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia

We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts.

Hallucination Retrieval

Paper
Code

Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

1 code implementation • NeurIPS 2023 • Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips.

Contrastive Learning

Paper
Code

Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

no code implementations • 5 Mar 2024 • Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues.

Pseudo Label

Paper
Add Code

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

1 code implementation • 28 Mar 2024 • Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia

Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models.

Emotion Recognition Road Segmentation

Paper
Code

Enforcing Paraphrase Generation via Controllable Latent Diffusion

1 code implementation • 13 Apr 2024 • Wei Zou, Ziyuan Zhuang, ShuJian Huang, Jia Liu, Jiajun Chen

Paraphrase generation aims to produce high-quality and diverse utterances of a given text.

Paraphrase Generation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.