Search Results for author: Wei Zou

Found 41 papers, 17 papers with code

Enforcing Paraphrase Generation via Controllable Latent Diffusion

1 code implementation13 Apr 2024 Wei Zou, Ziyuan Zhuang, ShuJian Huang, Jia Liu, Jiajun Chen

Paraphrase generation aims to produce high-quality and diverse utterances of a given text.

Paraphrase Generation

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

1 code implementation28 Mar 2024 Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia

Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models.

Emotion Recognition Road Segmentation

Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

no code implementations5 Mar 2024 Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues.

Pseudo Label

PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models

1 code implementation12 Feb 2024 Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia

We formulate knowledge poisoning attacks as an optimization problem, whose solution is a set of poisoned texts.

Hallucination Retrieval

MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization

1 code implementation12 Jan 2024 Shuaijie She, Wei Zou, ShuJian Huang, Wenhao Zhu, Xiang Liu, Xiang Geng, Jiajun Chen

To enhance reasoning abilities in non-dominant languages, we propose a Multilingual-Alignment-as-Preference Optimization framework (MAPO), aiming to align the reasoning processes in other languages with the dominant language.

Mathematical Reasoning

From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models

no code implementations5 Jan 2024 Na Liu, Liangyu Chen, Xiaoyu Tian, Wei Zou, Kaijiang Chen, Ming Cui

This paper introduces RAISE (Reasoning and Acting through Scratchpad and Examples), an advanced architecture enhancing the integration of Large Language Models (LLMs) like GPT-4 into conversational agents.

DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking

no code implementations27 Oct 2023 Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui

The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response.

Response Generation

ChatHome: Development and Evaluation of a Domain-Specific Language Model for Home Renovation

1 code implementation28 Jul 2023 Cheng Wen, Xianghui Sun, Shuaijiang Zhao, Xiaoquan Fang, Liangyu Chen, Wei Zou

This paper presents the development and evaluation of ChatHome, a domain-specific language model (DSLM) designed for the intricate field of home renovation.

Language Modelling

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

1 code implementation17 Aug 2022 Goutham Rajendran, Wei Zou

Therefore, the models we develop for various tasks should be robust to such kinds of noisy data, which led to the thriving field of robust machine learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Time Domain Adversarial Voice Conversion for ADD 2022

no code implementations19 Apr 2022 Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).

Voice Conversion

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

no code implementations19 Apr 2022 Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.

Voice Conversion

Role of limiting dispersal on metacommunity stability and persistence

no code implementations8 Mar 2022 Snehasish Roy Chowdhury, Ramesh Arumugam, Wei Zou, V. K. Chandrasekar, D. V. Senthilkumar

Nevertheless, at the local scale, the spread of the inhomogeneous steady states increases up to a critical value of the limiting factor, favoring the metacommunity persistence, and then starts decreasing for further decrease in the limiting factor with varying local interaction.

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

2 code implementations13 Jun 2021 Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

Sentence speech-recognition +1

DiDiSpeech: A Large Scale Mandarin Speech Corpus

no code implementations19 Oct 2020 Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.

Audio and Speech Processing

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

1 code implementation20 May 2020 Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li

In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.

speech-recognition Speech Recognition +2

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

1 code implementation ACL 2020 Wei Zou, Shu-Jian Huang, Jun Xie, Xin-yu Dai, Jia-Jun Chen

Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical for industrial maintenance.

Machine Translation Translation

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

no code implementations23 Oct 2019 Wubo Li, Wei Zou, Xiangang Li

Multimodalities provide promising performance than unimodality in most tasks.

Cross-task pre-training for on-device acoustic scene classification

no code implementations22 Oct 2019 Ruixiong Zhang, Wei Zou, Xiangang Li

To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.

Acoustic Scene Classification Classification +3

The Field-of-View Constraint of Markers for Mobile Robot with Pan-Tilt Camera

no code implementations24 Sep 2019 Hongxuan Ma, Wei Zou, Zheng Zhu, Siyang Sun, Zhaobing Kang

In the field of navigation and visual servo, it is common to calculate relative pose by feature points on markers, so keeping markers in camera's view is an important problem.

Position

EPOSIT: An Absolute Pose Estimation Method for Pinhole and Fish-Eye Cameras

1 code implementation19 Sep 2019 Zhaobing Kang, Wei Zou, Zheng Zhu, Chi Zhang, Hongxuan Ma

This paper presents a generic 6DOF camera pose estimation method, which can be used for both the pinhole camera and the fish-eye camera.

Pose Estimation

Human Following for Wheeled Robot with Monocular Pan-tilt Camera

no code implementations13 Sep 2019 Zheng Zhu, Hongxuan Ma, Wei Zou

Human following on mobile robots has witnessed significant advances due to its potentials for real-world applications.

Optical Flow Estimation Visual Tracking

High Performance Visual Object Tracking with Unified Convolutional Networks

no code implementations26 Aug 2019 Zheng Zhu, Wei Zou, Guan Huang, Dalong Du, Chang Huang

In this paper, we propose an end-to-end framework to learn the convolutional features and perform the tracking process simultaneously, namely, a unified convolutional tracker (UCT).

Object Visual Object Tracking +1

Camera Pose Correction in SLAM Based on Bias Values of Map Points

no code implementations24 Aug 2019 Zhaobing Kang, Wei Zou, Zheng Zhu

Firstly, the relationship between the camera pose estimation error and bias values of map points is derived based on the optimized function in VSLAM.

feature selection Pose Estimation

FastPose: Towards Real-time Pose Estimation and Tracking via Scale-normalized Multi-task Networks

no code implementations15 Aug 2019 Jiabin Zhang, Zheng Zhu, Wei Zou, Peng Li, Yanwei Li, Hu Su, Guan Huang

Given the results of MTN, we adopt an occlusion-aware Re-ID feature strategy in the pose tracking module, where pose information is utilized to infer the occlusion state to make better use of Re-ID feature.

Human Detection Multi-Person Pose Estimation +3

Action Machine: Rethinking Action Recognition in Trimmed Videos

no code implementations14 Dec 2018 Jiagang Zhu, Wei Zou, Liang Xu, Yiming Hu, Zheng Zhu, Manyu Chang, Jun-Jie Huang, Guan Huang, Dalong Du

On NTU RGB-D, Action Machine achieves the state-of-the-art performance with top-1 accuracies of 97. 2% and 94. 3% on cross-view and cross-subject respectively.

Action Recognition Multimodal Activity Recognition +3

An Efficient Optical Flow Based Motion Detection Method for Non-stationary Scenes

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Real-time motion detection in non-stationary scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Motion Detection Motion Detection In Non-Stationary Scenes +1

Optical Flow Based Online Moving Foreground Analysis

no code implementations18 Nov 2018 Junjie Huang, Wei Zou, Zheng Zhu, Jiagang Zhu

Obtained by moving object detection, the foreground mask result is unshaped and can not be directly used in most subsequent processes.

Clustering Moving Object Detection +2

Towards End-to-End Code-Switching Speech Recognition

no code implementations31 Oct 2018 Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Optical Flow Based Real-time Moving Object Detection in Unconstrained Scenes

no code implementations13 Jul 2018 Junjie Huang, Wei Zou, Jiagang Zhu, Zheng Zhu

Real-time moving object detection in unconstrained scenes is a difficult task due to dynamic background, changing foreground appearance and limited computational resource.

Moving Object Detection object-detection +1

A comparable study of modeling units for end-to-end Mandarin speech recognition

no code implementations10 May 2018 Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li

We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.

speech-recognition Speech Recognition

End-to-end Video-level Representation Learning for Action Recognition

1 code implementation11 Nov 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

From the frame/clip-level feature learning to the video-level representation building, deep learning methods in action recognition have developed rapidly in recent years.

Action Recognition Optical Flow Estimation +2

UCT: Learning Unified Convolutional Networks for Real-time Visual Tracking

no code implementations10 Nov 2017 Zheng Zhu, Guan Huang, Wei Zou, Dalong Du, Chang Huang

Convolutional neural networks (CNN) based tracking approaches have shown favorable performance in recent benchmarks.

Real-Time Visual Tracking

End-to-end Flow Correlation Tracking with Spatial-temporal Attention

no code implementations CVPR 2018 Zheng Zhu, Wei Wu, Wei Zou, Junjie Yan

Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks.

Optical Flow Estimation

Learning Gating ConvNet for Two-Stream based Methods in Action Recognition

1 code implementation12 Sep 2017 Jiagang Zhu, Wei Zou, Zheng Zhu

For the two-stream style methods in action recognition, fusing the two streams' predictions is always by the weighted averaging scheme.

Action Classification Action Recognition +3

Cannot find the paper you are looking for? You can Submit a new open access paper.