Search Results for author: Yu Wu

Found 132 papers, 56 papers with code

D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy

no code implementations • 6 Apr 2024 • Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu

The boom of Generative AI brings opportunities entangled with risks and concerns.

Paper
Add Code

Improving Bird's Eye View Semantic Segmentation by Task Decomposition

no code implementations • 2 Apr 2024 • Tianhao Zhao, Yongcan Chen, Yu Wu, Tianyang Liu, Bo Du, Peilun Xiao, Shi Qiu, Hongda Yang, Guozhen Li, Yi Yang, Yutian Lin

In the first stage, we train a BEV autoencoder to reconstruct the BEV segmentation maps given corrupted noisy latent representation, which urges the decoder to learn fundamental knowledge of typical BEV patterns.

Autonomous Driving Bird's-Eye View Semantic Segmentation +2

Paper
Add Code

Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning

no code implementations • 1 Apr 2024 • Rongjie Li, Yu Wu, Xuming He

Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering.

Image Captioning Instruction Following +5

Paper
Add Code

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

1 code implementation • 18 Feb 2024 • Long Qian, Juncheng Li, Yu Wu, Yaobo Ye, Hao Fei, Tat-Seng Chua, Yueting Zhuang, Siliang Tang

Large Language Models (LLMs) demonstrate remarkable proficiency in comprehending and handling text-based tasks.

Language Modelling Large Language Model

Paper
Code

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

1 code implementation • 20 Dec 2023 • Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS).

Ranked #1 on Video Semantic Segmentation on VSPW

Contrastive Learning Denoising +6

Paper
Code

Joint Trading and Scheduling among Coupled Carbon-Electricity-Heat-Gas Industrial Clusters

no code implementations • 20 Dec 2023 • Dafeng Zhu, Bo Yang, Yu Wu, Haoran Deng, ZhaoYang Dong, Kai Ma, Xinping Guan

This paper presents a carbon-energy coupling management framework for an industrial park, where the carbon flow model accompanying multi-energy flows is adopted to track and suppress carbon emissions on the user side.

energy trading Management +1

Paper
Add Code

DETER: Detecting Edited Regions for Deterring Generative Manipulations

no code implementations • 16 Dec 2023 • Sai Wang, Ye Zhu, Ruoyu Wang, Amaya Dharmasiri, Olga Russakovsky, Yu Wu

While face swapping and attribute editing are performed on similar face regions such as eyes and nose, the inpainting operation can be performed on random image regions, removing the spurious correlations of previous datasets.

Attribute Face Swapping +1

Paper
Add Code

A Survey on Trustworthy Edge Intelligence: From Security and Reliability To Transparency and Sustainability

no code implementations • 27 Oct 2023 • Xiaojie Wang, Beibei Wang, Yu Wu, Zhaolong Ning, Song Guo, Fei Richard Yu

Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial Intelligence (AI) to push the capabilities of AI to the network edge for real-time, efficient and secure intelligent decision-making and computation.

Decision Making Edge-computing

Paper
Add Code

SIRe-IR: Inverse Rendering for BRDF Reconstruction with Shadow and Illumination Removal in High-Illuminance Scenes

1 code implementation • 19 Oct 2023 • ZiYi Yang, Yanzhen Chen, Xinyu Gao, Yazhen Yuan, Yu Wu, Xiaowei Zhou, Xiaogang Jin

Implicit neural representation has opened up new possibilities for inverse rendering.

Inverse Rendering

Paper
Code

Unseen Image Synthesis with Diffusion Models

no code implementations • 13 Oct 2023 • Ye Zhu, Yu Wu, Zhiwei Deng, Olga Russakovsky, Yan Yan

While the current trend in the generative field is scaling up towards larger models and more training data for generalized domain representations, we go the opposite direction in this work by synthesizing unseen domain images without additional training.

Denoising Image Generation

Paper
Add Code

1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

1 code implementation • 28 Aug 2023 • Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving.

Autonomous Driving Denoising +6

114

Paper
Code

Vision-Based Human Pose Estimation via Deep Learning: A Survey

no code implementations • 26 Aug 2023 • Gongjin Lan, Yu Wu, Fei Hu, Qi Hao

In this article, we provide an up-to-date and in-depth overview of the deep learning approaches in vision-based HPE.

Pose Estimation

Paper
Add Code

WavMark: Watermarking for Audio Generation

no code implementations • 24 Aug 2023 • Guangyu Chen, Yu Wu, Shujie Liu, Tao Liu, Xiaoyong Du, Furu Wei

Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism.

Audio Generation

Paper
Add Code

Grounded Image Text Matching with Mismatched Relation Reasoning

no code implementations • ICCV 2023 • Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, Xuming He

This paper introduces Grounded Image Text Matching with Mismatched Relation (GITM-MR), a novel visual-linguistic joint task that evaluates the relation understanding capabilities of transformer-based pre-trained models.

Image-text matching Relation +2

Paper
Add Code

UDAMA: Unsupervised Domain Adaptation through Multi-discriminator Adversarial Training with Noisy Labels Improves Cardio-fitness Prediction

1 code implementation • 31 Jul 2023 • Yu Wu, Dimitris Spathis, Hong Jia, Ignacio Perez-Pozuelo, Tomas Gonzales, Soren Brage, Nicholas Wareham, Cecilia Mascolo

However, most healthcare datasets with high-quality (gold-standard) labels are small-scale, as directly collecting ground truth is often costly and time-consuming.

Transfer Learning Unsupervised Domain Adaptation

Paper
Code

On decoder-only architecture for speech-to-text and large language model integration

no code implementations • 8 Jul 2023 • Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu Wu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language.

Language Modelling Large Language Model +1

Paper
Add Code

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

no code implementations • 28 Jun 2023 • Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM).

Domain Adaptation Language Modelling +3

Paper
Add Code

Accelerating Transducers through Adjacent Token Merging

no code implementations • 28 Jun 2023 • Yuang Li, Yu Wu, Jinyu Li, Shujie Liu

Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation

1 code implementation • 14 Jun 2023 • Ruoyu Wang, Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu

In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties and propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon given a pre-trained frozen diffusion model for versatile customization application scenarios, where the low-level pixel information from the conditioning needs to be preserved.

Denoising Image Generation

Paper
Code

1st Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

1 code implementation • 7 Jun 2023 • Tao Zhang, Xingye Tian, Haoran Wei, Yu Wu, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

In this report, we successfully validated the effectiveness of the decoupling strategy in video panoptic segmentation.

Autonomous Driving Segmentation +2

114

Paper
Code

DVIS: Decoupled Video Instance Segmentation Framework

1 code implementation • ICCV 2023 • Tao Zhang, Xingye Tian, Yu Wu, Shunping Ji, Xuebo Wang, Yuan Zhang, Pengfei Wan

The efficacy of the decoupling strategy relies on two crucial elements: 1) attaining precise long-term alignment outcomes via frame-by-frame association during tracking, and 2) the effective utilization of temporal information predicated on the aforementioned accurate alignment outcomes during refinement.

Ranked #3 on Video Panoptic Segmentation on VIPSeg

Autonomous Driving Instance Segmentation +5

114

Paper
Code

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

no code implementations • 31 May 2023 • Huiqiang Jiang, Li Lyna Zhang, Yuang Li, Yu Wu, Shijie Cao, Ting Cao, Yuqing Yang, Jinyu Li, Mao Yang, Lili Qiu

In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

no code implementations • 25 May 2023 • Tianrui Wang, Long Zhou, Ziqiang Zhang, Yu Wu, Shujie Liu, Yashesh Gaur, Zhuo Chen, Jinyu Li, Furu Wei

Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities.

Language Modelling Multi-Task Learning +3

Paper
Add Code

Click-Feedback Retrieval

no code implementations • 28 Apr 2023 • Zeyu Wang, Yu Wu

In this work, we study a setting where the feedback is provided through users clicking liked and disliked searching results.

Retrieval

Paper
Add Code

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World

1 code implementation • ICCV 2023 • Qifan Yu, Juncheng Li, Yu Wu, Siliang Tang, Wei Ji, Yueting Zhuang

Based on that, we further introduce a novel Entangled cross-modal prompt approach for open-world predicate scene graph generation (Epic), where models can generalize to unseen predicates in a zero-shot manner.

Graph Generation Language Modelling +1

Paper
Code

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling

1 code implementation • 7 Mar 2023 • Ziqiang Zhang, Long Zhou, Chengyi Wang, Sanyuan Chen, Yu Wu, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

We propose a cross-lingual neural codec language model, VALL-E X, for cross-lingual speech synthesis.

In-Context Learning Language Modelling +3

7,138

Paper
Code

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

6 code implementations • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.

In-Context Learning Language Modelling +2

32,377

Paper
Code

Learning To Segment Every Referring Object Point by Point

1 code implementation • CVPR 2023 • Mengxue Qu, Yu Wu, Yunchao Wei, Wu Liu, Xiaodan Liang, Yao Zhao

Extensive experiments show that our model achieves 52. 06% in terms of accuracy (versus 58. 93% in fully supervised setting) on RefCOCO+@testA, when only using 1% of the mask annotations.

Object Referring Expression +1

Paper
Code

Good Is Bad: Causality Inspired Cloth-Debiasing for Cloth-Changing Person Re-Identification

1 code implementation • CVPR 2023 • Zhengwei Yang, Meng Lin, Xian Zhong, Yu Wu, Zheng Wang

Entangled representation of clothing and identity (ID)-intrinsic clues are potentially concomitant in conventional person Re-IDentification (ReID).

Cloth-Changing Person Re-Identification

Paper
Code

Generative Graph Neural Networks for Link Prediction

1 code implementation • 31 Dec 2022 • Xingping Xian, Tao Wu, Xiaoke Ma, Shaojie Qiao, Yabin Shao, Chao Wang, Lin Yuan, Yu Wu

Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated.

Link Prediction

Paper
Code

How to Share: Balancing Layer and Chain Sharing in Industrial Microservice Deployment

no code implementations • 29 Dec 2022 • Yuxiang Liu, Bo Yang, Yu Wu, Cailian Chen, Xinping Guan

However, due to the limited resources of edge servers, it is difficult to meet the optimization goals of the two methods at the same time.

Edge-computing

Paper
Add Code

BEATs: Audio Pre-Training with Acoustic Tokenizers

2 code implementations • 18 Dec 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, Furu Wei

In the first iteration, we use random projection as the acoustic tokenizer to train an audio SSL model in a mask and label prediction manner.

Ranked #1 on Audio Classification on Balanced Audio Set

Audio Classification Self-Supervised Learning

18,279

Paper
Code

Artificial Intelligence Security Competition (AISC)

no code implementations • 7 Dec 2022 • Yinpeng Dong, Peng Chen, Senyou Deng, Lianji L, Yi Sun, Hanyu Zhao, Jiaxing Li, Yunteng Tan, Xinyu Liu, Yangyi Dong, Enhui Xu, Jincai Xu, Shu Xu, Xuelin Fu, Changfeng Sun, Haoliang Han, Xuchong Zhang, Shen Chen, Zhimin Sun, Junyi Cao, Taiping Yao, Shouhong Ding, Yu Wu, Jian Lin, Tianpeng Wu, Ye Wang, Yu Fu, Lin Feng, Kangkang Gao, Zeyu Liu, Yuanzhe Pang, Chengqi Duan, Huipeng Zhou, Yajie Wang, Yuhang Zhao, Shangbo Wu, Haoran Lyu, Zhiyu Lin, YiFei Gao, Shuang Li, Haonan Wang, Jitao Sang, Chen Ma, Junhao Zheng, Yijia Li, Chao Shen, Chenhao Lin, Zhichao Cui, Guoshuai Liu, Huafeng Shi, Kun Hu, Mengxin Zhang

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems.

Autonomous Driving Face Recognition +1

Paper
Add Code

Turning Silver into Gold: Domain Adaptation with Noisy Labels for Wearable Cardio-Respiratory Fitness Prediction

no code implementations • 20 Nov 2022 • Yu Wu, Dimitris Spathis, Hong Jia, Ignacio Perez-Pozuelo, Tomas I. Gonzales, Soren Brage, Nicholas Wareham, Cecilia Mascolo

Deep learning models have shown great promise in various healthcare applications.

Unsupervised Domain Adaptation

Paper
Add Code

Exploring WavLM on Speech Enhancement

no code implementations • 18 Nov 2022 • Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success.

Self-Supervised Learning Speech Enhancement +2

Paper
Add Code

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

no code implementations • 17 Nov 2022 • Xun Gong, Yu Wu, Jinyu Li, Shujie Liu, Rui Zhao, Xie Chen, Yanmin Qian

This motivates us to leverage the factorized neural transducer structure, containing a real language model, the vocabulary predictor.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Speech separation with large-scale self-supervised learning

no code implementations • 9 Nov 2022 • Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez

Compared with a supervised baseline and the WavLM-based SS model using feature embeddings obtained with the previously released 94K hours trained WavLM, our proposed model obtains 15. 9% and 11. 2% of relative word error rate (WER) reductions, respectively, for a simulated far-field speech mixture test set.

Self-Supervised Learning Speech Separation

Paper
Add Code

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

no code implementations • 5 Nov 2022 • Peidong Wang, Eric Sun, Jian Xue, Yu Wu, Long Zhou, Yashesh Gaur, Shujie Liu, Jinyu Li

In this paper, we propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Two-Stream Network for Sign Language Recognition and Translation

1 code implementation • 2 Nov 2022 • Yutong Chen, Ronglai Zuo, Fangyun Wei, Yu Wu, Shujie Liu, Brian Mak

RGB videos, however, are raw signals with substantial visual redundancy, leading the encoder to overlook the key information for sign language understanding.

Ranked #1 on Sign Language Translation on RWTH-PHOENIX-Weather 2014 T

Sign Language Recognition Sign Language Translation +2

198

Paper
Code

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

no code implementations • 24 Oct 2022 • Quchen Fu, Szu-Wei Fu, Yaran Fan, Yu Wu, Zhuo Chen, Jayant Gupchup, Ross Cutler

Meetings are an essential form of communication for all types of organizations, and remote collaboration systems have been much more widely used since the COVID-19 pandemic.

Paper
Add Code

STAR: Zero-Shot Chinese Character Recognition with Stroke- and Radical-Level Decompositions

no code implementations • 16 Oct 2022 • Jinshan Zeng, Ruiying Xu, Yu Wu, Hongwei Li, Jiaxing Lu

The proposed method consists of a training stage and an inference stage.

Paper
Add Code

Foundation Transformers

4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei

A big convergence of model architectures across language, vision, speech, and multimodal is emerging.

Language Modelling Machine Translation +1

18,274

Paper
Code

Vision+X: A Survey on Multimodal Learning in the Light of Data

no code implementations • 5 Oct 2022 • Ye Zhu, Yu Wu, Nicu Sebe, Yan Yan

We are perceiving and communicating with the world in a multisensory manner, where different information sources are sophisticatedly processed and interpreted by separate parts of the human brain to constitute a complex, yet harmonious and unified sensing system.

Representation Learning

Paper
Add Code

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

1 code implementation • 30 Sep 2022 • Ziqiang Zhang, Sanyuan Chen, Long Zhou, Yu Wu, Shuo Ren, Shujie Liu, Zhuoyuan Yao, Xun Gong, LiRong Dai, Jinyu Li, Furu Wei

In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation.

Language Modelling speech-recognition +1

1,008

Paper
Code

SiRi: A Simple Selective Retraining Mechanism for Transformer-based Visual Grounding

1 code implementation • 27 Jul 2022 • Mengxue Qu, Yu Wu, Wu Liu, Qiqi Gong, Xiaodan Liang, Olga Russakovsky, Yao Zhao, Yunchao Wei

Particularly, SiRi conveys a significant principle to the research of visual grounding, i. e., a better initialized vision-language encoder would help the model converge to a better local minimum, advancing the performance accordingly.

Visual Grounding

Paper
Code

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

no code implementations • 29 Jun 2022 • Kaixuan Huang, Yu Wu, Xuezhou Zhang, Shenyinying Tu, Qingyun Wu, Mengdi Wang, Huazheng Wang

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes.

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

no code implementations • 21 Jun 2022 • Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

1 code implementation • 15 Jun 2022 • Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan

Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis.

Contrastive Learning Denoising +2

151

Paper
Code

Longitudinal cardio-respiratory fitness prediction through wearables in free-living environments

1 code implementation • 6 May 2022 • Dimitris Spathis, Ignacio Perez-Pozuelo, Tomas I. Gonzales, Yu Wu, Soren Brage, Nicholas Wareham, Cecilia Mascolo

Cardiorespiratory fitness is an established predictor of metabolic disease and mortality.

Paper
Code

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei

Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.

Self-Supervised Learning Speaker Recognition +3

Paper
Add Code

Ultra Fast Speech Separation Model with Teacher Student Learning

no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu

In this paper, an ultra fast speech separation Transformer model is proposed to achieve both better performance and efficiency with teacher student learning (T-S learning).

Computational Efficiency Speech Separation

Paper
Add Code

A collaborative decomposition-based evolutionary algorithm integrating normal and penalty-based boundary intersection for many-objective optimization

no code implementations • 14 Apr 2022 • Yu Wu, Jianle Wei, Weiqin Ying, Yanqi Lan, Zhen Cui, Zhenyu Wang

On the other hand, the parallel reference lines of the parallel decomposition methods including the normal boundary intersection (NBI) might result in poor diversity because of under-sampling near the boundaries for MaOPs with concave frontiers.

Evolutionary Algorithms

Paper
Add Code

Quantized GAN for Complex Music Generation from Dance Videos

1 code implementation • 1 Apr 2022 • Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos.

Music Generation

Paper
Code

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

1 code implementation • 30 Mar 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

The proposed speaker embedding, named t-vector, is extracted synchronously with the t-SOT ASR model, enabling joint execution of speaker identification (SID) or speaker diarization (SD) with the multi-talker transcription with low latency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

1 code implementation • 2 Feb 2022 • Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Multi-Query Video Retrieval

1 code implementation • 10 Jan 2022 • Zeyu Wang, Yu Wu, Karthik Narasimhan, Olga Russakovsky

Retrieving target videos based on text descriptions is a task of great practical value and has received increasing attention over the past few years.

Retrieval Video Retrieval

Paper
Code

Learning To Learn by Jointly Optimizing Neural Architecture and Weights

no code implementations • CVPR 2022 • Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Longhui Wei, Yueting Zhuang, Qi Tian

Existing NAS-based meta-learning methods apply a two-stage strategy, i. e., first searching architectures and then re-training meta-weights on the searched architecture.

Meta-Learning

Paper
Add Code

Large-Scale Video Panoptic Segmentation in the Wild: A Benchmark

1 code implementation • CVPR 2022 • Jiaxu Miao, Xiaohan Wang, Yu Wu, Wei Li, Xu Zhang, Yunchao Wei, Yi Yang

In contrast, our large-scale VIdeo Panoptic Segmentation in the Wild (VIPSeg) dataset provides 3, 536 videos and 84, 750 frames with pixel-level panoptic annotations, covering a wide range of real-world scenarios and categories.

Segmentation Video Panoptic Segmentation

119

Paper
Code

Self-Supervised Learning for speech recognition with Intermediate layer supervision

1 code implementation • 16 Dec 2021 • Chengyi Wang, Yu Wu, Sanyuan Chen, Shujie Liu, Jinyu Li, Yao Qian, Zhenglu Yang

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information.

Language Modelling Self-Supervised Learning +2

387

Paper
Code

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

5 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.

Denoising Self-Supervised Learning +3

18,274

Paper
Code

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

3 code implementations • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

124,527

Paper
Code

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

3 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu

We integrate the proposed methods into the HuBERT framework.

Data Augmentation Multi-Task Learning +5

387

Paper
Code

Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

no code implementations • 11 Oct 2021 • Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu

In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

no code implementations • 6 Oct 2021 • Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong

ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Cell2State: Learning Cell State Representations From Barcoded Single-Cell Gene-Expression Transitions

no code implementations • 29 Sep 2021 • Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang

Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.

Dimensionality Reduction

Paper
Add Code

Contrastive Video-Language Segmentation

no code implementations • 29 Sep 2021 • Chen Liang, Yawei Luo, Yu Wu, Yi Yang

We focus on the problem of segmenting a certain object referred by a natural language sentence in video content, at the core of formulating a pinpoint vision-language relation.

Contrastive Learning Relation +2

Paper
Add Code

Knowledge Enhanced Fine-Tuning for Better Handling Unseen Entities in Dialogue Generation

1 code implementation • EMNLP 2021 • Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang

To deal with this problem, instead of introducing knowledge base as the input, we force the model to learn a better semantic representation by predicting the information in the knowledge base, only based on the input context.

Dialogue Generation Retrieval

Paper
Code

Detecting Speaker Personas from Conversational Texts

1 code implementation • EMNLP 2021 • Jia-Chen Gu, Zhen-Hua Ling, Yu Wu, Quan Liu, Zhigang Chen, Xiaodan Zhu

This is a many-to-many semantic matching task because both contexts and personas in SPD are composed of multiple sentences.

Paper
Code

Flexible Clustered Federated Learning for Client-Level Data Distribution Shift

1 code implementation • 22 Aug 2021 • Moming Duan, Duo Liu, Xinyuan Ji, Yu Wu, Liang Liang, Xianzhang Chen, Yujuan Tan

Federated Learning (FL) enables the multiple participating devices to collaboratively contribute to a global neural network model while keeping the training data locally.

Federated Learning

Paper
Code

UniSpeech at scale: An Empirical Study of Pre-training Method on Large-Scale Speech Recognition Dataset

no code implementations • 12 Jul 2021 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei

Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

no code implementations • 5 Jul 2021 • Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li

Speech separation has been successfully applied as a frontend processing module of conversation transcription systems thanks to its ability to handle overlapped speech and its flexibility to combine with downstream tasks such as automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Saying the Unseen: Video Descriptions via Dialog Agents

1 code implementation • 26 Jun 2021 • Ye Zhu, Yu Wu, Yi Yang, Yan Yan

Current vision and language tasks usually take complete visual data (e. g., raw images or videos) as input, however, practical scenarios may often consist the situations where part of the visual information becomes inaccessible due to various reasons e. g., restricted view with fixed camera or intentional vision block for security concerns.

Transfer Learning

Paper
Code

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

no code implementations • CVPR 2021 • Jiaxu Miao, Yunchao Wei, Yu Wu, Chen Liang, Guangrui Li, Yi Yang

To the best of our knowledge, our VSPW is the first attempt to tackle the challenging video scene parsing task in the wild by considering diverse scenarios.

4k Scene Parsing

Paper
Add Code

Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing

no code implementations • CVPR 2021 • Yu Wu, Yi Yang

Previous works take the overall event labels to supervise both audio and visual model predictions.

Contrastive Learning

Paper
Add Code

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

Paper
Add Code

Template-Based Named Entity Recognition Using BART

1 code implementation • Findings (ACL) 2021 • Leyang Cui, Yu Wu, Jian Liu, Sen yang, Yue Zhang

To address the issue, we propose a template-based method for NER, treating NER as a language model ranking problem in a sequence-to-sequence framework, where original sentences and statement templates filled by candidate named entity span are regarded as the source sequence and the target sequence, respectively.

Few-shot NER Language Modelling +2

202

Paper
Code

Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation

no code implementations • 2 Jun 2021 • Chen Liang, Yu Wu, Tianfei Zhou, Wenguan Wang, Zongxin Yang, Yunchao Wei, Yi Yang

Referring video object segmentation (RVOS) aims to segment video objects with the guidance of natural language reference.

Object One-shot visual object segmentation +3

Paper
Add Code

Cooperative Path Planning of UAVs & UGVs for a Persistent Surveillance Task in Urban Environments

no code implementations • IEEE Internet of Things Journal 2021 • Yu Wu, Shaobo Wu, and Xinting Hu

The hybrid EDA-GA algorithm can greatly improve the performance of EDA and GA in terms of the quality and the stability of solutions.

Paper
Add Code

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

no code implementations • 31 Mar 2021 • Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Transcribing meetings containing overlapped speech with only a single distant microphone (SDM) has been one of the most challenging problems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

no code implementations • 19 Mar 2021 • Chen Liang, Yu Wu, Yawei Luo, Yi Yang

Text-based video segmentation is a challenging task that segments out the natural language referred objects in videos.

Ranked #4 on Referring Expression Segmentation on J-HMDB (Precision@0.9 metric)

Object Referring Expression Segmentation +4

Paper
Add Code

Learning Audio-Visual Correlations from Variational Cross-Modal Generation

no code implementations • 5 Feb 2021 • Ye Zhu, Yu Wu, Hugo Latapie, Yi Yang, Yan Yan

People can easily imagine the potential sound while seeing an event.

Retrieval

Paper
Add Code

UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data

3 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang

In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.

Multi-Task Learning Representation Learning +3

387

Paper
Code

Learning to Anticipate Egocentric Actions by Imagination

no code implementations • 13 Jan 2021 • Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, Fei Wu

We further improve ImagineRNN by residual anticipation, i. e., changing its target to predicting the feature difference of adjacent frames instead of the frame content.

Ranked #3 on Action Anticipation on EPIC-KITCHENS-55 (Unseen test set (S2)

Action Anticipation Autonomous Driving +1

Paper
Add Code

Connection-Adaptive Meta-Learning

no code implementations • 1 Jan 2021 • Yadong Ding, Yu Wu, Chengyue Huang, Siliang Tang, Yi Yang, Yueting Zhuang

In this paper, we aim to obtain better meta-learners by co-optimizing the architecture and meta-weights simultaneously.

Meta-Learning

Paper
Add Code

Formality Style Transfer with Shared Latent Space

1 code implementation • COLING 2020 • Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li, WenHan Chao

Conventional approaches for formality style transfer borrow models from neural machine translation, which typically requires massive parallel data for training.

Formality Style Transfer Machine Translation +2

Paper
Code

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

1 code implementation • 23 Oct 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li

With its strong modeling capacity that comes from a multi-head and multi-layer structure, Transformer is a very powerful model for learning a sequential representation and has been successfully applied to speech separation recently.

Speech Separation

Paper
Code

Developing Real-time Streaming Transformer Transducer for Speech Recognition on Large-scale Dataset

no code implementations • 22 Oct 2020 • Xie Chen, Yu Wu, Zhenghao Wang, Shujie Liu, Jinyu Li

Recently, Transformer based end-to-end models have achieved great success in many areas including speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Describing Unseen Videos via Multi-Modal Cooperative Dialog Agents

1 code implementation • ECCV 2020 • Ye Zhu, Yu Wu, Yi Yang, Yan Yan

With the arising concerns for the AI systems provided with direct access to abundant sensitive information, researchers seek to develop more reliable AI with implicit information sources.

Video Description

Paper
Code

Continuous Speech Separation with Conformer

1 code implementation • 13 Aug 2020 • Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou

Continuous speech separation plays a vital role in complicated speech related tasks such as conversation transcription.

Ranked #1 on Speech Separation on LibriCSS (using extra training data)

Speech Separation

103

Paper
Code

On Commonsense Cues in BERT for Solving Commonsense Tasks

no code implementations • Findings (ACL) 2021 • Leyang Cui, Sijie Cheng, Yu Wu, Yue Zhang

We quantitatively investigate the presence of structural commonsense cues in BERT when solving commonsense tasks, and the importance of such cues for the model prediction.

Sentiment Analysis Sentiment Classification

Paper
Add Code

A Retrieve-and-Rewrite Initialization Method for Unsupervised Machine Translation

1 code implementation • ACL 2020 • Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma

The commonly used framework for unsupervised machine translation builds initial translation models of both translation directions, and then performs iterative back-translation to jointly boost their translation performance.

NMT Retrieval +3

Paper
Code

Abnormal activity capture from passenger flow of elevator based on unsupervised learning and fine-grained multi-label recognition

no code implementations • 29 Jun 2020 • Chunhua Jia, Wenhai Yi, Yu Wu, Hui Huang, Lei Zhang, Leilei Wu

We present a work-flow which aims at capturing residents' abnormal activities through the passenger flow of elevator in multi-storey residence buildings.

Anomaly Detection Clustering +2

Paper
Add Code

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition

1 code implementation • 28 May 2020 • Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Rui Zhao, Shujie Liu

Among all three E2E models, transformer-AED achieved the best accuracy in both streaming and non-streaming mode.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

267

Paper
Code

Curriculum Pre-training for End-to-End Speech Translation

no code implementations • ACL 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, Zhenglu Yang

End-to-end speech translation poses a heavy burden on the encoder, because it has to transcribe, understand, and learn cross-lingual semantics simultaneously.

speech-recognition Speech Recognition +1

Paper
Add Code

MuTual: A Dataset for Multi-Turn Dialogue Reasoning

1 code implementation • ACL 2020 • Leyang Cui, Yu Wu, Shujie Liu, Yue Zhang, Ming Zhou

Non-task oriented dialogue systems have achieved great success in recent years due to largely accessible conversation data and the development of deep learning techniques.

Task-Oriented Dialogue Systems

269

Paper
Code

Unsupervised Person Re-identification via Softened Similarity Learning

1 code implementation • CVPR 2020 • Yutian Lin, Lingxi Xie, Yu Wu, Chenggang Yan, Qi Tian

Person re-identification (re-ID) is an important topic in computer vision.

Clustering General Classification +2

Paper
Code

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

no code implementations • 23 Mar 2020 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou

The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.

Audio and Speech Processing

Paper
Add Code

Symbiotic Attention with Privileged Information for Egocentric Action Recognition

no code implementations • 8 Feb 2020 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

Due to the large action vocabulary in egocentric video datasets, recent studies usually utilize a two-branch structure for action recognition, ie, one branch for verb classification and the other branch for noun classification.

Ranked #4 on Egocentric Activity Recognition on EGTEA

Action Recognition Egocentric Activity Recognition +5

Paper
Add Code

Semantic Mask for Transformer based End-to-End Speech Recognition

1 code implementation • 6 Dec 2019 • Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Attention-based encoder-decoder model has achieved impressive results for both automatic speech recognition (ASR) and text-to-speech (TTS) tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Harnessing Pre-Trained Neural Networks with Rules for Formality Style Transfer

no code implementations • IJCNLP 2019 • Yunli Wang, Yu Wu, Lili Mou, Zhoujun Li, WenHan Chao

Formality text style transfer plays an important role in various NLP applications, such as non-native speaker assistants and child education.

Formality Style Transfer Style Transfer

Paper
Add Code

Unsupervised Context Rewriting for Open Domain Conversation

no code implementations • IJCNLP 2019 • Kun Zhou, Kai Zhang, Yu Wu, Shujie Liu, Jingsong Yu

Context modeling has a pivotal role in open domain conversation.

Reinforcement Learning (RL) Response Generation +1

Paper
Add Code

Dual Attention Matching for Audio-Visual Event Localization

no code implementations • ICCV 2019 • Yu Wu, Linchao Zhu, Yan Yan, Yi Yang

The duration of these segments is usually short, making the visual and acoustic feature of each segment possibly not well aligned.

audio-visual event localization

Paper
Add Code

Gated Channel Transformation for Visual Recognition

3 code implementations • CVPR 2020 • Zongxin Yang, Linchao Zhu, Yu Wu, Yi Yang

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

General Classification Image Classification +5

125

Paper
Code

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

no code implementations • 17 Sep 2019 • Chengyi Wang, Yu Wu, Shujie Liu, Zhenglu Yang, Ming Zhou

End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model.

Multi-Task Learning Translation

Paper
Add Code

Explicit Cross-lingual Pre-training for Unsupervised Machine Translation

no code implementations • IJCNLP 2019 • Shuo Ren, Yu Wu, Shujie Liu, Ming Zhou, Shuai Ma

Pre-training has proven to be effective in unsupervised machine translation due to its ability to model deep context information in cross-lingual scenarios.

Language Modelling Translation +1

Paper
Add Code

Cascaded Revision Network for Novel Object Captioning

1 code implementation • 6 Aug 2019 • Qianyu Feng, Yu Wu, Hehe Fan, Chenggang Yan, Yi Yang

By this novel cascaded captioning-revising mechanism, CRN can accurately describe images with unseen objects.

Image Captioning Object +3

Paper
Code

Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019

no code implementations • 22 Jun 2019 • Xiaohan Wang, Yu Wu, Linchao Zhu, Yi Yang

In this report, we present the Baidu-UTS submission to the EPIC-Kitchens Action Recognition Challenge in CVPR 2019.

Action Recognition Object +2

Paper
Add Code

Revisiting EmbodiedQA: A Simple Baseline and Beyond

no code implementations • 8 Apr 2019 • Yu Wu, Lu Jiang, Yi Yang

In this paper, we empirically study this problem and introduce 1) a simple yet effective baseline that achieves promising performance; 2) an easier and practical setting for EmbodiedQA where an agent has a chance to adapt the trained model to a new environment before it actually answers users questions.

Embodied Question Answering Question Answering

Paper
Add Code

Auto-ReID: Searching for a Part-aware ConvNet for Person Re-Identification

3 code implementations • ICCV 2019 • Ruijie Quan, Xuanyi Dong, Yu Wu, Linchao Zhu, Yi Yang

We propose to automatically search for a CNN architecture that is specifically suitable for the reID task.

Ranked #9 on Person Re-Identification on CUHK03 detected

Classification General Classification +3

1,546

Paper
Code

Text Morphing

no code implementations • 30 Sep 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences.

Sentence Text Generation

Paper
Add Code

Neural Melody Composition from Lyrics

no code implementations • 12 Sep 2018 • Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou

In this paper, we study a novel task that learns to compose music from natural language.

Paper
Add Code

Keyphrase Generation with Correlation Constraints

no code implementations • EMNLP 2018 • Jun Chen, Xiao-Ming Zhang, Yu Wu, Zhao Yan, Zhoujun Li

In this paper, we study automatic keyphrase generation.

Keyphrase Generation

Paper
Add Code

Towards Explainable and Controllable Open Domain Dialogue Generation with Dialogue Acts

no code implementations • 19 Jul 2018 • Can Xu, Wei Wu, Yu Wu

We study open domain dialogue generation with dialogue acts designed to explain how people engage in social chat.

Dialogue Generation reinforcement-learning +2

Paper
Add Code

Dictionary-Guided Editing Networks for Paraphrase Generation

no code implementations • 21 Jun 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou

An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct.

Paraphrase Generation Sentence

Paper
Add Code

Response Generation by Context-aware Prototype Editing

3 code implementations • 19 Jun 2018 • Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou

Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses.

Informativeness Response Generation +1

Paper
Code

Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning

no code implementations • CVPR 2018 • Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, Yi Yang

We focus on the one-shot learning for video-based person re-Identification (re-ID).

One-Shot Learning Pedestrian Detection +1

Paper
Add Code

Learning Matching Models with Weak Supervision for Response Selection in Retrieval-based Chatbots

no code implementations • ACL 2018 • Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou

We propose a method that can leverage unlabeled data to learn a matching model for response selection in retrieval-based chatbots.

Retrieval

Paper
Add Code

Decoupled Novel Object Captioner

1 code implementation • 11 Apr 2018 • Yu Wu, Linchao Zhu, Lu Jiang, Yi Yang

Thus, the sequence model can be decoupled from the novel object descriptions.

Image Captioning Novel Concepts +2

Paper
Code

Towards Interpretable Chit-chat: Open Domain Dialogue Generation with Dialogue Acts

no code implementations • ICLR 2018 • Wei Wu, Can Xu, Yu Wu, Zhoujun Li

Conventional methods model open domain dialogue generation as a black box through end-to-end learning from large scale conversation data.

Dialogue Generation Response Generation

Paper
Add Code

Neural Response Generation with Dynamic Vocabularies

no code implementations • 30 Nov 2017 • Yu Wu, Wei Wu, Dejian Yang, Can Xu, Zhoujun Li, Ming Zhou

We study response generation for open domain conversation in chatbots.

Response Generation

Paper
Add Code

A Sequential Matching Framework for Multi-turn Response Selection in Retrieval-based Chatbots

no code implementations • CL 2019 • Yu Wu, Wei Wu, Chen Xing, Can Xu, Zhoujun Li, Ming Zhou

The task requires matching a response candidate with a conversation context, whose challenges include how to recognize important parts of the context, and how to model the relationships among utterances in the context.

Retrieval

Paper
Add Code

Beihang-MSRA at SemEval-2017 Task 3: A Ranking System with Neural Matching Features for Community Question Answering

no code implementations • SEMEVAL 2017 • Wenzheng Feng, Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou

This paper presents the system in SemEval-2017 Task 3, Community Question Answering (CQA).

Answer Selection Community Question Answering +4

Paper
Add Code

Non-Convex Weighted Lp Nuclear Norm based ADMM Framework for Image Restoration

no code implementations • 24 Apr 2017 • Zhiyuan Zha, Xinggan Zhang, Yu Wu, Qiong Wang, Lan Tang

Since the matrix formed by nonlocal similar patches in a natural image is of low rank, the nuclear norm minimization (NNM) has been widely used in various image processing studies.

Compressive Sensing Deblurring +3

Paper
Add Code

Non-Convex Weighted Lp Minimization based Group Sparse Representation Framework for Image Denoising

no code implementations • 5 Apr 2017 • Qiong Wang, Xinggan Zhang, Yu Wu, Lan Tang, Zhiyuan Zha

Nonlocal image representation or group sparsity has attracted considerable interest in various low-level vision tasks and has led to several state-of-the-art image denoising techniques, such as BM3D, LSSC.

Image Denoising

Paper
Add Code

Improving Person Re-identification by Attribute and Identity Learning

2 code implementations • 21 Mar 2017 • Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, Yi Yang

Person re-identification (re-ID) and attribute recognition share a common target at learning pedestrian descriptions.

Ranked #75 on Person Re-Identification on DukeMTMC-reID

Attribute Person Recognition +2

Paper
Code

Hierarchical Recurrent Attention Network for Response Generation

1 code implementation • 25 Jan 2017 • Chen Xing, Wei Wu, Yu Wu, Ming Zhou, YaLou Huang, Wei-Ying Ma

With the word level attention, hidden vectors of a word level encoder are synthesized as utterance vectors and fed to an utterance level encoder to construct hidden representations of the context.

Response Generation

Paper
Code

Sequential Matching Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots

3 code implementations • ACL 2017 • Yu Wu, Wei Wu, Chen Xing, Ming Zhou, Zhoujun Li

Existing work either concatenates utterances in context or matches a response with a highly abstract context vector finally, which may lose relationships among utterances or important contextual information.

Ranked #7 on Conversational Response Selection on RRS

Conversational Response Selection Retrieval

711

Paper
Code

Knowledge Enhanced Hybrid Neural Network for Text Matching

no code implementations • 15 Nov 2016 • Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou

Long text brings a big challenge to semantic matching due to their complicated semantic and syntactic structures.

Question Answering Text Matching

Paper
Add Code

Detecting Context Dependent Messages in a Conversational Environment

no code implementations • COLING 2016 • Chaozhuo Li, Yu Wu, Wei Wu, Chen Xing, Zhoujun Li, Ming Zhou

While automatic response generation for building chatbot systems has drawn a lot of attention recently, there is limited understanding on when we need to consider the linguistic context of an input text in the generation process.

Chatbot Response Generation

Paper
Add Code

Topic Aware Neural Response Generation

1 code implementation • 21 Jun 2016 • Chen Xing, Wei Wu, Yu Wu, Jie Liu, YaLou Huang, Ming Zhou, Wei-Ying Ma

We consider incorporating topic information into the sequence-to-sequence framework to generate informative and interesting responses for chatbots.

Response Generation

111

Paper
Code

Response Selection with Topic Clues for Retrieval-based Chatbots

1 code implementation • 30 Apr 2016 • Yu Wu, Wei Wu, Zhoujun Li, Ming Zhou

The message vector, the response vector, and the two topic vectors are fed to neural tensors to calculate a matching score.

Retrieval

Paper
Code

Learning Fair Representations

2 code implementations • International Conference on Machine Learning 2013 • Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, Cynthia Dwork

We propose a learning algorithm for fair classification that achieves both group fairness (the proportion of members in a protected group receiving positive classification is identical to the proportion in the population as a whole), and individual fairness (similar individuals should be treated similarly).

Classification Fairness +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.