Search Results for author: Qian Chen

Found 142 papers, 68 papers with code

基于迭代信息传递和滑动窗口注意力的问题生成模型研究(Question Generation Model Based on Iterative Message Passing and Sliding Windows Hierarchical Attention)

no code implementations CCL 2021 Qian Chen, Xiaoying Gao, Suge Wang, Xin Guo

“知识图谱问题生成任务是从给定的知识图谱中生成与其相关的问题。目前, 知识图谱问题生成模型主要使用基于RNN或Transformer对知识图谱子图进行编码, 但这种方式丢失了显式的图结构化信息, 在解码器中忽视了局部信息对节点的重要性。本文提出迭代信息传递图编码器来编码子图, 获取子图显式的图结构化信息, 此外, 我们还使用滑动窗口注意力机制提高RNN解码器, 提升子图局部信息对节点的重要度。从WQ和PQ数据集上的实验结果看, 我们提出的模型比KTG模型在BLEU4指标上分别高出2. 16和15. 44, 证明了该模型的有效性。”

Question Generation Question-Generation

UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook

no code implementations27 Feb 2025 Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, Shiliang Zhang, Haizhou Li

The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms.

Language Modeling Language Modelling

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

no code implementations24 Feb 2025 Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao

Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs.

GSM8K

Ten Challenging Problems in Federated Foundation Models

no code implementations14 Feb 2025 Tao Fan, Hanlin Gu, Xuemei Cao, Chee Seng Chan, Qian Chen, Yiqiang Chen, Yihui Feng, Yang Gu, Jiaxiang Geng, Bing Luo, Shuoling Liu, Win Kent Ong, Chao Ren, Jiaqi Shao, Chuan Sun, Xiaoli Tang, Hong Xi Tae, Yongxin Tong, Shuyue Wei, Fan Wu, Wei Xi, Mingcong Xu, He Yang, Xin Yang, Jiangpeng Yan, Hao Yu, Han Yu, Teng Zhang, Yifei Zhang, Xiaojin Zhang, Zhenzhe Zheng, Lixin Fan, Qiang Yang

This paper provides a comprehensive summary of the ten challenging problems inherent in FedFMs, encompassing foundational theory, utilization of private data, continual learning, unlearning, Non-IID and graph data, bidirectional knowledge transfer, incentive mechanism design, game mechanism design, model watermarking, and efficiency.

Continual Learning Federated Learning +2

Optimal Algorithms in Linear Regression under Covariate Shift: On the Importance of Precondition

no code implementations13 Feb 2025 Yuanshi Liu, Haihan Zhang, Qian Chen, Cong Fang

For (i), we prove that the optimal estimator can be simply a certain linear transformation of the best estimator for the source distribution.

When GNNs meet symmetry in ILPs: an orbit-based feature augmentation approach

1 code implementation24 Jan 2025 Qian Chen, Lei LI, Qian Li, Jianghua Wu, Akang Wang, Ruoyu Sun, Xiaodong Luo, Tsung-Hui Chang, Qingjiang Shi

In this work, we investigate the properties of permutation equivariance and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations.

V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer

1 code implementation9 Jan 2025 Hangzhou He, Lei Zhu, Xinliang Zhang, Shuang Zeng, Qian Chen, Yanye Lu

Concept Bottleneck Models (CBMs) offer inherent interpretability by initially translating images into human-comprehensible concepts, followed by a linear combination of these concepts for classification.

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

1 code implementation2 Jan 2025 Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu, Haofen Wang, Jun Yu, Min Zhang

As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.

FedMeld: A Model-dispersal Federated Learning Framework for Space-ground Integrated Networks

no code implementations23 Dec 2024 Qian Chen, Xianhao Chen, Kaibin Huang

By decomposing the problem into sequential SC and MR subproblems without compromising the optimality, we derive the round interval solution in a closed form and the mixing ratio in a semi-closed form to achieve the \textit{optimal} latency-accuracy tradeoff.

Federated Learning

Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

1 code implementation19 Dec 2024 Shuang Li, Qian Chen, Chulhong Kim, Seongwook Choi, Yibing Wang, Yu Zhang, Changhui Li

However, the quality of 3D PAI is often degraded due to reconstruction artifacts caused by the sparse and angle-limited configuration of detector arrays.

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models

1 code implementation13 Dec 2024 Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou

By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-parity naturalness, minimal response latency, and virtually lossless synthesis quality in the streaming mode.

In-Context Learning Quantization +1

4D SlingBAG: spatial-temporal coupled Gaussian ball for large-scale dynamic 3D photoacoustic iterative reconstruction

1 code implementation5 Dec 2024 Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li

However, for existing IR algorithms, multi-frame 3D reconstruction leads to extremely high memory consumption and prolonged computation time, with limited consideration of the spatial-temporal continuity between data frames.

3D Reconstruction

Concept Based Continuous Prompts for Interpretable Text Classification

1 code implementation2 Dec 2024 Qian Chen, Dongyang Li, Xiaofeng He

Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks.

text-classification Text Classification

DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach

no code implementations13 Nov 2024 Xin Tang, Qian Chen, Wenjie Weng, Binhan Liao, Jiacheng Wang, Xianbin Cao, Xiaohuan Li

Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT).

Denoising Multi-agent Reinforcement Learning

DECRL: A Deep Evolutionary Clustering Jointed Temporal Knowledge Graph Representation Learning Approach

no code implementations30 Oct 2024 Qian Chen, Ling Chen

To this end, we propose a Deep Evolutionary Clustering jointed temporal knowledge graph Representation Learning approach (DECRL).

Clustering Graph Representation Learning

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

1 code implementation23 Oct 2024 Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chaohong Tan, Zhihao Du, Shiliang Zhang

However, achieving low latency and natural interactions in full-duplex dialogue systems remains a significant challenge, especially considering human conversation dynamics such as interruptions, backchannels, and overlapping speech.

Large Language Model Spoken Dialogue Systems

Machine Unlearning in Forgettability Sequence

no code implementations9 Oct 2024 Junjie Chen, Qian Chen, Jian Lou, XiaoYu Zhang, Kai Wu, Zilong Wang

Machine unlearning (MU) is becoming a promising paradigm to achieve the "right to be forgotten", where the training trace of any chosen data points could be eliminated, while maintaining the model utility on general testing samples after unlearning.

Machine Unlearning

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

no code implementations9 Oct 2024 Xin Zhang, Xiang Lyu, Zhihao Du, Qian Chen, Dong Zhang, Hangrui Hu, Chaohong Tan, Tianyu Zhao, Yuxuan Wang, Bin Zhang, Heng Lu, Yaqian Zhou, Xipeng Qiu

Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions.

Response Generation

Unified Audio Event Detection

no code implementations13 Sep 2024 Yidi Jiang, Ruijie Tao, Wen Huang, Qian Chen, Wen Wang

Sound Event Detection (SED) detects regions of sound events, while Speaker Diarization (SD) segments speech conversations attributed to individual speakers.

Event Detection Sound Event Detection +2

Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis

1 code implementation12 Sep 2024 Qian Chen, Shihao Shu, Xiangzhi Bai

However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images.

Novel View Synthesis

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

1 code implementation29 Aug 2024 Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao

Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.

Language Modeling Language Modelling

Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

no code implementations19 Aug 2024 Jiaqing Liu, Chong Deng, Qinglin Zhang, Shilin Zhou, Qian Chen, Hai Yu, Wen Wang

To improve readability, we propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors and also transfer the informal text into the formal style with content preserved, utilizing contexts and auxiliary information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Multimodal Fusion and Coherence Modeling for Video Topic Segmentation

no code implementations1 Aug 2024 Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang

In this work, we improve supervised VTS by thoroughly exploring multimodal fusion and multimodal coherence modeling.

Contrastive Learning Scene Segmentation +2

Efficient Sampling for Data-Driven Frequency Stability Constraint via Forward-Mode Automatic Differentiation

1 code implementation21 Jul 2024 Wangkun Xu, Qian Chen, Pudong Ge, Zhongda Chu, Fei Teng

Encoding frequency stability constraints in the operation problem is challenging due to its complex dynamics.

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

no code implementations7 Jul 2024 Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.

Language Modelling Large Language Model +6

Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

no code implementations2 Jul 2024 Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field.

Knowledge Distillation Survey

Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation

no code implementations19 Jun 2024 Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu

However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks.

Continual Learning Image Segmentation +2

Skip-Layer Attention: Bridging Abstract and Detailed Dependencies in Transformers

no code implementations17 Jun 2024 Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies.

Diversity Language Modeling +1

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

2 code implementations17 Jun 2024 Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang

SDPN assigns the representation of the augmented views of an utterance to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.

Diversity Representation Learning +2

PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

1 code implementation4 Jun 2024 Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun

In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

1 code implementation3 Jun 2024 Shengpeng Ji, Jialong Zuo, Wen Wang, Minghui Fang, Siqi Zheng, Qian Chen, Ziyue Jiang, Hai Huang, Zehan Wang, Xize Cheng, Zhou Zhao

In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt.

Speech Synthesis Text to Speech

CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification

1 code implementation30 Apr 2024 Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song

By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs.

Code Generation Hallucination

PE: A Poincare Explanation Method for Fast Text Hierarchy Generation

1 code implementation25 Mar 2024 Qian Chen, Dongyang Li, Xiaofeng He, Hongzhao Li, Hongyu Yi

The research focus has shifted to Hierarchical Attribution (HA) for its ability to model feature interactions.

Interpreting What Typical Fault Signals Look Like via Prototype-matching

no code implementations11 Mar 2024 Qian Chen, Xingjian Dong, Zhike Peng

To understand the classification logic and explain what typical fault signals look like, the prototype matching network (PMN) is proposed by combining the human-inherent prototype-matching with autoencoder (AE).

Classification Denoising +4

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

2 code implementations13 Feb 2024 Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CIDR: A Cooperative Integrated Dynamic Refining Method for Minimal Feature Removal Problem

1 code implementation13 Dec 2023 Qian Chen, Taolin Zhang, Dongyang Li, Xiaofeng He

The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS).

CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

1 code implementation14 Nov 2023 Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Hari Sundaram, Shuiguang Deng

Finally, we systematically evaluate and analyze eight mainstream LLMs and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks.

Code Generation

Loss Masking Is Not Needed in Decoder-only Transformer for Discrete-token-based ASR

1 code implementation8 Nov 2023 Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang

We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.

Decoder

Imaging through multimode fibres with physical prior

no code implementations6 Nov 2023 Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen

The role of the physical prior is to simplify the mapping relationship between the speckle pattern and the target image, thereby reducing the computational complexity.

Differentially Private Pre-Trained Model Fusion using Decentralized Federated Graph Matching

no code implementations5 Nov 2023 Qian Chen, Yiqiang Chen, Xinlong Jiang, Teng Zhang, Weiwei Dai, Wuliang Huang, Zhen Yan, Bo Ye

Model fusion is becoming a crucial component in the context of model-as-a-service scenarios, enabling the delivery of high-quality model services to local users.

Graph Matching Privacy Preserving

Improving Long Document Topic Segmentation Models With Enhanced Coherence Modeling

1 code implementation18 Oct 2023 Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang

Our approach improve $F_1$ of old SOTA by 3. 42 (73. 74 -> 77. 16) and reduces $P_k$ by 1. 11 points (15. 0 -> 13. 89) on WIKI-727K and achieves an average relative reduction of 4. 3% on $P_k$ on WikiSection.

Information Retrieval Segmentation +3

PAGE: Equilibrate Personalization and Generalization in Federated Learning

no code implementations13 Oct 2023 Qian Chen, Zilong Wang, Jiaqi Hu, Haonan Yan, Jianying Zhou, Xiaodong Lin

Federated learning (FL) is becoming a major driving force behind machine learning as a service, where customers (clients) collaboratively benefit from shared local updates under the orchestration of the service provider (server).

Federated Learning

CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

1 code implementation8 Oct 2023 Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang

To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation.

Code Translation Machine Translation +1

ITRE: Low-light Image Enhancement Based on Illumination Transmission Ratio Estimation

no code implementations8 Oct 2023 Yu Wang, Yihong Wang, Tong Liu, Xiubao Sui, Qian Chen

In this paper, we propose a novel Retinex-based method, called ITRE, which suppresses noise and artifacts from the origin of the model, prevents over-exposure throughout the enhancement process.

Low-Light Image Enhancement

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

2 code implementations7 Oct 2023 Zhihao Du, JiaMing Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features.

Audio captioning Automatic Speech Recognition +13

FLEDGE: Ledger-based Federated Learning Resilient to Inference and Backdoor Attacks

no code implementations3 Oct 2023 Jorge Castillo, Phillip Rieger, Hossein Fereidooni, Qian Chen, Ahmad Sadeghi

Federated learning (FL) is a distributed learning process that uses a trusted aggregation server to allow multiple parties (or clients) to collaboratively train a machine learning model without having them share their private data.

Federated Learning

Multi-level Asymmetric Contrastive Learning for Volumetric Medical Image Segmentation Pre-training

no code implementations21 Sep 2023 Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu

Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase.

Contrastive Learning Decoder +4

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

no code implementations19 Sep 2023 Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.

Data Augmentation Language Modeling +7

Twofold Structured Features-Based Siamese Network for Infrared Target Tracking

no code implementations31 Aug 2023 Wei-Jie Yan, Yun-Kai Xu, Qian Chen, Xiao-Fang Kong, Guo-Hua Gu, A-Jun Shao, Min-Jie Wan

Nowadays, infrared target tracking has been a critical technology in the field of computer vision and has many applications, such as motion analysis, pedestrian surveillance, intelligent detection, and so forth.

Branches Mutual Promotion for End-to-End Weakly Supervised Semantic Segmentation

no code implementations9 Aug 2023 Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, Shuang Zeng, Qiushi Ren, Yanye Lu

Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch.

Classification Segmentation +3

Improving BERT with Hybrid Pooling Network and Drop Mask

no code implementations14 Jul 2023 Qian Chen, Wen Wang, Qinglin Zhang, Chong Deng, Ma Yukun, Siqi Zheng

Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks.

Language Modeling Language Modelling +3

FedBone: Towards Large-Scale Federated Multi-Task Learning

no code implementations30 Jun 2023 Yiqiang Chen, Teng Zhang, Xinlong Jiang, Qian Chen, Chenlong Gao, Wuliang Huang

The conflicting gradient projection technique is used to enhance the generalization of the large-scale general model between different tasks.

Federated Learning Multi-Task Learning

Exploiting Correlations Between Contexts and Definitions with Multiple Definition Modeling

no code implementations24 May 2023 Linhan Zhang, Qian Chen, Wen Wang, Yuxin Jiang, Bing Li, Wei Wang, Xin Cao

In this paper, we carefully design a new task called Multiple Definition Modeling (MDM) that pool together all contexts and definition of target words.

Advancing Precise Outline-Conditioned Text Generation with Task Duality and Explicit Outline Control

no code implementations23 May 2023 Yunzhe Li, Qian Chen, Weixiang Yan, Wen Wang, Qinglin Zhang, Hari Sundaram

Furthermore, we identify an issue of imbalanced utilization of the outline information in the precise outline-conditioned generation, which is ubiquitously observed across fine-tuned models and zero-shot inference models.

Sentence Text Generation

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations23 May 2023 Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

no code implementations22 May 2023 Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen

In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization.

speaker-diarization Speaker Diarization +1

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

2 code implementations22 May 2023 Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi

This paper proposes a novel architecture called Enhanced Res2Net (ERes2Net), which incorporates both local and global feature fusion techniques to improve the performance.

Speaker Verification

CASA-ASR: Context-Aware Speaker-Attributed ASR

no code implementations21 May 2023 Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai

In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.

Automatic Speech Recognition speech-recognition +1

Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings

1 code implementation18 May 2023 Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang

Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e. g., BERT, without fine-tuning.

Language Modeling Language Modelling +5

Hyperspectral Image Super-Resolution via Dual-domain Network Based on Hybrid Convolution

no code implementations10 Apr 2023 Tingting Liu, YuAn Liu, Chuncheng Zhang, Yuan Liyin, Xiubao Sui, Qian Chen

Moreover, to further improve the perceptual quality of HSI, a frequency loss(HFL) is introduced to optimize the model in the frequency domain.

Hyperspectral Image Super-Resolution Image Super-Resolution

PointCAT: Cross-Attention Transformer for point cloud

1 code implementation6 Apr 2023 Xincheng Yang, Mingze Jin, Weiji He, Qian Chen

Transformer-based models have significantly advanced natural language processing and computer vision in recent years.

Segmentation Semantic Segmentation

Meeting Action Item Detection with Regularized Context Modeling

no code implementations27 Mar 2023 Jiaqing Liu, Chong Deng, Qinglin Zhang, Qian Chen, Wen Wang

We construct and release the first Chinese meeting corpus with manual action item annotations.

Contrastive Learning

Overview of the ICASSP 2023 General Meeting Understanding and Generation Challenge (MUG)

no code implementations24 Mar 2023 Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings.

Extractive Summarization Keyphrase Extraction

MUG: A General Meeting Understanding and Generation Benchmark

1 code implementation24 Mar 2023 Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao

To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection.

Extractive Summarization Keyphrase Extraction +1

Weighted Sampling for Masked Language Modeling

no code implementations28 Feb 2023 Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Xin Cao, Kongzhang Hao, Yuxin Jiang, Wei Wang

Experiments on the Semantic Textual Similarity benchmark (STS) show that WSBERT significantly improves sentence embeddings over BERT.

Language Modeling Language Modelling +6

One-Pot Multi-Frame Denoising

no code implementations18 Feb 2023 Lujia Jin, Shi Zhao, Lei Zhu, Qian Chen, Yanye Lu

Therefore, it is necessary to avoid the restriction of clean labels and make full use of noisy data for model training.

Denoising Diversity

A Graphical Point Process Framework for Understanding Removal Effects in Multi-Touch Attribution

no code implementations13 Feb 2023 Jun Tao, Qian Chen, James W. Snyder Jr., Arava Sai Kumar, Amirhossein Meisami, Lingzhou Xue

Marketers employ various online advertising channels to reach customers, and they are particularly interested in attribution for measuring the degree to which individual touchpoints contribute to an eventual conversion.

Marketing

Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation

1 code implementation16 Dec 2022 Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang

Moreover, the pipelined approaches of retrieval and generation might result in poor generation performance when retrieval performance is low.

Answer Generation Decoder +5

DopplerBAS: Binaural Audio Synthesis Addressing Doppler Effect

no code implementations14 Dec 2022 Jinglin Liu, Zhenhui Ye, Qian Chen, Siqi Zheng, Wen Wang, Qinglin Zhang, Zhou Zhao

Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities.

Audio Synthesis

MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction

1 code implementation9 Dec 2022 Shuliang Ning, Mengcheng Lan, Yanran Li, Chaofeng Chen, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui

The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner.

All Prediction +1

Gradient Domain Weighted Guided Image Filtering

no code implementations30 Nov 2022 Bo wang, Yihong Wang, Xiubao Sui, YuAn Liu, Qian Chen

Guided image filter is a well-known local filter in image processing.

Image Denoising

Language-Assisted Deep Learning for Autistic Behaviors Recognition

no code implementations17 Nov 2022 Andong Deng, Taojiannan Yang, Chen Chen, Qian Chen, Leslie Neely, Sakiko Oyama

In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent.

Action Recognition Deep Learning +2

Pushing the limits of self-supervised speaker verification using regularized distillation framework

1 code implementation8 Nov 2022 Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen

A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification.

Data Augmentation Diversity +2

TFN: An Interpretable Neural Network with Time-Frequency Transform Embedded for Intelligent Fault Diagnosis

1 code implementation5 Sep 2022 Qian Chen, Xingjian Dong, Guowei Tu, Dong Wang, Baoxuan Zhao, Zhike Peng

However, the CNN is a typical black-box model, and the mechanism of CNN's decision-making are not clear, which limits its application in high-reliability-required fault diagnosis scenarios.

Decision Making Diagnostic +1

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

1 code implementation16 Jul 2022 Lei Zhu, Qian Chen, Lujia Jin, Yunfei You, Yanye Lu

Classification activation map (CAM), utilizing the classification structure to generate pixel-wise localization maps, is a crucial mechanism for weakly supervised object localization (WSOL).

Object Weakly-Supervised Object Localization

DePA: Improving Non-autoregressive Machine Translation with Dependency-Aware Decoder

1 code implementation30 Mar 2022 Jiaao Zhan, Qian Chen, Boxing Chen, Wen Wang, Yu Bai, Yang Gao

We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input.

Decoder Machine Translation +1

Weakly Supervised Object Localization as Domain Adaption

1 code implementation CVPR 2022 Lei Zhu, Qi She, Qian Chen, Yunfei You, Boyu Wang, Yanye Lu

To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects.

Classification Domain Adaptation +2

ProsoSpeech: Enhancing Prosody With Quantized Vector Pre-training in Text-to-Speech

no code implementations16 Feb 2022 Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao

Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).

Text to Speech

Background-aware Classification Activation Map for Weakly Supervised Object Localization

1 code implementation29 Dec 2021 Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia Jin, Zhe Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu

In our B-CAM, two image-level features, aggregated by pixel-level features of potential background and object locations, are used to purify the object feature from the object-related background and to represent the feature of the pure-background sample, respectively.

Classification Object +1

MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction

1 code implementation Findings (ACL) 2022 Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao

In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.

Contrastive Learning Document Embedding +4

BeamTransformer: Microphone Array-based Overlapping Speech Detection

no code implementations9 Sep 2021 Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan

We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.

Towards Making Deep Learning-based Vulnerability Detectors Robust

1 code implementation2 Aug 2021 Zhen Li, Jing Tang, Deqing Zou, Qian Chen, Shouhuai Xu, Chao Zhang, Yichen Li, Hai Jin

Automatically detecting software vulnerabilities in source code is an important problem that has attracted much attention.

Deep Learning

From Single to Multiple: Leveraging Multi-level Prediction Spaces for Video Forecasting

no code implementations21 Jul 2021 Mengcheng Lan, Shuliang Ning, Yanran Li, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui

Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces.

Prediction Video Prediction

Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation

1 code implementation20 Jul 2021 Qinglin Zhang, Qian Chen, YaLi Li, Jiaqing Liu, Wen Wang

Evaluations are conducted on the English Wiki-727K document segmentation benchmark, a Chinese Wikipedia-based document segmentation dataset we created, and an in-house Chinese spoken document dataset.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

TRS: Transferability Reduced Ensemble via Promoting Gradient Diversity and Model Smoothness

no code implementations NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

Diversity

Learning Synergistic Attention for Light Field Salient Object Detection

1 code implementation28 Apr 2021 Yi Zhang, Geng Chen, Qian Chen, Yujia Sun, Yong Xia, Olivier Deforges, Wassim Hamidouche, Lu Zhang

We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms.

Object object-detection +2

Discriminative Self-training for Punctuation Prediction

no code implementations21 Apr 2021 Qian Chen, Wen Wang, Mengzhe Chen, Qinglin Zhang

Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning

no code implementations21 Apr 2021 Qian Chen, Wen Wang, Qinglin Zhang

In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

TRS: Transferability Reduced Ensemble via Encouraging Gradient Diversity and Model Smoothness

1 code implementation NeurIPS 2021 Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li

To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.

Diversity

RGB-D Salient Object Detection via 3D Convolutional Neural Networks

1 code implementation25 Jan 2021 Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, Hongwei Du

The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams.

Decoder object-detection +3

Multi-feature driven active contour segmentation model for infrared image with intensity inhomogeneity

no code implementations25 Nov 2020 Qinyan Huang, Weiwen Zhou, Minjie Wan, Xin Chen, Qian Chen, Guohua Gu

Active contour model (ACM) is one of the most widely used image segmentation tools at present, but the existing methods only utilize the local or global single feature information of image to minimize the energy function, which is easy to cause false segmentations in IR images.

Image Segmentation Segmentation +1

FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

6 code implementations19 Nov 2020 Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Christina Dan Wang

In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies.

Deep Reinforcement Learning reinforcement-learning +2

EF-Net: A novel enhancement and fusion network for RGB-D saliency detection

1 code implementation4 Nov 2020 Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, LingShao

Finally, we propose an effective layer-wise aggregation module to fuse the features extracted from the enhanced depth maps and RGB images for the accurate detection of salient objects.

object-detection Object Detection +2

Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection

no code implementations3 Mar 2020 Qian Chen, Mengzhe Chen, Bo Li, Wen Wang

With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Sequential Neural Networks for Noetic End-to-End Response Selection

1 code implementation3 Mar 2020 Qian Chen, Wen Wang

The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.

Goal-Oriented Dialog

Dual Adversarial Domain Adaptation

1 code implementation1 Jan 2020 Yuntao Du, Zhiwen Tan, Qian Chen, Xiaowen Zhang, Yirong Yao, Chongjun Wang

Recent experiments have shown that when the discriminator is provided with domain information in both domains and label information in the source domain, it is able to preserve the complex multimodal information and high semantic information in both domains.

2k MULTI-VIEW LEARNING +1

Homogeneous Online Transfer Learning with Online Distribution Discrepancy Minimization

1 code implementation31 Dec 2019 Yuntao Du, Zhiwen Tan, Qian Chen, Yi Zhang, Chongjun Wang

In this paper, we propose a novel online transfer learning method which seeks to find a new feature representation, so that the marginal distribution and conditional distribution discrepancy can be online reduced simultaneously.

Transfer Learning

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

3 code implementations EMNLP 2020 Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li

In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.

Adversarial Text Decoder +4

ColluEagle: Collusive review spammer detection using Markov random fields

1 code implementation5 Nov 2019 Zhuo Wang, Runlong Hu, Qian Chen, Pei Gao, Xiaowei Xu

Previous works use review network effects, i. e. the relationships among reviewers, reviews, and products, to detect fake reviews or review spammers, but ignore time effects, which are critical in characterizing group spamming.

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

no code implementations19 Aug 2019 Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling

We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models.

Common Sense Reasoning Natural Language Inference +3

Gift Contagion in Online Groups: Evidence From Virtual Red Packets

no code implementations24 Jun 2019 Yuan Yuan, Tracy Liu, Chenhao Tan, Qian Chen, Alex Pentland, Jie Tang

Using data on 36 million online red packet gifts on a large social site in East Asia, we leverage a natural experimental design to identify the social contagion of gift giving in online groups.

Experimental Design Marketing

High-speed in vitro intensity diffraction tomography

1 code implementation12 Apr 2019 Jiaji Li, Alex Matlock, Yunzhe Li, Qian Chen, Chao Zuo, Lei Tian

We demonstrate a label-free, scan-free {\it intensity} diffraction tomography technique utilizing annular illumination (aIDT) to rapidly characterize large-volume 3D refractive index distributions in vitro.

Optics Biological Physics

BERT for Joint Intent Classification and Slot Filling

16 code implementations28 Feb 2019 Qian Chen, Zhu Zhuo, Wen Wang

Intent classification and slot filling are two essential tasks for natural language understanding.

General Classification intent-classification +5

Sequential Attention-based Network for Noetic End-to-End Response Selection

4 code implementations9 Jan 2019 Qian Chen, Wen Wang

The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.

Conversational Response Selection Goal-Oriented Dialog

A Semi-parametric Realized Joint Value-at-Risk and Expected Shortfall Regression Framework

no code implementations5 Jul 2018 Chao Wang, Richard Gerlach, Qian Chen

One-day-ahead VaR and ES forecasting results favor the proposed models, especially when incorporating the sub-sampled Realized Variance and the sub-sampled Realized Range in the model.

quantile regression

A Sequential Neural Encoder with Latent Structured Description for Modeling Sentences

no code implementations15 Nov 2017 Yu-Ping Ruan, Qian Chen, Zhen-Hua Ling

The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs.

Chunking Natural Language Inference +3

Neural Natural Language Inference Models Enhanced with External Knowledge

2 code implementations ACL 2018 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei

With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance.

Natural Language Inference

Adaptive compressed 3D imaging based on wavelet trees and Hadamard multiplexing with a single photon counting detector

no code implementations15 Sep 2017 Huidong Dai, Weiji He, Guohua Gu, Ling Ye, Tianyi Mao, Qian Chen

The proposed multi-resolution photon counting 3D imaging technique acquires a high-resolution 3D image from a coarse image and edges at successfully finer resolution sampled by Hadamard multiplexing along the wavelet trees.

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

2 code implementations WS 2017 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen

The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task.

Natural Language Inference Natural Language Understanding +1

Micro Fourier Transform Profilometry ($μ$FTP): 3D shape measurement at 10,000 frames per second

no code implementations31 May 2017 Chao Zuo, Tianyang Tao, Shijie Feng, Lei Huang, Anand Asundi, Qian Chen

Recent advances in imaging sensors and digital light projection technology have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces of complex-shaped objects to be captured with improved resolution and accuracy.

Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering

no code implementations14 Mar 2017 Junbei Zhang, Xiaodan Zhu, Qian Chen, Li-Rong Dai, Si Wei, Hui Jiang

The last several years have seen intensive interest in exploring neural-network-based models for machine comprehension (MC) and question answering (QA).

Question Answering Reading Comprehension

Distraction-Based Neural Networks for Document Summarization

1 code implementation26 Oct 2016 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.

Document Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.