no code implementations • CCL 2021 • Qian Chen, Xiaoying Gao, Suge Wang, Xin Guo
“知识图谱问题生成任务是从给定的知识图谱中生成与其相关的问题。目前, 知识图谱问题生成模型主要使用基于RNN或Transformer对知识图谱子图进行编码, 但这种方式丢失了显式的图结构化信息, 在解码器中忽视了局部信息对节点的重要性。本文提出迭代信息传递图编码器来编码子图, 获取子图显式的图结构化信息, 此外, 我们还使用滑动窗口注意力机制提高RNN解码器, 提升子图局部信息对节点的重要度。从WQ和PQ数据集上的实验结果看, 我们提出的模型比KTG模型在BLEU4指标上分别高出2. 16和15. 44, 证明了该模型的有效性。”
1 code implementation • 28 Feb 2025 • Chong Zhang, Yukun Ma, Qian Chen, Wen Wang, Shengkui Zhao, Zexu Pan, Hao Wang, Chongjia Ni, Trung Hieu Nguyen, Kun Zhou, Yidi Jiang, Chaohong Tan, Zhifu Gao, Zhihao Du, Bin Ma
This framework enables the controllable generation of high-fidelity long-form music at a higher sampling rate from both text and audio prompts.
no code implementations • 27 Feb 2025 • Yidi Jiang, Qian Chen, Shengpeng Ji, Yu Xi, Wen Wang, Chong Zhang, Xianghu Yue, Shiliang Zhang, Haizhou Li
The emergence of audio language models is empowered by neural audio codecs, which establish critical mappings between continuous waveforms and discrete tokens compatible with language model paradigms.
no code implementations • 24 Feb 2025 • Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao
Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs.
no code implementations • 14 Feb 2025 • Tao Fan, Hanlin Gu, Xuemei Cao, Chee Seng Chan, Qian Chen, Yiqiang Chen, Yihui Feng, Yang Gu, Jiaxiang Geng, Bing Luo, Shuoling Liu, Win Kent Ong, Chao Ren, Jiaqi Shao, Chuan Sun, Xiaoli Tang, Hong Xi Tae, Yongxin Tong, Shuyue Wei, Fan Wu, Wei Xi, Mingcong Xu, He Yang, Xin Yang, Jiangpeng Yan, Hao Yu, Han Yu, Teng Zhang, Yifei Zhang, Xiaojin Zhang, Zhenzhe Zheng, Lixin Fan, Qiang Yang
This paper provides a comprehensive summary of the ten challenging problems inherent in FedFMs, encompassing foundational theory, utilization of private data, continual learning, unlearning, Non-IID and graph data, bidirectional knowledge transfer, incentive mechanism design, game mechanism design, model watermarking, and efficiency.
no code implementations • 13 Feb 2025 • Yuanshi Liu, Haihan Zhang, Qian Chen, Cong Fang
For (i), we prove that the optimal estimator can be simply a certain linear transformation of the best estimator for the source distribution.
1 code implementation • 24 Jan 2025 • Qian Chen, Lei LI, Qian Li, Jianghua Wu, Akang Wang, Ruoyu Sun, Xiaodong Luo, Tsung-Hui Chang, Qingjiang Shi
In this work, we investigate the properties of permutation equivariance and invariance in GNNs, particularly in relation to the inherent symmetry of ILP formulations.
no code implementations • 10 Jan 2025 • Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou
Previous models for voice interactions are categorized as native and aligned.
1 code implementation • 9 Jan 2025 • Hangzhou He, Lei Zhu, Xinliang Zhang, Shuang Zeng, Qian Chen, Yanye Lu
Concept Bottleneck Models (CBMs) offer inherent interpretability by initially translating images into human-comprehensible concepts, followed by a linear combination of these concepts for classification.
1 code implementation • 2 Jan 2025 • Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu, Haofen Wang, Jun Yu, Min Zhang
As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial.
no code implementations • 23 Dec 2024 • Qian Chen, Xianhao Chen, Kaibin Huang
By decomposing the problem into sequential SC and MR subproblems without compromising the optimality, we derive the round interval solution in a closed form and the mixing ratio in a semi-closed form to achieve the \textit{optimal} latency-accuracy tradeoff.
1 code implementation • 19 Dec 2024 • Shuang Li, Qian Chen, Chulhong Kim, Seongwook Choi, Yibing Wang, Yu Zhang, Changhui Li
However, the quality of 3D PAI is often degraded due to reconstruction artifacts caused by the sparse and angle-limited configuration of detector arrays.
1 code implementation • 13 Dec 2024 • Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou
By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-parity naturalness, minimal response latency, and virtually lossless synthesis quality in the streaming mode.
1 code implementation • 5 Dec 2024 • Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li
However, for existing IR algorithms, multi-frame 3D reconstruction leads to extremely high memory consumption and prolonged computation time, with limited consideration of the spatial-temporal continuity between data frames.
1 code implementation • 2 Dec 2024 • Qian Chen, Dongyang Li, Xiaofeng He
Continuous prompts have become widely adopted for augmenting performance across a wide range of natural language tasks.
no code implementations • 13 Nov 2024 • Xin Tang, Qian Chen, Wenjie Weng, Binhan Liao, Jiacheng Wang, Xianbin Cao, Xiaohuan Li
Unmanned Aerial Vehicles (UAVs) possess high mobility and flexible deployment capabilities, prompting the development of UAVs for various application scenarios within the Internet of Things (IoT).
no code implementations • 30 Oct 2024 • Qian Chen, Ling Chen
To this end, we propose a Deep Evolutionary Clustering jointed temporal knowledge graph Representation Learning approach (DECRL).
no code implementations • 29 Oct 2024 • Krishna Chandra Roy, Qian Chen
We present a real-time threat detection approach using frequency-domain analysis of provenance graphs.
1 code implementation • 23 Oct 2024 • Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chaohong Tan, Zhihao Du, Shiliang Zhang
However, achieving low latency and natural interactions in full-duplex dialogue systems remains a significant challenge, especially considering human conversation dynamics such as interruptions, backchannels, and overlapping speech.
no code implementations • 9 Oct 2024 • Junjie Chen, Qian Chen, Jian Lou, XiaoYu Zhang, Kai Wu, Zilong Wang
Machine unlearning (MU) is becoming a promising paradigm to achieve the "right to be forgotten", where the training trace of any chosen data points could be eliminated, while maintaining the model utility on general testing samples after unlearning.
no code implementations • 9 Oct 2024 • Xin Zhang, Xiang Lyu, Zhihao Du, Qian Chen, Dong Zhang, Hangrui Hu, Chaohong Tan, Tianyu Zhao, Yuxuan Wang, Bin Zhang, Heng Lu, Yaqian Zhou, Xipeng Qiu
Current methods of building LLMs with voice interaction capabilities rely heavily on explicit text autoregressive generation before or during speech response generation to maintain content quality, which unfortunately brings computational overhead and increases latency in multi-turn interactions.
no code implementations • 13 Sep 2024 • Yidi Jiang, Ruijie Tao, Wen Huang, Qian Chen, Wen Wang
Sound Event Detection (SED) detects regions of sound events, while Speaker Diarization (SD) segments speech conversations attributed to individual speakers.
1 code implementation • 12 Sep 2024 • Qian Chen, Shihao Shu, Xiangzhi Bai
However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images.
1 code implementation • 29 Aug 2024 • Shengpeng Ji, Ziyue Jiang, Wen Wang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Zhou Zhao
Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.
no code implementations • 22 Aug 2024 • Luyao Cheng, Hui Wang, Siqi Zheng, Yafeng Chen, Rongjie Huang, Qinglin Zhang, Qian Chen, Xihao Li
Then we introduce a joint pairwise constraint propagation algorithm to cluster speakers based on these visual and semantic constraints.
no code implementations • 19 Aug 2024 • Jiaqing Liu, Chong Deng, Qinglin Zhang, Shilin Zhou, Qian Chen, Hai Yu, Wen Wang
To improve readability, we propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors and also transfer the informal text into the formal style with content preserved, utilizing contexts and auxiliary information.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 1 Aug 2024 • Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang
In this work, we improve supervised VTS by thoroughly exploring multimodal fusion and multimodal coherence modeling.
1 code implementation • 21 Jul 2024 • Wangkun Xu, Qian Chen, Pudong Ge, Zhongda Chu, Fei Teng
Encoding frequency stability constraints in the operation problem is challenging due to its complex dynamics.
1 code implementation • 16 Jul 2024 • Shuang Li, Yibing Wang, Jian Gao, Chulhong Kim, Seongwook Choi, Yu Zhang, Qian Chen, Yao Yao, Changhui Li
Large-scale 3D photoacoustic (PA) imaging has become increasingly important for both clinical and pre-clinical applications.
no code implementations • 7 Jul 2024 • Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan
Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis.
3 code implementations • 4 Jul 2024 • Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).
no code implementations • 2 Jul 2024 • Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen
Through in-depth understanding of the latest advancements and practical applications, this survey provides valuable resources for researchers, paving the way for sustained progress in this field.
no code implementations • 19 Jun 2024 • Qian Chen, Lei Zhu, Hangzhou He, Xinliang Zhang, Shuang Zeng, Qiushi Ren, Yanye Lu
However, the incorrect pseudo-labels may corrupt the learned feature and lead to a new problem that the better the model is trained on the old task, the poorer the model performs on the new tasks.
1 code implementation • 19 Jun 2024 • Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu, Xuandong Zhao
Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations.
no code implementations • 17 Jun 2024 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang
The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies.
2 code implementations • 17 Jun 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang
SDPN assigns the representation of the augmented views of an utterance to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.
1 code implementation • 4 Jun 2024 • Bingheng Li, Linxin Yang, Yupeng Chen, Senmiao Wang, Qian Chen, Haitao Mao, Yao Ma, Akang Wang, Tian Ding, Jiliang Tang, Ruoyu Sun
In this work, we propose an FOM-unrolled neural network (NN) called PDHG-Net, and propose a two-stage L2O method to solve large-scale LP problems.
no code implementations • 4 Jun 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Junjie Li
Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings.
1 code implementation • 3 Jun 2024 • Shengpeng Ji, Jialong Zuo, Wen Wang, Minghui Fang, Siqi Zheng, Qian Chen, Ziyue Jiang, Hai Huang, Zehan Wang, Xize Cheng, Zhou Zhao
In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt.
1 code implementation • 30 Apr 2024 • Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song
By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs.
2 code implementations • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Rongjie Huang, Chong Deng, Qian Chen, Shiliang Zhang, Wen Wang, Xihao Li
With 3D-Speaker-Toolkit, we establish a new benchmark for multimodal speaker analysis.
1 code implementation • 25 Mar 2024 • Qian Chen, Dongyang Li, Xiaofeng He, Hongzhao Li, Hongyu Yi
The research focus has shifted to Hierarchical Attribution (HA) for its ability to model feature interactions.
no code implementations • 18 Mar 2024 • Xin Tang, Qian Chen, Rong Yu, Xiaohuan Li
Moreover, the resource mutual exclusion problem of dynamic task assignment has not been effectively solved.
no code implementations • 11 Mar 2024 • Qian Chen, Xingjian Dong, Zhike Peng
To understand the classification logic and explain what typical fault signals look like, the prototype matching network (PMN) is proposed by combining the human-inherent prototype-matching with autoencoder (AE).
1 code implementation • 23 Feb 2024 • Yuanqing Yu, Chongming Gao, Jiawei Chen, Heng Tang, Yuefeng Sun, Qian Chen, Weizhi Ma, Min Zhang
EasyRL4Rec seeks to facilitate the model development and experimental process in the domain of RL-based RSs.
1 code implementation • 19 Feb 2024 • Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao
Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models.
2 code implementations • 13 Feb 2024 • Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen
We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 Jan 2024 • Yun Peng, Sen Lin, Qian Chen, Lyu Xu, Xiaojun Ren, Yafei Li, Jianliang Xu
Graph analysis is fundamental in real-world applications.
1 code implementation • 13 Dec 2023 • Qian Chen, Taolin Zhang, Dongyang Li, Xiaofeng He
The minimal feature removal problem in the post-hoc explanation area aims to identify the minimal feature set (MFS).
1 code implementation • 14 Nov 2023 • Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Hari Sundaram, Shuiguang Deng
Finally, we systematically evaluate and analyze eight mainstream LLMs and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks.
1 code implementation • 8 Nov 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang
We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.
no code implementations • 6 Nov 2023 • Chuncheng Zhang, Yingjie Shi, Zheyi Yao, Xiubao Sui, Qian Chen
The role of the physical prior is to simplify the mapping relationship between the speckle pattern and the target image, thereby reducing the computational complexity.
no code implementations • 5 Nov 2023 • Qian Chen, Yiqiang Chen, Xinlong Jiang, Teng Zhang, Weiwei Dai, Wuliang Huang, Zhen Yan, Bo Ye
Model fusion is becoming a crucial component in the context of model-as-a-service scenarios, enabling the delivery of high-quality model services to local users.
1 code implementation • 18 Oct 2023 • Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang
Our approach improve $F_1$ of old SOTA by 3. 42 (73. 74 -> 77. 16) and reduces $P_k$ by 1. 11 points (15. 0 -> 13. 89) on WIKI-727K and achieves an average relative reduction of 4. 3% on $P_k$ on WikiSection.
no code implementations • 17 Oct 2023 • Xin Su, Yao Zhou, Zifei Shan, Qian Chen
Then we learn a semantic representation of MeKB for the cross-domain recommendation.
no code implementations • 13 Oct 2023 • Qian Chen, Zilong Wang, Jiaqi Hu, Haonan Yan, Jianying Zhou, Xiaodong Lin
Federated learning (FL) is becoming a major driving force behind machine learning as a service, where customers (clients) collaboratively benefit from shared local updates under the orchestration of the service provider (server).
1 code implementation • 8 Oct 2023 • Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang
To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation.
no code implementations • 8 Oct 2023 • Yu Wang, Yihong Wang, Tong Liu, Xiubao Sui, Qian Chen
In this paper, we propose a novel Retinex-based method, called ITRE, which suppresses noise and artifacts from the origin of the model, prevents over-exposure throughout the enhancement process.
2 code implementations • 7 Oct 2023 • Zhihao Du, JiaMing Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features.
no code implementations • 3 Oct 2023 • Jorge Castillo, Phillip Rieger, Hossein Fereidooni, Qian Chen, Ahmad Sadeghi
Federated learning (FL) is a distributed learning process that uses a trusted aggregation server to allow multiple parties (or clients) to collaboratively train a machine learning model without having them share their private data.
no code implementations • 21 Sep 2023 • Shuang Zeng, Lei Zhu, Xinliang Zhang, Qian Chen, Hangzhou He, Lujia Jin, Zifeng Tian, Qiushi Ren, Zhaoheng Xie, Yanye Lu
Moreover, we develop a multi-level contrastive learning strategy that integrates correspondences across feature-level, image-level, and pixel-level representations to ensure the encoder and decoder capture comprehensive details from representations of varying scales and granularities during the pre-training phase.
no code implementations • 19 Sep 2023 • Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen
In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and speech synthesis technique, Azure TTS.
no code implementations • 19 Sep 2023 • Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang
Speaker diarization has gained considerable attention within speech processing research community.
no code implementations • 31 Aug 2023 • Wei-Jie Yan, Yun-Kai Xu, Qian Chen, Xiao-Fang Kong, Guo-Hua Gu, A-Jun Shao, Min-Jie Wan
Nowadays, infrared target tracking has been a critical technology in the field of computer vision and has many applications, such as motion analysis, pedestrian surveillance, intelligent detection, and so forth.
no code implementations • 9 Aug 2023 • Lei Zhu, Hangzhou He, Xinliang Zhang, Qian Chen, Shuang Zeng, Qiushi Ren, Yanye Lu
Existing methods adopt an online-trained classification branch to provide pseudo annotations for supervising the segmentation branch.
2 code implementations • 5 Aug 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Chong Deng, Shiliang Zhang, Wen Wang
To mitigate this problem, we introduce a diversity regularization term to embeddings in SDPN.
no code implementations • 14 Jul 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Chong Deng, Ma Yukun, Siqi Zheng
Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks.
no code implementations • 30 Jun 2023 • Yiqiang Chen, Teng Zhang, Xinlong Jiang, Qian Chen, Chenlong Gao, Wuliang Huang
The conflicting gradient projection technique is used to enhance the generalization of the large-scale general model between different tasks.
2 code implementations • 27 Jun 2023 • Siqi Zheng, Luyao Cheng, Yafeng Chen, Hui Wang, Qian Chen
Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community.
no code implementations • 24 May 2023 • Linhan Zhang, Qian Chen, Wen Wang, Yuxin Jiang, Bing Li, Wei Wang, Xin Cao
In this paper, we carefully design a new task called Multiple Definition Modeling (MDM) that pool together all contexts and definition of target words.
no code implementations • 23 May 2023 • Yunzhe Li, Qian Chen, Weixiang Yan, Wen Wang, Qinglin Zhang, Hari Sundaram
Furthermore, we identify an issue of imbalanced utilization of the outline information in the precise outline-conditioned generation, which is ubiquitously observed across fine-tuned models and zero-shot inference models.
no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie
The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 22 May 2023 • Luyao Cheng, Siqi Zheng, Zhang Qinglin, Hui Wang, Yafeng Chen, Qian Chen
In this paper, we propose methods to extract speaker-related information from semantic content in multi-party meetings, which, as we will show, can further benefit speaker diarization.
2 code implementations • 22 May 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Jiajun Qi
This paper proposes a novel architecture called Enhanced Res2Net (ERes2Net), which incorporates both local and global feature fusion techniques to improve the performance.
no code implementations • 21 May 2023 • Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai
For speech interaction, voice activity detection (VAD) is often used as a front-end.
no code implementations • 21 May 2023 • Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Li-Rong Dai
In addition, a two-pass decoding strategy is further proposed to fully leverage the contextual modeling ability resulting in a better recognition performance.
1 code implementation • 18 May 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang
Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e. g., BERT, without fine-tuning.
no code implementations • 10 Apr 2023 • Tingting Liu, YuAn Liu, Chuncheng Zhang, Yuan Liyin, Xiubao Sui, Qian Chen
Moreover, to further improve the perceptual quality of HSI, a frequency loss(HFL) is introduced to optimize the model in the frequency domain.
1 code implementation • 6 Apr 2023 • Xincheng Yang, Mingze Jin, Weiji He, Qian Chen
Transformer-based models have significantly advanced natural language processing and computer vision in recent years.
no code implementations • 27 Mar 2023 • Jiaqing Liu, Chong Deng, Qinglin Zhang, Qian Chen, Wen Wang
We construct and release the first Chinese meeting corpus with manual action item annotations.
no code implementations • 24 Mar 2023 • Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings.
1 code implementation • 24 Mar 2023 • Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection.
no code implementations • 7 Mar 2023 • Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models.
no code implementations • 28 Feb 2023 • Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Xin Cao, Kongzhang Hao, Yuxin Jiang, Wei Wang
Experiments on the Semantic Textual Similarity benchmark (STS) show that WSBERT significantly improves sentence embeddings over BERT.
no code implementations • 13 Feb 2023 • Jun Tao, Qian Chen, James W. Snyder Jr., Arava Sai Kumar, Amirhossein Meisami, Lingzhou Xue
Marketers employ various online advertising channels to reach customers, and they are particularly interested in attribution for measuring the degree to which individual touchpoints contribute to an eventual conversion.
1 code implementation • 16 Dec 2022 • Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang
Moreover, the pipelined approaches of retrieval and generation might result in poor generation performance when retrieval performance is low.
no code implementations • 14 Dec 2022 • Jinglin Liu, Zhenhui Ye, Qian Chen, Siqi Zheng, Wen Wang, Qinglin Zhang, Zhou Zhao
Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities.
1 code implementation • 9 Dec 2022 • Shuliang Ning, Mengcheng Lan, Yanran Li, Chaofeng Chen, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui
The mainstream of the existing approaches for video prediction builds up their models based on a Single-In-Single-Out (SISO) architecture, which takes the current frame as input to predict the next frame in a recursive manner.
no code implementations • 30 Nov 2022 • Bo wang, Yihong Wang, Xiubao Sui, YuAn Liu, Qian Chen
Guided image filter is a well-known local filter in image processing.
no code implementations • 17 Nov 2022 • Andong Deng, Taojiannan Yang, Chen Chen, Qian Chen, Leslie Neely, Sakiko Oyama
In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent.
1 code implementation • 8 Nov 2022 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen
A range of experiments conducted on the VoxCeleb datasets demonstrate the superiority of the regularized DINO framework in speaker verification.
no code implementations • 1 Nov 2022 • Mohan Shi, Jie Zhang, Zhihao Du, Fan Yu, Qian Chen, Shiliang Zhang, Li-Rong Dai
Speaker-attributed automatic speech recognition (SA-ASR) in multi-party meeting scenarios is one of the most valuable and challenging ASR task.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • 5 Sep 2022 • Qian Chen, Xingjian Dong, Guowei Tu, Dong Wang, Baoxuan Zhao, Zhike Peng
However, the CNN is a typical black-box model, and the mechanism of CNN's decision-making are not clear, which limits its application in high-reliability-required fault diagnosis scenarios.
1 code implementation • 16 Jul 2022 • Lei Zhu, Qian Chen, Lujia Jin, Yunfei You, Yanye Lu
Classification activation map (CAM), utilizing the classification structure to generate pixel-wise localization maps, is a crucial mechanism for weakly supervised object localization (WSOL).
1 code implementation • 30 Mar 2022 • Jiaao Zhan, Qian Chen, Boxing Chen, Wen Wang, Yu Bai, Yang Gao
We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input.
1 code implementation • CVPR 2022 • Lei Zhu, Qi She, Qian Chen, Yunfei You, Boyu Wang, Yanye Lu
To avoid this problem, this work provides a novel perspective that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the target/pixel domain to locate objects.
no code implementations • 16 Feb 2022 • Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao
Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).
1 code implementation • 29 Dec 2021 • Lei Zhu, Qi She, Qian Chen, Xiangxi Meng, Mufeng Geng, Lujia Jin, Zhe Jiang, Bin Qiu, Yunfei You, Yibao Zhang, Qiushi Ren, Yanye Lu
In our B-CAM, two image-level features, aggregated by pixel-level features of potential background and object locations, are used to purify the object feature from the object-related background and to represent the feature of the pure-background sample, respectively.
no code implementations • 17 Dec 2021 • Qian Chen, Haoxin Bai, Bingchen Che, Tianyun Zhao, Ce Zhang, Kaige Wang, Jintao Bai, Wei Zhao
To date, live-cell imaging at the nanometer scale remains challenging.
1 code implementation • Findings (ACL) 2022 • Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao
In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.
1 code implementation • ICLR 2022 • Chao-Hong Tan, Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Zhen-Hua Ling
We propose a novel Pooling Network (PoNet) for token mixing in long sequences with linear complexity.
no code implementations • 9 Sep 2021 • Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan
We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.
1 code implementation • 2 Aug 2021 • Zhen Li, Jing Tang, Deqing Zou, Qian Chen, Shouhuai Xu, Chao Zhang, Yichen Li, Hai Jin
Automatically detecting software vulnerabilities in source code is an important problem that has attracted much attention.
no code implementations • 21 Jul 2021 • Mengcheng Lan, Shuliang Ning, Yanran Li, Qian Chen, Xunlai Chen, Xiaoguang Han, Shuguang Cui
Despite video forecasting has been a widely explored topic in recent years, the mainstream of the existing work still limits their models with a single prediction space but completely neglects the way to leverage their model with multi-prediction spaces.
1 code implementation • 20 Jul 2021 • Qinglin Zhang, Qian Chen, YaLi Li, Jiaqing Liu, Wen Wang
Evaluations are conducted on the English Wiki-727K document segmentation benchmark, a Chinese Wikipedia-based document segmentation dataset we created, and an in-house Chinese spoken document dataset.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Pan Zhou, Benjamin I. P. Rubinstein, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
1 code implementation • 28 Apr 2021 • Yi Zhang, Geng Chen, Qian Chen, Yujia Sun, Yong Xia, Olivier Deforges, Wassim Hamidouche, Lu Zhang
We propose a novel Synergistic Attention Network (SA-Net) to address the light field salient object detection by establishing a synergistic effect between multi-modal features with advanced attention mechanisms.
no code implementations • 21 Apr 2021 • Qian Chen, Wen Wang, Mengzhe Chen, Qinglin Zhang
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 21 Apr 2021 • Qian Chen, Wen Wang, Qinglin Zhang
In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+7
1 code implementation • NeurIPS 2021 • Zhuolin Yang, Linyi Li, Xiaojun Xu, Shiliang Zuo, Qian Chen, Benjamin Rubinstein, Pan Zhou, Ce Zhang, Bo Li
To answer these questions, in this work we first theoretically analyze and outline sufficient conditions for adversarial transferability between models; then propose a practical algorithm to reduce the transferability between base models within an ensemble to improve its robustness.
1 code implementation • 25 Jan 2021 • Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, Hongwei Du
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams.
no code implementations • 25 Nov 2020 • Qinyan Huang, Weiwen Zhou, Minjie Wan, Xin Chen, Qian Chen, Guohua Gu
Active contour model (ACM) is one of the most widely used image segmentation tools at present, but the existing methods only utilize the local or global single feature information of image to minimize the energy function, which is easy to cause false segmentations in IR images.
6 code implementations • 19 Nov 2020 • Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Christina Dan Wang
In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies.
1 code implementation • 4 Nov 2020 • Qian Chen, Keren Fu, Ze Liu, Geng Chen, Hongwei Du, Bensheng Qiu, LingShao
Finally, we propose an effective layer-wise aggregation module to fuse the features extracted from the enhanced depth maps and RGB images for the accurate detection of salient objects.
no code implementations • 11 Oct 2020 • Chao Ma, Guohua Gu, Xin Miao, Minjie Wan, Weixian Qian, Kan Ren, Qian Chen
Infrared target tracking plays an important role in both civil and military fields.
no code implementations • 3 Mar 2020 • Qian Chen, Mengzhe Chen, Bo Li, Wen Wang
With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 3 Mar 2020 • Qian Chen, Wen Wang
The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.
no code implementations • 3 Mar 2020 • Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu
We explore different transfer learning approaches to reduce dependency on data collection and annotation.
Spoken Language Understanding
Task-Oriented Dialogue Systems
+2
1 code implementation • 1 Jan 2020 • Yuntao Du, Zhiwen Tan, Qian Chen, Xiaowen Zhang, Yirong Yao, Chongjun Wang
Recent experiments have shown that when the discriminator is provided with domain information in both domains and label information in the source domain, it is able to preserve the complex multimodal information and high semantic information in both domains.
Ranked #6 on
Domain Adaptation
on ImageCLEF-DA
1 code implementation • 31 Dec 2019 • Yuntao Du, Zhiwen Tan, Qian Chen, Yi Zhang, Chongjun Wang
In this paper, we propose a novel online transfer learning method which seeks to find a new feature representation, so that the marginal distribution and conditional distribution discrepancy can be online reduced simultaneously.
3 code implementations • EMNLP 2020 • Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.
1 code implementation • 5 Nov 2019 • Zhuo Wang, Runlong Hu, Qian Chen, Pei Gao, Xiaowei Xu
Previous works use review network effects, i. e. the relationships among reviewers, reviews, and products, to detect fake reviews or review spammers, but ignore time effects, which are critical in characterizing group spamming.
no code implementations • 19 Aug 2019 • Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling
We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models.
Ranked #27 on
Common Sense Reasoning
on CommonsenseQA
no code implementations • 24 Jun 2019 • Yuan Yuan, Tracy Liu, Chenhao Tan, Qian Chen, Alex Pentland, Jie Tang
Using data on 36 million online red packet gifts on a large social site in East Asia, we leverage a natural experimental design to identify the social contagion of gift giving in online groups.
1 code implementation • 27 Apr 2019 • Tianda Li, Xiaodan Zhu, Quan Liu, Qian Chen, Zhigang Chen, Si Wei
Natural language inference (NLI) is among the most challenging tasks in natural language understanding.
1 code implementation • 12 Apr 2019 • Jiaji Li, Alex Matlock, Yunzhe Li, Qian Chen, Chao Zuo, Lei Tian
We demonstrate a label-free, scan-free {\it intensity} diffraction tomography technique utilizing annular illumination (aIDT) to rapidly characterize large-volume 3D refractive index distributions in vitro.
Optics Biological Physics
16 code implementations • 28 Feb 2019 • Qian Chen, Zhu Zhuo, Wen Wang
Intent classification and slot filling are two essential tasks for natural language understanding.
Ranked #3 on
Slot Filling
on ATIS
4 code implementations • 9 Jan 2019 • Qian Chen, Wen Wang
The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.
Ranked #1 on
Conversational Response Selection
on Advising Corpus
no code implementations • 12 Oct 2018 • Xiaodong Kuang, Xiubao Sui, Chengwei Liu, Yu-An Liu, Qian Chen, Guohua Gu
Transforming a thermal infrared image into a realistic RGB image is a challenging task.
no code implementations • 5 Jul 2018 • Chao Wang, Richard Gerlach, Qian Chen
One-day-ahead VaR and ES forecasting results favor the proposed models, especially when incorporating the sub-sampled Realized Variance and the sub-sampled Realized Range in the model.
1 code implementation • COLING 2018 • Qian Chen, Zhen-Hua Ling, Xiaodan Zhu
This paper explores generalized pooling methods to enhance sentence embedding.
Ranked #10 on
Sentiment Analysis
on Yelp Fine-grained classification
no code implementations • ICLR 2018 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen
Modeling informal inference in natural language is very challenging.
no code implementations • 15 Nov 2017 • Yu-Ping Ruan, Qian Chen, Zhen-Hua Ling
The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs.
2 code implementations • ACL 2018 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei
With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance.
Ranked #22 on
Natural Language Inference
on SNLI
no code implementations • 15 Sep 2017 • Huidong Dai, Weiji He, Guohua Gu, Ling Ye, Tianyi Mao, Qian Chen
The proposed multi-resolution photon counting 3D imaging technique acquires a high-resolution 3D image from a coarse image and edges at successfully finer resolution sampled by Hadamard multiplexing along the wavelet trees.
2 code implementations • WS 2017 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen
The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task.
Ranked #71 on
Natural Language Inference
on SNLI
Natural Language Inference
Natural Language Understanding
+1
no code implementations • 31 May 2017 • Chao Zuo, Tianyang Tao, Shijie Feng, Lei Huang, Anand Asundi, Qian Chen
Recent advances in imaging sensors and digital light projection technology have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces of complex-shaped objects to be captured with improved resolution and accuracy.
no code implementations • 14 Mar 2017 • Junbei Zhang, Xiaodan Zhu, Qian Chen, Li-Rong Dai, Si Wei, Hui Jiang
The last several years have seen intensive interest in exploring neural-network-based models for machine comprehension (MC) and question answering (QA).
Ranked #39 on
Question Answering
on SQuAD1.1 dev
1 code implementation • 26 Oct 2016 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang
Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.
11 code implementations • ACL 2017 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen
Reasoning and inference are central to human and artificial intelligence.
Ranked #32 on
Natural Language Inference
on SNLI