no code implementations • 10 Jan 2025 • Qian Chen, Yafeng Chen, Yanni Chen, Mengzhe Chen, Yingda Chen, Chong Deng, Zhihao Du, Ruize Gao, Changfeng Gao, Zhifu Gao, Yabin Li, Xiang Lv, Jiaqing Liu, Haoneng Luo, Bin Ma, Chongjia Ni, Xian Shi, Jialong Tang, Hui Wang, Hao Wang, Wen Wang, Yuxuan Wang, Yunlan Xu, Fan Yu, Zhijie Yan, Yexin Yang, Baosong Yang, Xian Yang, Guanrou Yang, Tianyu Zhao, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Pei Zhang, Chong Zhang, Jinren Zhou
Previous models for voice interactions are categorized as native and aligned.
no code implementations • 9 Jan 2025 • YuXuan Li, Cheng Zhang, Wen Wang, Yongming Huang
Radio map, or pathloss map prediction, is a crucial method for wireless network modeling and management.
1 code implementation • 19 Dec 2024 • Hanlin Wang, Hao Ouyang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Qifeng Chen, Yujun Shen, LiMin Wang
The intuitive nature of drag-based interaction has led to its growing adoption for controlling object trajectories in image-to-video synthesis.
no code implementations • 18 Dec 2024 • Yihao Meng, Hao Ouyang, Hanlin Wang, Qiuyu Wang, Wen Wang, Ka Leong Cheng, Zhiheng Liu, Yujun Shen, Huamin Qu
The production of 2D animation follows an industry-standard workflow, encompassing four essential stages: character design, keyframe animation, in-betweening, and coloring.
1 code implementation • 13 Dec 2024 • Zhihao Du, Yuxuan Wang, Qian Chen, Xian Shi, Xiang Lv, Tianyu Zhao, Zhifu Gao, Yexin Yang, Changfeng Gao, Hui Wang, Fan Yu, Huadai Liu, Zhengyan Sheng, Yue Gu, Chong Deng, Wen Wang, Shiliang Zhang, Zhijie Yan, Jingren Zhou
By training on a large-scale multilingual dataset, CosyVoice 2 achieves human-parity naturalness, minimal response latency, and virtually lossless synthesis quality in the streaming mode.
no code implementations • 2 Dec 2024 • Dongsheng Han, Peng Wang, Wanli Ni, Wen Wang, Ailing Zheng, Dusit Niyato, Naofal Al-Dhahir
We propose a MF-RIS-enabled multi-user and multi-target ISAC system, and formulate an optimization problem to maximize the signal-to-interference-plus-noise ratio (SINR) of sensing targets.
1 code implementation • 22 Nov 2024 • Weijia Wu, MingYu Liu, Zeyu Zhu, Xi Xia, Haoen Feng, Wen Wang, Kevin Qinghong Lin, Chunhua Shen, Mike Zheng Shou
Recent advancements in video generation models, like Stable Video Diffusion, show promising results, but primarily focus on short, single-scene videos.
1 code implementation • 15 Nov 2024 • Zewen Chen, Juan Wang, Wen Wang, Sunhan Xu, Hang Xiong, Yun Zeng, Jian Guo, Shuxun Wang, Chunfeng Yuan, Bing Li, Weiming Hu
The quality analysis of ROIs can provide fine-grained guidance for image quality improvement and is crucial for scenarios focusing on region-level quality.
no code implementations • 14 Nov 2024 • Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Wen Wang, Zhiheng Liu, Qifeng Chen, Yujun Shen
Image editing involves a variety of complex tasks and requires efficient and precise manipulation techniques.
1 code implementation • 5 Nov 2024 • Jiejun Tan, Zhicheng Dou, Wen Wang, Mang Wang, WeiPeng Chen, Ji-Rong Wen
To alleviate this problem, we propose HtmlRAG, which uses HTML instead of plain text as the format of retrieved knowledge in RAG.
no code implementations • 24 Oct 2024 • Wen Wang, Qiuyu Wang, Kecheng Zheng, Hao Ouyang, Zhekai Chen, Biao Gong, Hao Chen, Yujun Shen, Chunhua Shen
We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity.
1 code implementation • 23 Oct 2024 • Qinglin Zhang, Luyao Cheng, Chong Deng, Qian Chen, Wen Wang, Siqi Zheng, Jiaqing Liu, Hai Yu, Chaohong Tan, Zhihao Du, Shiliang Zhang
However, achieving low latency and natural interactions in full-duplex dialogue systems remains a significant challenge, especially considering human conversation dynamics such as interruptions, backchannels, and overlapping speech.
no code implementations • 9 Oct 2024 • Wanli Ni, Wen Wang, Ailing Zheng, Peng Wang, Changsheng You, Yonina C. Eldar, Dusit Niyato, Robert Schober
Furthermore, we present two schemes that utilize MF-RISs to enhance the performance of integrated sensing and communication (ISAC).
no code implementations • 13 Sep 2024 • Yidi Jiang, Ruijie Tao, Wen Huang, Qian Chen, Wen Wang
Sound Event Detection (SED) detects regions of sound events, while Speaker Diarization (SD) segments speech conversations attributed to individual speakers.
1 code implementation • 29 Aug 2024 • Shengpeng Ji, Ziyue Jiang, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Xize Cheng, Zehan Wang, RuiQi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao
Despite the reduced number of tokens, WavTokenizer achieves state-of-the-art reconstruction quality with outstanding UTMOS scores and inherently contains richer semantic information.
no code implementations • 19 Aug 2024 • Jiaqing Liu, Chong Deng, Qinglin Zhang, Shilin Zhou, Qian Chen, Hai Yu, Wen Wang
To improve readability, we propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors and also transfer the informal text into the formal style with content preserved, utilizing contexts and auxiliary information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Aug 2024 • Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang
In this work, we improve supervised VTS by thoroughly exploring multimodal fusion and multimodal coherence modeling.
no code implementations • 23 Jul 2024 • Canyu Zhao, MingYu Liu, Wen Wang, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen
Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering.
1 code implementation • 6 Jul 2024 • Zhekai Chen, Wen Wang, Zhen Yang, Zeqing Yuan, Hao Chen, Chunhua Shen
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image.
3 code implementations • 4 Jul 2024 • Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang, Zhangyu Xiao, Zhijie Yan, Yexin Yang, Bin Zhang, Qinglin Zhang, Shiliang Zhang, Nan Zhao, Siqi Zheng
This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs).
1 code implementation • 19 Jun 2024 • Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu, Xuandong Zhao
Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations.
2 code implementations • 17 Jun 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Shiliang Zhang, Wen Wang
SDPN assigns the representation of the augmented views of an utterance to the same prototypes as the representation of the original view, thereby enabling effective knowledge transfer between the views.
no code implementations • 17 Jun 2024 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang
The Transformer architecture has significantly advanced deep learning, particularly in natural language processing, by effectively managing long-range dependencies.
1 code implementation • 3 Jun 2024 • Shengpeng Ji, Jialong Zuo, Wen Wang, Minghui Fang, Siqi Zheng, Qian Chen, Ziyue Jiang, Hai Huang, Zehan Wang, Xize Cheng, Zhou Zhao
In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt.
no code implementations • 25 May 2024 • Wanli Ni, Ailing Zheng, Wen Wang, Dusit Niyato, Naofal Al-Dhahir, Merouane Debbah
Although reconfigurable intelligent surfaces (RISs) have demonstrated the potential to boost network capacity and expand coverage by adjusting their electromagnetic properties, existing RIS architectures have certain limitations, such as double-fading attenuation and restricted half-space coverage.
2 code implementations • CVPR 2024 • Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen
Experiments show that our method's produced images are consistent with the given concepts and better aligned with the input text.
1 code implementation • 30 Apr 2024 • Yuchen Tian, Weixiang Yan, Qian Yang, Xuandong Zhao, Qian Chen, Wen Wang, Ziyang Luo, Lei Ma, Dawn Song
By evaluating 17 popular LLMs using this benchmark, we reveal significant differences in their accuracy and reliability in code generation, offering detailed insights for further improving the code generation capabilities of LLMs.
1 code implementation • 16 Apr 2024 • Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Tianliu He, Wen Wang
On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI inference without relying on remote servers.
2 code implementations • 29 Mar 2024 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Rongjie Huang, Chong Deng, Qian Chen, Shiliang Zhang, Wen Wang, Xihao Li
With 3D-Speaker-Toolkit, we establish a new benchmark for multimodal speaker analysis.
2 code implementations • 18 Mar 2024 • Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu
Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts.
1 code implementation • 29 Feb 2024 • Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li
Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations.
no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen
We offer a new perspective on approaching the task of video generation.
1 code implementation • 7 Dec 2023 • Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Tian Wen, Wen Wang
ALU drastically decreases the frequency of communication in federated distillation, thereby significantly reducing the communication overhead during the training process.
1 code implementation • 19 Nov 2023 • Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen
We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.
1 code implementation • 14 Nov 2023 • Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Hari Sundaram, Shuiguang Deng
Finally, we systematically evaluate and analyze eight mainstream LLMs and demonstrate the superior breadth and challenges of CodeScope for evaluating LLMs on code understanding and generation tasks compared to other benchmarks.
1 code implementation • 8 Nov 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Shiliang Zhang, Chong Deng, Yukun Ma, Hai Yu, Jiaqing Liu, Chong Zhang
We find that applying the conventional cross-entropy loss on input speech tokens does not consistently improve the ASR performance over the Loss Masking approach.
no code implementations • 28 Oct 2023 • Wenju Sun, Qingyong Li, Wen Wang, Yangli-ao Geng
The knowledge from the plastic learner is transferred to the stable learner via cumulative parameter averaging.
1 code implementation • 18 Oct 2023 • Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang
Our approach improve $F_1$ of old SOTA by 3. 42 (73. 74 -> 77. 16) and reduces $P_k$ by 1. 11 points (15. 0 -> 13. 89) on WIKI-727K and achieves an average relative reduction of 4. 3% on $P_k$ on WikiSection.
1 code implementation • 18 Oct 2023 • Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen
Subsequently, we propose an additional reassembly step to seamlessly integrate the respective editing results and the non-editing region to obtain the final edited image.
1 code implementation • 8 Oct 2023 • Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang
To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of programming languages for code translation.
2 code implementations • 7 Oct 2023 • Zhihao Du, JiaMing Wang, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang
Previous mainstream audio-and-text LLMs use discrete audio tokens to represent both input and output audio; however, they suffer from performance degradation on tasks such as automatic speech recognition, speech-to-text translation, and speech enhancement over models using continuous speech features.
no code implementations • 4 Oct 2023 • Wen Wang, Wanli Ni, Hui Tian, Yonina C. Eldar, Rui Zhang
In this paper, we propose and study a multi-functional reconfigurable intelligent surface (MF-RIS) architecture.
no code implementations • 4 Oct 2023 • Wen Wang, Wanli Ni, Hui Tian, Naofal Al-Dhahir
To realize a self-sustainable communication system, we investigate the use of MF-RIS in improving the sum-rate of multi-user wireless networks.
1 code implementation • IEEE Transactions on Image Processing 2023 • Fangtai Guo, Tianlei Jin, Shiqiang Zhu, Xiangming Xi, Wen Wang, Qiwei Meng, Wei Song, and Jiakai Zhu
Human Action Recognition plays a driving engine of many human-computer interaction applications.
Ranked #20 on Action Recognition on NTU RGB+D
2 code implementations • 5 Aug 2023 • Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Qian Chen, Chong Deng, Shiliang Zhang, Wen Wang
To mitigate this problem, we introduce a diversity regularization term to embeddings in SDPN.
no code implementations • 14 Jul 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Chong Deng, Ma Yukun, Siqi Zheng
Transformer-based pre-trained language models, such as BERT, achieve great success in various natural language understanding tasks.
no code implementations • 24 May 2023 • Linhan Zhang, Qian Chen, Wen Wang, Yuxin Jiang, Bing Li, Wei Wang, Xin Cao
In this paper, we carefully design a new task called Multiple Definition Modeling (MDM) that pool together all contexts and definition of target words.
no code implementations • 23 May 2023 • Yunzhe Li, Qian Chen, Weixiang Yan, Wen Wang, Qinglin Zhang, Hari Sundaram
Furthermore, we identify an issue of imbalanced utilization of the outline information in the precise outline-conditioned generation, which is ubiquitously observed across fine-tuned models and zero-shot inference models.
1 code implementation • 18 May 2023 • Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Chong Deng, Hai Yu, Jiaqing Liu, Yukun Ma, Chong Zhang
Prior studies diagnose the anisotropy problem in sentence representations from pre-trained language models, e. g., BERT, without fine-tuning.
3 code implementations • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.
Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)
1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen
Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.
no code implementations • 27 Mar 2023 • Jiaqing Liu, Chong Deng, Qinglin Zhang, Qian Chen, Wen Wang
We construct and release the first Chinese meeting corpus with manual action item annotations.
no code implementations • 24 Mar 2023 • Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
ICASSP2023 General Meeting Understanding and Generation Challenge (MUG) focuses on prompting a wide range of spoken language processing (SLP) research on meeting transcripts, as SLP applications are critical to improve users' efficiency in grasping important information in meetings.
1 code implementation • 24 Mar 2023 • Qinglin Zhang, Chong Deng, Jiaqing Liu, Hai Yu, Qian Chen, Wen Wang, Zhijie Yan, Jinglin Liu, Yi Ren, Zhou Zhao
To prompt SLP advancement, we establish a large-scale general Meeting Understanding and Generation Benchmark (MUG) to benchmark the performance of a wide range of SLP tasks, including topic segmentation, topic-level and session-level extractive summarization and topic title generation, keyphrase extraction, and action item detection.
no code implementations • 7 Mar 2023 • Jinjie Ni, Yukun Ma, Wen Wang, Qian Chen, Dianwen Ng, Han Lei, Trung Hieu Nguyen, Chong Zhang, Bin Ma, Erik Cambria
Learning on a massive amount of speech corpus leads to the recent success of many self-supervised speech models.
no code implementations • 28 Feb 2023 • Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Xin Cao, Kongzhang Hao, Yuxin Jiang, Wei Wang
Experiments on the Semantic Textual Similarity benchmark (STS) show that WSBERT significantly improves sentence embeddings over BERT.
no code implementations • ICCV 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang
We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.
no code implementations • CVPR 2023 • Tianlei Jin, Fangtai Guo, Qiwei Meng, Shiqiang Zhu, Xiangming Xi, Wen Wang, Zonghao Mu, Wei Song
Therefore, at the context level, we can produce diverse context descriptions by using a context augmentation method based on the original dataset.
1 code implementation • CVPR 2023 • Wenju Sun, Qingyong Li, Jing Zhang, Wen Wang, Yangli-ao Geng
BMKP decouples the functions of learning and knowledge remembering via a bilevel-memory design: a working memory responsible for adaptively model learning, to ensure plasticity; a long-term memory in charge of enduringly storing the knowledge incorporated within the learned model, to guarantee stability.
1 code implementation • 16 Dec 2022 • Qian Yang, Qian Chen, Wen Wang, Baotian Hu, Min Zhang
Moreover, the pipelined approaches of retrieval and generation might result in poor generation performance when retrieval performance is low.
no code implementations • 14 Dec 2022 • Jinglin Liu, Zhenhui Ye, Qian Chen, Siqi Zheng, Wen Wang, Qinglin Zhang, Zhou Zhao
Recently, binaural audio synthesis (BAS) has emerged as a promising research field for its applications in augmented and virtual realities.
1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang
In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.
Ranked #6 on Personalized Segmentation on PerSeg
6 code implementations • CVPR 2023 • Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.
Ranked #1 on Object Detection on COCO-O
no code implementations • 30 Sep 2022 • Wen Wang, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao
The extraction of sequence patterns from a collection of functionally linked unlabeled DNA sequences is known as DNA motif discovery, and it is a key task in computational biology.
1 code implementation • 20 Jul 2022 • Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
Ranked #15 on Audio Generation on AudioCaps (FD metric)
1 code implementation • 8 Jul 2022 • Wen Wang, Shunda Hu, Shiqiang Zhu, Wei Song, Zheyuan Lin, Tianlei Jin, Zonghao Mu, Yuanhai Zhou
A service robot serving safely and politely needs to track the surrounding people robustly, especially for Tour-Guide Robot (TGR).
1 code implementation • CVPR 2023 • Xu Zhang, Wen Wang, Zhe Chen, Yufei Xu, Jing Zhang, DaCheng Tao
Motivated by the progress of visual-language research, we propose that pre-trained language models (e. g., CLIP) can facilitate animal pose estimation by providing rich prior knowledge for describing animal keypoints in text.
no code implementations • 29 May 2022 • Wen Wang, Wanli Ni, Hui Tian, Zhaohui Yang, Chongwen Huang, Kai-Kit Wong
This paper investigates the use of the reconfigurable dual-functional surface to guarantee the full-space secure transmission in non-orthogonal multiple access (NOMA) networks.
no code implementations • 20 May 2022 • Qingzhong Wang, Haifang Li, Haoyi Xiong, Wen Wang, Jiang Bian, Yu Lu, Shuaiqiang Wang, Zhicong Cheng, Dejing Dou, Dawei Yin
To handle the diverse query requests from users at web-scale, Baidu has done tremendous efforts in understanding users' queries, retrieve relevant contents from a pool of trillions of webpages, and rank the most relevant webpages on the top of results.
1 code implementation • 30 Mar 2022 • Jiaao Zhan, Qian Chen, Boxing Chen, Wen Wang, Yu Bai, Yang Gao
We propose a novel and general Dependency-Aware Decoder (DePA) to enhance target dependency modeling in the decoder of fully NAT models from two perspectives: decoder self-attention and decoder input.
2 code implementations • 17 Mar 2022 • Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, DaCheng Tao
Besides, we introduce a simple yet effective label augmentation method to provide richer supervision and improve data efficiency.
1 code implementation • 5 Jan 2022 • Wenju Sun, Qingyong Li, Jing Zhang, Danyu Wang, Wen Wang, Yangli-ao Geng
DisCOIL follows the basic principle of POC, but it adopts variational auto-encoders (VAE) instead of other well-established one-class classifiers (e. g. deep SVDD), because a trained VAE can not only identify the probability of an input sample belonging to a class but also generate pseudo samples of the class to assist in learning new tasks.
Ranked #7 on Exemplar-Free Counting on FSC147
no code implementations • 4 Jan 2022 • Wen Wang, Shihao Wu, Ziwei Zhu, Ling Zhou, Peter X. -K. Song
Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group.
1 code implementation • Findings (ACL) 2022 • Linhan Zhang, Qian Chen, Wen Wang, Chong Deng, Shiliang Zhang, Bing Li, Wei Wang, Xin Cao
In this work, we propose a novel unsupervised embedding-based KPE approach, Masked Document Embedding Rank (MDERank), to address this problem by leveraging a mask strategy and ranking candidates by the similarity between embeddings of the source document and the masked document.
1 code implementation • ICLR 2022 • Chao-Hong Tan, Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Zhen-Hua Ling
We propose a novel Pooling Network (PoNet) for token mixing in long sequences with linear complexity.
no code implementations • ICLR 2022 • Wen Wang, Yang Cao, Jing Zhang, DaCheng Tao
To this end, we propose the task adapter which leverages self-attention to model the contextual relation between object query embedding.
3 code implementations • ICCV 2021 • Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia
In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts from scanned PDF documents, we aim to establish a practical table structure parsing system for real-world scenarios where tabular input images are taken or scanned with severe deformation, bending or occlusions.
1 code implementation • 27 Jul 2021 • Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, DaCheng Tao
In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains.
1 code implementation • 20 Jul 2021 • Qinglin Zhang, Qian Chen, YaLi Li, Jiaqing Liu, Wen Wang
Evaluations are conducted on the English Wiki-727K document segmentation benchmark, a Chinese Wikipedia-based document segmentation dataset we created, and an in-house Chinese spoken document dataset.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 7 Jul 2021 • Chengzhi Jiang, Yanzhou Su, Wen Wang, Haiwei Bai, Haijun Liu, Jian Cheng
This method, named IntraLoss, explicitly performs gradient enhancement in the anisotropic region so that the intra-class distribution continues to shrink, resulting in isotropic and more compact intra-class distribution and further margin between identities.
1 code implementation • ICCV 2021 • Wenyuan Xue, Baosheng Yu, Wen Wang, DaCheng Tao, Qingyong Li
A table arranging data in rows and columns is a very effective data structure, which has been widely used in business and scientific research.
1 code implementation • ACL 2021 • Yongliang Shen, Xinyin Ma, Zeqi Tan, Shuai Zhang, Wen Wang, Weiming Lu
Although these methods have the innate ability to handle nested NER, they suffer from high computational cost, ignorance of boundary information, under-utilization of the spans that partially match with entities, and difficulties in long entity recognition.
Ranked #6 on Nested Named Entity Recognition on GENIA
Chinese Named Entity Recognition named-entity-recognition +3
no code implementations • 25 Apr 2021 • Wen Wang, Andreas Stolcke, Jing Zheng
In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.
no code implementations • 21 Apr 2021 • Qian Chen, Wen Wang, Qinglin Zhang
In this paper, we propose a novel joint textual-phonetic pre-training approach for learning spoken language representations, aiming at exploring the full potentials of phonetic information to improve SLU robustness to ASR errors.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 21 Apr 2021 • Qian Chen, Wen Wang, Mengzhe Chen, Qinglin Zhang
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 5 Mar 2021 • Wei zhang, Zeyuan Chen, Chao Dong, Wen Wang, Hongyuan Zha, Jianyong Wang
However, they encounter two main limitations: (1) Correlations between answers in the same question are often overlooked.
no code implementations • 3 Dec 2020 • Wen Wang, Honglei Zhuang, Mi Zhou, Hanyu Liu, Beibei Li
Based on these insights, we then propose a hierarchical course BERT model to predict teachers' performance in online education.
1 code implementation • 11 Mar 2020 • Yue Zhao, Xiyang Hu, Cheng Cheng, Cong Wang, Changlin Wan, Wen Wang, Jianing Yang, Haoping Bai, Zheng Li, Cao Xiao, Yunlong Wang, Zhi Qiao, Jimeng Sun, Leman Akoglu
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples with numerous high-stake applications including fraud detection and intrusion detection.
no code implementations • 7 Mar 2020 • Wen Wang, Xiaojiang Peng, Yanzhou Su, Yu Qiao, Jian Cheng
Video action anticipation aims to predict future action categories from observed frames.
no code implementations • 3 Mar 2020 • Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu
We explore different transfer learning approaches to reduce dependency on data collection and annotation.
Spoken Language Understanding Task-Oriented Dialogue Systems +2
1 code implementation • 3 Mar 2020 • Qian Chen, Wen Wang
The noetic end-to-end response selection challenge as one track in the 7th Dialog System Technology Challenges (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.
no code implementations • 3 Mar 2020 • Qian Chen, Mengzhe Chen, Bo Li, Wen Wang
With the increased applications of automatic speech recognition (ASR) in recent years, it is essential to automatically insert punctuation marks and remove disfluencies in transcripts, to improve the readability of the transcripts as well as the performance of subsequent applications, such as machine translation, dialogue systems, and so forth.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 19 Feb 2020 • Wen Wang, Wei zhang, Shukai Liu, Qi Liu, Bo Zhang, Leyu Lin, Hongyuan Zha
Specifically, we build a Multi-Relational Item Graph (MRIG) based on all behavior sequences from all sessions, involving target and auxiliary behavior types.
1 code implementation • 21 Jan 2020 • Wen Wang, Xiaojiang Peng, Yu Qiao, Jian Cheng
Online action detection (OAD) is a practical yet challenging task, which has attracted increasing attention in recent years.
no code implementations • 28 Nov 2019 • Wen Wang, Lijun Du, Yinxing Gao, Yanzhou Su, Feng Wang, Jian Cheng
Concretely, for remote sensing image scene classification, we would like to map images from the same scene to feature vectors that are close, and map images from different scenes to feature vectors that are widely separated.
no code implementations • 19 Aug 2019 • Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling
We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models.
Ranked #27 on Common Sense Reasoning on CommonsenseQA
no code implementations • 10 Jun 2019 • Hao Lang, Wen Wang
The RBSMA algorithm improves the test set accuracy by 7. 8% relative compared to the standard beam search.
no code implementations • 30 May 2019 • Haijun Liu, Jian Cheng, Shiguang Wang, Wen Wang
Unlike existing cross-domain Re-ID methods, leveraging the auxiliary information of those unlabeled target-domain data, we aim at enhancing the model generalization and adaptation by discriminative feature learning, and directly exploiting a pre-trained model to new domains (datasets) without any utilization of the information from target domains.
no code implementations • 30 May 2019 • Haijun Liu, Jian Cheng, Wen Wang, Yanzhou Su
A large amount of loss functions based on pair distances have been presented in the literature for guiding the training of deep metric learning.
16 code implementations • 28 Feb 2019 • Qian Chen, Zhu Zhuo, Wen Wang
Intent classification and slot filling are two essential tasks for natural language understanding.
Ranked #3 on Slot Filling on ATIS
4 code implementations • 9 Jan 2019 • Qian Chen, Wen Wang
The noetic end-to-end response selection challenge as one track in Dialog System Technology Challenges 7 (DSTC7) aims to push the state of the art of utterance classification for real world goal-oriented dialog systems, for which participants need to select the correct next utterances from a set of candidates for the multi-turn context.
Ranked #1 on Conversational Response Selection on Advising Corpus
no code implementations • 29 Nov 2018 • Wen Wang, Rema Padman, Nirav Shah
Stratifying patients at risk for postoperative complications may facilitate timely and accurate workups and reduce the burden of adverse events on patients and the health system.
no code implementations • 19 Oct 2018 • Wen Wang, Yongjian Wu, Haijun Liu, Shiguang Wang, Jian Cheng
Temporal action detection aims at not only recognizing action category but also detecting start time and end time for each action instance in an untrimmed video.
no code implementations • 16 Feb 2018 • Vikramjit Mitra, Wen Wang, Chris Bartels, Horacio Franco, Dimitra Vergyri
This paper explores the use of multi-view features and their discriminative transforms in a convolutional deep neural network (CNN) architecture for a continuous large vocabulary speech recognition task.
no code implementations • CVPR 2017 • Wen Wang, Ruiping Wang, Shiguang Shan, Xilin Chen
For face recognition with image sets, while most existing works mainly focus on building robust set models with hand-crafted feature, it remains a research gap to learn better image representations which can closely match the subsequent image set modeling and classification.
no code implementations • 7 Sep 2015 • Katrin Kirchhoff, Bing Zhao, Wen Wang
Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language.
no code implementations • CVPR 2015 • Wen Wang, Ruiping Wang, Zhiwu Huang, Shiguang Shan, Xilin Chen
This paper presents a method named Discriminant Analysis on Riemannian manifold of Gaussian distributions (DARG) to solve the problem of face recognition with image sets.
no code implementations • 10 Feb 2014 • Wen Wang, Zhen Cui, Hong Chang, Shiguang Shan, Xilin Chen
In this paper, we propose a simple but effective coupled neural network, called Deeply Coupled Autoencoder Networks (DCAN), which seeks to build two deep neural networks, coupled with each other in every corresponding layers.