1 code implementation • 2 Oct 2024 • Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang
Our experiments on various datasets confirm the correspondence between our identified skill paths and language skills, and validate three longstanding hypotheses: 1) Language skills are identifiable through circuit dissection; 2) Simple language skills reside in shallow layers, whereas complex language skills are found in deeper layers; 3) Complex language skills are formed on top of simpler language skills.
no code implementations • 26 Sep 2024 • Shifu Xiong, Mengzhi Wang, Genshun Wan, Hang Chen, Jianqing Gao, LiRong Dai
In this work, we propose deep CLAS to use contextual information better.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 5 Sep 2024 • Genshun Wan, Mengzhi Wang, Tingzhi Mao, Hang Chen, Zhongfu Ye
This enables the lightweight transducer achieving similar results to transducer.
Ranked #7 on Speech Recognition on AISHELL-1
no code implementations • 27 Jul 2024 • Yinheng Li, Han Ding, Hang Chen
we provide an comprehensive review of common data processing techniques used in modern multimodal model training with a focus on diffusion models and multimodal large language models (MLLMs).
no code implementations • 26 Jul 2024 • Han Ding, Yinheng Li, Junhao Wang, Hang Chen
In this survey, we provide a comprehensive review of the current research on using LLMs as agents in financial trading.
1 code implementation • 17 Jul 2024 • Hang Chen, Collin Meese, Mark Nejad, Chien-Chung Shen
Low-latency traffic prediction is vital for smart city traffic management.
no code implementations • 6 Jul 2024 • Hang Chen, Sankepally Sainath Reddy, Ziwei Chen, Dianbo Liu
The dimensionality of the embedding and the number of available embeddings ( also called codebook size) are critical factors influencing the performance of Vector Quantization(VQ), a discretization process used in many models such as the Vector Quantized Variational Autoencoder (VQ-VAE) architecture.
no code implementations • 27 Jun 2024 • Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin
Here, we propose an optical super-oscillatory diffractive neural network, i. e., SODNN, that can achieve super-resolved spatial resolution for imaging beyond the diffraction limit with superior performance over existing methods.
1 code implementation • 21 May 2024 • Hang Chen, Xinyu Yang, Jiaying Zhu, Wenya Wang
Empirical results show that (1) our method gives consistent measurements which align with existing observations based on performance metrics, validating the effectiveness of our emergence quantification; (2) our proposed metric uncovers novel emergence patterns such as the correlations between the variance of our metric and the number of ``shots'' in ICL, which further suggests a new way of interpreting hallucinations in LLMs; (3) we offer a potential solution towards estimating the emergence of larger and closed-resource LMs via smaller LMs like GPT-2.
1 code implementation • 27 Apr 2024 • Yuhang Gan, Wenjie Xuan, Hang Chen, Juhua Liu, Bo Du
The C2FG module aims to seamlessly integrate the side prediction from the previous coarse-scale into the current fine-scale prediction in a coarse-to-fine manner, while LF module assumes that the contribution of each stage and each spatial location is independent, thus designing a learnable module to fuse multiple predictions.
Ranked #13 on Change Detection on WHU-CD
1 code implementation • CVPR 2024 • Yusheng Dai, Hang Chen, Jun Du, Ruoyu Wang, Shihao Chen, Jiefeng Ma, Haotian Wang, Chin-Hui Lee
In this paper, we investigate this contrasting phenomenon from the perspective of modality bias and reveal that an excessive modality bias on the audio caused by dropout is the underlying reason.
no code implementations • CVPR 2024 • Yun Li, Zhe Liu, Hang Chen, Lina Yao
Our framework evaluates the specificity of attributes by considering the diversity of objects they apply to and their related context.
1 code implementation • 16 Jan 2024 • Hang Chen, Xinyu Yang, Keqing Du
These highpoints make the probabilistic model capable of overcoming challenges brought by the coexistence of multi-structure data and multi-value representations and pave the way for the extension of latent confounders.
no code implementations • 26 Dec 2023 • Hang Chen, Yuchuan Jang, Weijie Zhou, Cristian Meo, Ziwei Chen, Dianbo Liu
Individuals, despite having varied life experiences and learning processes, can communicate effectively through languages.
no code implementations • 21 Nov 2023 • Keqing Du, Xinyu Yang, Hang Chen
CASR works out by reducing the difference in the causal adjacency matrix between we constructed and pre-segmentation results of backbone models.
no code implementations • 2 Nov 2023 • Hang Chen, Keqing Du, Chenguang Li, Xinyu Yang
The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area.
no code implementations • 28 Oct 2023 • Hang Chen, Xinyu Yang, Keqing Du
The cross-pollination of deep learning and causal discovery has catalyzed a burgeoning field of research seeking to elucidate causal relationships within non-statistical data forms like images, videos, and text.
no code implementations • 28 Sep 2023 • Yinheng Li, Shaofei Wang, Han Ding, Hang Chen
In this paper, we provide a practical survey focused on two key aspects of utilizing LLMs for financial tasks: existing solutions and guidance for adoption.
no code implementations • 15 Sep 2023 • Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments.
no code implementations • 11 Sep 2023 • Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng
Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion.
no code implementations • 28 Aug 2023 • Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee
This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios.
1 code implementation • 14 Aug 2023 • Yusheng Dai, Hang Chen, Jun Du, Xiaofei Ding, Ning Ding, Feijun Jiang, Chin-Hui Lee
In this paper, we propose two novel techniques to improve audio-visual speech recognition (AVSR) under a pre-training and fine-tuning training framework.
Audio-Visual Speech Recognition Automatic Speech Recognition +2
no code implementations • 17 Jul 2023 • Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee
Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios.
1 code implementation • 28 May 2023 • Hang Chen, Bingyu Liao, Jing Luo, Wenjing Zhu, Xinyu Yang
Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models including Large Language Model.
no code implementations • 27 May 2023 • Xiao Li, Hang Chen, Xiaolin Hu
We argue that using adversarially pre-trained backbone networks is essential for enhancing the adversarial robustness of object detectors.
1 code implementation • 4 May 2023 • Hang Chen, Jing Luo, Xinyu Yang, Wenjing Zhu
noise terms into the conversation process, thereby constructing a structural causal model (SCM).
no code implementations • 4 May 2023 • Hang Chen, Xinyu Yang, Qing Yang
We implement the above designs as a dynamic variational inference model, tailored to learn causal representation from indefinite data under latent confounding.
2 code implementations • 21 Dec 2022 • Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu
Then, inspired by the large number of connections between cortical regions and the thalamus, the model fuses the auditory and visual information in a thalamic subnetwork through top-down connections.
Ranked #3 on Speech Separation on VoxCeleb2
1 code implementation • 9 Dec 2022 • Ziyang Zheng, Zhengyang Duan, Hang Chen, Rui Yang, Sheng Gao, Haiou Zhang, Hongkai Xiong, Xing Lin
Photonic neural network (PNN) is a remarkable analog artificial intelligence (AI) accelerator that computes with photons instead of electrons to feature low latency, high energy efficiency, and high parallelism.
no code implementations • 7 Dec 2022 • Pengcheng Li, Genshun Wan, Fenglin Ding, Hang Chen, Jianqing Gao, Jia Pan, Cong Liu
Speech pre-training has shown great success in learning useful and general latent representations from large-scale unlabeled data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 7 Dec 2022 • Genshun Wan, Tan Liu, Hang Chen, Jia Pan, Cong Liu, Zhongfu Ye
Self-supervised learning (SSL) models have achieved considerable improvements in automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 30 Nov 2022 • Zhengyang Duan, Hang Chen, Xing Lin
By encoding multi-task inputs into multi-wavelength channels, the system can increase the computing throughput and significantly alle-viate the competition to perform multiple tasks in parallel with high accuracy.
no code implementations • 26 Oct 2022 • Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee
In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function.
no code implementations • 26 Sep 2022 • Yun Zhao, Hang Chen, Min Lin, Haiou Zhang, Tao Yan, Xing Lin, Ruqi Huang, Qionghai Dai
Increasing the layer number of on-chip photonic neural networks (PNNs) is essential to improve its model performance.
no code implementations • 14 Sep 2022 • Hang Chen, Keqing Du, Xinyu Yang, Chenguang Li
Understanding causality helps to structure interventions to achieve specific goals and enables predictions under interventions.
no code implementations • 29 Aug 2022 • Hang Chen, Xinyu Yang, Xiang Li
To learn it applicably, we propose a general clause-level encoding model named EA-GAT comprising E-GAT and Activation Sort.
2 code implementations • CVPR 2021 • Chufeng Tang, Hang Chen, Xiao Li, Jianmin Li, Zhaoxiang Zhang, Xiaolin Hu
Tremendous efforts have been made on instance segmentation but the mask quality is still not satisfactory.
2 code implementations • 9 Jan 2021 • Hang Chen, Syed Ali Asif, Jihong Park, Chien-Chung Shen, Mehdi Bennis
Federated learning (FL) is a promising distributed learning solution that only exchanges model parameters without revealing raw data.
no code implementations • 28 Dec 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Chin-Hui Lee, Bao-Cai Yin
In this paper, we propose a novel deep learning architecture to improving word-level lip-reading.
no code implementations • 19 Oct 2020 • Hang Chen, Guoqiang Yao, Jianhao Hu
According to the influence of the moment matching and parameter selection for the performance of the EP MIMO detector, we propose a modified EP MIMO detector (MEPD).
no code implementations • 21 Sep 2020 • Hang Chen, Jun Du, Yu Hu, Li-Rong Dai, Bao-Cai Yin, Chin-Hui Lee
We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE).
no code implementations • WS 2019 • Hoang Van, Ahmad Musa, Hang Chen, Stephen Kobourov, Mihai Surdeanu
Second, we investigate the effect of socioeconomic factors (income, poverty, and education) on predicting state-level T2DM rates.
no code implementations • 14 Jan 2019 • Yi Zhen, Hang Chen, Xu Zhang, Meng Liu, Xin Meng, Jian Zhang, Jiantao Pu
To investigate whether and to what extent central serous chorioretinopathy (CSC) depicted on color fundus photographs can be assessed using deep learning technology.