no code implementations • 28 May 2025 • Jiaxi Yang, Mengqi Zhang, Yiqiao Jin, Hao Chen, Qingsong Wen, Lu Lin, Yi He, Weijie Xu, James Evans, Jindong Wang
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence.
no code implementations • 23 May 2025 • Yue Xiao, Yi He, Yaqing Zhang, Xin Lin, Ming Zhang
Under extreme conditions, autonomous drifting enables vehicles to follow predefined paths at large slip angles, significantly enhancing the control system's capability to handle hazardous scenarios.
no code implementations • 12 Mar 2025 • Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Xiaohui Chen, Yi He, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov, Chen Zhao
Foundation models have emerged as a powerful paradigm in computational pathology (CPath), enabling scalable and generalizable analysis of histopathological images.
no code implementations • 5 Mar 2025 • Yi He, Lei Yang, Shilin Wang
This paper introduces a novel approach to Visual Forced Alignment (VFA), aiming to accurately synchronize utterances with corresponding lip movements, without relying on audio cues.
no code implementations • 16 Feb 2025 • Ruoyu Zhang, Lulu Wang, Yi He, Tongling Pan, Zhengtao Yu, Yingna Li
Recent advancements in large language models (LLMs) have significantly enhanced the fluency and logical coherence of image captioning.
no code implementations • 26 Jan 2025 • Manzong Huang, Chenyang Bu, Yi He, Xindong Wu
Knowledge Graph (KG)-augmented Large Language Models (LLMs) have recently propelled significant advances in complex reasoning tasks, thanks to their broad domain knowledge and contextual awareness.
1 code implementation • 23 Jan 2025 • Yiming Yang, Xiaoyuan Cheng, Daniel Giles, Sibo Cheng, Yi He, Xiao Xue, Boli Chen, Yukun Hu
In this paper, we propose \textit{Tensor-Var} to address these challenges using kernel Conditional Mean Embedding (CME).
no code implementations • 10 Jan 2025 • Yi He, Shengqi Dang, Long Ling, Ziqing Qian, Nanxuan Zhao, Nan Cao
In this work, we introduce the new task of continuous emotional image content generation (C-EICG) and present EmotiCrafter, an emotional image generation model that generates images based on text prompts and Valence-Arousal values.
no code implementations • 29 Dec 2024 • Lujia Lv, Di wu, Yangyi Xia, Jia Wu, Xiaojing Liu, Yi He
Object detection is a key technology for selecting ingredients and evaluating the quality of dishes in the pre-made dishes industry.
no code implementations • 30 Oct 2024 • Fulai Yang, Di wu, Yi He, Li Tao, Xin Luo
However, existing approaches loosely consider these relationships and mechanisms by a non-end-to-end learning framework, resulting in sub-optimal feature extractions and fusions for CD.
1 code implementation • 17 Aug 2024 • Leizhen Zhang, Lusi Li, Di wu, Sheng Chen, Yi He
The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associational feature correlation, such that bias may be leaked from those seemingly admissible, non-protected features.
1 code implementation • 1 Jul 2024 • Kehinde Ajayi, Leizhen Zhang, Yi He, Jian Wu
This paper proposes a method for uncertainty quantification (UQ) of table structure recognition (TSR).
no code implementations • 1 Mar 2024 • Huaqing Yuan, Yi He, Peng Du, Lu Song
Face images contain a wide variety of attribute information.
1 code implementation • 29 Aug 2023 • Sotirios Kastanas, Shaomu Tan, Yi He
In this study, we aim to fill these gaps by conducting a comparative evaluation of state-of-the-art models in document layout analysis and investigating the potential of cross-lingual layout analysis by utilizing machine translation techniques.
no code implementations • 9 Jun 2023 • Xianzhao Chen, Yist Y. Lin, Kang Wang, Yi He, Zejun Ma
In this paper, we improve the frame-level classifier for word timings in E2E system by introducing label priors in connectionist temporal classification (CTC) loss, which is adopted from prior works, and combining low-level Mel-scale filter banks with high-level ASR encoder output as input feature.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 1 Mar 2023 • Yi He, Haoran Xie, Kazunori Miyata
In this study, we propose Sketch2Cloth, a sketch-based 3D garment generation system using the unsigned distance fields from the user's sketch input.
no code implementations • 20 Dec 2022 • Cheng Liang, Teng Huang, Yi He, Song Deng, Di wu, Xin Luo
The idea of the proposed MMA is mainly two-fold: 1) apply different $L_p$-norm on loss function and regularization to form different variant models in different metric spaces, and 2) aggregate these variant models.
no code implementations • 28 Oct 2022 • Yist Y. Lin, Tao Han, HaiHua Xu, Van Tung Pham, Yerbolat Khassanov, Tze Yuang Chong, Yi He, Lu Lu, Zejun Ma
One of limitations in end-to-end automatic speech recognition (ASR) framework is its performance would be compromised if train-test utterance lengths are mismatched.
1 code implementation • 26 Oct 2022 • Hexin Liu, HaiHua Xu, Leibny Paola Garcia, Andy W. H. Khong, Yi He, Sanjeev Khudanpur
The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 2 Aug 2022 • Feilong Chen, Di wu, Jie Yang, Yi He
In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i. e., how to establish the uncertain relationship between sparse streaming features and labels.
no code implementations • 9 Jul 2022 • Yizhou Peng, Yufei Liu, Jicheng Zhang, HaiHua Xu, Yi He, Hao Huang, Eng Siong Chng
More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR.
no code implementations • 9 Jul 2022 • Jicheng Zhang, Yizhou Peng, HaiHua Xu, Yi He, Eng Siong Chng, Hao Huang
Intermediate layer output (ILO) regularization by means of multitask training on encoder side has been shown to be an effective approach to yielding improved results on a wide range of end-to-end ASR frameworks.
1 code implementation • 13 Jun 2022 • Yi He, Xi Yang, Chia-Ming Chang, Haoran Xie, Takeo Igarashi
Attention guidance is an approach to addressing dataset bias in deep learning, where the model relies on incorrect features to make decisions.
1 code implementation • 25 Apr 2022 • Heng Lian, John Scovil Atwood, BoJian Hou, Jian Wu, Yi He
This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away.
no code implementations • 16 Apr 2022 • Di wu, Yi He, Xin Luo
A High-dimensional and sparse (HiDS) matrix is frequently encountered in a big data-related application like an e-commerce system or a social network services system.
no code implementations • 16 Apr 2022 • Di wu, Peng Zhang, Yi He, Xin Luo
High-dimensional and sparse (HiDS) matrices are omnipresent in a variety of big data-related applications.
no code implementations • 26 Jan 2022 • Yufei Liu, Rao Ma, HaiHua Xu, Yi He, Zejun Ma, Weibin Zhang
In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework.
1 code implementation • 17 Sep 2021 • Jiancheng Yang, Yi He, Kaiming Kuang, Zudi Lin, Hanspeter Pfister, Bingbing Ni
The proposed A3D consistently outperforms symmetric context fusion operators by considerable margins, and establishes a new \emph{state of the art} on DeepLesion.
no code implementations • 23 Apr 2021 • Yi He, Haoran Xie, Chao Zhang, Xi Yang, Kazunori Miyata
This paper proposes a deep generative model for generating normal maps from users sketch with geometric sampling.
no code implementations • 3 Nov 2020 • Mingkun Huang, Jun Zhang, Meng Cai, Yang Zhang, Jiali Yao, Yongbin You, Yi He, Zejun Ma
In this work, we analyze the cause of the huge gradient variance in RNN-T training and proposed a new \textit{normalized jointer network} to overcome it.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 3 Nov 2020 • Mingkun Huang, Meng Cai, Jun Zhang, Yang Zhang, Yongbin You, Yi He, Zejun Ma
In this work we propose an inference technique, asynchronous revision, to unify streaming and non-streaming speech recognition models.
no code implementations • MIDL 2019 • Wanyue Li, Wen Kong, YiWei Chen, Jing Wang, Yi He, Guohua Shi, Guohua Deng
Fluorescein angiography can provide a map of retinal vascular structure and function, which is commonly used in ophthalmology diagnosis, however, this imaging modality may pose risks of harm to the patients.
1 code implementation • 5 May 2020 • Jiancheng Yang, Yi He, Xiaoyang Huang, Jingwei Xu, Xiaodan Ye, Guangyu Tao, Bingbing Ni
This paper addresses a fundamental challenge in 3D medical image processing: how to deal with imaging thickness.
no code implementations • MIDL 2019 • Jing Wang, YiWei Chen, Wanyue Li, Wen Kong, Yi He, Chunhui Jiang, Guohua Shi
A deep neural network (DNN) can assist in retinopathy screening by automatically classifying patients into normal and abnormal categories according to optical coherence tomography (OCT) images.
2 code implementations • 24 Nov 2019 • Jiancheng Yang, Xiaoyang Huang, Yi He, Jingwei Xu, Canqian Yang, Guozheng Xu, Bingbing Ni
Theoretically, ANY 2D CNN (ResNet, DenseNet, or DeepLab) is able to be converted into a 3D ACS CNN, with pretrained weight of a same parameter size.
no code implementations • ICCV 2019 • Yi He, Jiayuan Shi, Chuan Wang, Haibin Huang, Jiaming Liu, Guanbin Li, Risheng Liu, Jue Wang
In this paper we present a new data-driven method for robust skin detection from a single human portrait image.
no code implementations • 16 Aug 2018 • Risheng Liu, Shichao Cheng, Yi He, Xin Fan, Zhouchen Lin, Zhongxuan Luo
Moreover, there is a lack of rigorous analysis about the convergence behaviors of these reimplemented iterations, and thus the significance of such methods is a little bit vague.
no code implementations • 31 Jul 2018 • Risheng Liu, Yi He, Shichao Cheng, Xin Fan, Zhongxuan Luo
Blind image deblurring plays a very important role in many vision and multimedia applications.
no code implementations • 28 Apr 2018 • Risheng Liu, Shichao Cheng, Yi He, Xin Fan, Zhongxuan Luo
Operator splitting methods have been successfully used in computational sciences, statistics, learning and vision areas to reduce complex problems into a series of simpler subproblems.
no code implementations • 17 May 2017 • Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei
The system which combined frame retaining with frame stacking could reduces the time consumption of both training and decoding.
no code implementations • 21 Mar 2017 • Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei, Peihao Wu, Wenchang Situ, Shuai Li, Yang Zhang
It is a competitive framework that LSTM models of more than 7 layers are successfully trained on Shenma voice search data in Mandarin and they outperform the deep LSTM models trained by conventional approach.
no code implementations • 3 Mar 2017 • Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei
As training data rapid growth, large-scale parallel training with multi-GPUs cluster is widely applied in the neural network model learning currently. We present a new approach that applies exponential moving average method in large-scale parallel training of neural network model.