no code implementations • CCL 2020 • Kunli Zhang, Xu Zhao, Lei Zhuang, Qi Xie, Hongying Zan
In this paper, we treat the diagnosis assistant as a multi-label classification task and propose a Knowledge-Enabled Diagnosis Assistant (KEDA) model for the obstetric diagnosis assistant.
no code implementations • 19 Jun 2025 • Xu Zhao, Chen Zhao, Xiantao Hu, Hongliang Zhang, Ying Tai, Jian Yang
Recent advancements in multi-scale architectures have demonstrated exceptional performance in image denoising tasks.
1 code implementation • 10 Jun 2025 • Ailin Huang, Bingxin Li, Bruce Wang, Boyong Wu, Chao Yan, Chengli Feng, Heng Wang, HongYu Zhou, Hongyuan Wang, Jingbei Li, Jianjian Sun, Joanna Wang, Mingrui Chen, Peng Liu, Ruihang Miao, Shilei Jiang, Tian Fei, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Ge, Zheng Gong, Zhewei Huang, Zixin Zhang, Bin Wang, Bo Li, Buyun Ma, Changxin Miao, Changyi Wan, Chen Xu, Dapeng Shi, Dingyuan Hu, Enle Liu, Guanzhe Huang, Gulin Yan, Hanpeng Hu, Haonan Jia, Jiahao Gong, Jiaoren Wu, Jie Wu, Jie Yang, Junzhe Lin, Kaixiang Li, Lei Xia, Longlong Gu, Ming Li, Nie Hao, Ranchen Ming, Shaoliang Pang, SiQi Liu, Song Yuan, Tiancheng Cao, Wen Li, Wenqing He, Xu Zhao, Xuelin Zhang, Yanbo Yu, Yinmin Zhong, Yu Zhou, Yuanwei Liang, Yuanwei Lu, Yuxiang Yang, Zidong Yang, Zili Zhang, Binxing Jiao, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Daxin Jiang, Shuchang Zhou, Chen Hu
This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks.
no code implementations • 6 Jun 2025 • Yesheng Zhang, Wenjian Sun, YuHeng Chen, Qingwei Liu, Qi Lin, Rui Zhang, Xu Zhao
To tackle the issue, this paper proposes a metric, termed as Trajectory Entropy, to reveal the game status of agents within the level-k game framework.
no code implementations • 10 Apr 2025 • Xu Zhao, Pengju Zhang, Bo Liu, Yihong Wu
Monocular 3D occupancy prediction, aiming to predict the occupancy and semantics within interesting regions of 3D scenes from only 2D images, has garnered increasing attention recently for its vital role in 3D scene understanding.
no code implementations • 25 Feb 2025 • Chengkun Cai, Haoliang Liu, Xu Zhao, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Serge Belongie, Lei LI
In the rapidly evolving field of image generation, achieving precise control over generated content and maintaining semantic consistency remain significant limitations, particularly concerning grounding techniques and the necessity for model fine-tuning.
1 code implementation • 17 Feb 2025 • Ailin Huang, Boyong Wu, Bruce Wang, Chao Yan, Chen Hu, Chengli Feng, Fei Tian, Feiyu Shen, Jingbei Li, Mingrui Chen, Peng Liu, Ruihang Miao, Wang You, Xi Chen, Xuerui Yang, Yechang Huang, Yuxiang Zhang, Zheng Gong, Zixin Zhang, HongYu Zhou, Jianjian Sun, Brian Li, Chengting Feng, Changyi Wan, Hanpeng Hu, Jianchang Wu, Jiangjie Zhen, Ranchen Ming, Song Yuan, Xuelin Zhang, Yu Zhou, Bingxin Li, Buyun Ma, Hongyuan Wang, Kang An, Wei Ji, Wen Li, Xuan Wen, Xiangwen Kong, Yuankai Ma, Yuanwei Liang, Yun Mou, Bahtiyar Ahmidi, Bin Wang, Bo Li, Changxin Miao, Chen Xu, Chenrun Wang, Dapeng Shi, Deshan Sun, Dingyuan Hu, Dula Sai, Enle Liu, Guanzhe Huang, Gulin Yan, Heng Wang, Haonan Jia, Haoyang Zhang, Jiahao Gong, Junjing Guo, Jiashuai Liu, Jiahong Liu, Jie Feng, Jie Wu, Jiaoren Wu, Jie Yang, Jinguo Wang, Jingyang Zhang, Junzhe Lin, Kaixiang Li, Lei Xia, Li Zhou, Liang Zhao, Longlong Gu, Mei Chen, Menglin Wu, Ming Li, Mingxiao Li, Mingliang Li, Mingyao Liang, Na Wang, Nie Hao, Qiling Wu, Qinyuan Tan, Ran Sun, Shuai Shuai, Shaoliang Pang, Shiliang Yang, Shuli Gao, Shanshan Yuan, SiQi Liu, Shihong Deng, Shilei Jiang, Sitong Liu, Tiancheng Cao, Tianyu Wang, Wenjin Deng, Wuxun Xie, Weipeng Ming, Wenqing He, Wen Sun, Xin Han, Xin Huang, Xiaomin Deng, Xiaojia Liu, Xin Wu, Xu Zhao, Yanan Wei, Yanbo Yu, Yang Cao, Yangguang Li, Yangzhen Ma, Yanming Xu, Yaoyu Wang, Yaqiang Shi, Yilei Wang, Yizhuang Zhou, Yinmin Zhong, Yang Zhang, Yaoben Wei, Yu Luo, Yuanwei Lu, Yuhe Yin, Yuchu Luo, Yuanhao Ding, Yuting Yan, Yaqi Dai, Yuxiang Yang, Zhe Xie, Zheng Ge, Zheng Sun, Zhewei Huang, Zhichao Chang, Zhisheng Guan, Zidong Yang, Zili Zhang, Binxing Jiao, Daxin Jiang, Heung-Yeung Shum, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu
Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following.
3 code implementations • 14 Feb 2025 • Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang, Bizhu Huang, Bo wang, Brian Li, Changxing Miao, Chen Xu, Chenfei Wu, Chenguang Yu, Dapeng Shi, Dingyuan Hu, Enle Liu, Gang Yu, Ge Yang, Guanzhe Huang, Gulin Yan, Haiyang Feng, Hao Nie, Haonan Jia, Hanpeng Hu, Hanqi Chen, Haolong Yan, Heng Wang, Hongcheng Guo, Huilin Xiong, Huixin Xiong, Jiahao Gong, Jianchang Wu, Jiaoren Wu, Jie Wu, Jie Yang, Jiashuai Liu, Jiashuo Li, Jingyang Zhang, Junjing Guo, Junzhe Lin, Kaixiang Li, Lei Liu, Lei Xia, Liang Zhao, Liguo Tan, Liwen Huang, Liying Shi, Ming Li, Mingliang Li, Muhua Cheng, Na Wang, Qiaohui Chen, Qinglin He, Qiuyan Liang, Quan Sun, Ran Sun, Rui Wang, Shaoliang Pang, Shiliang Yang, Sitong Liu, SiQi Liu, Shuli Gao, Tiancheng Cao, Tianyu Wang, Weipeng Ming, Wenqing He, Xu Zhao, Xuelin Zhang, Xianfang Zeng, Xiaojia Liu, Xuan Yang, Yaqi Dai, Yanbo Yu, Yang Li, Yineng Deng, Yingming Wang, Yilei Wang, Yuanwei Lu, Yu Chen, Yu Luo, Yuchu Luo, Yuhe Yin, Yuheng Feng, Yuxiang Yang, Zecheng Tang, Zekai Zhang, Zidong Yang, Binxing Jiao, Jiansheng Chen, Jing Li, Shuchang Zhou, Xiangyu Zhang, Xinhao Zhang, Yibo Zhu, Heung-Yeung Shum, Daxin Jiang
We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length.
1 code implementation • 10 Feb 2025 • Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
Outliers have been widely observed in Large Language Models (LLMs), significantly impacting model performance and posing challenges for model compression.
1 code implementation • 20 Dec 2024 • Xiantao Hu, Ying Tai, Xu Zhao, Chen Zhao, Zhenyu Zhang, Jun Li, Bineng Zhong, Jian Yang
These temporal information tokens are used to guide the localization of the target in the next time state, establish long-range contextual relationships between video frames, and capture the temporal trajectory of the target.
Ranked #5 on
Rgb-T Tracking
on LasHeR
1 code implementation • 25 Nov 2024 • Xin He, Haiyun Guo, Kuan Zhu, Bingke Zhu, Xu Zhao, Jianwu Fang, Jinqiao Wang
Lane detection plays an important role in autonomous driving perception systems.
no code implementations • 30 Oct 2024 • Zhenyi Hou, Xu Zhao, Kejie Ye, Xinyu Sheng, Shanggerile Jiang, Jiajing Xia, YiTao Zhang, Chenxi Ban, Daijun Luo, Jiaxing Chen, Yan Zou, Yuchao Feng, Guangyu Fan, Xin Yuan
Vocal education in the music field is difficult to quantify due to the individual differences in singers' voices and the different quantitative criteria of singing techniques.
no code implementations • 21 Oct 2024 • Yong Deng, Baoxing Li, Xu Zhao
Simultaneously, to compensate for local ambiguity in images, a temporal transformer is utilized to extract temporal features from adjacent frames.
no code implementations • 3 Oct 2024 • Chengkun Cai, Xu Zhao, Haoliang Liu, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei LI
Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks.
1 code implementation • 1 Aug 2024 • Yesheng Zhang, Shuhan Shen, Xu Zhao
To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework.
no code implementations • 26 Jul 2024 • Kunlun Li, Daniel Ferro, Xu Zhao, Abdul Jabbar Syed, Anil K Vuppala, Azeemuddin Syed
The number of epochs occurring at similar positions to the reference speaker will be counted as Delta, with larger Delta values indicating greater speaker similarity.
no code implementations • 23 May 2024 • Chengkun Cai, Xu Zhao, Yucheng Du, Haoliang Liu, Lei LI
Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, especially in complex decision-making scenarios, but their static problem-solving strategies often limit their adaptability to dynamic environments.
no code implementations • 27 Apr 2024 • Yiming Bao, Xu Zhao, Dahong Qian
On Total Capture dataset, the pose estimation error is significantly decreased compared to the baseline method.
no code implementations • 30 Jan 2024 • Baoxing Li, Yong Deng, Yehui Yang, Xu Zhao
Recent approaches have combined parametric body models (such as SMPL), which capture body pose and shape priors, with neural implicit functions that flexibly learn clothing details.
no code implementations • CVPR 2024 • Yesheng Zhang, Xu Zhao
However, the pervasive presence of matching redundancy between images gives rise to unnecessary and error-prone computations in these methods, imposing limitations on their accuracy.
1 code implementation • 19 Dec 2023 • Yongqi An, Xu Zhao, Tao Yu, Ming Tang, Jinqiao Wang
Retraining-free is important for LLMs' pruning methods.
no code implementations • 24 Nov 2023 • Xiaoyue Wan, Zhuo Chen, Yiming Bao, Xu Zhao
This perception is injected by the Pose Transformer network and learned through a pre-training task that recovers iterative masked joints.
2 code implementations • 31 Oct 2023 • Kaixin Li, Qisheng Hu, Xu Zhao, Hui Chen, Yuxi Xie, Tiedong Liu, Qizhe Xie, Junxian He
In this work, we explore the use of Large Language Models (LLMs) to edit code based on user instructions.
no code implementations • 5 Aug 2023 • Yi Ren, Xu Zhao, Hongyan Tang, Shuai Li
In this paper, we propose a structural causal model-based method to address the popularity bias issue for sequential recommendation model learning.
1 code implementation • 21 Jun 2023 • Xu Zhao, Wenchao Ding, Yongqi An, Yinglong Du, Tao Yu, Min Li, Ming Tang, Jinqiao Wang
In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance.
Ranked #4 on
Zero-Shot Instance Segmentation
on LVIS v1.0 val
1 code implementation • Github 2023 • Qisheng Hu*, Kaixin Li*, Xu Zhao, Yuxi Xie, Tiedong Liu, Hui Chen, Qizhe Xie, Junxian He
In this work, we explore the use of large language models (LLMs) to edit code based on user instructions, covering a broad range of implicit tasks such as comment insertion, code optimization, and code refactoring.
no code implementations • NeurIPS 2023 • Yuxi Xie, Kenji Kawaguchi, Yiran Zhao, Xu Zhao, Min-Yen Kan, Junxian He, Qizhe Xie
Stochastic beam search balances exploitation and exploration of the search space with temperature-controlled randomness.
2 code implementations • 29 Apr 2023 • Yesheng Zhang, Xu Zhao
This paper, thus, pays attention to the search space and proposes to set the initial search space for point matching as the matched image areas containing prominent semantic, named semantic area matches.
1 code implementation • 27 Apr 2023 • Yueming Hao, Xu Zhao, Bin Bao, David Berard, Will Constable, Adnan Aziz, Xu Liu
TorchBench is able to comprehensively characterize the performance of the PyTorch software stack, guiding the performance optimization across models, PyTorch framework, and GPU libraries.
no code implementations • 10 Apr 2023 • Zhaowen Li, Xu Zhao, Peigeng Ding, Zongxin Gao, Yuting Yang, Ming Tang, Jinqiao Wang
In the high-frequency branch, a derivative-filter-like architecture is designed to extract the high-frequency information while a light extractor is employed in the low-frequency branch because the low-frequency information is usually redundant.
1 code implementation • CVPR 2023 • Yongqi An, Xu Zhao, Tao Yu, Haiyun Guo, Chaoyang Zhao, Ming Tang, Jinqiao Wang
However, previous unsupervised deep learning BGS algorithms perform poorly in sophisticated scenarios such as shadows or night lights, and they cannot detect objects outside the pre-defined categories.
no code implementations • 28 Feb 2023 • Shenzheng Zhang, Qi Tan, Xinzhi Zheng, Yi Ren, Xu Zhao
The gap between the randomly initialized item ID embedding and the well-trained warm item ID embedding makes the cold items hard to suit the recommendation system, which is trained on the data of historical warm items.
1 code implementation • 24 Feb 2023 • Yi Ren, Xiao Han, Xu Zhao, Shenzheng Zhang, Yan Zhang
Therefore, the ranking stage is still essential for most applications to provide high-quality candidate set for the re-ranking stage.
no code implementations • 22 Feb 2023 • Xiaoyue Wan, Zhuo Chen, Xu Zhao
The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction.
1 code implementation • 14 Feb 2023 • Wenke Xia, Xu Zhao, Xincheng Pang, Changqing Zhang, Di Hu
We surprisingly find that: the multimodal models with existing imbalance algorithms consistently perform worse than the unimodal one on specific subsets, in accordance with the modality bias.
no code implementations • ICCV 2023 • Zixuan Zhao, Dongqi Wang, Xu Zhao
First, the submergence of movement feature, i. e. the movement information in a snippet is covered by the scene information.
no code implementations • 11 Nov 2022 • Ke Liao, Wei Wang, Armagan Elibol, Lingzhong Meng, Xu Zhao, Nak Young Chong
In this paper, we systematically examine the performance of machine learning models for the clinical prediction task based on the EHR, especially physiological time series.
no code implementations • 31 Aug 2022 • Zhaowen Li, Xu Zhao, Chaoyang Zhao, Ming Tang, Jinqiao Wang
Previous unsupervised domain adaptation methods did not handle the cross-domain problem from the perspective of frequency for computer vision.
no code implementations • 25 Aug 2022 • Yiming Bao, Xu Zhao, Dahong Qian
On Total Capture dataset, KineFuse surpasses previous state-of-the-art which uses IMU only for testing by 8. 6\%.
Ranked #2 on
3D Human Pose Estimation
on Total Capture
1 code implementation • 27 May 2022 • Xu Zhao, Yi Ren, Ying Du, Shenzheng Zhang, Nian Wang
This paper attempts to tackle the item cold-start problem by generating enhanced warmed-up ID embeddings for cold items with historical data and limited interaction records.
1 code implementation • 14 May 2022 • Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem
We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder.
1 code implementation • 1 Feb 2022 • Yesheng Zhang, Xu Zhao, Dahong Qian
Therefore, in this paper, we propose a hybrid camera calibration framework which combines learning-based approaches with traditional methods to handle these bottlenecks.
1 code implementation • 18 Jan 2022 • Nanfei Jiang, Xu Zhao, Chaoyang Zhao, Yongqi An, Ming Tang, Jinqiao Wang
MaskSparsity imposes the fine-grained sparse regularization on the specific filters selected by a pruning mask, rather than all the filters of the model.
no code implementations • 11 Jan 2021 • Hansen Zhao, Xu Zhao, Huan Yao, Jiaxin Feng, Sichun Zhang, Xinrong Zhang
Metabolite structure identification has become the major bottleneck of the mass spectrometry based metabolomics research.
no code implementations • 1 Jan 2021 • ZiHao Wang, Xu Zhao, Tam Le, Hao Wu, Yong Zhang, Makoto Yamada
In this work, we consider OT over tree metrics, which is more general than the sliced Wasserstein and includes the sliced Wasserstein as a special case, and we propose a fast minimization algorithm in $O(n)$ for the optimal Wasserstein-1 transport plan between two distributions in the tree structure.
no code implementations • 29 Oct 2020 • Yesheng Zhang, Xu Zhao, Dahong Qian
In this paper, we present a novel end-to-end network architecture to estimate fundamental matrix directly from stereo images.
1 code implementation • EMNLP 2020 • Xu Zhao, ZiHao Wang, Hao Wu, Yong Zhang
In this paper, we propose a new semi-supervised BLI framework to encourage the interaction between the supervised signal and unsupervised alignment.
1 code implementation • 14 Oct 2020 • Xiaoqing Liang, Xu Zhao, Chaoyang Zhao, Nanfei Jiang, Ming Tang, Jinqiao Wang
This method decouples the distillation task of face detection into two subtasks, i. e., the classification distillation subtask and the regression distillation subtask.
no code implementations • ACL 2020 • Xu Zhao, ZiHao Wang, Hao Wu, Yong Zhang
Recently unsupervised Bilingual Lexicon Induction (BLI) without any parallel corpus has attracted much research interest.
no code implementations • 31 Oct 2019 • Xu Zhao
Auto Composing is an active and appealing research area in the past few years, and lots of efforts have been put into inventing more robust models to solve this problem.
no code implementations • 29 Jul 2019 • Haisheng Su, Xu Zhao, Shuming Liu
This technical report presents an overview of our solution used in the submission to ActivityNet Challenge 2019 Task 1 (\textbf{temporal action proposal generation}) and Task 2 (\textbf{temporal action localization/detection}).
no code implementations • 5 Mar 2019 • Xiao Song, Xu Zhao, Liangji Fang, Hanwen Hu
EdgeStereo also achieves comparable generalization performance for disparity estimation because of the incorporation of edge cues.
no code implementations • 4 Feb 2019 • Xu Zhao, Zongli Jiang
TDPM uses tangent distance instead of geodesic distance, and then applies MDS to the tangent distance matrix to map the manifold into a low dimensional space in which we can get its nonlinear structure.
no code implementations • 28 Oct 2018 • Haisheng Su, Xu Zhao, Tianwei Lin
Weakly supervised temporal action localization, which aims at temporally locating action instances in untrimmed videos using only video-level class labels during training, is an important yet challenging problem in video analysis.
no code implementations • 27 Aug 2018 • Xiao Song, Xu Zhao, Liangji Fang, Tianwei Lin
Secondly we utilize the SSD, which is a deep learning framework for detection, to excavate context cues and conduct end-to-end face presentation attack detection.
17 code implementations • ECCV 2018 • Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content.
Ranked #3 on
Temporal Action Proposal Generation
on THUMOS' 14
no code implementations • 14 Mar 2018 • Xiao Song, Xu Zhao, Hanwen Hu, Liangji Fang
Recent convolutional neural networks, especially end-to-end disparity estimation models, achieve remarkable performance on stereo matching task.
no code implementations • 13 Mar 2018 • Xiao Song, Xu Zhao, Tianwei Lin
The second one is a high-level micro-texture based feature called Spatial Pyramid Coding Micro-Texture (SPMT) feature.
2 code implementations • 17 Oct 2017 • Tianwei Lin, Xu Zhao, Zheng Shou
The main drawback of this framework is that the boundaries of action instance proposals have been fixed during the classification step.
3 code implementations • ICCV 2017 • Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, Hanqing Lu
To fully explore the local and global properties, in this paper, we propose a novel fully convolutional network, named as CoupleNet, to couple the global structure with local parts for object detection.
Ranked #5 on
Object Detection
on PASCAL VOC 2007
no code implementations • 24 Jul 2017 • Xu Zhao, Yingying Chen, Ming Tang, Jinqiao Wang
In the first stage, a convolutional encoder-decoder sub-network is employed to reconstruct the background images and encode rich prior knowledge of background scenes.
no code implementations • 21 Jul 2017 • Tianwei Lin, Xu Zhao, Zheng Shou
Our approach achieves the state-of-the-art performances on both temporal action proposal task and temporal action localization task.
Ranked #11 on
Temporal Action Proposal Generation
on ActivityNet-1.3