no code implementations • 29 Nov 2024 • Hang Ye, Xiaoxuan Ma, Hai Ci, Wentao Zhu, Yizhou Wang
For deformed regions close to the body, we leverage LBS to handle the deformation.
no code implementations • 30 Oct 2024 • Jie Sun, Qian Xia, Chuanfu Sun, Yumei Chen, Huafeng Liu, Wentao Zhu, Qiegen Liu
In validation experiments with simulated data, our network demonstrated good predictive performance for kinetic parameters and was able to reconstruct high-quality dynamic PET images.
no code implementations • 18 Jul 2024 • Yuan Jin, Gege Ma, Geng Chen, Tianling Lyu, Jan Egger, Junhui Lyu, Shaoting Zhang, Wentao Zhu
To this end, we propose a novel deep learning network designed to accurately classify lung cancer subtype with multi-dimensional and multi-modality images, i. e., CT and pathological images.
no code implementations • 16 Jul 2024 • Zhenhua Huang, Kunhao Li, Shaojie Wang, Zhaohong Jia, Wentao Zhu, Sharad Mehrotra
Despite the Graph Neural Networks' (GNNs) proficiency in analyzing graph data, achieving high-accuracy and interpretable predictions remains challenging.
no code implementations • 16 Jul 2024 • Zhenhua Huang, Kunhao Li, Shaojie Wang, Zhaohong Jia, Wentao Zhu, Sharad Mehrotra
Graph neural networks (GNNs) are widely applied in graph data modeling.
no code implementations • 2 Jul 2024 • Haoru Wang, Wentao Zhu, Luyi Miao, Yishu Xu, Feng Gao, Qi Tian, Yizhou Wang
Human motion generation is a critical task with a wide range of applications.
no code implementations • 7 May 2024 • Yadang Chen, Wentao Zhu, Zhi-Xin Yang, Enhua Wu
Recently, video object segmentation (VOS) networks typically use memory-based methods: for each query frame, the mask is predicted by space-time matching to memory frames.
1 code implementation • 3 Mar 2024 • Zishi Li, Xiaoxuan Ma, Qiuyan Shang, Wentao Zhu, Hai Ci, Yu Qiao, Yizhou Wang
Temporal repetition counting aims to quantify the repeated action cycles within a video.
no code implementations • 28 Feb 2024 • Xiaosong Wang, Xiaofan Zhang, Guotai Wang, Junjun He, Zhongyu Li, Wentao Zhu, Yi Guo, Qi Dou, Xiaoxiao Li, Dequan Wang, Liang Hong, Qicheng Lao, Tong Ruan, Yukun Zhou, Yixue Li, Jie Zhao, Kang Li, Xin Sun, Lifeng Zhu, Shaoting Zhang
The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and Gemini, has reshaped the landscape of research (academia and industry) in machine learning and many other research areas.
no code implementations • 28 Feb 2024 • Wentao Zhu, Zhining Zhang, Yizhou Wang
Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning.
2 code implementations • 8 Feb 2024 • Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang
Estimating robot pose from RGB images is a crucial problem in computer vision and robotics.
Ranked #1 on
Robot Pose Estimation
on DREAM-dataset
no code implementations • 8 Jan 2024 • Wentao Zhu
To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.
no code implementations • 8 Jan 2024 • Wentao Zhu
In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues.
no code implementations • 3 Jan 2024 • Wentao Zhu
Previous approaches that employ gradual token reduction to address this challenge assume that token redundancy in one layer implies redundancy in all the following layers.
no code implementations • CVPR 2024 • Yuan Xu, Xiaoxuan Ma, Jiajun Su, Wentao Zhu, Yu Qiao, Yizhou Wang
Experimental results demonstrate that HypoNet outperforms existing state-of-the-art probabilistic methods as a multi-hypothesis mesh estimator.
no code implementations • 24 Dec 2023 • Wentao Zhu
Hence, we introduce a learnable input adaptor to alleviate this issue, and DATAR achieves state-of-the-art performance.
1 code implementation • NeurIPS 2023 • Xiaoxuan Ma, Stephan P. Kaufhold, Jiajun Su, Wentao Zhu, Jack Terwilliger, Andres Meza, Yixin Zhu, Federico Rossano, Yizhou Wang
ChimpACT is both comprehensive and challenging, consisting of 163 videos with a cumulative 160, 500 frames, each richly annotated with detection, identification, pose estimation, and fine-grained spatiotemporal behavior labels.
no code implementations • 16 Oct 2023 • Yangfan Ni, Duo Zhang, Gege Ma, Lijun Lu, Zhongke Huang, Wentao Zhu
Accurate reorientation and segmentation of the left ventricular (LV) is essential for the quantitative analysis of myocardial perfusion imaging (MPI), in which one critical step is to reorient the reconstructed transaxial nuclear cardiac images into standard short-axis slices for subsequent image processing.
1 code implementation • 19 Sep 2023 • Jianghao Wu, Guotai Wang, Ran Gu, Tao Lu, Yinan Chen, Wentao Zhu, Tom Vercauteren, Sébastien Ourselin, Shaoting Zhang
The different predictions in these duplicated heads are used to obtain pseudo labels for unlabeled target-domain images and their uncertainty to identify reliable pseudo labels.
no code implementations • 15 Sep 2023 • Yuanfeng Wu, Shaojie Li, Zhiqiang Du, Wentao Zhu
Hence, we proposed BROW, a foundation model for extracting better feature representations for WSIs, which can be conveniently adapted to downstream tasks without or with slight fine-tuning.
no code implementations • 9 Aug 2023 • Wentao Zhu, Yuan Jin, Gege Ma, Geng Chen, Jan Egger, Shaoting Zhang, Dimitris N. Metaxas
The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements.
no code implementations • 20 Jul 2023 • Wentao Zhu, Xiaoxuan Ma, Dongwoo Ro, Hai Ci, Jinlu Zhang, Jiaxin Shi, Feng Gao, Qi Tian, Yizhou Wang
In this survey, we present a comprehensive literature review of human motion generation, which, to the best of our knowledge, is the first of its kind in this field.
1 code implementation • Submitted to ICLR 2022 • Wentao Zhu, Yufang Huang, Xiufeng Xie, Wenxian Liu, Jincan Deng, Debing Zhang, Zhangyang Wang, Ji Liu
For video content creation and understanding, the shot boundary detection (SBD) is one of the most essential components in various scenarios.
Ranked #1 on
Camera shot boundary detection
on ClipShots
no code implementations • CVPR 2023 • Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, Raffay Hamid
To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.
Ranked #3 on
Video Classification
on Breakfast
2 code implementations • CVPR 2023 • Xiaoxuan Ma, Jiajun Su, Chunyu Wang, Wentao Zhu, Yizhou Wang
The advanced motion capture systems solve the problem by placing dense physical markers on the body surface, which allows to extract realistic meshes from their non-rigid motions.
Ranked #1 on
3D Human Pose Estimation
on Surreal
no code implementations • 19 Mar 2023 • Wentao Zhu, Mohamed Omar
Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes.
Ranked #13 on
Audio Classification
on VGGSound
no code implementations • CVPR 2023 • Burak Uzkent, Amanmeet Garg, Wentao Zhu, Keval Doshi, Jingru Yi, Xiaolong Wang, Mohamed Omar
For example, recent image and language models with more than 200M parameters have been proposed to learn visual grounding in the pre-training step and show impressive results on downstream vision and language tasks.
1 code implementation • CVPR 2023 • Hai Ci, Mingdong Wu, Wentao Zhu, Xiaoxuan Ma, Hao Dong, Fangwei Zhong, Yizhou Wang
During the denoising process, GFPose implicitly incorporates pose priors in gradients and unifies various discriminative and generative tasks in an elegant framework.
no code implementations • 21 Nov 2022 • Shiqiang Zhu, Ting Yu, Tao Xu, Hongyang Chen, Schahram Dustdar, Sylvain Gigan, Deniz Gunduz, Ekram Hossain, Yaochu Jin, Feng Lin, Bo Liu, Zhiguo Wan, Ji Zhang, Zhifeng Zhao, Wentao Zhu, Zuoning Chen, Tariq Durrani, Huaimin Wang, Jiangxing Wu, Tongyi Zhang, Yunhe Pan
In recent years, we have witnessed the emergence of intelligent computing, a new computing paradigm that is reshaping traditional computing and promoting digital revolution in the era of big data, artificial intelligence and internet-of-things with new computing theories, architectures, methods, systems, and applications.
2 code implementations • 4 Nov 2022 • M. Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Can Zhao, Dong Yang, Vishwesh Nath, Yufan He, Ziyue Xu, Ali Hatamizadeh, Andriy Myronenko, Wentao Zhu, Yun Liu, Mingxin Zheng, Yucheng Tang, Isaac Yang, Michael Zephyr, Behrooz Hashemian, Sachidanand Alle, Mohammad Zalbagi Darestani, Charlie Budd, Marc Modat, Tom Vercauteren, Guotai Wang, Yiwen Li, Yipeng Hu, Yunguan Fu, Benjamin Gorman, Hans Johnson, Brad Genereaux, Barbaros S. Erdal, Vikash Gupta, Andres Diaz-Pinto, Andre Dourson, Lena Maier-Hein, Paul F. Jaeger, Michael Baumgartner, Jayashree Kalpathy-Cramer, Mona Flores, Justin Kirby, Lee A. D. Cooper, Holger R. Roth, Daguang Xu, David Bericat, Ralf Floca, S. Kevin Zhou, Haris Shuaib, Keyvan Farahani, Klaus H. Maier-Hein, Stephen Aylward, Prerna Dogra, Sebastien Ourselin, Andrew Feng
For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e. g. geometry, physiology, physics) of medical data being processed.
1 code implementation • ICCV 2023 • Wentao Zhu, Xiaoxuan Ma, Zhaoyang Liu, Libin Liu, Wayne Wu, Yizhou Wang
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
Ranked #1 on
Monocular 3D Human Pose Estimation
on Human3.6M
(using extra training data)
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Kevin Hsu, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
AVT uses a combination of video and audio signals to improve action recognition accuracy, leveraging the effective spatio-temporal representation by the video Transformer.
Ranked #4 on
Multi-modal Classification
on VGG-Sound
no code implementations • Submitted to ICLR 2022 • Wentao Zhu, Jingru Yi, Xiaohang Sun, Xiang Hao, Linda Liu, Mohamed Omar
In this work, we develop a multiscale multimodal Transformer (MMT) that employs hierarchical representation learning.
Ranked #1 on
Multi-modal Classification
on VGG-Sound
1 code implementation • 27 Aug 2022 • Runqi Wang, Yuxiang Bao, Baochang Zhang, Jianzhuang Liu, Wentao Zhu, Guodong Guo
Second, according to the similarity between incremental knowledge and base knowledge, we design an adaptive fusion of incremental knowledge, which helps the model allocate capacity to the knowledge of different difficulties.
1 code implementation • 25 Jul 2022 • Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy
Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields.
Ranked #1 on
Unconditional Video Generation
on CelebV-HQ
1 code implementation • 22 Jul 2022 • Hang Ye, Wentao Zhu, Chunyu Wang, Rujie Wu, Yizhou Wang
While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes.
Ranked #5 on
3D Multi-Person Pose Estimation
on Campus
no code implementations • 17 Mar 2022 • Runqi Wang, Linlin Yang, Baochang Zhang, Wentao Zhu, David Doermann, Guodong Guo
Research on the generalization ability of deep neural networks (DNNs) has recently attracted a great deal of attention.
no code implementations • 26 Feb 2022 • Wentao Zhu, Hang Shang, Tingxun Lv, Chao Liao, Sen yang, Ji Liu
Recently, learning from vast unlabeled data, especially self-supervised learning, has been emerging and attracted widespread attention.
no code implementations • 28 Dec 2021 • Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo
We show that our method improves the recognition accuracy of adversarial training on ImageNet by 8. 32% compared with the baseline.
no code implementations • 19 Dec 2021 • Wentao Zhu, Zhuoqian Yang, Ziang Di, Wayne Wu, Yizhou Wang, Chen Change Loy
Trained with the canonicalization operations and the derived regularizations, our method learns to factorize a skeleton sequence into three independent semantic subspaces, i. e., motion, structure, and view angle.
1 code implementation • 15 Dec 2021 • Shuwei Shao, Zhongcai Pei, Weihai Chen, Wentao Zhu, Xingming Wu, Dianmin Sun, Baochang Zhang
Recently, self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos, achieving remarkable performance in autonomous driving scenarios.
no code implementations • 16 Nov 2021 • Shuwei Shao, Ran Li, Zhongcai Pei, Zhong Liu, Weihai Chen, Wentao Zhu, Xingming Wu, Baochang Zhang
In this work, we investigate into the phenomenon and propose to integrate the strengths of multiple weak depth predictor to build a comprehensive and accurate depth predictor, which is critical for many real-world applications, e. g., 3D reconstruction.
1 code implementation • 15 Oct 2021 • Tianli Zhao, Xi Sheryl Zhang, Wentao Zhu, Jiaxing Wang, Sen yang, Ji Liu, Jian Cheng
In this paper, we present a unified framework with Joint Channel pruning and Weight pruning (JCW), and achieves a better Pareto-frontier between the latency and accuracy than previous model compression approaches.
1 code implementation • 18 Sep 2021 • Wentao Zhu, Tianlong Kong, Shun Lu, Jixiang Li, Dawei Zhang, Feng Deng, Xiaorui Wang, Sen yang, Ji Liu
Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances.
Ranked #13 on
Speaker Verification
on VoxCeleb1
no code implementations • NeurIPS 2021 • Xuefan Zha, Wentao Zhu, Tingxun Lv, Sen yang, Ji Liu
However, the pure-Transformer based spatio-temporal learning can be prohibitively costly on memory and computation to extract fine-grained features from a tiny patch.
no code implementations • 16 Jul 2021 • Holger R. Roth, Dong Yang, Wenqi Li, Andriy Myronenko, Wentao Zhu, Ziyue Xu, Xiaosong Wang, Daguang Xu
Building robust deep learning-based models requires diverse training data, ideally from several sources.
no code implementations • 25 Mar 2021 • Wentao Zhu, Yufang Huang, Daguang Xu, Zhen Qian, Wei Fan, Xiaohui Xie
Registration is a fundamental task in medical robotics and is often a crucial step for many downstream tasks such as motion analysis, intra-operative tracking and image segmentation.
no code implementations • 7 Dec 2020 • Xuan Gong, Xin Xia, Wentao Zhu, Baochang Zhang, David Doermann, Lian Zhuo
In recent years, deep learning has dominated progress in the field of medical image analysis.
no code implementations • 23 Nov 2020 • Dong Yang, Ziyue Xu, Wenqi Li, Andriy Myronenko, Holger R. Roth, Stephanie Harmon, Sheng Xu, Baris Turkbey, Evrim Turkbey, Xiaosong Wang, Wentao Zhu, Gianpaolo Carrafiello, Francesca Patella, Maurizio Cariati, Hirofumi Obinata, Hitoshi Mori, Kaku Tamura, Peng An, Bradford J. Wood, Daguang Xu
To facilitate CT analysis, recent efforts have focused on computer-aided characterization and diagnosis, which has shown promising results.
no code implementations • COLING 2020 • Yufang Huang, Wentao Zhu, Deyi Xiong, Yiye Zhang, Changjian Hu, Feiyu Xu
Unsupervised text style transfer is full of challenges due to the lack of parallel data and difficulties in content preservation.
no code implementations • 10 Jul 2020 • Liyue Shen, Wentao Zhu, Xiaosong Wang, Lei Xing, John M. Pauly, Baris Turkbey, Stephanie Anne Harmon, Thomas Hogue Sanford, Sherif Mehralivand, Peter Choyke, Bradford Wood, Daguang Xu
Multi-domain data are widely leveraged in vision applications taking advantage of complementary information from different modalities, e. g., brain tumor segmentation from multi-parametric magnetic resonance imaging (MRI).
1 code implementation • 22 Jun 2020 • Wentao Zhu, Can Zhao, Wenqi Li, Holger Roth, Ziyue Xu, Daguang Xu
In this work, we introduce Large deep 3D ConvNets with Automated Model Parallelism (LAMP) and investigate the impact of both input's and deep 3D ConvNets' size on segmentation accuracy.
no code implementations • 22 Jun 2020 • Xiahai Zhuang, Jiahang Xu, Xinzhe Luo, Chen Chen, Cheng Ouyang, Daniel Rueckert, Victor M. Campello, Karim Lekadir, Sulaiman Vesal, Nishant Ravikumar, Yashu Liu, Gongning Luo, Jingkun Chen, Hongwei Li, Buntheng Ly, Maxime Sermesant, Holger Roth, Wentao Zhu, Jiexiang Wang, Xinghao Ding, Xinyue Wang, Sen yang, Lei LI
In addition, the paired MS-CMR images could enable algorithms to combine the complementary information from the other sequences for the segmentation of LGE CMR.
no code implementations • CVPR 2020 • Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy
We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person.
1 code implementation • 4 Oct 2019 • Wentao Zhu, Andriy Myronenko, Ziyue Xu, Wenqi Li, Holger Roth, Yufang Huang, Fausto Milletari, Daguang Xu
Furthermore, we design three segmentation frameworks based on the proposed registration framework: 1) atlas-based segmentation, 2) joint learning of both segmentation and registration tasks, and 3) multi-task learning with atlas-based segmentation as an intermediate feature.
no code implementations • 2 Oct 2019 • Holger Roth, Wentao Zhu, Dong Yang, Ziyue Xu, Daguang Xu
In the first step, we register a small set of five LGE cardiac magnetic resonance (CMR) images with ground truth labels to a set of 40 target LGE CMR images without annotation.
no code implementations • 2 Oct 2019 • Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M. Jorge Cardoso, Andrew Feng
Due to medical data privacy regulations, it is often infeasible to collect and share patient data in a centralised data lake.
no code implementations • 18 Jun 2019 • Wentao Zhu, Yufang Huang, Mani A. Vannan, Shizhen Liu, Daguang Xu, Wei Fan, Zhen Qian, Xiaohui Xie
In this work, we propose a neural multi-scale self-supervised registration (NMSR) method for automated myocardial and cardiac blood flow dense tracking.
no code implementations • 12 Mar 2019 • Wentao Zhu
Second, we will demonstrate how to use the weakly labeled data for the mammogram breast cancer diagnosis by efficiently design deep learning for multi-instance learning.
2 code implementations • 15 Aug 2018 • Wentao Zhu, Yufang Huang, Liang Zeng, Xuming Chen, Yong liu, Zhen Qian, Nan Du, Wei Fan, Xiaohui Xie
Methods: Our deep learning model, called AnatomyNet, segments OARs from head and neck CT images in an end-to-end fashion, receiving whole-volume HaN CT images as input and generating masks of all OARs of interest in one shot.
2 code implementations • 14 May 2018 • Wentao Zhu, Yeeleng S. Vang, Yufang Huang, Xiaohui Xie
Recently deep learning has been witnessing widespread adoption in various medical image applications.
2 code implementations • 25 Jan 2018 • Wentao Zhu, Chaochun Liu, Wei Fan, Xiaohui Xie
DeepLung consists of two components, nodule detection (identifying the locations of candidate nodules) and classification (classifying candidate nodules into benign or malignant).
Ranked #5 on
Lung Nodule Classification
on LIDC-IDRI
1 code implementation • 24 Oct 2017 • Wentao Zhu, Xiang Xiang, Trac. D. Tran, Gregory D. Hager, Xiaohui Xie
Mass segmentation provides effective morphological features which are important for mass diagnosis.
no code implementations • 16 Sep 2017 • Wentao Zhu, Chaochun Liu, Wei Fan, Xiaohui Xie
Considering the 3D nature of lung CT data, two 3D networks are designed for the nodule detection and classification respectively.
Automated Pulmonary Nodule Detection And Classification
Classification
+2
1 code implementation • 23 May 2017 • Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning (MIL) for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned ROIs.
no code implementations • 12 Mar 2017 • Qing Han, Wentao Zhu, Yang Shi
Today, detection of anomalous events in civil infrastructures (e. g. water pipe breaks and leaks) is time consuming and often takes hours or days.
no code implementations • 18 Dec 2016 • Wentao Zhu, Qi Lou, Yeeleng Scott Vang, Xiaohui Xie
Inspired by the success of using deep convolutional features for natural image analysis and multi-instance learning for labeling a set of instances/patches, we propose end-to-end trained deep multi-instance networks for mass classification based on whole mammogram without the aforementioned costly need to annotate the training data.
1 code implementation • 18 Dec 2016 • Wentao Zhu, Xiang Xiang, Trac. D. Tran, Xiaohui Xie
Experimental results on two public datasets, INbreast and DDSM-BCRP, show that our end-to-end network combined with adversarial training achieves the-state-of-the-art results.
no code implementations • 24 Mar 2016 • Wentao Zhu, Cuiling Lan, Junliang Xing, Wen-Jun Zeng, Yanghao Li, Li Shen, Xiaohui Xie
Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions.
no code implementations • 27 Sep 2015 • Wentao Zhu, Jun Miao, Laiyun Qing, Xilin Chen
Compared to traditional deep learning methods, the implemented feature learning method has much less parameters and is validated in several typical experiments, such as digit recognition on MNIST and MNIST variations, object recognition on Caltech 101 dataset and face verification on LFW dataset.
1 code implementation • 25 Jan 2015 • Wentao Zhu, Jun Miao, Laiyun Qing
Extreme learning machine (ELM) is an extremely fast learning method and has a powerful performance for pattern recognition tasks proven by enormous researches and engineers.