no code implementations • 10 Jan 2025 • Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang
We further introduce a Long Narrative Video Director to enhance both visual and semantic coherence in generated videos and emphasize the role of aligning visual embeddings to achieve improved overall video quality.
no code implementations • 12 Dec 2024 • Taiming Lu, Tianmin Shu, Junfei Xiao, Luoxin Ye, Jiahao Wang, Cheng Peng, Chen Wei, Daniel Khashabi, Rama Chellappa, Alan Yuille, Jieneng Chen
In this work, we take a step toward this goal by introducing GenEx, a system capable of planning complex embodied world exploration, guided by its generative imagination that forms priors (expectations) about the surrounding environments.
no code implementations • 12 Jun 2024 • Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie
For discriminative models like CLIP, we observe enhanced zero-shot performance in cross-modal retrieval tasks.
Ranked #127 on
Visual Question Answering
on MM-Vet
1 code implementation • 8 Jun 2024 • Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou
This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.
no code implementations • 16 Feb 2024 • Junfei Xiao, Zheng Xu, Alan Yuille, Shen Yan, Boyu Wang
Our research undertakes a thorough exploration of the state-of-the-art perceiver resampler architecture and builds a strong baseline.
1 code implementation • 21 Dec 2023 • Junfei Xiao, Ziqi Zhou, Wenxuan Li, Shiyi Lan, Jieru Mei, Zhiding Yu, Alan Yuille, Yuyin Zhou, Cihang Xie
Instead of relying solely on category-specific annotations, ProLab uses descriptive properties grounded in common sense knowledge for supervising segmentation models.
4 code implementations • 4 Dec 2023 • Sucheng Ren, Zeyu Wang, Hongru Zhu, Junfei Xiao, Alan Yuille, Cihang Xie
This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning.
1 code implementation • CVPR 2023 • Qixin Hu, Yixiong Chen, Junfei Xiao, Shuwen Sun, Jieneng Chen, Alan Yuille, Zongwei Zhou
We demonstrate that AI models can accurately segment liver tumors without the need for manual annotation by using synthetic tumors in CT scans.
Ranked #1 on
Tumor Segmentation
on LiTS17
2 code implementations • ICCV 2023 • Jie Liu, Yixiao Zhang, Jie-Neng Chen, Junfei Xiao, Yongyi Lu, Bennett A. Landman, Yixuan Yuan, Alan Yuille, Yucheng Tang, Zongwei Zhou
The proposed model is developed from an assembly of 14 datasets, using a total of 3, 410 CT scans for training and then evaluated on 6, 162 external CT scans from 3 additional datasets.
Ranked #1 on
Organ Segmentation
on BTCV
no code implementations • 7 Dec 2022 • Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li
Through systematic analysis, we found that the commonly used pairwise affinity loss has two limitations: (1) it works with color affinity but leads to inferior performance with other modalities such as depth gradient, (2)the original affinity loss does not prevent trivial predictions as intended but actually accelerates this process due to the affinity loss term being symmetric.
1 code implementation • 26 Oct 2022 • Qixin Hu, Junfei Xiao, Yixiong Chen, Shuwen Sun, Jie-Neng Chen, Alan Yuille, Zongwei Zhou
We develop a novel strategy to generate synthetic tumors.
1 code implementation • 23 Oct 2022 • Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou
We hope that this study can direct future research on the application of Transformers to a larger variety of medical imaging tasks.
1 code implementation • 23 Oct 2022 • Junfei Xiao, Zhichao Xu, Shiyi Lan, Zhiding Yu, Alan Yuille, Anima Anandkumar
The model is trained on a composite dataset consisting of images from 9 datasets (ADE20K, Cityscapes, Mapillary Vistas, ScanNet, VIPER, WildDash 2, IDD, BDD, and COCO) with a simple dataset balancing strategy.
1 code implementation • CVPR 2023 • Yutong Bai, Zeyu Wang, Junfei Xiao, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie
For example, by distilling the knowledge from an MAE pre-trained ViT-L into a ViT-B, our method achieves 84. 0% ImageNet top-1 accuracy, outperforming the baseline of directly distilling a fine-tuned ViT-L by 1. 2%.
1 code implementation • CVPR 2022 • Junfei Xiao, Longlong Jing, Lin Zhang, Ju He, Qi She, Zongwei Zhou, Alan Yuille, Yingwei Li
Our method achieves the state-of-the-art performance on three video action recognition benchmarks (i. e., Kinetics-400, UCF-101, and HMDB-51) under several typical semi-supervised settings (i. e., different ratios of labeled data).
1 code implementation • 29 Mar 2021 • Junfei Xiao, Lequan Yu, Zongwei Zhou, Yutong Bai, Lei Xing, Alan Yuille, Yuyin Zhou
We propose a new normalization strategy, named categorical normalization (CateNorm), to normalize the activations according to categorical statistics.