Search Results for author: Xianhang Li

Found 19 papers, 15 papers with code

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

no code implementations25 Nov 2024 Yanqing Liu, Xianhang Li, Zeyu Wang, Bingchen Zhao, Cihang Xie

Previous works show that noisy, web-crawled image-text pairs may limit vision-language pretraining like CLIP and propose learning with synthetic captions as a promising alternative.

Cross-Modal Retrieval

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

1 code implementation6 Aug 2024 Yunfei Xie, Ce Zhou, Lang Gao, Juncheng Wu, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou

We then build a comprehensive knowledge base and prompt multimodal large language models to perform retrieval-augmented generation with the identified ROIs as guidance, resulting in multigranular texual descriptions.

 Ranked #1 on Medical Visual Question Answering on SLAKE-English (using extra training data)

Medical Visual Question Answering Visual Question Answering (VQA)

Autoregressive Pretraining with Mamba in Vision

1 code implementation11 Jun 2024 Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks.

Mamba

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

1 code implementation8 Jun 2024 Sucheng Ren, Xiaoke Huang, Xianhang Li, Junfei Xiao, Jieru Mei, Zeyu Wang, Alan Yuille, Yuyin Zhou

This study presents Medical Vision Generalist (MVG), the first foundation model capable of handling various medical imaging tasks -- such as cross-modal synthesis, image segmentation, denoising, and inpainting -- within a unified image-to-image generation framework.

Conditional Image Generation Denoising +2

Scaling White-Box Transformers for Vision

no code implementations30 May 2024 Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie

CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability.

Semantic Segmentation Unsupervised Object Segmentation

3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

1 code implementation23 Mar 2024 Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou

We identify that the Decoder-only 3D-TransUNet model should offer enhanced efficacy in the segmentation of brain metastases, as indicated by our 5-fold cross-validation on the training set.

Brain Tumor Segmentation Decoder +2

Revisiting Adversarial Training at Scale

1 code implementation CVPR 2024 Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie

For example, by training on DataComp-1B dataset, our AdvXL empowers a vanilla ViT-g model to substantially surpass the previous records of $l_{\infty}$-, $l_{2}$-, and $l_{1}$-robust accuracy by margins of 11. 4%, 14. 2% and 12. 9%, respectively.

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

3 code implementations11 Oct 2023 Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qihang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, Matthew Lungren, Lei Xing, Le Lu, Alan Yuille, Yuyin Zhou

In this paper, we extend the 2D TransUNet architecture to a 3D network by building upon the state-of-the-art nnU-Net architecture, and fully exploring Transformers' potential in both the encoder and decoder design.

Decoder Image Segmentation +4

Consistency-guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image Segmentation

1 code implementation21 Jul 2023 Qingyue Wei, Lequan Yu, Xianhang Li, Wei Shao, Cihang Xie, Lei Xing, Yuyin Zhou

Specifically, our approach first involves training a segmentation model on a small set of clean labeled images to generate initial labels for unlabeled data.

Image Segmentation Meta-Learning +4

CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy

2 code implementations27 Jun 2023 Xianhang Li, Zeyu Wang, Cihang Xie

The recent work CLIPA presents an inverse scaling law for CLIP training -- whereby the larger the image/text encoders used, the shorter the sequence length of image/text tokens that can be applied in training.

An Inverse Scaling Law for CLIP Training

1 code implementation NeurIPS 2023 Xianhang Li, Zeyu Wang, Cihang Xie

However, its associated training cost is prohibitively high, imposing a significant barrier to its widespread exploration.

Unleashing the Power of Visual Prompting At the Pixel Level

1 code implementation20 Dec 2022 Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie

This paper presents a simple and effective visual prompting method for adapting pre-trained models to downstream recognition tasks.

Diversity Visual Prompting

In Defense of Image Pre-Training for Spatiotemporal Recognition

1 code implementation3 May 2022 Xianhang Li, Huiyu Wang, Chen Wei, Jieru Mei, Alan Yuille, Yuyin Zhou, Cihang Xie

Inspired by this observation, we hypothesize that the key to effectively leveraging image pre-training lies in the decomposition of learning spatial and temporal features, and revisiting image pre-training as the appearance prior to initializing 3D kernels.

STS Video Recognition

Fast AdvProp

1 code implementation ICLR 2022 Jieru Mei, Yucheng Han, Yutong Bai, Yixiao Zhang, Yingwei Li, Xianhang Li, Alan Yuille, Cihang Xie

Specifically, our modifications in Fast AdvProp are guided by the hypothesis that disentangled learning with adversarial examples is the key for performance improvements, while other training recipes (e. g., paired clean and adversarial training samples, multi-step adversarial attackers) could be largely simplified.

Data Augmentation object-detection +1

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

1 code implementation CVPR 2024 Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing

Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation.

Ranked #8 on Image Classification on Clothing1M (using clean data) (using extra training data)

Image Segmentation Learning with noisy labels +3

CT-Net: Channel Tensorization Network for Video Classification

1 code implementation ICLR 2021 Kunchang Li, Xianhang Li, Yali Wang, Jun Wang, Yu Qiao

It can learn to exploit spatial, temporal and channel attention in a high-dimensional manner, to improve the cooperative power of all the feature dimensions in our CT-Module.

Action Classification Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.