1 code implementation • 25 Sep 2024 • Runpeng Yu, Weihao Yu, Xinchao Wang
To fill this gap, in this work, we propose a new prompting technique named Attention Prompting on Image, which just simply overlays a text-query-guided attention heatmap on the original input image and effectively enhances LVLM on various tasks.
1 code implementation • 3 Sep 2024 • Songhua Liu, Weihao Yu, Zhenxiong Tan, Xinchao Wang
Modern diffusion models, particularly those utilizing a Transformer-based UNet for denoising, rely heavily on self-attention operations to manage complex spatial relationships, thus achieving impressive generation performance.
1 code implementation • 1 Aug 2024 • Weihao Yu, Zhengyuan Yang, Lingfeng Ren, Linjie Li, JianFeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang, Xinchao Wang
Using MM-Vet v2 to benchmark large multimodal models, we found that Claude 3. 5 Sonnet is the best model with a score of 71. 8, slightly outperforming GPT-4o which scored 71. 0.
1 code implementation • 23 Jul 2024 • Runpeng Yu, Weihao Yu, Xinchao Wang
This paper does not introduce a novel method.
no code implementations • 8 Jul 2024 • Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan
To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph for Omnimodal Learning (GTP-4o), which embeds the numerous disparate clinical modalities into a unified representation, completes the deficient embedding of missing modality and reformulates the cross-modal learning with a graph-based aggregation.
no code implementations • 1 Jul 2024 • Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan
3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities.
4 code implementations • 13 May 2024 • Weihao Yu, Xinchao Wang
For vision tasks, as image classification does not align with either characteristic, we hypothesize that Mamba is not necessary for this task; Detection and segmentation tasks are also not autoregressive, yet they adhere to the long-sequence characteristic, so we believe it is still worthwhile to explore Mamba's potential for these tasks.
1 code implementation • 4 Aug 2023 • Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.
no code implementations • 18 Apr 2023 • Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges.
11 code implementations • CVPR 2024 • Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.
no code implementations • 9 Feb 2023 • Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
Introducing prior auxiliary information from the knowledge graph (KG) to assist the user-item graph can improve the comprehensive performance of the recommender system.
7 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)
no code implementations • 17 Aug 2022 • Jin Huang, Bosong Huang, Weihao Yu, Jing Xiao, Ruzhong Xie, Ke Ruan
Origin-Destination (OD) matrices record directional flow data between pairs of OD regions.
no code implementations • 28 Jul 2022 • Hanxiao Zhang, Xiao Gu, Minghui Zhang, Weihao Yu, Liang Chen, Zhexin Wang, Feng Yao, Yun Gu, Guang-Zhong Yang
The LIDC-IDRI database is the most popular benchmark for lung cancer prediction.
4 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.
1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
Ranked #14 on Self-Supervised Image Classification on ImageNet
Contrastive Learning Self-Supervised Image Classification +3
no code implementations • 13 Feb 2022 • Yangqian Wu, Minghui Zhang, Weihao Yu, Hao Zheng, Jiasheng Xu, Yun Gu
Methods: In this paper, a long-term slice propagation (LTSP) method is proposed for accurate airway segmentation from pathological CT scans.
no code implementations • 29 Jan 2022 • Weihao Yu, Hao Zheng, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Jie Yang
Since the volume of the peripheral bronchi may be much smaller than the large branches in an input patch, the common segmentation loss is not sensitive to the breakages among the distal branches.
17 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on Semantic Segmentation on DensePASS
no code implementations • 7 Sep 2021 • Minghui Zhang, Xin Yu, Hanxiao Zhang, Hao Zheng, Weihao Yu, Hong Pan, Xiangran Cai, Yun Gu
Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans.
1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng
In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #183 on Image Classification on ImageNet
13 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan
To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.
Ranked #438 on Image Classification on ImageNet
9 code implementations • NeurIPS 2020 • Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
1 code implementation • ICLR 2020 • Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
Ranked #1 on Logical Reasoning Question Answering on ReClor
1 code implementation • NeurIPS 2019 • Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao
Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.
3 code implementations • CVPR 2019 • Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin
More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions.
Ranked #9 on Scene Graph Generation on Visual Genome
1 code implementation • 2 Jul 2018 • Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin
And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.
Ranked #3 on Visual Social Relationship Recognition on PIPA