1 code implementation • 4 Aug 2023 • Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.
no code implementations • 18 Apr 2023 • Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges.
4 code implementations • 29 Mar 2023 • Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.
no code implementations • 9 Feb 2023 • Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang
Introducing prior auxiliary information from the knowledge graph (KG) to assist the user-item graph can improve the comprehensive performance of the recommender system.
7 code implementations • 24 Oct 2022 • Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang
By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.
Ranked #2 on
Domain Generalization
on ImageNet-C
(using extra training data)
no code implementations • 17 Aug 2022 • Jin Huang, Bosong Huang, Weihao Yu, Jing Xiao, Ruzhong Xie, Ke Ruan
Origin-Destination (OD) matrices record directional flow data between pairs of OD regions.
no code implementations • 28 Jul 2022 • Hanxiao Zhang, Xiao Gu, Minghui Zhang, Weihao Yu, Liang Chen, Zhexin Wang, Feng Yao, Yun Gu, Guang-Zhong Yang
The LIDC-IDRI database is the most popular benchmark for lung cancer prediction.
3 code implementations • 25 May 2022 • Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.
1 code implementation • 27 Mar 2022 • Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan
It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.
Ranked #8 on
Self-Supervised Image Classification
on ImageNet
Contrastive Learning
Self-Supervised Image Classification
+3
no code implementations • 13 Feb 2022 • Yangqian Wu, Minghui Zhang, Weihao Yu, Hao Zheng, Jiasheng Xu, Yun Gu
Methods: In this paper, a long-term slice propagation (LTSP) method is proposed for accurate airway segmentation from pathological CT scans.
no code implementations • 29 Jan 2022 • Weihao Yu, Hao Zheng, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Jie Yang
Since the volume of the peripheral bronchi may be much smaller than the large branches in an input patch, the common segmentation loss is not sensitive to the breakages among the distal branches.
14 code implementations • CVPR 2022 • Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan
Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.
Ranked #9 on
Semantic Segmentation
on DensePASS
no code implementations • 7 Sep 2021 • Minghui Zhang, Xin Yu, Hanxiao Zhang, Hao Zheng, Weihao Yu, Hong Pan, Xiangran Cai, Yun Gu
Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans.
1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng
In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #169 on
Image Classification
on ImageNet
12 code implementations • ICCV 2021 • Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan
To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.
Ranked #386 on
Image Classification
on ImageNet
7 code implementations • NeurIPS 2020 • Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
1 code implementation • ICLR 2020 • Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng
Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.
Ranked #29 on
Reading Comprehension
on ReClor
1 code implementation • NeurIPS 2019 • Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao
Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.
3 code implementations • CVPR 2019 • Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin
More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions.
Ranked #7 on
Scene Graph Generation
on Visual Genome
1 code implementation • 2 Jul 2018 • Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin
And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.