Search Results for author: Weihao Yu

Found 21 papers, 14 papers with code

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

1 code implementation4 Aug 2023 Weihao Yu, Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang

Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking.

Math Zero-Shot Visual Question Answring

Two-stage Denoising Diffusion Model for Source Localization in Graph Inverse Problems

no code implementations18 Apr 2023 Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang

However, the inherent intricacy and uncertainty in information dissemination pose significant challenges, and the ill-posed nature of the source localization problem further exacerbates these challenges.


InceptionNeXt: When Inception Meets ConvNeXt

4 code implementations29 Mar 2023 Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7x7 depthwise convolution.

Image Classification Semantic Segmentation

Lorentz Equivariant Model for Knowledge-Enhanced Hyperbolic Collaborative Filtering

no code implementations9 Feb 2023 Bosong Huang, Weihao Yu, Ruzhong Xie, Jing Xiao, Jin Huang

Introducing prior auxiliary information from the knowledge graph (KG) to assist the user-item graph can improve the comprehensive performance of the recommender system.

Collaborative Filtering Recommendation Systems

MetaFormer Baselines for Vision

7 code implementations24 Oct 2022 Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

Inception Transformer

3 code implementations25 May 2022 Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

Mugs: A Multi-Granular Self-Supervised Learning Framework

1 code implementation27 Mar 2022 Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.

Contrastive Learning Self-Supervised Image Classification +3

LTSP: Long-Term Slice Propagation for Accurate Airway Segmentation

no code implementations13 Feb 2022 Yangqian Wu, Minghui Zhang, Weihao Yu, Hao Zheng, Jiasheng Xu, Yun Gu

Methods: In this paper, a long-term slice propagation (LTSP) method is proposed for accurate airway segmentation from pathological CT scans.

Computed Tomography (CT) Segmentation

BREAK: Bronchi Reconstruction by gEodesic transformation And sKeleton embedding

no code implementations29 Jan 2022 Weihao Yu, Hao Zheng, Minghui Zhang, Hanxiao Zhang, Jiayuan Sun, Jie Yang

Since the volume of the peripheral bronchi may be much smaller than the large branches in an input patch, the common segmentation loss is not sensitive to the breakages among the distal branches.


MetaFormer Is Actually What You Need for Vision

14 code implementations CVPR 2022 Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Object Detection +1

FDA: Feature Decomposition and Aggregation for Robust Airway Segmentation

no code implementations7 Sep 2021 Minghui Zhang, Xin Yu, Hanxiao Zhang, Hao Zheng, Weihao Yu, Hong Pan, Xiangran Cai, Yun Gu

Compared to other state-of-the-art transfer learning methods, our method accurately segmented more bronchi in the noisy CT scans.

Transfer Learning

LV-BERT: Exploiting Layer Variety for BERT

1 code implementation Findings (ACL) 2021 Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng

In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.

Refiner: Refining Self-attention for Vision Transformers

1 code implementation7 Jun 2021 Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng

Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.

Image Classification

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

12 code implementations ICCV 2021 Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zihang Jiang, Francis EH Tay, Jiashi Feng, Shuicheng Yan

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

Image Classification Language Modelling

ConvBERT: Improving BERT with Span-based Dynamic Convolution

7 code implementations NeurIPS 2020 Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.

Natural Language Understanding

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

1 code implementation ICLR 2020 Weihao Yu, Zi-Hang Jiang, Yanfei Dong, Jiashi Feng

Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set.

Logical Reasoning Logical Reasoning Question Answering +2

Heterogeneous Graph Learning for Visual Commonsense Reasoning

1 code implementation NeurIPS 2019 Weijiang Yu, Jingwen Zhou, Weihao Yu, Xiaodan Liang, Nong Xiao

Our HGL consists of a primal vision-to-answer heterogeneous graph (VAHG) module and a dual question-to-answer heterogeneous graph (QAHG) module to interactively refine reasoning paths for semantic agreement.

Graph Learning Visual Commonsense Reasoning

Knowledge-Embedded Routing Network for Scene Graph Generation

3 code implementations CVPR 2019 Tianshui Chen, Weihao Yu, Riquan Chen, Liang Lin

More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions.

Graph Generation Scene Graph Generation

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

1 code implementation2 Jul 2018 Zhouxia Wang, Tianshui Chen, Jimmy Ren, Weihao Yu, Hui Cheng, Liang Lin

And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects.

Cannot find the paper you are looking for? You can Submit a new open access paper.