Search Results for author: Mengdan Zhang

Found 14 papers, 9 papers with code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,890

Paper
Code

Aligning and Prompting Everything All at Once for Universal Visual Perception

2 code implementations • 4 Dec 2023 • Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji

However, predominant paradigms, driven by casting instance-level tasks as an object-word alignment, bring heavy cross-modality interaction, which is not effective in prompting object detection and visual grounding.

Object object-detection +6

415

Paper
Code

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

no code implementations • 30 Aug 2023 • Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu

In this paper, we for the first time explore helpful multi-modal contextual knowledge to understand novel categories for open-vocabulary object detection (OVD).

Knowledge Distillation Language Modelling +4

Paper
Add Code

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

3 code implementations • 23 Jun 2023 • Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji

Multimodal Large Language Model (MLLM) relies on the powerful LLM to perform multimodal tasks, showing amazing emergent abilities in recent studies, such as writing poems based on an image.

Benchmarking Language Modelling +3

8,890

Paper
Code

Multi-modal Queried Object Detection in the Wild

1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu

To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.

Ranked #1 on Few-Shot Object Detection on ODinW-35

Few-Shot Object Detection Object +2

226

Paper
Code

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

2 code implementations • 22 Jun 2022 • Peixian Chen, Kekai Sheng, Mengdan Zhang, Mingbao Lin, Yunhang Shen, Shaohui Lin, Bo Ren, Ke Li

Open-vocabulary object detection (OVD) aims to scale up vocabulary size to detect objects of novel categories beyond the training vocabulary.

Ranked #12 on Open Vocabulary Object Detection on LVIS v1.0

Causal Inference object-detection +1

Paper
Code

Efficient Decoder-free Object Detection with Transformers

2 code implementations • 14 Jun 2022 • Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao, Xing Sun, Ke Li, Chunhua Shen

A natural usage of ViTs in detection is to replace the CNN-based backbone with a transformer-based backbone, which is straightforward and effective, with the price of bringing considerable computation burden for inference.

Object Object Detection

Paper
Code

ARM: Any-Time Super-Resolution Method

1 code implementation • 21 Mar 2022 • Bohong Chen, Mingbao Lin, Kekai Sheng, Mengdan Zhang, Peixian Chen, Ke Li, Liujuan Cao, Rongrong Ji

To that effect, we construct an Edge-to-PSNR lookup table that maps the edge score of an image patch to the PSNR performance for each subnet, together with a set of computation costs for the subnets.

Image Super-Resolution

Paper
Code

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

1 code implementation • 3 Aug 2021 • Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, WeiMing Dong, Liqing Zhang, Changsheng Xu, Xing Sun

Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue.

Ranked #11 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

Paper
Code

An Empirical Study and Analysis on Open-Set Semi-Supervised Learning

no code implementations • 19 Jan 2021 • Huixiang Luo, Hao Cheng, Fanxu Meng, Yuting Gao, Ke Li, Mengdan Zhang, Xing Sun

Pseudo-labeling (PL) and Data Augmentation-based Consistency Training (DACT) are two approaches widely used in Semi-Supervised Learning (SSL) methods.

Data Augmentation

Paper
Add Code

Learning To Know Where To See: A Visibility-Aware Approach for Occluded Person Re-Identification

no code implementations • ICCV 2021 • Jinrui Yang, Jiawei Zhang, Fufu Yu, Xinyang Jiang, Mengdan Zhang, Xing Sun, Ying-Cong Chen, Wei-Shi Zheng

Several mainstream methods utilize extra cues (e. g., human pose information) to distinguish human parts from obstacles to alleviate the occlusion problem.

Person Re-Identification

Paper
Add Code

Dive Deeper Into Box for Object Detection

no code implementations • ECCV 2020 • Ran Chen, Yong liu, Mengdan Zhang, Shu Liu, Bei Yu, Yu-Wing Tai

Anchor free methods have defined the new frontier in state-of-the-art object detection researches where accurate bounding box estimation is the key to the success of these methods.

Object object-detection +1

Paper
Add Code

Visual Tracking via Spatially Aligned Correlation Filters Network

no code implementations • ECCV 2018 • Mengdan Zhang, Qiang Wang, Junliang Xing, Jin Gao, Peixi Peng, Weiming Hu, Steve Maybank

Correlation filters based trackers rely on a periodic assumption of the search sample to efficiently distinguish the target from the background.

Visual Tracking

Paper
Add Code

DCFNet: Discriminant Correlation Filters Network for Visual Tracking

5 code implementations • 13 Apr 2017 • Qiang Wang, Jin Gao, Junliang Xing, Mengdan Zhang, Weiming Hu

In this work, we present an end-to-end lightweight network architecture, namely DCFNet, to learn the convolutional features and perform the correlation tracking process simultaneously.

Object Tracking Visual Tracking

214

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.