Search Results for author: Xinlong Wang

Found 42 papers, 33 papers with code

Instance-Aware Embedding for Point Cloud Instance Segmentation

no code implementations • ECCV 2020 • Tong He, Yifan Liu, Chunhua Shen, Xinlong Wang, Changming Sun

However, these methods are unaware of the instance context and fail to realize the boundary and geometric information of an instance, which are critical to separate adjacent objects.

Instance Segmentation Semantic Segmentation

Paper
Add Code

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions

1 code implementation • 17 Feb 2024 • Wenxuan Wang, Yisi Zhang, Xingjian He, Yichen Yan, Zijia Zhao, Xinlong Wang, Jing Liu

Previous datasets and methods for classic VG task mainly rely on the prior assumption that the given expression must literally refer to the target object, which greatly impedes the practical deployment of agents in real-world scenarios.

Visual Grounding

Paper
Code

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

2 code implementations • 6 Feb 2024 • Quan Sun, Jinsheng Wang, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Xinlong Wang

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models.

Ranked #1 on Zero-Shot Transfer Image Classification on SUN

Image Classification Zero-Shot Transfer Image Classification

1,957

Paper
Code

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

6 code implementations • 17 Jan 2024 • Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, Xinggang Wang

The results demonstrate that Vim is capable of overcoming the computation & memory constraints on performing Transformer-style understanding for high-resolution images and it has great potential to be the next-generation backbone for vision foundation models.

object-detection Object Detection +3

2,041

Paper
Code

Generative Multimodal Models are In-Context Learners

1 code implementation • 20 Dec 2023 • Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

Ranked #19 on Visual Question Answering on MM-Vet

In-Context Learning Question Answering +2

1,487

Paper
Code

Tokenize Anything via Prompting

1 code implementation • 14 Dec 2023 • Ting Pan, Lulu Tang, Xinlong Wang, Shiguang Shan

The semantic token is responsible for learning the semantic priors in a predefined concept space.

Visual Prompting

431

Paper
Code

Unveiling Parts Beyond Objects:Towards Finer-Granularity Referring Expression Segmentation

1 code implementation • 13 Dec 2023 • Wenxuan Wang, Tongtian Yue, Yisi Zhang, Longteng Guo, Xingjian He, Xinlong Wang, Jing Liu

To foster future research into fine-grained visual grounding, our benchmark RefCOCOm, the MRES-32M dataset and model UniRES will be publicly available at https://github. com/Rubics-Xuan/MRES

Descriptive Object +3

Paper
Code

GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

2 code implementations • 29 Nov 2023 • Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang, Xinlong Wang

We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors.

3D Generation Text to 3D

556

Paper
Code

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation • 31 Oct 2023 • Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

170

Paper
Code

JudgeLM: Fine-tuned Large Language Models are Scalable Judges

1 code implementation • 26 Oct 2023 • Lianghui Zhu, Xinggang Wang, Xinlong Wang

To address this problem, we propose to fine-tune LLMs as scalable judges (JudgeLM) to evaluate LLMs efficiently and effectively in open-ended benchmarks.

255

Paper
Code

3D-GPT: Procedural 3D Modeling with Large Language Models

no code implementations • 19 Oct 2023 • Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould

Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.

Scene Generation

Paper
Add Code

Uni3D: Exploring Unified 3D Representation at Scale

2 code implementations • 10 Oct 2023 • Junsheng Zhou, Jinsheng Wang, Baorui Ma, Yu-Shen Liu, Tiejun Huang, Xinlong Wang

Scaling up representations for images or text has been extensively investigated in the past few years and has led to revolutions in learning vision and language.

Ranked #1 on Zero-shot 3D classification on Objaverse LVIS (using extra training data)

3D Object Classification Retrieval +5

556

Paper
Code

Generative Pretraining in Multimodality

2 code implementations • 11 Jul 2023 • Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Ranked #1 on Visual Question Answering on VQA v2

Image Captioning Temporal/Casual QA +4

1,487

Paper
Code

Fine-Grained Visual Prompting

1 code implementation • NeurIPS 2023 • Lingfeng Yang, Yueze Wang, Xiang Li, Xinlong Wang, Jian Yang

Previous works have suggested that incorporating visual prompts, such as colorful boxes or circles, can improve the ability of models to recognize objects of interest.

Visual Prompting

Paper
Code

Towards Better Entity Linking with Multi-View Enhanced Distillation

1 code implementation • 27 May 2023 • Yi Liu, Yuan Tian, Jianxun Lian, Xinlong Wang, Yanan Cao, Fang Fang, Wen Zhang, Haizhen Huang, Denvy Deng, Qi Zhang

Aiming at learning entity representations that can match divergent mentions, this paper proposes a Multi-View Enhanced Distillation (MVD) framework, which can effectively transfer knowledge of multiple fine-grained and mention-relevant parts within entities from cross-encoders to dual-encoders.

Entity Linking Knowledge Distillation +1

Paper
Code

Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching

1 code implementation • 22 May 2023 • Yang Liu, Muzhi Zhu, Hengtao Li, Hao Chen, Xinlong Wang, Chunhua Shen

In this work, we present Matcher, a novel perception paradigm that utilizes off-the-shelf vision foundation models to address various perception tasks.

Segmentation Semantic Segmentation

357

Paper
Code

SegGPT: Segmenting Everything In Context

1 code implementation • 6 Apr 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Ranked #1 on Few-Shot Semantic Segmentation on PASCAL-5i (5-Shot) (using extra training data)

Few-Shot Semantic Segmentation In-Context Learning +5

2,419

Paper
Code

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

1 code implementation • 30 Mar 2023 • Wen Wang, Yan Jiang, Kangyang Xie, Zide Liu, Hao Chen, Yue Cao, Xinlong Wang, Chunhua Shen

Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video.

Image Generation Video Alignment +1

319

Paper
Code

EVA-CLIP: Improved Training Techniques for CLIP at Scale

3 code implementations • 27 Mar 2023 • Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Ranked #4 on Zero-Shot Transfer Image Classification on Food-101

Image Classification Representation Learning +2

1,956

Paper
Code

EVA-02: A Visual Representation for Neon Genesis

6 code implementations • 20 Mar 2023 • Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

29,713

Paper
Code

Affective Image Filter: Reflecting Emotions from Text to Images

no code implementations • ICCV 2023 • Shuchen Weng, Peixuan Zhang, Zheng Chang, Xinlong Wang, Si Li, Boxin Shi

In this work, we propose Affective Image Filter (AIF), a novel model that is able to understand the visually-abstract emotions from the text and reflect them to visually-concrete images with appropriate colors and textures.

Image Generation

Paper
Add Code

SegGPT: Towards Segmenting Everything in Context

no code implementations • ICCV 2023 • Xinlong Wang, Xiaosong Zhang, Yue Cao, Wen Wang, Chunhua Shen, Tiejun Huang

We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images.

Few-Shot Semantic Segmentation In-Context Learning +4

Paper
Add Code

Images Speak in Images: A Generalist Painter for In-Context Visual Learning

1 code implementation • CVPR 2023 • Xinlong Wang, Wen Wang, Yue Cao, Chunhua Shen, Tiejun Huang

In this work, we present Painter, a generalist model which addresses these obstacles with an "image"-centric solution, that is, to redefine the output of core vision tasks as images, and specify task prompts as also images.

Ranked #6 on Personalized Segmentation on PerSeg

In-Context Learning Keypoint Detection +2

2,419

Paper
Code

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

6 code implementations • CVPR 2023 • Yuxin Fang, Wen Wang, Binhui Xie, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Ranked #1 on Self-Supervised Image Classification (with CLIP) on ImageNet (zero-shot)

Action Classification Action Recognition +9

29,713

Paper
Code

Point-Teaching: Weakly Semi-Supervised Object Detection with Point Annotations

no code implementations • 1 Jun 2022 • Yongtao Ge, Qiang Zhou, Xinlong Wang, Zhibin Wang, Hao Li, Chunhua Shen

Point annotations are considerably more time-efficient than bounding box annotations.

Data Augmentation Multiple Instance Learning +4

Paper
Add Code

FreeSOLO: Learning to Segment Objects without Annotations

1 code implementation • CVPR 2022 • Xinlong Wang, Zhiding Yu, Shalini De Mello, Jan Kautz, Anima Anandkumar, Chunhua Shen, Jose M. Alvarez

FreeSOLO further demonstrates superiority as a strong pre-training method, outperforming state-of-the-art self-supervised pre-training methods by +9. 8% AP when fine-tuning instance segmentation with only 5% COCO masks.

Instance Segmentation object-detection +4

309

Paper
Code

Poseur: Direct Human Pose Regression with Transformers

1 code implementation • 19 Jan 2022 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang, Anton Van Den Hengel

We propose a direct, regression-based approach to 2D human pose estimation from single images.

Ranked #2 on Keypoint Detection on MS COCO

2D Human Pose Estimation Keypoint Detection +1

168

Paper
Code

SOLO: A Simple Framework for Instance Segmentation

no code implementations • 30 Jun 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Besides instance segmentation, our method yields state-of-the-art results in object detection (from our mask byproduct) and panoptic segmentation.

Image Matting Instance Segmentation +4

Paper
Add Code

FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions

3 code implementations • CVPR 2021 • Weian Mao, Zhi Tian, Xinlong Wang, Chunhua Shen

We propose a fully convolutional multi-person pose estimation framework using dynamic instance-aware convolutions, termed FCPose.

Keypoint Estimation Multi-Person Pose Estimation

3,324

Paper
Code

TFPose: Direct Human Pose Estimation with Transformers

no code implementations • 29 Mar 2021 • Weian Mao, Yongtao Ge, Chunhua Shen, Zhi Tian, Xinlong Wang, Zhibin Wang

We propose a human pose estimation framework that solves the task in the regression-based fashion.

Ranked #26 on Pose Estimation on MPII Human Pose (using extra training data)

Pose Estimation regression

Paper
Add Code

Conditional Positional Encodings for Vision Transformers

2 code implementations • 22 Feb 2021 • Xiangxiang Chu, Zhi Tian, Bo Zhang, Xinlong Wang, Chunhua Shen

Built on PEG, we present Conditional Position encoding Vision Transformer (CPVT).

AutoML Classification +4

177

Paper
Code

Diverse Knowledge Distillation for End-to-End Person Search

no code implementations • 21 Dec 2020 • Xinyu Zhang, Xinlong Wang, Jia-Wang Bian, Chunhua Shen, Mingyu You

Person search aims to localize and identify a specific person from a gallery of images.

Human Detection Knowledge Distillation +1

Paper
Add Code

BoxInst: High-Performance Instance Segmentation with Box Annotations

2 code implementations • CVPR 2021 • Zhi Tian, Chunhua Shen, Xinlong Wang, Hao Chen

We present a high-performance method that can achieve mask-level instance segmentation with only bounding-box annotations for training.

Ranked #2 on Box-supervised Instance Segmentation on PASCAL VOC 2012 val

Box-supervised Instance Segmentation Segmentation +3

3,324

Paper
Code

End-to-End Video Instance Segmentation with Transformers

2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Ranked #33 on Video Instance Segmentation on YouTube-VIS validation

Instance Segmentation Segmentation +3

734

Paper
Code

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

6 code implementations • CVPR 2021 • Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, Lei LI

Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin.

Contrastive Learning Image Classification +7

3,080

Paper
Code

SOLOv2: Dynamic and Fast Instance Segmentation

18 code implementations • NeurIPS 2020 • Xinlong Wang, Rufeng Zhang, Tao Kong, Lei LI, Chunhua Shen

Importantly, we take one step further by dynamically learning the mask head of the object segmenter such that the mask head is conditioned on the location.

Ranked #10 on Real-time Instance Segmentation on MSCOCO

object-detection Object Detection +4

27,744

Paper
Code

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

2 code implementations • 3 Feb 2020 • Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin

Compared with previous learning objectives, i. e., learning metric depth or relative depth, we propose to learn the affine-invariant depth using our diverse dataset to ensure both generalization and high-quality geometric shapes of scenes.

Depth Estimation Depth Prediction

216

Paper
Code

SOLO: Segmenting Objects by Locations

24 code implementations • ECCV 2020 • Xinlong Wang, Tao Kong, Chunhua Shen, Yuning Jiang, Lei LI

We present a new, embarrassingly simple approach to instance segmentation in images.

Ranked #67 on Instance Segmentation on COCO test-dev

Clustering General Classification +3

27,744

Paper
Code

Task-Aware Monocular Depth Estimation for 3D Object Detection

1 code implementation • 17 Sep 2019 • Xinlong Wang, Wei Yin, Tao Kong, Yuning Jiang, Lei LI, Chunhua Shen

In this paper, we first analyse the data distributions and interaction of foreground and background, then propose the foreground-background separated monocular depth estimation (ForeSeE) method, to estimate the foreground depth and background depth using separate optimization objectives and depth decoders.

3D Object Detection 3D Object Recognition +4

Paper
Code

Associatively Segmenting Instances and Semantics in Point Clouds

3 code implementations • CVPR 2019 • Xinlong Wang, Shu Liu, Xiaoyong Shen, Chunhua Shen, Jiaya Jia

A 3D point cloud describes the real scene precisely and intuitively. To date how to segment diversified elements in such an informative 3D scene is rarely discussed.

Ranked #15 on 3D Instance Segmentation on S3DIS (mRec metric)

3D Instance Segmentation 3D Semantic Segmentation +1

248

Paper
Code

Repulsion Loss: Detecting Pedestrians in a Crowd

2 code implementations • CVPR 2018 • Xinlong Wang, Tete Xiao, Yuning Jiang, Shuai Shao, Jian Sun, Chunhua Shen

In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem.

Ranked #9 on Pedestrian Detection on Caltech (using extra training data)

Pedestrian Detection regression

233

Paper
Code

Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition

no code implementations • 11 Jul 2017 • Xinlong Wang, Zhipeng Man, Mingyu You, Chunhua Shen

Our experimental results on a few data sets demonstrate the effectiveness of using GAN images: an improvement of 7. 5% over a strong baseline with moderate-sized real data being available.

Image Generation License Plate Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.