Search Results for author: Xihui Liu

Found 32 papers, 20 papers with code

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

no code implementations18 Sep 2023 Chenming Zhu, Wenwei Zhang, Tai Wang, Xihui Liu, Kai Chen

Instead of leveraging 2D images, we propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.

3D Object Detection Contrastive Learning +2

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

1 code implementation18 Aug 2023 Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao

In contrast, such privilege has not yet fully benefited 3D deep learning, mainly due to the limited availability of large-scale 3D datasets.

 Ranked #1 on 3D Semantic Segmentation on ScanNet200 (using extra training data)

3D Semantic Segmentation LIDAR Semantic Segmentation +1

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation

1 code implementation12 Jul 2023 Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene.

UniG3D: A Unified 3D Object Generation Dataset

no code implementations19 Jun 2023 Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao

However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.

Autonomous Driving

SAM3D: Segment Anything in 3D Scenes

1 code implementation6 Jun 2023 Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu

In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.

TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale

1 code implementation23 May 2023 Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan

We argue that tuning a text encoder end-to-end, as done in previous work, is suboptimal since it may overfit in terms of styles, thereby losing its original generalization ability to capture the semantics of various language registers.

Representation Learning

Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images

1 code implementation25 Apr 2023 Zeyu Lu, Di Huang, Lei Bai, Jingjing Qu, Chengyue Wu, Xihui Liu, Wanli Ouyang

Along with this, we conduct the model capability of AI-Generated images detection evaluation MPBench and the top-performing model from MPBench achieves a 13% failure rate under the same setting used in the human evaluation.

Benchmarking Fake Image Detection

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

no code implementations24 Apr 2023 Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu

To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.

Image Generation Image Manipulation +1

RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer

2 code implementations12 Apr 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

DDP: Diffusion Model for Dense Visual Prediction

1 code implementation30 Mar 2023 Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo

We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.

Denoising Monocular Depth Estimation +1

Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

1 code implementation CVPR 2023 Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks.

Ranked #5 on Semantic Segmentation on ScanNet (using extra training data)

Contrastive Learning Data Augmentation +3

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

no code implementations CVPR 2023 Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin

Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.

Inductive Bias

Back to the Source: Diffusion-Driven Adaptation To Test-Time Corruption

no code implementations CVPR 2023 Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model.

GLeaD: Improving GANs with A Generator-Leading Task

1 code implementation CVPR 2023 Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen

Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.

Image Generation

Shape-Guided Diffusion with Inside-Outside Attention

no code implementations1 Dec 2022 Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell

When manipulating an object, existing text-to-image diffusion models often ignore the shape of the object and generate content that is incorrectly scaled, cut off, or replaced with background content.

Point Transformer V2: Grouped Vector Attention and Partition-based Pooling

2 code implementations11 Oct 2022 Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao

In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work.

3D Point Cloud Classification 3D Semantic Segmentation +4

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

1 code implementation CVPR 2023 Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge

Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.

Descriptive Representation Learning +1

Back to the Source: Diffusion-Driven Test-Time Adaptation

1 code implementation7 Jul 2022 Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang

We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.

The ArtBench Dataset: Benchmarking Generative Models with Artworks

1 code implementation22 Jun 2022 Peiyuan Liao, Xiuyu Li, Xihui Liu, Kurt Keutzer

We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation.

Benchmarking Conditional Image Generation +1

MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval

1 code implementation26 Apr 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo

Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.

Action Recognition Retrieval +5

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations CVPR 2022 Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Action Recognition Multiple-choice +8

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

no code implementations10 Dec 2021 Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell

We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.

Continuous Control Denoising +1

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

1 code implementation ECCV 2020 Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.

Image Manipulation

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

no code implementations CVPR 2019 Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.

Referring Expression

Localization Guided Learning for Pedestrian Attribute Recognition

no code implementations28 Aug 2018 Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao

Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos.

Pedestrian Attribute Recognition Scene Understanding

Object Detection in Videos with Tubelet Proposal Networks

1 code implementation CVPR 2017 Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

object-detection Object Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.