no code implementations • 18 Sep 2023 • Chenming Zhu, Wenwei Zhang, Tai Wang, Xihui Liu, Kai Chen
Instead of leveraging 2D images, we propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.
1 code implementation • 18 Aug 2023 • Xiaoyang Wu, Zhuotao Tian, Xin Wen, Bohao Peng, Xihui Liu, Kaicheng Yu, Hengshuang Zhao
In contrast, such privilege has not yet fully benefited 3D deep learning, mainly due to the limited availability of large-scale 3D datasets.
Ranked #1 on
3D Semantic Segmentation
on ScanNet200
(using extra training data)
1 code implementation • 12 Jul 2023 • Kaiyi Huang, Kaiyue Sun, Enze Xie, Zhenguo Li, Xihui Liu
Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene.
no code implementations • 19 Jun 2023 • Qinghong Sun, Yangguang Li, Zexiang Liu, Xiaoshui Huang, Fenggang Liu, Xihui Liu, Wanli Ouyang, Jing Shao
However, the quality and diversity of existing 3D object generation methods are constrained by the inadequacies of existing 3D object datasets, including issues related to text quality, the incompleteness of multi-modal data representation encompassing 2D rendered images and 3D assets, as well as the size of the dataset.
1 code implementation • 6 Jun 2023 • Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao, Xihui Liu
In this work, we propose SAM3D, a novel framework that is able to predict masks in 3D point clouds by leveraging the Segment-Anything Model (SAM) in RGB images without further training or finetuning.
1 code implementation • 23 May 2023 • Ziyun Zeng, Yixiao Ge, Zhan Tong, Xihui Liu, Shu-Tao Xia, Ying Shan
We argue that tuning a text encoder end-to-end, as done in previous work, is suboptimal since it may overfit in terms of styles, thereby losing its original generalization ability to capture the semantics of various language registers.
1 code implementation • 25 Apr 2023 • Zeyu Lu, Di Huang, Lei Bai, Jingjing Qu, Chengyue Wu, Xihui Liu, Wanli Ouyang
Along with this, we conduct the model capability of AI-Generated images detection evaluation MPBench and the top-performing model from MPBench achieves a 13% failure rate under the same setting used in the human evaluation.
no code implementations • 24 Apr 2023 • Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu
To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.
2 code implementations • 12 Apr 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
1 code implementation • 30 Mar 2023 • Yuanfeng Ji, Zhe Chen, Enze Xie, Lanqing Hong, Xihui Liu, Zhaoqiang Liu, Tong Lu, Zhenguo Li, Ping Luo
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
Ranked #2 on
Monocular Depth Estimation
on SUN-RGBD
1 code implementation • CVPR 2023 • Xiaoyang Wu, Xin Wen, Xihui Liu, Hengshuang Zhao
As a pioneering work, PointContrast conducts unsupervised 3D representation learning via leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on various downstream tasks.
Ranked #5 on
Semantic Segmentation
on ScanNet
(using extra training data)
no code implementations • CVPR 2023 • Jiahao Wang, Songyang Zhang, Yong liu, Taiqiang Wu, Yujiu Yang, Xihui Liu, Kai Chen, Ping Luo, Dahua Lin
Extensive experiments and ablative analysis also demonstrate that the inductive bias of network architecture, can be incorporated into simple network structure with appropriate optimization strategy.
no code implementations • CVPR 2023 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang
We update the target data instead, and project all test inputs toward the source domain with a generative diffusion model.
1 code implementation • CVPR 2023 • Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen
Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.
no code implementations • 1 Dec 2022 • Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, Trevor Darrell
When manipulating an object, existing text-to-image diffusion models often ignore the shape of the object and generate content that is incorrectly scaled, cut off, or replaced with background content.
2 code implementations • 11 Oct 2022 • Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, Hengshuang Zhao
In this work, we analyze the limitations of the Point Transformer and propose our powerful and efficient Point Transformer V2 model with novel designs that overcome the limitations of previous work.
Ranked #1 on
3D Semantic Segmentation
on nuScenes
1 code implementation • CVPR 2023 • Ziyun Zeng, Yuying Ge, Xihui Liu, Bin Chen, Ping Luo, Shu-Tao Xia, Yixiao Ge
Pre-training on large-scale video data has become a common recipe for learning transferable spatiotemporal representations in recent years.
1 code implementation • 7 Jul 2022 • Jin Gao, Jialing Zhang, Xihui Liu, Trevor Darrell, Evan Shelhamer, Dequan Wang
We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model.
1 code implementation • 22 Jun 2022 • Peiyuan Liao, Xiuyu Li, Xihui Liu, Kurt Keutzer
We introduce ArtBench-10, the first class-balanced, high-quality, cleanly annotated, and standardized dataset for benchmarking artwork generation.
1 code implementation • 26 Apr 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Alex Jinpeng Wang, Jianping Wu, Ying Shan, XiaoHu Qie, Ping Luo
Dominant pre-training work for video-text retrieval mainly adopt the "dual-encoder" architectures to enable efficient retrieval, where two separate encoders are used to contrast global video and text representations, but ignore detailed local semantics.
2 code implementations • CVPR 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo
As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.
Ranked #2 on
Zero-Shot Video Retrieval
on MSVD
no code implementations • 10 Dec 2021 • Xihui Liu, Dong Huk Park, Samaneh Azadi, Gong Zhang, Arman Chopikyan, Yuxiao Hu, Humphrey Shi, Anna Rohrbach, Trevor Darrell
We investigate fine-grained, continuous control of this model class, and introduce a novel unified framework for semantic diffusion guidance, which allows either language or image guidance, or both.
1 code implementation • ECCV 2020 • Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
Semantic image synthesis aims at generating photorealistic images from semantic layouts.
1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Ranked #8 on
Image Retrieval
on Flickr30K 1K test
no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.
no code implementations • 28 Aug 2018 • Pengze Liu, Xihui Liu, Junjie Yan, Jing Shao
Pedestrian attribute recognition has attracted many attentions due to its wide applications in scene understanding and person analysis from surveillance videos.
no code implementations • ECCV 2018 • Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian yuan, Xiaogang Wang
Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities.
Ranked #19 on
Text based Person Retrieval
on CUHK-PEDES
no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
The aim of image captioning is to generate captions by machine to describe image contents.
no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.
2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang
Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.
Ranked #2 on
Pedestrian Attribute Recognition
on RAP
1 code implementation • CVPR 2017 • Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang
Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.