no code implementations • 10 Nov 2023 • Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.
no code implementations • 30 Oct 2023 • Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang
We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.
no code implementations • 26 Aug 2023 • Ce Liu, Ziwei Wang, HanZhe Zhang
The classic two-sided many-to-one job matching model assumes that firms treat workers as substitutes and workers ignore colleagues when choosing where to work.
1 code implementation • CVPR 2023 • Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan, Luc van Gool
We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings.
no code implementations • CVPR 2023 • Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool
Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution.
1 code implementation • 20 Mar 2023 • Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.
Ranked #6 on
Visual Question Answering
on MM-Vet
2 code implementations • 13 Feb 2023 • Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool
While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene space, such as the regularity of the scene.
Ranked #10 on
Monocular Depth Estimation
on NYU-Depth V2
1 code implementation • CVPR 2023 • Haotian Liu, Kilho Son, Jianwei Yang, Ce Liu, Jianfeng Gao, Yong Jae Lee, Chunyuan Li
Image-text contrastive learning models such as CLIP have demonstrated strong task transfer ability.
Ranked #1 on
Semi-Supervised Image Classification
on ImageNet - 10% labeled data
(using extra training data)
1 code implementation • 7 Dec 2022 • Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu
We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.
Ranked #6 on
Instance Segmentation
on LVIS v1.0 val
no code implementations • CVPR 2023 • Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang
Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.
no code implementations • 15 Sep 2022 • Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan
This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.
Ranked #4 on
Cross-Modal Retrieval
on Flickr30k
(using extra training data)
1 code implementation • NeurIPS 2022 • Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, JianFeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann Lecun, Nanyun Peng, Jianfeng Gao, Lijuan Wang
Vision-language (VL) pre-training has recently received considerable attention.
Ranked #1 on
Phrase Grounding
on Flickr30k Entities Dev
1 code implementation • CVPR 2023 • Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang
In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.
no code implementations • 3 Jun 2022 • Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng
Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.
2 code implementations • 27 May 2022 • JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.
Ranked #1 on
Image Captioning
on nocaps near-domain
no code implementations • 6 May 2022 • Xiao Lin, Ce Liu
We propose a new notion of credibility for Bayesian persuasion problems.
2 code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao
We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.
1 code implementation • CVPR 2022 • Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao
Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.
6 code implementations • CVPR 2022 • Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman
At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.
Ranked #1 on
Image Outpainting
on LHQC
(Block-FID (Right Extend) metric)
1 code implementation • NeurIPS 2021 • Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan
The surface embeddings are implemented as coordinate-based MLPs that are fit to each video via consistency and contrastive reconstruction losses. Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS.
1 code implementation • CVPR 2022 • Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun
In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.
Ranked #6 on
Domain Generalization
on ImageNet-C
(using extra training data)
1 code implementation • 22 Nov 2021 • Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang
Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.
Ranked #1 on
Action Recognition In Videos
on Kinetics-600
1 code implementation • NeurIPS 2021 • Mark Boss, Varun Jampani, Raphael Braun, Ce Liu, Jonathan T. Barron, Hendrik P. A. Lensch
Decomposing a scene into its shape, reflectance and illumination is a fundamental problem in computer vision and graphics.
no code implementations • ICCV 2021 • Varun Jampani, Huiwen Chang, Kyle Sargent, Abhishek Kar, Richard Tucker, Michael Krainin, Dominik Kaeser, William T. Freeman, David Salesin, Brian Curless, Ce Liu
We present SLIDE, a modular and unified system for single image 3D photography that uses a simple yet effective soft layering strategy to better preserve appearance details in novel views.
3 code implementations • ICLR 2022 • Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.
Ranked #61 on
Image Generation
on CIFAR-10
no code implementations • CVPR 2021 • Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih
Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.
1 code implementation • CVPR 2021 • Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Huiwen Chang, Deva Ramanan, William T. Freeman, Ce Liu
Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images.
2 code implementations • ICCV 2021 • Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar
Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.
1 code implementation • CVPR 2021 • Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu
Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.
no code implementations • CVPR 2022 • Innfarn Yoo, Huiwen Chang, Xiyang Luo, Ondrej Stava, Ce Liu, Peyman Milanfar, Feng Yang
Digital watermarking is widely used for copyright protection.
1 code implementation • CVPR 2021 • Hung-Yu Tseng, Lu Jiang, Ce Liu, Ming-Hsuan Yang, Weilong Yang
Recent years have witnessed the rapid progress of generative adversarial networks (GANs).
Ranked #1 on
Image Generation
on 25% ImageNet 128x128
1 code implementation • ICCV 2021 • Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Barron, Ce Liu, Hendrik P. A. Lensch
This problem is inherently more challenging when the illumination is not a single light source under laboratory conditions but is instead an unconstrained environmental illumination.
no code implementations • ECCV 2018 • Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih
Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.
no code implementations • 7 Jul 2020 • Ce Liu
This paper develops a framework for repeated matching markets.
23 code implementations • NeurIPS 2020 • Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan
Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.
Ranked #2 on
Class Incremental Learning
on cifar100
no code implementations • 24 Dec 2019 • Kevin Karsch, Ce Liu, Sing Bing Kang
We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling.
no code implementations • 24 Dec 2019 • Kevin Karsch, Ce Liu, Sing Bing Kang
We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling.
no code implementations • ICCV 2019 • Piotr Teterwak, Aaron Sarna, Dilip Krishnan, Aaron Maschinot, David Belanger, Ce Liu, William T. Freeman
Image extension models have broad applications in image editing, computational photography and computer graphics.
Ranked #2 on
Uncropping
on Places2 val
no code implementations • CVPR 2019 • Zhengqi Li, Tali Dekel, Forrester Cole, Richard Tucker, Noah Snavely, Ce Liu, William T. Freeman
We present a method for predicting dense depth in scenarios where both a monocular camera and people in the scene are freely moving.
no code implementations • CVPR 2018 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman
We study the problem of reconstructing an image from information stored at contour locations.
no code implementations • 21 Dec 2017 • Tali Dekel, Chuang Gan, Dilip Krishnan, Ce Liu, William T. Freeman
We study the problem of reconstructing an image from information stored at contour locations.
no code implementations • CVPR 2017 • Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman
Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured.
no code implementations • NeurIPS 2014 • Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia
Many fundamental image-related problems involve deconvolution operators.
Ranked #1 on
Image Compression
on FER2013
no code implementations • CVPR 2014 • Deqing Sun, Ce Liu, Hanspeter Pfister
To handle such situations, we propose a local layering model where motion and occlusion relationships are inferred jointly.
no code implementations • CVPR 2014 • Hossein Mobahi, Ce Liu, William T. Freeman
Learning a low-dimensional representation of images is useful for various applications in graphics and computer vision.
no code implementations • CVPR 2013 • Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu
In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search.
no code implementations • CVPR 2013 • Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, Ce Liu
Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows.
no code implementations • CVPR 2013 • Jaechul Kim, Ce Liu, Fei Sha, Kristen Grauman
We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences.