Search Results for author: Ce Liu

Found 48 papers, 22 papers with code

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

no code implementations10 Nov 2023 Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks.

Multi-Task Learning object-detection +1

MM-VID: Advancing Video Understanding with GPT-4V(ision)

no code implementations30 Oct 2023 Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, JianFeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang

We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding.

Video Understanding

Self-Enforced Job Matching

no code implementations26 Aug 2023 Ce Liu, Ziwei Wang, HanZhe Zhang

The classic two-sided many-to-one job matching model assumes that firms treat workers as substitutes and workers ignore colleagues when choosing where to work.

Indiscernible Object Counting in Underwater Scenes

1 code implementation CVPR 2023 Guolei Sun, Zhaochong An, Yun Liu, Ce Liu, Christos Sakaridis, Deng-Ping Fan, Luc van Gool

We further advance the frontier of this field by systematically studying a new challenge named indiscernible object counting (IOC), the goal of which is to count objects that are blended with respect to their surroundings.

Benchmarking Object Counting +1

Single Image Depth Prediction Made Better: A Multivariate Gaussian Take

no code implementations CVPR 2023 Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool

Accordingly, we introduce an approach that performs continuous modeling of per-pixel depth, where we can predict and reason about the per-pixel depth and its distribution.

Depth Estimation Depth Prediction

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

1 code implementation20 Mar 2023 Zhengyuan Yang, Linjie Li, JianFeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

We propose MM-REACT, a system paradigm that integrates ChatGPT with a pool of vision experts to achieve multimodal reasoning and action.

Visual Question Answering

VA-DepthNet: A Variational Approach to Single Image Depth Prediction

2 code implementations13 Feb 2023 Ce Liu, Suryansh Kumar, Shuhang Gu, Radu Timofte, Luc van Gool

While state-of-the-art deep neural network methods for SIDP learn the scene depth from images in a supervised setting, they often overlook the invaluable invariances and priors in the rigid scene space, such as the regularity of the scene.

Depth Prediction Monocular Depth Estimation

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

1 code implementation7 Dec 2022 Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable.

Data Augmentation Instance Segmentation +4

ReCo: Region-Controlled Text-to-Image Generation

no code implementations CVPR 2023 Zhengyuan Yang, JianFeng Wang, Zhe Gan, Linjie Li, Kevin Lin, Chenfei Wu, Nan Duan, Zicheng Liu, Ce Liu, Michael Zeng, Lijuan Wang

Human evaluation on PaintSkill shows that ReCo is +19. 28% and +17. 21% more accurate in generating images with correct object count and spatial relationship than the T2I model.

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

no code implementations15 Sep 2022 Junke Wang, Dongdong Chen, Zuxuan Wu, Chong Luo, Luowei Zhou, Yucheng Zhao, Yujia Xie, Ce Liu, Yu-Gang Jiang, Lu Yuan

This paper presents OmniVL, a new foundation model to support both image-language and video-language tasks using one universal architecture.

Ranked #4 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Action Classification Action Recognition +13

LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling

1 code implementation CVPR 2023 Linjie Li, Zhe Gan, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Ce Liu, Lijuan Wang

In this work, we explore a unified VidL framework LAVENDER, where Masked Language Modeling (MLM) is used as the common interface for all pre-training and downstream tasks.

Language Modelling Masked Language Modeling +6

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

no code implementations3 Jun 2022 Yujia Xie, Luowei Zhou, Xiyang Dai, Lu Yuan, Nguyen Bach, Ce Liu, Michael Zeng

Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e. g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model.

Image Paragraph Captioning Language Modelling +1

GIT: A Generative Image-to-text Transformer for Vision and Language

2 code implementations27 May 2022 JianFeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, Lijuan Wang

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.

Image Classification Language Modelling +6

Credible Persuasion

no code implementations6 May 2022 Xiao Lin, Ce Liu

We propose a new notion of credibility for Bayesian persuasion problems.

K-LITE: Learning Transferable Visual Models with External Knowledge

2 code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts.

Benchmarking Descriptive +4

Unified Contrastive Learning in Image-Text-Label Space

1 code implementation CVPR 2022 Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Particularly, it attains gains up to 9. 2% and 14. 5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively.

Contrastive Learning Image Classification +2

MaskGIT: Masked Generative Image Transformer

6 code implementations CVPR 2022 Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation.

 Ranked #1 on Image Outpainting on LHQC (Block-FID (Right Extend) metric)

Image Manipulation Image Outpainting

ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction

1 code implementation NeurIPS 2021 Gengshan Yang, Deqing Sun, Varun Jampani, Daniel Vlasic, Forrester Cole, Ce Liu, Deva Ramanan

The surface embeddings are implemented as coordinate-based MLPs that are fit to each video via consistency and contrastive reconstruction losses. Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS.

3D Shape Reconstruction from Videos

Pyramid Adversarial Training Improves ViT Performance

1 code implementation CVPR 2022 Charles Herrmann, Kyle Sargent, Lu Jiang, Ramin Zabih, Huiwen Chang, Ce Liu, Dilip Krishnan, Deqing Sun

In this work, we present pyramid adversarial training (PyramidAT), a simple and effective technique to improve ViT's overall performance.

Ranked #6 on Domain Generalization on ImageNet-C (using extra training data)

Adversarial Attack Data Augmentation +2

Florence: A New Foundation Model for Computer Vision

1 code implementation22 Nov 2021 Lu Yuan, Dongdong Chen, Yi-Ling Chen, Noel Codella, Xiyang Dai, Jianfeng Gao, Houdong Hu, Xuedong Huang, Boxin Li, Chunyuan Li, Ce Liu, Mengchen Liu, Zicheng Liu, Yumao Lu, Yu Shi, Lijuan Wang, JianFeng Wang, Bin Xiao, Zhen Xiao, Jianwei Yang, Michael Zeng, Luowei Zhou, Pengchuan Zhang

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

Action Classification Action Recognition In Videos +12

SLIDE: Single Image 3D Photography with Soft Layering and Depth-aware Inpainting

no code implementations ICCV 2021 Varun Jampani, Huiwen Chang, Kyle Sargent, Abhishek Kar, Richard Tucker, Michael Krainin, Dominik Kaeser, William T. Freeman, David Salesin, Brian Curless, Ce Liu

We present SLIDE, a modular and unified system for single image 3D photography that uses a simple yet effective soft layering strategy to better preserve appearance details in novel views.

Image Matting

ViTGAN: Training GANs with Vision Transformers

3 code implementations ICLR 2022 Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases.

Image Generation

OCONet: Image Extrapolation by Object Completion

no code implementations CVPR 2021 Richard Strong Bowen, Huiwen Chang, Charles Herrmann, Piotr Teterwak, Ce Liu, Ramin Zabih

Existing methods struggle to extrapolate images with salient objects in the foreground or are limited to very specific objects such as humans, but tend to work well on indoor/outdoor scenes.

COMISR: Compression-Informed Video Super-Resolution

2 code implementations ICCV 2021 Yinxiao Li, Pengchong Jin, Feng Yang, Ce Liu, Ming-Hsuan Yang, Peyman Milanfar

Most video super-resolution methods focus on restoring high-resolution video frames from low-resolution videos without taking into account compression.

Video Super-Resolution

AutoFlow: Learning a Better Training Set for Optical Flow

1 code implementation CVPR 2021 Deqing Sun, Daniel Vlasic, Charles Herrmann, Varun Jampani, Michael Krainin, Huiwen Chang, Ramin Zabih, William T. Freeman, Ce Liu

Synthetic datasets play a critical role in pre-training CNN models for optical flow, but they are painstaking to generate and hard to adapt to new applications.

Optical Flow Estimation

NeRD: Neural Reflectance Decomposition from Image Collections

1 code implementation ICCV 2021 Mark Boss, Raphael Braun, Varun Jampani, Jonathan T. Barron, Ce Liu, Hendrik P. A. Lensch

This problem is inherently more challenging when the illumination is not a single light source under laboratory conditions but is instead an unconstrained environmental illumination.

Robust image stitching with multiple registrations

no code implementations ECCV 2018 Charles Herrmann, Chen Wang, Richard Strong Bowen, Emil Keyder, Michael Krainin, Ce Liu, Ramin Zabih

Here, we observe that the use of a single registration often leads to errors, especially in scenes with significant depth variation or object motion.

Image Stitching

Stability in Repeated Matching Markets

no code implementations7 Jul 2020 Ce Liu

This paper develops a framework for repeated matching markets.

Supervised Contrastive Learning

23 code implementations NeurIPS 2020 Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, Dilip Krishnan

Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models.

Class Incremental Learning Contrastive Learning +4

Depth Extraction from Video Using Non-parametric Sampling

no code implementations24 Dec 2019 Kevin Karsch, Ce Liu, Sing Bing Kang

We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling.

Depth Estimation Optical Flow Estimation

DepthTransfer: Depth Extraction from Video Using Non-parametric Sampling

no code implementations24 Dec 2019 Kevin Karsch, Ce Liu, Sing Bing Kang

We describe a technique that automatically generates plausible depth maps from videos using non-parametric depth sampling.

Depth Estimation Optical Flow Estimation

On the Effectiveness of Visible Watermarks

no code implementations CVPR 2017 Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman

Since such an attack relies on the consistency of watermarks across image collection, we explore and evaluate how it is affected by various types of inconsistencies in the watermark embedding that could potentially be used to make watermarking more secured.

Image Matting

Local Layering for Joint Motion Estimation and Occlusion Detection

no code implementations CVPR 2014 Deqing Sun, Ce Liu, Hanspeter Pfister

To handle such situations, we propose a local layering model where motion and occlusion relationships are inferred jointly.

Motion Estimation Optical Flow Estimation

A Compositional Model for Low-Dimensional Image Set Representation

no code implementations CVPR 2014 Hossein Mobahi, Ce Liu, William T. Freeman

Learning a low-dimensional representation of images is useful for various applications in graphics and computer vision.

Unsupervised Joint Object Discovery and Segmentation in Internet Images

no code implementations CVPR 2013 Michael Rubinstein, Armand Joulin, Johannes Kopf, Ce Liu

In contrast to previous co-segmentation methods, our algorithm performs well even in the presence of significant amounts of noise images (images not containing a common object), as typical for datasets collected from Internet search.

Object Discovery Segmentation

Probabilistic Label Trees for Efficient Large Scale Image Classification

no code implementations CVPR 2013 Baoyuan Liu, Fereshteh Sadeghi, Marshall Tappen, Ohad Shamir, Ce Liu

Large-scale recognition problems with thousands of classes pose a particular challenge because applying the classifier requires more computation as the number of classes grows.

Classification General Classification +1

Deformable Spatial Pyramid Matching for Fast Dense Correspondences

no code implementations CVPR 2013 Jaechul Kim, Ce Liu, Fei Sha, Kristen Grauman

We introduce a fast deformable spatial pyramid (DSP) matching algorithm for computing dense pixel correspondences.

Cannot find the paper you are looking for? You can Submit a new open access paper.