Search Results for author: Ming-Yu Liu

Found 68 papers, 32 papers with code

Joint Geodesic Upsampling of Depth Images

no code implementations • CVPR 2013 • Ming-Yu Liu, Oncel Tuzel, Yuichi Taguchi

We propose an algorithm utilizing geodesic distances to upsample a low resolution depth image using a registered high resolution color image.

Sensor Fusion

Paper
Add Code

Recursive Context Propagation Network for Semantic Scene Labeling

no code implementations • NeurIPS 2014 • Abhishek Sharma, Oncel Tuzel, Ming-Yu Liu

Then a top-down propagation of the aggregated information takes place that enhances the contextual information of each local feature.

Scene Labeling

Paper
Add Code

Unsupervised Network Pretraining via Encoding Human Design

no code implementations • 19 Feb 2015 • Ming-Yu Liu, Arun Mallya, Oncel C. Tuzel, Xi Chen

Our idea is to pretrain the network through the task of replicating the process of hand-designed feature extraction.

Object Recognition

Paper
Add Code

Layered Interpretation of Street View Images

no code implementations • 15 Jun 2015 • Ming-Yu Liu, Shuoxin Lin, Srikumar Ramalingam, Oncel Tuzel

We propose a layered street view model to encode both depth and semantic information on street view images for autonomous driving.

Autonomous Driving Scene Labeling +1

Paper
Add Code

Deep Gaussian Conditional Random Field Network: A Model-based Deep Network for Discriminative Denoising

no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu

We propose a novel deep network architecture for image\\ denoising based on a Gaussian Conditional Random Field (GCRF) model.

Image Denoising

Paper
Add Code

Learning to Remove Multipath Distortions in Time-of-Flight Range Images for a Robotic Arm Setup

no code implementations • 8 Jan 2016 • Kilho Son, Ming-Yu Liu, Yuichi Taguchi

We use the robotic arm to automatically collect a large amount of ToF range images containing various multipath distortions.

Paper
Add Code

Gaussian Conditional Random Field Network for Semantic Segmentation

no code implementations • CVPR 2016 • Raviteja Vemulapalli, Oncel Tuzel, Ming-Yu Liu, Rama Chellapa

In contrast to the existing approaches that use discrete Conditional Random Field (CRF) models, we propose to use a Gaussian CRF model for the task of semantic segmentation.

Segmentation Semantic Segmentation

Paper
Add Code

Coupled Generative Adversarial Networks

4 code implementations • NeurIPS 2016 • Ming-Yu Liu, Oncel Tuzel

We propose coupled generative adversarial network (CoGAN) for learning a joint distribution of multi-domain images.

Ranked #3 on Image-to-Image Translation on Cityscapes Photo-to-Labels

Domain Adaptation Generative Adversarial Network +1

15,669

Paper
Code

Attentional Network for Visual Object Detection

no code implementations • 6 Feb 2017 • Kota Hara, Ming-Yu Liu, Oncel Tuzel, Amir-Massoud Farahmand

We propose augmenting deep neural networks with an attention mechanism for the visual object detection task.

Object object-detection +1

Paper
Add Code

Unsupervised Image-to-Image Translation Networks

8 code implementations • NeurIPS 2017 • Ming-Yu Liu, Thomas Breuel, Jan Kautz

Unsupervised image-to-image translation aims at learning a joint distribution of images in different domains by using images from the marginal distributions in individual domains.

Ranked #2 on Multimodal Unsupervised Image-To-Image Translation on Cats-and-Dogs

Domain Adaptation Multimodal Unsupervised Image-To-Image Translation +2

15,669

Paper
Code

Tactics of Adversarial Attack on Deep Reinforcement Learning Agents

no code implementations • 8 Mar 2017 • Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun

In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.

Adversarial Attack Atari Games +2

Paper
Add Code

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Video

1 code implementation • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Watching a 360{\deg} sports video requires a viewer to continuously select a viewing angle, either through a sequence of mouse clicks or head movements.

Paper
Code

CASENet: Deep Category-Aware Semantic Edge Detection

11 code implementations • CVPR 2017 • Zhiding Yu, Chen Feng, Ming-Yu Liu, Srikumar Ramalingam

To this end, we propose a novel end-to-end deep semantic edge learning architecture based on ResNet and a new skip-layer architecture where category-wise edge activations at the top convolution layer share and are fused with the same set of bottom layer features.

Ranked #1 on Edge Detection on Cityscapes test

Edge Detection Object Proposal Generation +1

212

Paper
Code

Deep 360 Pilot: Learning a Deep Agent for Piloting Through 360deg Sports Videos

no code implementations • CVPR 2017 • Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun

Given the main object and previously selected viewing angles, our method regresses a shift in viewing angle to move to the next one.

Object

Paper
Add Code

MoCoGAN: Decomposing Motion and Content for Video Generation

5 code implementations • CVPR 2018 • Sergey Tulyakov, Ming-Yu Liu, Xiaodong Yang, Jan Kautz

The proposed framework generates a video by mapping a sequence of random vectors to a sequence of video frames.

Ranked #4 on Video Generation on UCF-101 16 frames, Unconditional, Single GPU

Generative Adversarial Network Video Generation

560

Paper
Code

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

21 code implementations • CVPR 2018 • Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz

It then uses the warped features and features of the first image to construct a cost volume, which is processed by a CNN to estimate the optical flow.

Ranked #3 on Dense Pixel Correspondence Estimation on HPatches

Dense Pixel Correspondence Estimation Optical Flow Estimation

1,585

Paper
Code

Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight

2 code implementations • 2 Oct 2017 • Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang

Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model.

Autonomous Vehicles Decision Making +2

Paper
Code

High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs

20 code implementations • CVPR 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs).

Ranked #2 on Sketch-to-Image Translation on COCO-Stuff

Conditional Image Generation Fundus to Angiography Generation +5

6,518

Paper
Code

Learning Binary Residual Representations for Domain-specific Video Streaming

no code implementations • 14 Dec 2017 • Yi-Hsuan Tsai, Ming-Yu Liu, Deqing Sun, Ming-Hsuan Yang, Jan Kautz

Specifically, we target a streaming setting where the videos to be streamed from a server to a client are all in the same domain and they have to be compressed to a small size for low-latency transmission.

Video Compression

Paper
Add Code

Localization-Aware Active Learning for Object Detection

no code implementations • 16 Jan 2018 • Chieh-Chi Kao, Teng-Yok Lee, Pradeep Sen, Ming-Yu Liu

Active learning - a class of algorithms that iteratively searches for the most informative samples to include in a training dataset - has been shown to be effective at annotating data for image classification.

Active Learning Classification +7

Paper
Add Code

Reblur2Deblur: Deblurring Videos via Self-Supervised Learning

no code implementations • 16 Jan 2018 • Huaijin Chen, Jinwei Gu, Orazio Gallo, Ming-Yu Liu, Ashok Veeraraghavan, Jan Kautz

Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference.

Deblurring Optical Flow Estimation +1

Paper
Add Code

A Closed-form Solution to Photorealistic Image Stylization

12 code implementations • ECCV 2018 • Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, Jan Kautz

Photorealistic image stylization concerns transferring style of a reference photo to a content photo with the constraint that the stylized photo should remain photorealistic.

Image Stylization

11,093

Paper
Code

Multimodal Unsupervised Image-to-Image Translation

14 code implementations • ECCV 2018 • Xun Huang, Ming-Yu Liu, Serge Belongie, Jan Kautz

To translate an image to another domain, we recombine its content code with a random style code sampled from the style space of the target domain.

Ranked #1 on Multimodal Unsupervised Image-To-Image Translation on Edge-to-Handbags

Multimodal Unsupervised Image-To-Image Translation Translation +1

15,669

Paper
Code

Learning Superpixels With Segmentation-Aware Affinity Loss

no code implementations • CVPR 2018 • Wei-Chih Tu, Ming-Yu Liu, Varun Jampani, Deqing Sun, Shao-Yi Chien, Ming-Hsuan Yang, Jan Kautz

Specifically, we propose a new loss function that takes the segmentation error into account for affinity learning.

Segmentation Superpixels

Paper
Add Code

Domain Stylization: A Strong, Simple Baseline for Synthetic to Real Image Domain Adaptation

no code implementations • 24 Jul 2018 • Aysegul Dundar, Ming-Yu Liu, Ting-Chun Wang, John Zedlewski, Jan Kautz

Deep neural networks have largely failed to effectively utilize synthetic data when applied to real images due to the covariate shift problem.

Domain Adaptation object-detection +5

Paper
Add Code

Superpixel Sampling Networks

2 code implementations • ECCV 2018 • Varun Jampani, Deqing Sun, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Superpixels provide an efficient low/mid-level representation of image data, which greatly reduces the number of image primitives for subsequent vision tasks.

Segmentation Superpixels

342

Paper
Code

Video-to-Video Synthesis

11 code implementations • NeurIPS 2018 • Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Guilin Liu, Andrew Tao, Jan Kautz, Bryan Catanzaro

We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.

2k Semantic Segmentation +2

8,495

Paper
Code

Unsupervised Stylish Image Description Generation via Domain Layer Norm

no code implementations • 11 Sep 2018 • Cheng Kuan Chen, Zhu Feng Pan, Min Sun, Ming-Yu Liu

It can learn to generate stylish image descriptions that are more related to image content and can be trained with the arbitrary monolingual corpus without collecting new paired image and stylish descriptions.

Paper
Add Code

Models Matter, So Does Training: An Empirical Study of CNNs for Optical Flow Estimation

2 code implementations • 14 Sep 2018 • Deqing Sun, Xiaodong Yang, Ming-Yu Liu, Jan Kautz

We investigate two crucial and closely related aspects of CNNs for optical flow estimation: models and training.

Ranked #7 on Optical Flow Estimation on KITTI 2012

Optical Flow Estimation

1,585

Paper
Code

Context-Aware Synthesis and Placement of Object Instances

2 code implementations • NeurIPS 2018 • Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Learning to insert an object instance into an image in a semantically coherent manner is a challenging and interesting problem.

Object Scene Parsing

Paper
Code

Semantic Image Synthesis with Spatially-Adaptive Normalization

26 code implementations • CVPR 2019 • Taesung Park, Ming-Yu Liu, Ting-Chun Wang, Jun-Yan Zhu

Previous methods directly feed the semantic layout as input to the deep network, which is then processed through stacks of convolution, normalization, and nonlinearity layers.

Ranked #3 on Sketch-to-Image Translation on COCO-Stuff

Image-to-Image Translation Sketch-to-Image Translation

7,531

Paper
Code

CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification

no code implementations • CVPR 2019 • Zheng Tang, Milind Naphade, Ming-Yu Liu, Xiaodong Yang, Stan Birchfield, Shuo Wang, Ratnesh Kumar, David Anastasiu, Jenq-Neng Hwang

Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking.

object-detection Object Detection +1

Paper
Add Code

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

1 code implementation • CVPR 2019 • Xitong Yang, Xiaodong Yang, Ming-Yu Liu, Fanyi Xiao, Larry Davis, Jan Kautz

In this paper, we propose Spatio-TEmporal Progressive (STEP) action detector---a progressive learning framework for spatio-temporal action detection in videos.

Ranked #7 on Action Detection on UCF101-24

Action Detection Action Recognition

244

Paper
Code

Meta-Sim: Learning to Generate Synthetic Datasets

no code implementations • ICCV 2019 • Amlan Kar, Aayush Prakash, Ming-Yu Liu, Eric Cameracci, Justin Yuan, Matt Rusiniak, David Acuna, Antonio Torralba, Sanja Fidler

Training models to high-end performance requires availability of large labeled datasets, which are expensive to get.

Paper
Add Code

Few-Shot Unsupervised Image-to-Image Translation

10 code implementations • ICCV 2019 • Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, Jan Kautz

Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images.

Translation Unsupervised Image-To-Image Translation

1,563

Paper
Code

PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows

12 code implementations • ICCV 2019 • Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, Bharath Hariharan

Specifically, we learn a two-level hierarchy of distributions where the first level is the distribution of shapes and the second level is the distribution of points given a shape.

Ranked #4 on Point Cloud Generation on ShapeNet Car

Point Cloud Generation Variational Inference

2,392

Paper
Code

Neural Turtle Graphics for Modeling City Road Layouts

no code implementations • ICCV 2019 • Hang Chu, Daiqing Li, David Acuna, Amlan Kar, Maria Shugrina, Xinkai Wei, Ming-Yu Liu, Antonio Torralba, Sanja Fidler

We propose Neural Turtle Graphics (NTG), a novel generative model for spatial graphs, and demonstrate its applications in modeling city road layouts.

Paper
Add Code

Few-shot Video-to-Video Synthesis

6 code implementations • NeurIPS 2019 • Ting-Chun Wang, Ming-Yu Liu, Andrew Tao, Guilin Liu, Jan Kautz, Bryan Catanzaro

To address the limitations, we propose a few-shot vid2vid framework, which learns to synthesize videos of previously unseen subjects or scenes by leveraging few example images of the target at test time.

Ranked #1 on Video-to-Video Synthesis on YouTube Dancing

Video-to-Video Synthesis

1,782

Paper
Code

Dancing to Music

2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.

Ranked #3 on Motion Synthesis on BRACE

Motion Synthesis Pose Estimation

521

Paper
Code

UNAS: Differentiable Architecture Search Meets Reinforcement Learning

1 code implementation • CVPR 2020 • Arash Vahdat, Arun Mallya, Ming-Yu Liu, Jan Kautz

Our framework brings the best of both worlds, and it enables us to search for architectures with both differentiable and non-differentiable criteria in one unified framework while maintaining a low search cost.

Neural Architecture Search reinforcement-learning +1

Paper
Code

On the distance between two neural networks and the stability of learning

2 code implementations • NeurIPS 2020 • Jeremy Bernstein, Arash Vahdat, Yisong Yue, Ming-Yu Liu

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions.

LEMMA

203

Paper
Code

Learning to Generate Multiple Style Transfer Outputs for an Input Sentence

no code implementations • WS 2020 • Kevin Lin, Ming-Yu Liu, Ming-Ting Sun, Jan Kautz

Specifically, we decompose the latent representation of the input sentence to a style code that captures the language style variation and a content code that encodes the language style-independent content.

Sentence Style Transfer +1

Paper
Add Code

Style Example-Guided Text Generation using Generative Adversarial Transformers

no code implementations • 2 Mar 2020 • Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu

The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context.

Sentence Text Generation

Paper
Add Code

Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection

2 code implementations • CVPR 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Yong Jae Lee, Alexander G. Schwing, Jan Kautz

Weakly supervised learning has emerged as a compelling tool for object detection by reducing the need for strong supervision during training.

Ranked #1 on Weakly Supervised Object Detection on COCO test-dev

Object object-detection +3

358

Paper
Code

Learning compositional functions via multiplicative weight updates

1 code implementation • NeurIPS 2020 • Jeremy Bernstein, Jia-Wei Zhao, Markus Meister, Ming-Yu Liu, Anima Anandkumar, Yisong Yue

This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions.

LEMMA

Paper
Code

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

1 code implementation • ECCV 2020 • Kuniaki Saito, Kate Saenko, Ming-Yu Liu

Unsupervised image-to-image translation intends to learn a mapping of an image in a given domain to an analogous image in a different domain, without explicit supervision of the mapping.

Translation Unsupervised Image-To-Image Translation

3,937

Paper
Code

World-Consistent Video-to-Video Synthesis

no code implementations • ECCV 2020 • Arun Mallya, Ting-Chun Wang, Karan Sapra, Ming-Yu Liu

This is because they lack knowledge of the 3D world being rendered and generate each frame only based on the past few frames.

Video-to-Video Synthesis

Paper
Add Code

Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications

no code implementations • 6 Aug 2020 • Ming-Yu Liu, Xun Huang, Jiahui Yu, Ting-Chun Wang, Arun Mallya

The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the synthesis of visual content in an unconditional or input-conditional manner.

Generative Adversarial Network Neural Rendering +1

Paper
Add Code

UFO$^2$: A Unified Framework towards Omni-supervised Object Detection

no code implementations • 21 Oct 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz

Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.

object-detection Object Detection

Paper
Add Code

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing

2 code implementations • CVPR 2021 • Ting-Chun Wang, Arun Mallya, Ming-Yu Liu

We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing.

678

Paper
Code

GANcraft: Unsupervised 3D Neural Rendering of Minecraft Worlds

no code implementations • ICCV 2021 • Zekun Hao, Arun Mallya, Serge Belongie, Ming-Yu Liu

We represent the world as a continuous volumetric function and train our model to render view-consistent photorealistic images for a user-controlled camera.

Neural Rendering

Paper
Add Code

LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update

no code implementations • 26 Jun 2021 • Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa Ali, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar

Representing deep neural networks (DNNs) in low-precision is a promising approach to enable efficient acceleration and memory reduction.

Quantization

Paper
Add Code

Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis

no code implementations • NeurIPS 2021 • Tianchang Shen, Jun Gao, Kangxue Yin, Ming-Yu Liu, Sanja Fidler

The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation.

Paper
Add Code

Multimodal Conditional Image Synthesis with Product-of-Experts GANs

no code implementations • 9 Dec 2021 • Xun Huang, Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

Existing conditional image synthesis frameworks generate images based on user inputs in a single modality, such as text, segmentation, sketch, or style reference.

Ranked #4 on Image-to-Image Translation on COCO-Stuff Labels-to-Photos

Image-to-Image Translation

Paper
Add Code

Generating Long Videos of Dynamic Scenes

1 code implementation • 7 Jun 2022 • Tim Brooks, Janne Hellsten, Miika Aittala, Ting-Chun Wang, Timo Aila, Jaakko Lehtinen, Ming-Yu Liu, Alexei A. Efros, Tero Karras

Existing video generation methods often fail to produce new content as a function of time while maintaining consistencies expected in real environments, such as plausible dynamics and object persistence.

MORPH Video Generation

301

Paper
Code

Learning to Relight Portrait Images via a Virtual Light Stage and Synthetic-to-Real Adaptation

no code implementations • 21 Sep 2022 • Yu-Ying Yeh, Koki Nagano, Sameh Khamis, Jan Kautz, Ming-Yu Liu, Ting-Chun Wang

An effective approach is to supervise the training of deep neural networks with a high-fidelity dataset of desired input-output pairs, captured with a light stage.

Paper
Add Code

Implicit Warping for Animation with Image Sets

no code implementations • 4 Oct 2022 • Arun Mallya, Ting-Chun Wang, Ming-Yu Liu

We present a new implicit warping framework for image animation using sets of source images through the transfer of the motion of a driving video.

Image Animation

Paper
Add Code

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

2 code implementations • 2 Nov 2022 • Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Qinsheng Zhang, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, Ming-Yu Liu

Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages.

Ranked #14 on Text-to-Image Generation on MS COCO

Text-to-Image Generation

626

Paper
Code

SPACE: Speech-driven Portrait Animation with Controllable Expression

no code implementations • ICCV 2023 • Siddharth Gururani, Arun Mallya, Ting-Chun Wang, Rafael Valle, Ming-Yu Liu

It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator.

Paper
Add Code

Magic3D: High-Resolution Text-to-3D Content Creation

1 code implementation • CVPR 2023 • Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi Lin

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results.

Ranked #2 on Text to 3D on T$^3$Bench

Text to 3D Vocal Bursts Intensity Prediction

138

Paper
Code

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

no code implementations • 9 Feb 2023 • Zhuolin Yang, Wei Ping, Zihan Liu, Vijay Korthikanti, Weili Nie, De-An Huang, Linxi Fan, Zhiding Yu, Shiyi Lan, Bo Li, Ming-Yu Liu, Yuke Zhu, Mohammad Shoeybi, Bryan Catanzaro, Chaowei Xiao, Anima Anandkumar

Augmenting pretrained language models (LMs) with a vision encoder (e. g., Flamingo) has obtained the state-of-the-art results in image-to-text generation.

Few-Shot Learning Image Captioning +3

Paper
Add Code

DiffCollage: Parallel Generation of Large Content with Diffusion Models

no code implementations • CVPR 2023 • Qinsheng Zhang, Jiaming Song, Xun Huang, Yongxin Chen, Ming-Yu Liu

We present DiffCollage, a compositional diffusion model that can generate large content by leveraging diffusion models trained on generating pieces of the large content.

Infinite Image Generation

Paper
Add Code

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

no code implementations • ICCV 2023 • Songwei Ge, Seungjun Nah, Guilin Liu, Tyler Poon, Andrew Tao, Bryan Catanzaro, David Jacobs, Jia-Bin Huang, Ming-Yu Liu, Yogesh Balaji

Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy.

Ranked #8 on Text-to-Video Generation on UCF-101

Image Generation Text-to-Video Generation +1

Paper
Add Code

Neuralangelo: High-Fidelity Neural Surface Reconstruction

1 code implementation • CVPR 2023 • Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H. Taylor, Mathias Unberath, Ming-Yu Liu, Chen-Hsuan Lin

Neural surface reconstruction has been shown to be powerful for recovering dense 3D surfaces via image-based neural rendering.

Neural Rendering Surface Reconstruction

4,180

Paper
Code

ATT3D: Amortized Text-to-3D Object Synthesis

no code implementations • ICCV 2023 • Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James Lucas

Text-to-3D modelling has seen exciting progress by combining generative text-to-image models with image-to-3D methods like Neural Radiance Fields.

Image to 3D Object +1

Paper
Add Code

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

1 code implementation • 29 Feb 2024 • Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

413

Paper
Code

Condition-Aware Neural Network for Controlled Image Generation

no code implementations • 1 Apr 2024 • Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han

In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network.

Conditional Image Generation Text-to-Image Generation

Paper
Add Code

UFO²: A Unified Framework towards Omni-supervised Object Detection

1 code implementation • ECCV 2020 • Zhongzheng Ren, Zhiding Yu, Xiaodong Yang, Ming-Yu Liu, Alexander G. Schwing, Jan Kautz

Existing work on object detection often relies on a single form of annotation: the model is trained using either accurate yet costly bounding boxes or cheaper but less expressive image-level tags.

Object object-detection +1

358

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.