Search Results for author: Huan Ling

Found 18 papers, 6 papers with code

Structural Realization with GGNNs

no code implementations NAACL (TextGraphs) 2021 Jinman Zhao, Gerald Penn, Huan Ling

In this paper, we define an abstract task called structural realization that generates words given a prefix of words and a partial representation of a parse tree.

Graph Neural Network Language Modelling

L4GM: Large 4D Gaussian Reconstruction Model

no code implementations14 Jun 2024 Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second.

Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

no code implementations CVPR 2024 Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation.

Synthetic Data Generation Video Generation

3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

no code implementations CVPR 2024 Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

However, these features are initially trained on paired text and image data, which are not optimized for 3D tasks, and often exhibit a domain gap when applied to the target data.

3D Object Detection Novel View Synthesis +2

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

no code implementations ICCV 2023 Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones.

Knowledge Distillation Representation Learning

Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

3 code implementations CVPR 2023 Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

We first pre-train an LDM on images only; then, we turn the image generator into a video generator by introducing a temporal dimension to the latent space diffusion model and fine-tuning on encoded image sequences, i. e., videos.

Ranked #5 on Text-to-Video Generation on MSR-VTT (CLIP-FID metric)

Image Generation Text-to-Video Generation +3

EditGAN: High-Precision Semantic Image Editing

1 code implementation NeurIPS 2021 Huan Ling, Karsten Kreis, Daiqing Li, Seung Wook Kim, Antonio Torralba, Sanja Fidler

EditGAN builds on a GAN framework that jointly models images and their semantic segmentations, requiring only a handful of labeled examples, making it a scalable tool for editing.

Segmentation Semantic Segmentation +1

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

2 code implementations CVPR 2021 Yuxuan Zhang, Huan Ling, Jun Gao, Kangxue Yin, Jean-Francois Lafleche, Adela Barriuso, Antonio Torralba, Sanja Fidler

To showcase the power of our approach, we generated datasets for 7 image segmentation tasks which include pixel-level labels for 34 human face parts, and 32 car parts.

Decoder Image Segmentation +1

Variational Amodal Object Completion

no code implementations NeurIPS 2020 Huan Ling, David Acuna, Karsten Kreis, Seung Wook Kim, Sanja Fidler

In images of complex scenes, objects are often occluding each other which makes perception tasks such as object detection and tracking, or robotic control tasks such as planning, challenging.

Object object-detection +1

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

no code implementations ICLR 2021 Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Key to our approach is to exploit GANs as a multi-view data generator to train an inverse graphics network using an off-the-shelf differentiable renderer, and the trained inverse graphics network as a teacher to disentangle the GAN's latent code into interpretable 3D properties.

3D geometry Neural Rendering

ScribbleBox: Interactive Annotation Framework for Video Object Segmentation

no code implementations ECCV 2020 Bo-Wen Chen, Huan Ling, Xiaohui Zeng, Gao Jun, Ziyue Xu, Sanja Fidler

Our approach tolerates a modest amount of noise in the box placements, thus typically only a few clicks are needed to annotate tracked boxes to a sufficient accuracy.

Object Segmentation +3

Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer

1 code implementation NeurIPS 2019 Wenzheng Chen, Jun Gao, Huan Ling, Edward J. Smith, Jaakko Lehtinen, Alec Jacobson, Sanja Fidler

Many machine learning models operate on images, but ignore the fact that images are 2D projections formed by 3D geometry interacting with light, in a process called rendering.

3D geometry Single-View 3D Reconstruction

Fast Interactive Object Annotation with Curve-GCN

2 code implementations CVPR 2019 Huan Ling, Jun Gao, Amlan Kar, Wenzheng Chen, Sanja Fidler

Our model runs at 29. 3ms in automatic, and 2. 6ms in interactive mode, making it 10x and 100x faster than Polygon-RNN++.

Object

Cannot find the paper you are looking for? You can Submit a new open access paper.