Search Results for author: Hsin-Ying Lee

Found 42 papers, 20 papers with code

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

no code implementations20 Mar 2023 Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts.

Texture Synthesis

3D generation on ImageNet

no code implementations2 Mar 2023 Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene.

InfiniCity: Infinite-Scale City Synthesis

no code implementations23 Jan 2023 Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

no code implementations CVPR 2023 Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains.

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

no code implementations CVPR 2023 Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.

3D-Aware Image Synthesis

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

1 code implementation CVPR 2023 Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing, LiangYan Gui

To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input.

3D Reconstruction 3D Shape Generation +2

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation CVPR 2023 Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Story Generation Story Visualization

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

1 code implementation8 Oct 2022 Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities.

Language Modelling Question Answering +1

Coarse-to-Fine Point Cloud Registration with SE(3)-Equivariant Representations

1 code implementation5 Oct 2022 Cheng-Wei Lin, Tung-I Chen, Hsin-Ying Lee, Wen-Chin Chen, Winston H. Hsu

As global feature alignment requires the features to preserve the poses of input point clouds and local feature matching expects the features to be invariant to these poses, we propose an SE(3)-equivariant feature extractor to simultaneously generate two types of features.

Point Cloud Registration

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

1 code implementation27 Sep 2022 Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches.

3D Object Detection Autonomous Driving +4

Vector Quantized Image-to-Image Translation

no code implementations27 Jul 2022 Yu-Jie Chen, Shin-I Cheng, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities.

Image-to-Image Translation Quantization +1

Cross-Modal 3D Shape Generation and Manipulation

no code implementations24 Jul 2022 Zezhou Cheng, Menglei Chai, Jian Ren, Hsin-Ying Lee, Kyle Olszewski, Zeng Huang, Subhransu Maji, Sergey Tulyakov

In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces.

3D Shape Generation

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features

no code implementations2 Jun 2022 Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks.

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

1 code implementation CVPR 2022 Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.

Self-Learning Text Augmentation +1

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations CVPR 2022 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation

StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN

1 code implementation2 Nov 2021 Min Jin Chong, Hsin-Ying Lee, David Forsyth

Recently, StyleGAN has enabled various image manipulation and editing tasks thanks to the high-quality generation and the disentangled latent space.

Image Manipulation Image-to-Image Translation +1

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

1 code implementation ICCV 2021 Tsung-Han Wu, Yueh-Cheng Liu, Yu-Kai Huang, Hsin-Ying Lee, Hung-Ting Su, Ping-Chia Huang, Winston H. Hsu

Despite the success of deep learning on supervised point cloud semantic segmentation, obtaining large-scale point-by-point manual annotations is still a significant challenge.

Active Learning Scene Understanding +1

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations1 Apr 2021 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation +1

Unsupervised Discovery of Disentangled Manifolds in GANs

1 code implementation24 Nov 2020 Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Interpretable generation process is beneficial to various image editing applications.

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

1 code implementation2 Nov 2020 Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang

Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.

Image-to-Image Translation Translation

Semantic View Synthesis

1 code implementation ECCV 2020 Hsin-Ping Huang, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang

We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.

Image Generation

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

no code implementations ECCV 2020 Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.

Image Generation Retrieval

Controllable Image Synthesis via SegVAE

no code implementations ECCV 2020 Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +1

Dancing to Music

1 code implementation NeurIPS 2019 Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.

Motion Synthesis Pose Estimation

Self-supervised Audio Spatialization with Correspondence Classifier

no code implementations14 May 2019 Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Spatial audio is an essential medium to audiences for 3D visual and auditory experience.

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

4 code implementations2 May 2019 Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang

In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images.

Image-to-Image Translation Perceptual Distance +1

Sub-GAN: An Unsupervised Generative Model via Subspaces

no code implementations ECCV 2018 Jie Liang, Jufeng Yang, Hsin-Ying Lee, Kai Wang, Ming-Hsuan Yang

The recent years have witnessed significant growth in constructing robust generative models to capture informative distributions of natural data.

Learning Structured Semantic Embeddings for Visual Recognition

no code implementations5 Jun 2017 Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang

First, we exploit the discriminative constraints to capture the intra- and inter-class relationships of image embeddings.

General Classification Multi-Label Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.