Search Results for author: Hsin-Ying Lee

Found 53 papers, 23 papers with code

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

1 code implementation • 30 Jun 2023 • Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.

Image to 3D

1,456

Paper
Code

Diverse Image-to-Image Translation via Disentangled Representations

7 code implementations • ECCV 2018 • Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Kumar Singh, Ming-Hsuan Yang

Our model takes the encoded content features extracted from a given input and the attribute vectors sampled from the attribute space to produce diverse outputs at test time.

Ranked #4 on Multimodal Unsupervised Image-To-Image Translation on CelebA-HQ

Attribute Domain Adaptation +4

832

Paper
Code

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

4 code implementations • 2 May 2019 • Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, Ming-Hsuan Yang

In this work, we present an approach based on disentangled representation for generating diverse outputs without paired training images.

Attribute Image-to-Image Translation +2

832

Paper
Code

Dancing to Music

2 code implementations • NeurIPS 2019 • Hsin-Ying Lee, Xiaodong Yang, Ming-Yu Liu, Ting-Chun Wang, Yu-Ding Lu, Ming-Hsuan Yang, Jan Kautz

In the analysis phase, we decompose a dance into a series of basic dance units, through which the model learns how to move.

Ranked #3 on Motion Synthesis on BRACE

Motion Synthesis Pose Estimation

521

Paper
Code

Mode Seeking Generative Adversarial Networks for Diverse Image Synthesis

2 code implementations • CVPR 2019 • Qi Mao, Hsin-Ying Lee, Hung-Yu Tseng, Siwei Ma, Ming-Hsuan Yang

In this work, we propose a simple yet effective regularization term to address the mode collapse issue for cGANs.

Ranked #3 on Multimodal Unsupervised Image-To-Image Translation on CelebA-HQ

Multimodal Unsupervised Image-To-Image Translation Translation

411

Paper
Code

StyleGAN of All Trades: Image Manipulation with Only Pretrained StyleGAN

1 code implementation • 2 Nov 2021 • Min Jin Chong, Hsin-Ying Lee, David Forsyth

Recently, StyleGAN has enabled various image manipulation and editing tasks thanks to the high-quality generation and the disentangled latent space.

Image Manipulation Image-to-Image Translation +1

374

Paper
Code

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

1 code implementation • CVPR 2023 • Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing, LiangYan Gui

To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input.

3D Reconstruction 3D Shape Generation +2

364

Paper
Code

Text-Guided Synthesis of Eulerian Cinemagraphs

1 code implementation • 6 Jul 2023 • Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu

We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images.

Image Animation

344

Paper
Code

Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation

1 code implementation • ICLR 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Ming-Hsuan Yang

Few-shot classification aims to recognize novel categories with only few labeled images in each class.

Ranked #6 on Cross-Domain Few-Shot on CUB

Classification Cross-Domain Few-Shot +2

319

Paper
Code

InfinityGAN: Towards Infinite-Pixel Image Synthesis

1 code implementation • ICLR 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Yen-Chi Cheng, Sergey Tulyakov, Ming-Hsuan Yang

We present a novel framework, InfinityGAN, for arbitrary-sized image generation.

Ranked #2 on Scene Generation on OSM

Image Generation Scene Generation

317

Paper
Code

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

1 code implementation • CVPR 2022 • Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.

Self-Learning Text Augmentation +1

186

Paper
Code

Unsupervised Representation Learning by Sorting Sequences

1 code implementation • ICCV 2017 • Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang

We present an unsupervised representation learning approach using videos without semantic labels.

Ranked #46 on Self-Supervised Action Recognition on HMDB51

Image Classification object-detection +4

Paper
Code

Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

1 code implementation • 2 Nov 2020 • Qi Mao, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang, Siwei Ma, Ming-Hsuan Yang

Generating a smooth sequence of intermediate results bridges the gap of two different domains, facilitating the morphing effect across domains.

Attribute Image-to-Image Translation +1

Paper
Code

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

1 code implementation • 27 Sep 2022 • Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu

To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches.

3D Object Detection Autonomous Driving +5

Paper
Code

ReDAL: Region-based and Diversity-aware Active Learning for Point Cloud Semantic Segmentation

1 code implementation • ICCV 2021 • Tsung-Han Wu, Yueh-Cheng Liu, Yu-Kai Huang, Hsin-Ying Lee, Hung-Ting Su, Ping-Chia Huang, Winston H. Hsu

Despite the success of deep learning on supervised point cloud semantic segmentation, obtaining large-scale point-by-point manual annotations is still a significant challenge.

Active Learning Scene Understanding +1

Paper
Code

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation • CVPR 2023 • Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Paper
Code

Exploiting Diffusion Prior for Generalizable Dense Prediction

2 code implementations • 30 Nov 2023 • Hung-Yu Tseng, Hsin-Ying Lee, Ming-Hsuan Yang

Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable domain gap.

Intrinsic Image Decomposition Semantic Segmentation

Paper
Code

Semantic View Synthesis

1 code implementation • ECCV 2020 • Hsin-Ping Huang, Hung-Yu Tseng, Hsin-Ying Lee, Jia-Bin Huang

We tackle a new problem of semantic view synthesis -- generating free-viewpoint rendering of a synthesized scene using a semantic label map as input.

Image Generation

Paper
Code

D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation

1 code implementation • 14 Feb 2022 • Tsung-Han Wu, Yi-Syuan Liou, Shao-Ji Yuan, Hsin-Ying Lee, Tung-I Chen, Kuan-Chih Huang, Winston H. Hsu

In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations.

Active Learning Domain Adaptation +2

Paper
Code

Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

1 code implementation • 8 Oct 2022 • Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities.

Language Modelling Question Answering +1

Paper
Code

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

1 code implementation • NeurIPS 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories.

Paper
Code

Coarse-to-Fine Point Cloud Registration with SE(3)-Equivariant Representations

1 code implementation • 5 Oct 2022 • Cheng-Wei Lin, Tung-I Chen, Hsin-Ying Lee, Wen-Chin Chen, Winston H. Hsu

As global feature alignment requires the features to preserve the poses of input point clouds and local feature matching expects the features to be invariant to these poses, we propose an SE(3)-equivariant feature extractor to simultaneously generate two types of features.

Point Cloud Registration

Paper
Code

Unsupervised Discovery of Disentangled Manifolds in GANs

1 code implementation • 24 Nov 2020 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Interpretable generation process is beneficial to various image editing applications.

Attribute

Paper
Code

Learning Structured Semantic Embeddings for Visual Recognition

no code implementations • 5 Jun 2017 • Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, Ming-Hsuan Yang

First, we exploit the discriminative constraints to capture the intra- and inter-class relationships of image embeddings.

General Classification Multi-Label Classification +2

Paper
Add Code

Sub-GAN: An Unsupervised Generative Model via Subspaces

no code implementations • ECCV 2018 • Jie Liang, Jufeng Yang, Hsin-Ying Lee, Kai Wang, Ming-Hsuan Yang

The recent years have witnessed significant growth in constructing robust generative models to capture informative distributions of natural data.

Generative Adversarial Network

Paper
Add Code

Soft-Segmentation Guided Object Motion Deblurring

no code implementations • CVPR 2016 • Jinshan Pan, Zhe Hu, Zhixun Su, Hsin-Ying Lee, Ming-Hsuan Yang

To address these problems, we propose a novel model for object motion deblurring.

Deblurring Object +2

Paper
Add Code

Self-supervised Audio Spatialization with Correspondence Classifier

no code implementations • 14 May 2019 • Yu-Ding Lu, Hsin-Ying Lee, Hung-Yu Tseng, Ming-Hsuan Yang

Spatial audio is an essential medium to audiences for 3D visual and auditory experience.

Paper
Add Code

Neural Design Network: Graphic Layout Generation with Constraints

no code implementations • ECCV 2020 • Hsin-Ying Lee, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang

The first module predicts a graph with complete relations from a graph with user-specified relations.

Image Generation

Paper
Add Code

Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning

no code implementations • 19 May 2020 • Jia-Fong Yeh, Hsin-Ying Lee, Bing-Chen Tsai, Yi-Rong Chen, Ping-Chia Huang, Winston H. Hsu

In recent years, few-shot learning problems have received a lot of attention.

cross-domain few-shot learning Face Recognition

Paper
Add Code

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval

no code implementations • ECCV 2020 • Hung-Yu Tseng, Hsin-Ying Lee, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Image generation from scene description is a cornerstone technique for the controlled generation, which is beneficial to applications such as content creation and image editing.

Image Generation Retrieval

Paper
Add Code

Controllable Image Synthesis via SegVAE

no code implementations • ECCV 2020 • Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, Ming-Hsuan Yang

We also apply an off-the-shelf image-to-image translation model to generate realistic RGB images to better understand the quality of the synthesized semantic maps.

Conditional Image Generation Image-to-Image Translation +2

Paper
Add Code

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations • 1 Apr 2021 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation +1

Paper
Add Code

Unsupervised Sound Localization via Iterative Contrastive Learning

no code implementations • 1 Apr 2021 • Yan-Bo Lin, Hung-Yu Tseng, Hsin-Ying Lee, Yen-Yu Lin, Ming-Hsuan Yang

Sound localization aims to find the source of the audio signal in the visual scene.

Contrastive Learning

Paper
Add Code

Unveiling The Mask of Position-Information Pattern Through the Mist of Image Features

no code implementations • 2 Jun 2022 • Chieh Hubert Lin, Hsin-Ying Lee, Hung-Yu Tseng, Maneesh Singh, Ming-Hsuan Yang

Recent studies show that paddings in convolutional neural networks encode absolute position information which can negatively affect the model performance for certain tasks.

Position

Paper
Add Code

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations • CVPR 2022 • Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Image Outpainting Image-to-Image Translation

Paper
Add Code

Cross-Modal 3D Shape Generation and Manipulation

no code implementations • 24 Jul 2022 • Zezhou Cheng, Menglei Chai, Jian Ren, Hsin-Ying Lee, Kyle Olszewski, Zeng Huang, Subhransu Maji, Sergey Tulyakov

In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces.

3D Generation 3D Shape Generation

Paper
Add Code

Vector Quantized Image-to-Image Translation

no code implementations • 27 Jul 2022 • Yu-Jie Chen, Shin-I Cheng, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

For example, it provides style variability for image generation and extension, and equips image-to-image translation with further extension capabilities.

Image-to-Image Translation Quantization +1

Paper
Add Code

Adaptively-Realistic Image Generation from Stroke and Sketch with Diffusion Model

no code implementations • 26 Aug 2022 • Shin-I Cheng, Yu-Jie Chen, Wei-Chen Chiu, Hung-Yu Tseng, Hsin-Ying Lee

Generating images from hand-drawings is a crucial and fundamental task in content creation.

Image Generation Translation

Paper
Add Code

ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes

no code implementations • 12 Dec 2022 • Ahmed Abdelreheem, Kyle Olszewski, Hsin-Ying Lee, Peter Wonka, Panos Achlioptas

The two popular datasets ScanRefer [16] and ReferIt3D [3] connect natural language to real-world 3D data.

Sentence Text Generation

Paper
Add Code

DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis

no code implementations • CVPR 2023 • Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov

Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.

3D-Aware Image Synthesis Object

Paper
Add Code

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

no code implementations • CVPR 2023 • Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains.

Paper
Add Code

InfiniCity: Infinite-Scale City Synthesis

no code implementations • ICCV 2023 • Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

Paper
Add Code

Unsupervised Volumetric Animation

no code implementations • CVPR 2023 • Aliaksandr Siarohin, Willi Menapace, Ivan Skorokhodov, Kyle Olszewski, Jian Ren, Hsin-Ying Lee, Menglei Chai, Sergey Tulyakov

We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.

Keypoint Estimation Novel View Synthesis

Paper
Add Code

3D generation on ImageNet

no code implementations • 2 Mar 2023 • Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene.

3D Generation

Paper
Add Code

Text2Tex: Text-driven Texture Synthesis via Diffusion Models

no code implementations • ICCV 2023 • Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We present Text2Tex, a novel method for generating high-quality textures for 3D meshes from the given text prompts.

Texture Synthesis

Paper
Add Code

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

no code implementations • 28 Nov 2023 • Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors.

Texture Synthesis

Paper
Add Code

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

no code implementations • 11 Dec 2023 • Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani

We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information.

Novel View Synthesis

Paper
Add Code

SceneWiz3D: Towards Text-guided 3D Scene Composition

no code implementations • 13 Dec 2023 • Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

We are witnessing significant breakthroughs in the technology for generating 3D objects from text.

Text to 3D

Paper
Add Code

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations • 21 Dec 2023 • Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

Paper
Add Code

Diffusion Priors for Dynamic View Synthesis from Monocular Videos

no code implementations • 10 Jan 2024 • Chaoyang Wang, Peiye Zhuang, Aliaksandr Siarohin, Junli Cao, Guocheng Qian, Hsin-Ying Lee, Sergey Tulyakov

Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos.

Novel View Synthesis

Paper
Add Code

AToM: Amortized Text-to-Mesh using 2D Diffusion

no code implementations • 1 Feb 2024 • Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey Tulyakov

We introduce Amortized Text-to-Mesh (AToM), a feed-forward text-to-mesh framework optimized across multiple text prompts simultaneously.

Text to 3D

Paper
Add Code

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations • 18 Feb 2024 • Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

Paper
Add Code

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code implementations • 29 Feb 2024 • Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Retrieval Text Retrieval +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.