Search Results for author: Sergey Tulyakov

Found 75 papers, 34 papers with code

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

no code implementations17 Apr 2024 Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman

MoA is designed to retain the original model's prior by fixing its attention layers in the prior branch, while minimally intervening in the generation process with the personalized branch that learns to embed subjects in the layout and context generated by the prior branch.

Disentanglement Image Generation

TextCraftor: Your Text Encoder Can be Image Quality Controller

no code implementations27 Mar 2024 Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren

Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments.

Image Generation

MyVLM: Personalizing VLMs for User-Specific Queries

no code implementations21 Mar 2024 Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or

To effectively recognize a variety of user-specific concepts, we augment the VLM with external concept heads that function as toggles for the model, enabling the VLM to identify the presence of specific target concepts in a given image.

Image Captioning Language Modelling +2

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

no code implementations29 Feb 2024 Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation.

Retrieval Text Retrieval +3

Evaluating Very Long-Term Conversational Memory of LLM Agents

no code implementations27 Feb 2024 Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang

Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions.

Avg Multi-modal Dialogue Generation +1

Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

no code implementations22 Feb 2024 Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.

Image Generation Text-to-Video Generation +1

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

no code implementations18 Feb 2024 Tanzila Rahman, Shweta Mahajan, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Leonid Sigal

We illustrate that such joint alternating refinement leads to the learning of better tokens for concepts and, as a bi-product, latent masks.

Image Generation

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

no code implementations11 Jan 2024 Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs).

Image-to-Image Translation

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations21 Dec 2023 Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

UpFusion: Novel View Diffusion from Unposed Sparse View Observations

no code implementations11 Dec 2023 Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov, Shubham Tulsiani

We propose UpFusion, a system that can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images without corresponding pose information.

Novel View Synthesis

4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

1 code implementation29 Nov 2023 Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes.

SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors

no code implementations28 Nov 2023 Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee, Sergey Tulyakov, Matthias Nießner

We propose SceneTex, a novel method for effectively generating high-quality and style-consistent textures for indoor scenes using depth-to-image diffusion priors.

Decoder Texture Synthesis

HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion

no code implementations12 Oct 2023 Xian Liu, Jian Ren, Aliaksandr Siarohin, Ivan Skorokhodov, Yanyu Li, Dahua Lin, Xihui Liu, Ziwei Liu, Sergey Tulyakov

Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network, where each branch in the model complements to each other with both structural awareness and textural richness.

Image Generation

AutoDecoding Latent 3D Diffusion Models

1 code implementation NeurIPS 2023 Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc van Gool, Sergey Tulyakov

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.

Text-Guided Synthesis of Eulerian Cinemagraphs

1 code implementation6 Jul 2023 Aniruddha Mahapatra, Aliaksandr Siarohin, Hsin-Ying Lee, Sergey Tulyakov, Jun-Yan Zhu

We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images.

Image Animation

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

1 code implementation30 Jun 2023 Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem

We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors.

Image to 3D

Promptable Game Models: Text-Guided Game Simulation via Masked Diffusion Models

no code implementations23 Mar 2023 Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, Elisa Ricci

Most captivatingly, our PGM unlocks the director's mode, where the game is played by specifying goals for the agents in the form of a prompt.


3D generation on ImageNet

no code implementations2 Mar 2023 Ivan Skorokhodov, Aliaksandr Siarohin, Yinghao Xu, Jian Ren, Hsin-Ying Lee, Peter Wonka, Sergey Tulyakov

Existing 3D-from-2D generators are typically designed for well-curated single-category datasets, where all the objects have (approximately) the same scale, 3D location, and orientation, and the camera always points to the center of the scene.

3D Generation

Invertible Neural Skinning

no code implementations CVPR 2023 Yash Kant, Aliaksandr Siarohin, Riza Alp Guler, Menglei Chai, Jian Ren, Sergey Tulyakov, Igor Gilitschenski

Next, we combine PIN with a differentiable LBS module to build an expressive and end-to-end Invertible Neural Skinning (INS) pipeline.

InfiniCity: Infinite-Scale City Synthesis

no code implementations ICCV 2023 Chieh Hubert Lin, Hsin-Ying Lee, Willi Menapace, Menglei Chai, Aliaksandr Siarohin, Ming-Hsuan Yang, Sergey Tulyakov

Toward infinite-scale 3D city synthesis, we propose a novel framework, InfiniCity, which constructs and renders an unconstrainedly large and 3D-grounded environment from random noises.

Image Generation Neural Rendering

3DAvatarGAN: Bridging Domains for Personalized Editable Avatars

no code implementations CVPR 2023 Rameen Abdal, Hsin-Ying Lee, Peihao Zhu, Menglei Chai, Aliaksandr Siarohin, Peter Wonka, Sergey Tulyakov

Finally, we propose a novel inversion method for 3D-GANs linking the latent spaces of the source and the target domains.

Real-Time Neural Light Field on Mobile Devices

1 code implementation CVPR 2023 Junli Cao, Huan Wang, Pavlo Chemerys, Vladislav Shakhrai, Ju Hu, Yun Fu, Denys Makoviichuk, Sergey Tulyakov, Jian Ren

Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly.

Neural Rendering Novel View Synthesis

Rethinking Vision Transformers for MobileNet Size and Speed

6 code implementations ICCV 2023 Yanyu Li, Ju Hu, Yang Wen, Georgios Evangelidis, Kamyar Salahi, Yanzhi Wang, Sergey Tulyakov, Jian Ren

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices.

LADIS: Language Disentanglement for 3D Shape Editing

1 code implementation9 Dec 2022 IAn Huang, Panos Achlioptas, Tianyi Zhang, Sergey Tulyakov, Minhyuk Sung, Leonidas Guibas

Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision.


SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

1 code implementation CVPR 2023 Yen-Chi Cheng, Hsin-Ying Lee, Sergey Tulyakov, Alexander Schwing, LiangYan Gui

To enable interactive generation, our method supports a variety of input modalities that can be easily provided by a human, including images, text, partially observed shapes and combinations of these, further allowing to adjust the strength of each input.

3D Reconstruction 3D Shape Generation +3

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

1 code implementation CVPR 2023 Tanzila Rahman, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Shweta Mahajan, Leonid Sigal

Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, which are consistent with the story, but also models appropriate correspondences between the characters and the background.

Sentence Story Generation +1

Affection: Learning Affective Explanations for Real-World Visual Data

no code implementations CVPR 2023 Panos Achlioptas, Maks Ovsjanikov, Leonidas Guibas, Sergey Tulyakov

To embark on this journey, we introduce and share with the research community a large-scale dataset that contains emotional reactions and free-form textual explanations for 85, 007 publicly available images, analyzed by 6, 283 annotators who were asked to indicate and explain how and why they felt in a particular way when observing a specific image, producing a total of 526, 749 responses.

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

1 code implementation22 Sep 2022 Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong Tang, Yanzhi Wang, Jian Ren

Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs.

Cross-Modal 3D Shape Generation and Manipulation

no code implementations24 Jul 2022 Zezhou Cheng, Menglei Chai, Jian Ren, Hsin-Ying Lee, Kyle Olszewski, Zeng Huang, Subhransu Maji, Sergey Tulyakov

In this paper, we propose a generic multi-modal generative model that couples the 2D modalities and implicit 3D representations through shared latent spaces.

3D Generation 3D Shape Generation

EpiGRAF: Rethinking training of 3D GANs

1 code implementation21 Jun 2022 Ivan Skorokhodov, Sergey Tulyakov, Yiqun Wang, Peter Wonka

In this work, we show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise.

3D-Aware Image Synthesis

Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation

1 code implementation15 Jun 2022 Ye Zhu, Yu Wu, Kyle Olszewski, Jian Ren, Sergey Tulyakov, Yan Yan

Diffusion probabilistic models (DPMs) have become a popular approach to conditional generation, due to their promising results and support for cross-modal synthesis.

Contrastive Learning Denoising +2

EfficientFormer: Vision Transformers at MobileNet Speed

10 code implementations2 Jun 2022 Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.

Control-NeRF: Editable Feature Volumes for Scene Rendering and Manipulation

no code implementations22 Apr 2022 Verica Lazova, Vladimir Guzov, Kyle Olszewski, Sergey Tulyakov, Gerard Pons-Moll

With the aim of obtaining interpretable and controllable scene representations, our model couples learnt scene-specific feature volumes with a scene agnostic neural rendering network.

Neural Rendering Novel View Synthesis

Quantized GAN for Complex Music Generation from Dance Videos

1 code implementation1 Apr 2022 Ye Zhu, Kyle Olszewski, Yu Wu, Panos Achlioptas, Menglei Chai, Yan Yan, Sergey Tulyakov

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos.

Music Generation

R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis

1 code implementation31 Mar 2022 Huan Wang, Jian Ren, Zeng Huang, Kyle Olszewski, Menglei Chai, Yun Fu, Sergey Tulyakov

On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching.

Novel View Synthesis

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

1 code implementation CVPR 2022 Ligong Han, Jian Ren, Hsin-Ying Lee, Francesco Barbieri, Kyle Olszewski, Shervin Minaee, Dimitris Metaxas, Sergey Tulyakov

In addition, our model can extract visual information as suggested by the text prompt, e. g., "an object in image one is moving northeast", and generate corresponding videos.

Self-Learning Text Augmentation +1

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

1 code implementation ICLR 2022 Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov

Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.


NeROIC: Neural Rendering of Objects from Online Image Collections

1 code implementation7 Jan 2022 Zhengfei Kuang, Kyle Olszewski, Menglei Chai, Zeng Huang, Panos Achlioptas, Sergey Tulyakov

We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects from photographs with varying cameras, illumination, and backgrounds.

Neural Rendering Novel View Synthesis +1

InOut: Diverse Image Outpainting via GAN Inversion

no code implementations CVPR 2022 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation

StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2

1 code implementation CVPR 2022 Ivan Skorokhodov, Sergey Tulyakov, Mohamed Elhoseiny

We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality.

Video Generation

Motion Representations for Articulated Animation

2 code implementations CVPR 2021 Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov

To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space.

Object Video Reconstruction

In&Out : Diverse Image Outpainting via GAN Inversion

no code implementations1 Apr 2021 Yen-Chi Cheng, Chieh Hubert Lin, Hsin-Ying Lee, Jian Ren, Sergey Tulyakov, Ming-Hsuan Yang

Existing image outpainting methods pose the problem as a conditional image-to-image translation task, often generating repetitive structures and textures by replicating the content available in the input image.

Image Outpainting Image-to-Image Translation +1

SMIL: Multimodal Learning with Severely Missing Modality

1 code implementation9 Mar 2021 Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, Xi Peng

A common assumption in multimodal learning is the completeness of training data, i. e., full modalities are available in all training examples.


Teachers Do More Than Teach: Compressing Image-to-Image Models

1 code implementation CVPR 2021 Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov

In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation.

Knowledge Distillation

MichiGAN: Multi-Input-Conditioned Hair Image Generation for Portrait Editing

1 code implementation30 Oct 2020 Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, Nenghai Yu

In this paper, we present MichiGAN (Multi-Input-Conditioned Hair Image GAN), a novel conditional image generation method for interactive portrait hair manipulation.

Conditional Image Generation

Interactive Video Stylization Using Few-Shot Patch-Based Training

2 code implementations29 Apr 2020 Ondřej Texler, David Futschik, Michal Kučera, Ondřej Jamriška, Šárka Sochorová, Menglei Chai, Sergey Tulyakov, Daniel Sýkora

In this paper, we present a learning-based method to the keyframe-based video stylization that allows an artist to propagate the style from a few selected keyframes to the rest of the sequence.

Style Transfer Translation +1

Neural Hair Rendering

no code implementations ECCV 2020 Menglei Chai, Jian Ren, Sergey Tulyakov

Unlike existing supervised translation methods that require model-level similarity to preserve consistent structure representation for both real images and fake renderings, our method adopts an unsupervised solution to work on arbitrary hair models.


Human Motion Transfer from Poses in the Wild

no code implementations7 Apr 2020 Jian Ren, Menglei Chai, Sergey Tulyakov, Chen Fang, Xiaohui Shen, Jianchao Yang

In this paper, we tackle the problem of human motion transfer, where we synthesize novel motion video for a target person that imitates the movement from a reference video.


Task-Assisted Domain Adaptation with Anchor Tasks

no code implementations16 Aug 2019 Zhizhong Li, Linjie Luo, Sergey Tulyakov, Qieyun Dai, Derek Hoiem

Our key idea to improve domain adaptation is to introduce a separate anchor task (such as facial landmarks) whose annotations can be obtained at no cost or are already available on both synthetic and real datasets.

Depth Estimation Domain Adaptation +2

Transformable Bottleneck Networks

1 code implementation ICCV 2019 Kyle Olszewski, Sergey Tulyakov, Oliver Woodford, Hao Li, Linjie Luo

We propose a novel approach to performing fine-grained 3D manipulation of image content via a convolutional neural network, which we call the Transformable Bottleneck Network (TBN).

3D Reconstruction Decoder +1

Train One Get One Free: Partially Supervised Neural Network for Bug Report Duplicate Detection and Clustering

no code implementations NAACL 2019 Lahari Poddar, Leonardo Neves, William Brendel, Luis Marujo, Sergey Tulyakov, Pradeep Karuturi

Leveraging the assumption that learning the topic of a bug is a sub-task for detecting duplicates, we design a loss function that can jointly perform both tasks but needs supervision for only duplicate classification, achieving topic clustering in an unsupervised fashion.

Clustering General Classification

3D Guided Fine-Grained Face Manipulation

no code implementations CVPR 2019 Zhenglin Geng, Chen Cao, Sergey Tulyakov

This is achieved by first fitting a 3D face model and then disentangling the face into a texture and a shape.

Face Model

Hybrid VAE: Improving Deep Generative Models using Partial Observations

no code implementations30 Nov 2017 Sergey Tulyakov, Andrew Fitzgibbon, Sebastian Nowozin

We show that such a combination is beneficial because the unlabeled data acts as a data-driven form of regularization, allowing generative models trained on few labeled samples to reach the performance of fully-supervised generative models trained on much larger datasets.

Self-Adaptive Matrix Completion for Heart Rate Estimation From Face Videos Under Realistic Conditions

no code implementations CVPR 2016 Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F. Cohn, Nicu Sebe

Recent studies in computer vision have shown that, while practically invisible to a human observer, skin color changes due to blood flow can be captured on face videos and, surprisingly, be used to estimate the heart rate (HR).

Heart rate estimation Matrix Completion

Regressing a 3D Face Shape From a Single Image

no code implementations ICCV 2015 Sergey Tulyakov, Nicu Sebe

To support the ability of our method to reliably reconstruct 3D shapes, we introduce a simple method for head pose estimation using a single image that reaches higher accuracy than the state of the art.

Head Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.