Search Results for author: Xintong Han

Found 35 papers, 16 papers with code

MakeItTalk: Speaker-Aware Talking-Head Animation

3 code implementations27 Apr 2020 Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, DIngzeyu Li

We present a method that generates expressive talking heads from a single facial image with audio as the only input.

Talking Face Generation Talking Head Generation

Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning

2 code implementations CVPR 2019 Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, Matthew R. Scott

A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning.

Image Retrieval Metric Learning +1

VITON: An Image-based Virtual Try-on Network

6 code implementations CVPR 2018 Xintong Han, Zuxuan Wu, Zhe Wu, Ruichi Yu, Larry S. Davis

We present an image-based VIirtual Try-On Network (VITON) without using 3D information in any form, which seamlessly transfers a desired clothing item onto the corresponding region of a person using a coarse-to-fine strategy.

Descriptive Virtual Try-on

Learning Rich Features for Image Manipulation Detection

2 code implementations CVPR 2018 Peng Zhou, Xintong Han, Vlad I. Morariu, Larry S. Davis

Image manipulation detection is different from traditional semantic object detection because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned.

Image Manipulation Image Manipulation Detection +3

Learning Fashion Compatibility with Bidirectional LSTMs

2 code implementations18 Jul 2017 Xintong Han, Zuxuan Wu, Yu-Gang Jiang, Larry S. Davis

To this end, we propose to jointly learn a visual-semantic embedding and the compatibility relationships among fashion items in an end-to-end fashion.

Attribute

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao

To this end, we propose spatially probabilistic diversity normalization (SPDNorm) inside the modulation to model the probability of generating a pixel conditioned on the context information.

Image Inpainting Image Restoration

M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

1 code implementation20 Apr 2021 Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images.

DeepFake Detection Face Swapping +1

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

1 code implementation CVPR 2021 Zhichao Huang, Xintong Han, Jia Xu, Tong Zhang

We present a new method for few-shot human motion transfer that achieves realistic human image generation with only a small number of appearance inputs.

Image Generation

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

1 code implementation CVPR 2021 Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, Jing Liao, Bing Jiang, Wei Liu

While existing methods combine an input image and these low-level controls for CNN inputs, the corresponding feature representations are not sufficient to convey user intentions, leading to unfaithfully generated content.

Texture Synthesis

MotionEditor: Editing Video Motion via Content-Aware Diffusion

1 code implementation30 Nov 2023 Shuyuan Tu, Qi Dai, Zhi-Qi Cheng, Han Hu, Xintong Han, Zuxuan Wu, Yu-Gang Jiang

This mechanism enables the editing branch to query the key and value from the reconstruction branch in a decoupled manner, making the editing branch retain the original background and protagonist appearance.

Video Editing

One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations

1 code implementation14 Oct 2022 Yiming Zhu, Hongyu Liu, Yibing Song, Ziyang Yuan, Xintong Han, Chun Yuan, Qifeng Chen, Jue Wang

Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations.

Attribute Image Manipulation

Generate, Segment and Refine: Towards Generic Manipulation Segmentation

1 code implementation24 Nov 2018 Peng Zhou, Bor-Chun Chen, Xintong Han, Mahyar Najibi, Abhinav Shrivastava, Ser Nam Lim, Larry S. Davis

The advent of image sharing platforms and the easy availability of advanced photo editing software have resulted in a large quantities of manipulated images being shared on the internet.

Detecting Image Manipulation Image Generation +3

Human MotionFormer: Transferring Human Motions with Vision Transformers

1 code implementation22 Feb 2023 Hongyu Liu, Xintong Han, ChengBin Jin, Lihui Qian, Huawei Wei, Zhe Lin, Faqiang Wang, Haoye Dong, Yibing Song, Jia Xu, Qifeng Chen

In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively.

Motion Synthesis

CoverHunter: Cover Song Identification with Refined Attention and Alignments

1 code implementation15 Jun 2023 Feng Liu, Deyi Tuo, Yinan Xu, Xintong Han

Abstract: Cover song identification (CSI) focuses on finding the same music with different versions in reference anchors given a query track.

Cover song identification

ClothFlow: A Flow-Based Model for Clothed Person Generation

1 code implementation ICCV 2019 Xintong Han, Xiaojun Hu, Weilin Huang, Matthew R. Scott

By estimating a dense flow between source and target clothing regions, ClothFlow effectively models the geometric changes and naturally transfers the appearance to synthesize novel images as shown in Figure 1.

Ranked #6 on Virtual Try-on on VITON (SSIM metric)

Image Generation Virtual Try-on

DCAN: Dual Channel-wise Alignment Networks for Unsupervised Scene Adaptation

no code implementations ECCV 2018 Zuxuan Wu, Xintong Han, Yen-Liang Lin, Mustafa Gkhan Uzunbas, Tom Goldstein, Ser Nam Lim, Larry S. Davis

In particular, given an image from the source domain and unlabeled samples from the target domain, the generator synthesizes new images on-the-fly to resemble samples from the target domain in appearance and the segmentation network further refines high-level features before predicting semantic maps, both of which leverage feature statistics of sampled images from the target domain.

Segmentation Semantic Segmentation

NISP: Pruning Networks using Neuron Importance Score Propagation

no code implementations CVPR 2018 Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, Larry S. Davis

In contrast, we argue that it is essential to prune neurons in the entire neuron network jointly based on a unified goal: minimizing the reconstruction error of important responses in the "final response layer" (FRL), which is the second-to-last layer before classification, for a pruned network to retrain its predictive power.

Network Pruning

VRFP: On-the-fly Video Retrieval using Web Images and Fast Fisher Vector Products

no code implementations10 Dec 2015 Xintong Han, Bharat Singh, Vlad I. Morariu, Larry S. Davis

VRFP is a real-time video retrieval framework based on short text input queries, which obtains weakly labeled training images from the web after the query is known.

Re-Ranking Retrieval +2

Son of Zorn's Lemma: Targeted Style Transfer Using Instance-aware Semantic Segmentation

no code implementations9 Jan 2017 Carlos Castillo, Soham De, Xintong Han, Bharat Singh, Abhay Kumar Yadav, Tom Goldstein

This work considers targeted style transfer, in which the style of a template image is used to alter only part of a target image.

LEMMA Object +2

Selecting Relevant Web Trained Concepts for Automated Event Retrieval

no code implementations ICCV 2015 Bharat Singh, Xintong Han, Zhe Wu, Vlad I. Morariu, Larry S. Davis

Given a text description of an event, event retrieval is performed by selecting concepts linguistically related to the event description and fusing the concept responses on unseen videos.

Domain Adaptation Retrieval

Tree-based Visualization and Optimization for Image Collection

no code implementations17 Jul 2015 Xintong Han, Chongyang Zhang, Weiyao Lin, Mingliang Xu, Bin Sheng, Tao Mei

The visualization of an image collection is the process of displaying a collection of images on a screen under some specific layout requirements.

Enhancing HEVC Compressed Videos with a Partition-masked Convolutional Neural Network

no code implementations10 May 2018 Xiaoyi He, Qiang Hu, Xintong Han, Xiaoyun Zhang, Chongyang Zhang, Weiyao Lin

In this paper, we propose a partition-masked Convolution Neural Network (CNN) to achieve compressed-video enhancement for the state-of-the-art coding standard, High Efficiency Video Coding (HECV).

Multimedia

Compatible and Diverse Fashion Image Inpainting

no code implementations4 Feb 2019 Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis

The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments.

Fashion Synthesis Image Inpainting

FiNet: Compatible and Diverse Fashion Image Inpainting

no code implementations ICCV 2019 Xintong Han, Zuxuan Wu, Weilin Huang, Matthew R. Scott, Larry S. Davis

The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments.

Fashion Synthesis Image Inpainting

iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection

no code implementations9 Mar 2020 Chenfan Zhuang, Xintong Han, Weilin Huang, Matthew R. Scott

We propose Image-Instance Full Alignment Networks (iFAN) to tackle this problem by precisely aligning feature distributions on both image and instance levels: 1) Image-level alignment: multi-scale features are roughly aligned by training adversarial domain classifiers in a hierarchically-nested fashion.

Domain Adaptation Metric Learning +3

Channel Interaction Networks for Fine-Grained Image Categorization

no code implementations AAAI-2020 2020 Yu Gao, Xintong Han, Xun Wang, Weilin Huang, Matthew R. Scott

Fine-grained image categorization is challenging due to the subtle inter-class differences. We posit that exploiting the rich relationships between channels can help capture such differences since different channels correspond to different semantics.

Image Categorization Metric Learning

Learning 3D Face Reconstruction with a Pose Guidance Network

no code implementations9 Oct 2020 Pengpeng Liu, Xintong Han, Michael Lyu, Irwin King, Jia Xu

We present a self-supervised learning approach to learning monocular 3D face reconstruction with a pose guidance network (PGN).

3D Face Reconstruction Pose Estimation +1

Action-guided 3D Human Motion Prediction

no code implementations NeurIPS 2021 Jiangxin Sun, Zihang Lin, Xintong Han, Jian-Fang Hu, Jia Xu, Wei-Shi Zheng

The ability of forecasting future human motion is important for human-machine interaction systems to understand human behaviors and make interaction.

Human motion prediction motion prediction

ObjectFormer for Image Manipulation Detection and Localization

no code implementations CVPR 2022 Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Abhinav Shrivastava, Ser-Nam Lim, Yu-Gang Jiang

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection.

Image Manipulation Image Manipulation Detection

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

no code implementations13 Mar 2023 Haoran Chen, Zuxuan Wu, Xintong Han, Menglin Jia, Yu-Gang Jiang

Such a trade-off is referred to as the stabilityplasticity dilemma and is a more general and challenging problem for continual learning.

Class Incremental Learning Incremental Learning

XFormer: Fast and Accurate Monocular 3D Body Capture

no code implementations18 May 2023 Lihui Qian, Xintong Han, Faqiang Wang, Hongyu Liu, Haoye Dong, Zhiwen Li, Huawei Wei, Zhe Lin, Cheng-Bin Jin

We present XFormer, a novel human mesh and motion capture method that achieves real-time performance on consumer CPUs given only monocular images as input.

3D Human Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.