Search Results for author: Zhaoyang Huang

Found 22 papers, 9 papers with code

Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models

no code implementations1 May 2024 Xiaoshi Wu, Yiming Hao, Manyuan Zhang, Keqiang Sun, Zhaoyang Huang, Guanglu Song, Yu Liu, Hongsheng Li

In this study, we propose Deep Reward Tuning (DRTune), an algorithm that directly supervises the final output image of a text-to-image diffusion model and back-propagates through the iterative sampling process to the input noise.

Denoising

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

1 code implementation20 Mar 2024 Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-specific patterns of the source video and the image/video generative prior for effective outpainting.

InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

1 code implementation30 Nov 2023 Rongyao Fang, Shilin Yan, Zhaoyang Huang, Jingqiu Zhou, Hao Tian, Jifeng Dai, Hongsheng Li

In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual data.

Image Captioning Referring Expression +2

FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow

no code implementations8 Jun 2023 Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei Qin, Jifeng Dai, Xiaogang Wang, Hongsheng Li

This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation.

Decoder Optical Flow Estimation

Context-PIPs: Persistent Independent Particles Demands Spatial Context Features

no code implementations3 Jun 2023 Weikang Bian, Zhaoyang Huang, Xiaoyu Shi, Yitong Dong, Yijin Li, Hongsheng Li

We tackle the problem of Persistent Independent Particles (PIPs), also called Tracking Any Point (TAP), in videos, which specifically aims at estimating persistent long-term trajectories of query points in videos.

Point Tracking

DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation

1 code implementation CVPR 2024 Xiaoliang Ju, Zhaoyang Huang, Yijin Li, Guofeng Zhang, Yu Qiao, Hongsheng Li

In addition to the scene generation, the final part of DiffInDScene can be used as a post-processing module to refine the 3D reconstruction results from multi-view stereo.

3D Generation 3D Reconstruction +1

VideoFlow: Exploiting Temporal Cues for Multi-frame Optical Flow Estimation

1 code implementation ICCV 2023 Xiaoyu Shi, Zhaoyang Huang, Weikang Bian, Dasong Li, Manyuan Zhang, Ka Chun Cheung, Simon See, Hongwei Qin, Jifeng Dai, Hongsheng Li

We first propose a TRi-frame Optical Flow (TROF) module that estimates bi-directional optical flows for the center frame in a three-frame manner.

Optical Flow Estimation

PATS: Patch Area Transportation with Subdivision for Local Feature Matching

no code implementations CVPR 2023 Junjie Ni, Yijin Li, Zhaoyang Huang, Hongsheng Li, Hujun Bao, Zhaopeng Cui, Guofeng Zhang

However, estimating scale differences between these patches is non-trivial since the scale differences are determined by both relative camera poses and scene structures, and thus spatially varying over image pairs.

Graph Matching Optical Flow Estimation +2

CGOF++: Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

no code implementations23 Nov 2022 Keqiang Sun, Shangzhe Wu, Ning Zhang, Zhaoyang Huang, Quan Wang, Hongsheng Li

Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e. g., controlling the shapes, expressions, textures, and poses of the generated face images.

Face Generation

NeuralMarker: A Framework for Learning General Marker Correspondence

no code implementations19 Sep 2022 Zhaoyang Huang, Xiaokun Pan, Weihong Pan, Weikang Bian, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker.

Video Editing

Controllable 3D Face Synthesis with Conditional Generative Occupancy Fields

no code implementations16 Jun 2022 Keqiang Sun, Shangzhe Wu, Zhaoyang Huang, Ning Zhang, Quan Wang, Hongsheng Li

Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e. g., controlling the shapes, expressions, textures, and poses of the generated face images.

Face Generation

FlowFormer: A Transformer Architecture for Optical Flow

1 code implementation30 Mar 2022 Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li

We introduce optical Flow transFormer, dubbed as FlowFormer, a transformer-based neural network architecture for learning optical flow.

Decoder Optical Flow Estimation

VS-Net: Voting with Segmentation for Visual Localization

1 code implementation CVPR 2021 Zhaoyang Huang, Han Zhou, Yijin Li, Bangbang Yang, Yan Xu, Xiaowei Zhou, Hujun Bao, Guofeng Zhang, Hongsheng Li

To address this problem, we propose a novel visual localization framework that establishes 2D-to-3D correspondences between the query image and the 3D map with a series of learnable scene-specific landmarks.

Segmentation Semantic Segmentation +1

LIFE: Lighting Invariant Flow Estimation

no code implementations7 Apr 2021 Zhaoyang Huang, Xiaokun Pan, Runsen Xu, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.

R-SAC: Reinforcement Sample Consensus

no code implementations CUHK Course IERG5350 2020 Zhaoyang Huang, Yan Xu

In contrast, a model estimated from more observations may be better than from a minimum set.

SelfVoxeLO: Self-supervised LiDAR Odometry with Voxel-based Deep Neural Networks

no code implementations19 Oct 2020 Yan Xu, Zhaoyang Huang, Kwan-Yee Lin, Xinge Zhu, Jianping Shi, Hujun Bao, Guofeng Zhang, Hongsheng Li

To suit our network to self-supervised learning, we design several novel loss functions that utilize the inherent properties of LiDAR point clouds.

Self-Supervised Learning

Prior Guided Dropout for Robust Visual Localization in Dynamic Environments

1 code implementation ICCV 2019 Zhaoyang Huang, Yan Xu, Jianping Shi, Xiaowei Zhou, Hujun Bao, Guofeng Zhang

Additionally, the dropout module enables the pose regressor to output multiple hypotheses from which the uncertainty of pose estimates can be quantified and leveraged in the following uncertainty-aware pose-graph optimization to improve the robustness further.

Camera Localization Uncertainty Quantification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.