Search Results for author: Fengyu Yang

Found 14 papers, 7 papers with code

WorDepth: Variational Language Prior for Monocular Depth Estimation

1 code implementation4 Apr 2024 Ziyao Zeng, Daniel Wang, Fengyu Yang, Hyoungseob Park, Yangchao Wu, Stefano Soatto, Byung-Woo Hong, Dong Lao, Alex Wong

To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene.

3D Reconstruction Monocular Depth Estimation

APISR: Anime Production Inspired Real-World Anime Super-Resolution

1 code implementation3 Mar 2024 Boyang Wang, Fengyu Yang, Xihang Yu, Chao Zhang, Hanbin Zhao

In addition, we identify two anime-specific challenges of distorted and faint hand-drawn lines and unwanted color artifacts.

Super-Resolution

VCISR: Blind Single Image Super-Resolution with Video Compression Synthetic Data

1 code implementation2 Nov 2023 Boyang Wang, Bowen Liu, Shiyu Liu, Fengyu Yang

In this work, we for the first time, present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task.

Image Compression Image Super-Resolution +3

Generating Visual Scenes from Touch

no code implementations ICCV 2023 Fengyu Yang, Jiacheng Zhang, Andrew Owens

An emerging line of work has sought to generate plausible imagery from touch.

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions

1 code implementation10 Sep 2023 Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Yanqing Jing, Ruimao Zhang

To facilitate the development of 3D pose estimation, we present FreeMan, the first large-scale, multi-view dataset collected under the real-world conditions.

3D Human Pose Estimation 3D Pose Estimation +1

Boosting Detection in Crowd Analysis via Underutilized Output Features

1 code implementation CVPR 2023 Shaokai Wu, Fengyu Yang

Detection-based methods have been viewed unfavorably in crowd analysis due to their poor performance in dense crowds.

Crowd Counting

Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models

1 code implementation23 Aug 2023 Siyue Yao, MingJie Sun, Bingliang Li, Fengyu Yang, Junle Wang, Ruimao Zhang

In this paper, we introduce a novel multi-dancer synthesis task called partner dancer generation, which involves synthesizing virtual human dancers capable of performing dance with users.

Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model

1 code implementation20 May 2023 Jie Yang, Bingliang Li, Fengyu Yang, Ailing Zeng, Lei Zhang, Ruimao Zhang

Extensive experiments demonstrate that DiffHOI significantly outperforms the state-of-the-art in regular detection (i. e., 41. 50 mAP) and zero-shot detection.

Ranked #2 on Zero-Shot Human-Object Interaction Detection on HICO-DET (using extra training data)

Human-Object Interaction Detection Zero-Shot Human-Object Interaction Detection

Improve Bilingual TTS Using Dynamic Language and Phonology Embedding

no code implementations7 Dec 2022 Fengyu Yang, Jian Luan, Yujun Wang

We introduce phonology embedding to capture the English differences between different phonology.

Touch and Go: Learning from Human-Collected Vision and Touch

no code implementations22 Nov 2022 Fengyu Yang, Chenyang Ma, Jiacheng Zhang, Jing Zhu, Wenzhen Yuan, Andrew Owens

The ability to associate touch with sight is essential for tasks that require physically interacting with objects in the world.

Image Stylization

RBC: Rectifying the Biased Context in Continual Semantic Segmentation

no code implementations16 Mar 2022 Hanbin Zhao, Fengyu Yang, Xinghe Fu, Xi Li

In practice, new images are usually made available in a consecutive manner, leading to a problem called Continual Semantic Segmentation (CSS).

Continual Semantic Segmentation Segmentation +1

Sparse and Complete Latent Organization for Geospatial Semantic Segmentation

no code implementations CVPR 2022 Fengyu Yang, Chenyang Ma

In particular, to enhance the sparsity of the latent space, we design a prototypical contrastive learning to have prototypes of the same category clustering together and prototypes of different categories to be far away from each other.

Contrastive Learning Semantic Segmentation

Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion

no code implementations16 Jun 2021 Zhichao Wang, Xinyong Zhou, Fengyu Yang, Tao Li, Hongqiang Du, Lei Xie, Wendong Gan, Haitao Chen, Hai Li

Specifically, prosodic features are used to explicit model prosody, while VAE and reference encoder are used to implicitly model prosody, which take Mel spectrum and bottleneck feature as input respectively.

Style Transfer Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.