Search Results for author: Kwan-Yee K. Wong

Found 52 papers, 26 papers with code

What is Learned in Deep Uncalibrated Photometric Stereo?

no code implementations ECCV 2020 Guan-Ying Chen, Michael Waechter, Boxin Shi, Kwan-Yee K. Wong, Yasuyuki Matsushita

Based on this insight, we propose a guided calibration network, named GCNet, that explicitly leverages object shape and shading information for improved lighting estimation.

Lighting Estimation Surface Normal Estimation

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

1 code implementation9 Jun 2025 Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, WangMeng Zuo, Ziwei Liu, Kwan-Yee K. Wong

Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-driven visual generation.

Attribute

SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

no code implementations27 Apr 2025 Jiaqi Chen, Bang Zhang, Ruotian Ma, Peisong Wang, Xiaodan Liang, Zhaopeng Tu, Xiaolong Li, Kwan-Yee K. Wong

Evaluating the step-by-step reliability of large language model (LLM) reasoning, such as Chain-of-Thought, remains challenging due to the difficulty and cost of obtaining high-quality step-level supervision.

Large Language Model Mathematical Reasoning

VipDiff: Towards Coherent and Diverse Video Inpainting via Training-free Denoising Diffusion Models

no code implementations21 Jan 2025 Chaohao Xie, Kai Han, Kwan-Yee K. Wong

Recently, diffusion models have demonstrated impressive performance in generating diverse and high-quality images, and have been exploited in a number of works for image inpainting.

Denoising Image Inpainting +2

DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving

no code implementations CVPR 2025 Zhenhua Xu, Yan Bai, Yujia Zhang, Zhuoling Li, Fei Xia, Kwan-Yee K. Wong, Jianqiang Wang, Hengshuang Zhao

Multimodal large language models (MLLMs) possess the ability to comprehend visual images or videos, and show impressive reasoning ability thanks to the vast amounts of pretrained knowledge, making them highly suitable for autonomous driving applications.

Autonomous Driving CARLA longest6 +4

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

no code implementations25 Oct 2024 Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong

Our key contributions include a dynamic feature reuse strategy that preserves both feature distinction and temporal continuity, and CFG-Cache which optimizes the reuse of conditional and unconditional outputs to further enhance inference speed without compromising video quality.

Video Generation

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

1 code implementation18 Oct 2024 Shaozhe Hao, Xuantong Liu, Xianbiao Qi, Shihao Zhao, Bojia Zi, Rong Xiao, Kai Han, Kwan-Yee K. Wong

We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities.

Conditional Image Generation Image Inpainting +3

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

no code implementations9 Oct 2024 Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K. Wong, Ziwei Liu

2) For the ''how'' challenge, we introduce correspondence-aware motion optimization that constructs motion fields for both human and object models using the linear blend skinning function from SMPL-X.

Human-Object Interaction Detection Human-Object Interaction Generation +2

ArtiFade: Learning to Generate High-quality Subject from Blemished Images

no code implementations CVPR 2025 Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong

In this paper, we introduce ArtiFade to tackle this issue and successfully generate high-quality artifact-free images from blemished datasets.

Text to Image Generation Text-to-Image Generation

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

1 code implementation9 Jul 2024 Shaozhe Hao, Kai Han, Zhengyao Lv, Shihao Zhao, Kwan-Yee K. Wong

To achieve this, we present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.

Text to Image Generation Text-to-Image Generation

Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

2 code implementations12 Mar 2024 Shihao Zhao, Shaozhe Hao, Bojia Zi, Huaizhe xu, Kwan-Yee K. Wong

In this paper, we explore this objective and propose LaVi-Bridge, a pipeline that enables the integration of diverse pre-trained language models and generative vision models for text-to-image generation.

Language Modelling Text to Image Generation +1

PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis

1 code implementation CVPR 2024 Zhengyao Lv, Yuxiang Wei, WangMeng Zuo, Kwan-Yee K. Wong

Extensive experiments demonstrate that our approach performs favorably in terms of visual quality, semantic consistency, and layout alignment.

Image Generation

MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation

no code implementations14 Jan 2024 Jiaqi Chen, Bingqian Lin, ran Xu, Zhenhua Chai, Xiaodan Liang, Kwan-Yee K. Wong

Embodied agents equipped with GPT as their brains have exhibited extraordinary decision-making and generalization abilities across various tasks.

Decision Making Vision and Language Navigation

DiffusionMat: Alpha Matting as Sequential Refinement Learning

no code implementations22 Nov 2023 Yangyang Xu, Shengfeng He, Wenqi Shao, Kwan-Yee K. Wong, Yu Qiao, Ping Luo

In this paper, we introduce DiffusionMat, a novel image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.

Denoising Image Matting

Guide3D: Create 3D Avatars from Text and Image Guidance

no code implementations18 Aug 2023 Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

To this end, we introduce Guide3D, a zero-shot text-and-image-guided generative model for 3D avatar generation based on diffusion models.

3D Generation Text to 3D +1

RIGID: Recurrent GAN Inversion and Editing of Real Face Videos

no code implementations ICCV 2023 Yangyang Xu, Shengfeng He, Kwan-Yee K. Wong, Ping Luo

In this paper, we propose a unified recurrent framework, named \textbf{R}ecurrent v\textbf{I}deo \textbf{G}AN \textbf{I}nversion and e\textbf{D}iting (RIGID), to explicitly and simultaneously enforce temporally coherent GAN inversion and facial editing of real videos.

Attribute Facial Editing +1

Semi-supervised Cycle-GAN for face photo-sketch translation in the wild

no code implementations18 Jul 2023 Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong

Experiments show that SCG achieves competitive performance on public benchmarks and superior results on photos in the wild.

Translation

ViCo: Plug-and-play Visual Condition for Personalized Text-to-image Generation

1 code implementation1 Jun 2023 Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong

Personalized text-to-image generation using diffusion models has recently emerged and garnered significant interest.

Text to Image Generation Text-to-Image Generation

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

1 code implementation NeurIPS 2023 Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions.

All

CiPR: An Efficient Framework with Cross-instance Positive Relations for Generalized Category Discovery

1 code implementation14 Apr 2023 Shaozhe Hao, Kai Han, Kwan-Yee K. Wong

GCD considers the open-world problem of automatically clustering a partially labelled dataset, in which the unlabelled data may contain instances from both novel categories and labelled classes.

Clustering Contrastive Learning +1

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

1 code implementation CVPR 2024 Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. Wong

We present DreamAvatar, a text-and-shape guided framework for generating high-quality 3D human avatars with controllable poses.

NeRF

SeSDF: Self-evolved Signed Distance Field for Implicit 3D Clothed Human Reconstruction

no code implementations CVPR 2023 Yukang Cao, Kai Han, Kwan-Yee K. Wong

We propose a flexible framework which, by leveraging the parametric SMPL-X model, can take an arbitrary number of input images to reconstruct a clothed human model under an uncalibrated setting.

S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint

no code implementations17 Oct 2022 Wenqi Yang, GuanYing Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

Different from existing single-view methods which can only recover a 2. 5D scene representation (i. e., a normal / depth map for the visible surface), our method learns a neural reflectance field to represent the 3D geometry and BRDFs of a scene.

3D geometry NeRF +1

PS-NeRF: Neural Inverse Rendering for Multi-view Photometric Stereo

no code implementations23 Jul 2022 Wenqi Yang, GuanYing Chen, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

It then jointly optimizes the surface normals, spatially-varying BRDFs, and lights based on a shadow-aware differentiable rendering layer.

Inverse Rendering NeRF +1

JIFF: Jointly-aligned Implicit Face Function for High Quality Single View Clothed Human Reconstruction

no code implementations CVPR 2022 Yukang Cao, GuanYing Chen, Kai Han, Wenqi Yang, Kwan-Yee K. Wong

In this paper, we focus on improving the quality of face in the reconstruction and propose a novel Jointly-aligned Implicit Face Function (JIFF) that combines the merits of the implicit function based approach and model based approach.

3D Human Reconstruction Face Model +1

A Unified Framework for Masked and Mask-Free Face Recognition via Feature Rectification

1 code implementation15 Feb 2022 Shaozhe Hao, Chaofeng Chen, Zhenfang Chen, Kwan-Yee K. Wong

We introduce rectification blocks to rectify features extracted by a state-of-the-art recognition model, in both spatial and channel dimensions, to minimize the distance between a masked face and its mask-free counterpart in the rectified feature space.

Face Recognition

Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel

1 code implementation CVPR 2022 Zongsheng Yue, Qian Zhao, Jianwen Xie, Lei Zhang, Deyu Meng, Kwan-Yee K. Wong

To address the above issues, this paper proposes a model-based blind SISR method under the probabilistic framework, which elaborately models image degradation from the perspectives of noise and blur kernel.

Image Super-Resolution

Dense Reconstruction of Transparent Objects by Altering Incident Light Paths Through Refraction

no code implementations20 May 2021 Kai Han, Kwan-Yee K. Wong, Miaomiao Liu

We present a simple setup that allows us to alter the incident light paths before light rays enter the object by immersing the object partially in a liquid, and develop a method for recovering the object surface through reconstructing and triangulating such incident light paths.

Object Surface Reconstruction +1

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset

1 code implementation ICCV 2021 GuanYing Chen, Chaofeng Chen, Shi Guo, Zhetong Liang, Kwan-Yee K. Wong, Lei Zhang

Secondly, we conduct more sophisticated alignment and temporal fusion in the feature space of the coarse HDR video to produce better reconstruction.

HDR Reconstruction Optical Flow Estimation +1

Fixed Viewpoint Mirror Surface Reconstruction under an Uncalibrated Camera

1 code implementation23 Jan 2021 Kai Han, Miaomiao Liu, Dirk Schnieders, Kwan-Yee K. Wong

This paper addresses the problem of mirror surface reconstruction, and proposes a solution based on observing the reflections of a moving reference plane on the mirror surface.

Surface Reconstruction

Learning Spatial Attention for Face Super-Resolution

1 code implementation2 Dec 2020 Chaofeng Chen, Dihong Gong, Hao Wang, Zhifeng Li, Kwan-Yee K. Wong

Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces (e. g., $16\times16$).

Face Parsing Image Super-Resolution +2

Face Sketch Synthesis with Style Transfer using Pyramid Column Feature

1 code implementation18 Sep 2020 Chaofeng Chen, Xiao Tan, Kwan-Yee K. Wong

We utilize a fully convolutional neural network (FCNN) to create the content image, and propose a style transfer approach to introduce textures and shadings based on a newly proposed pyramid column feature.

Face Sketch Synthesis Style Transfer

Progressive Semantic-Aware Style Transformation for Blind Face Restoration

1 code implementation CVPR 2021 Chaofeng Chen, Xiaoming Li, Lingbo Yang, Xianhui Lin, Lei Zhang, Kwan-Yee K. Wong

Compared with previous networks, the proposed PSFR-GAN makes full use of the semantic (parsing maps) and pixel (LQ images) space information from different scales of input pairs.

Blind Face Restoration Face Parsing +2

Deep Photometric Stereo for Non-Lambertian Surfaces

1 code implementation26 Jul 2020 Guan-Ying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

To deal with the uncalibrated scenario where light directions are unknown, we introduce a new convolutional network, named LCNet, to estimate light directions from input images.

Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

no code implementations25 Jan 2020 Zhenfang Chen, Lin Ma, Wenhan Luo, Peng Tang, Kwan-Yee K. Wong

In this paper, we study the problem of weakly-supervised temporal grounding of sentence in video.

Sentence

Learning Transparent Object Matting

1 code implementation25 Jul 2019 Guan-Ying Chen, Kai Han, Kwan-Yee K. Wong

In this paper, we formulate transparent object matting as a refractive flow estimation problem, and propose a deep learning framework, called TOM-Net, for learning the refractive flow.

Decoder Image Matting +2

Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video

1 code implementation ACL 2019 Zhenfang Chen, Lin Ma, Wenhan Luo, Kwan-Yee K. Wong

In this paper, we address a novel task, namely weakly-supervised spatio-temporally grounding natural sentence in video.

Diversity object-detection +2

Self-calibrating Deep Photometric Stereo Networks

1 code implementation CVPR 2019 Guan-Ying Chen, Kai Han, Boxin Shi, Yasuyuki Matsushita, Kwan-Yee K. Wong

This paper proposes an uncalibrated photometric stereo method for non-Lambertian scenes based on deep learning.

Deep Learning

SAFE: Scale Aware Feature Encoder for Scene Text Recognition

no code implementations17 Jan 2019 Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong

We propose a novel scale aware feature encoder (SAFE) that is designed specifically for encoding characters with different scales.

Scene Text Recognition

Semi-Supervised Learning for Face Sketch Synthesis in the Wild

1 code implementation12 Dec 2018 Chaofeng Chen, Wei Liu, Xiao Tan, Kwan-Yee K. Wong

Instead of supervising the network with ground truth sketches, we first perform patch matching in feature space between the input photo and photos in a small reference set of photo-sketch pairs.

Face Sketch Synthesis Patch Matching

PS-FCN: A Flexible Learning Framework for Photometric Stereo

1 code implementation ECCV 2018 Guan-Ying Chen, Kai Han, Kwan-Yee K. Wong

This paper addresses the problem of photometric stereo for non-Lambertian surfaces.

Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition

no code implementations AAAI 2018 Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong

Unlike previous work which employed a global spatial transformer network to rectify the entire distorted text image, we take an approach of detecting and rectifying individual characters.

Decoder Scene Text Recognition

SCNet: Learning Semantic Correspondence

1 code implementation ICCV 2017 Kai Han, Rafael S. Rezende, Bumsub Ham, Kwan-Yee K. Wong, Minsu Cho, Cordelia Schmid, Jean Ponce

This paper addresses the problem of establishing semantic correspondences between images depicting different instances of the same object or scene category.

Semantic correspondence

Mirror Surface Reconstruction Under an Uncalibrated Camera

no code implementations CVPR 2016 Kai Han, Kwan-Yee K. Wong, Dirk Schnieders, Miaomiao Liu

Unlike previous approaches which require tedious work to calibrate the camera, our method can recover both the camera intrinsics and extrinsics together with the mirror surface from reflections of the reference plane under at least three unknown distinct poses.

Surface Reconstruction

A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects

no code implementations CVPR 2015 Kai Han, Kwan-Yee K. Wong, Miaomiao Liu

In this paper, we develop a fixed viewpoint approach for dense surface reconstruction of transparent objects based on refraction of light.

Object Surface Reconstruction +1

Cannot find the paper you are looking for? You can Submit a new open access paper.