Search Results for author: Chaoyang Wang

Found 53 papers, 15 papers with code

Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports

no code implementations22 May 2025 Francesco Dalla Serra, Patrick Schrempf, Chaoyang Wang, Zaiqiao Meng, Fani Deligianni, Alison Q. O'Neil

Taking inspiration from 'Chain-of-Thought reasoning', we demonstrate that performance on the CXR VQA task can be improved by grounding the answer generator module with a radiology report predicted for the same CXR.

Answer Generation Question Answering +2

TMCIR: Token Merge Benefits Composed Image Retrieval

no code implementations15 Apr 2025 Chaoyang Wang, Zeyu Zhang, Long Teng, Zijun Li, Shichao Kan

This mechanism dynamically balances visual and textual representations within the contrastive learning pipeline, optimizing the composed feature for retrieval.

Contrastive Learning cross-modal alignment +3

Towards Affordance-Aware Articulation Synthesis for Rigged Objects

no code implementations21 Jan 2025 Yu-Chu Yu, Chieh Hubert Lin, Hsin-Ying Lee, Chaoyang Wang, Yu-Chiang Frank Wang, Ming-Hsuan Yang

However, articulating the rigs into realistic affordance-aware postures (e. g., following the context, respecting the physics and the personalities of the object) remains time-consuming and heavily relies on human labor from experienced artists.

Semantic correspondence

PrEditor3D: Fast and Precise 3D Shape Editing

no code implementations CVPR 2025 Ziya Erkoç, Can Gümeli, Chaoyang Wang, Matthias Nießner, Angela Dai, Peter Wonka, Hsin-Ying Lee, Peiye Zhuang

The edited 3D mesh aligns well with the prompts, and remains identical for regions that are not intended to be altered.

DELTA: Dense Efficient Long-range 3D Tracking for any video

no code implementations31 Oct 2024 Tuan Duc Ngo, Peiye Zhuang, Chuang Gan, Evangelos Kalogerakis, Sergey Tulyakov, Hsin-Ying Lee, Chaoyang Wang

Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.

Motion Estimation

Pixel-Aligned Multi-View Generation with Depth Guided Decoder

no code implementations26 Aug 2024 Zhenggang Tang, Peiye Zhuang, Chaoyang Wang, Aliaksandr Siarohin, Yash Kant, Alexander Schwing, Sergey Tulyakov, Hsin-Ying Lee

During inference, we employ a rapid multi-view to 3D reconstruction approach, NeuS, to obtain coarse depth for the depth-truncated epipolar attention.

3D Reconstruction Decoder +1

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

no code implementations17 Jul 2024 Sherwin Bahmani, Ivan Skorokhodov, Aliaksandr Siarohin, Willi Menapace, Guocheng Qian, Michael Vasilkovsky, Hsin-Ying Lee, Chaoyang Wang, Jiaxu Zou, Andrea Tagliasacchi, David B. Lindell, Sergey Tulyakov

Recently, new methods demonstrate the ability to generate videos with controllable camera poses these techniques leverage pre-trained U-Net-based diffusion models that explicitly disentangle spatial and temporal generation.

Video Generation

Oracle Bone Inscriptions Multi-modal Dataset

no code implementations4 Jul 2024 Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.

Decipherment Denoising

Lightweight Predictive 3D Gaussian Splats

1 code implementation27 Jun 2024 Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

Our key observation is that nearby points in the scene can share similar representations.

GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement

no code implementations9 Jun 2024 Peiye Zhuang, Songfang Han, Chaoyang Wang, Aliaksandr Siarohin, Jiaxu Zou, Michael Vasilkovsky, Vladislav Shakhrai, Sergey Korolev, Sergey Tulyakov, Hsin-Ying Lee

Our method takes inspiration from large reconstruction models like LRM that use a transformer-based triplane generator and a Neural Radiance Field (NeRF) model trained on multi-view images.

3D Generation 3D Reconstruction +2

SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

1 code implementation30 May 2024 Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

For image synthesis, we propose a finite perturbation approach to enhance the diversity of generated results without changing the semantic categories.

Diversity Image Generation +2

3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

no code implementations28 May 2024 Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang

This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity.

3D geometry Disentanglement

DualVAE: Dual Disentangled Variational AutoEncoder for Recommendation

1 code implementation10 Jan 2024 Zhiqiang Guo, GuoHui Li, Jianjun Li, Chaoyang Wang, Si Shi

To address this problem, we propose a Dual Disentangled Variational AutoEncoder (DualVAE) for collaborative recommendation, which combines disentangled representation learning with variational inference to facilitate the generation of implicit interaction data.

Collaborative Filtering Disentanglement +1

Towards Text-guided 3D Scene Composition

no code implementations CVPR 2024 Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee

We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes.

Text to 3D

LGMRec: Local and Global Graph Learning for Multimodal Recommendation

1 code implementation27 Dec 2023 Zhiqiang Guo, Jianjun Li, GuoHui Li, Chaoyang Wang, Si Shi, Bin Ruan

The multimodal recommendation has gradually become the infrastructure of online media platforms, enabling them to provide personalized service to users through a joint modeling of user historical behaviors (e. g., purchases, clicks) and item various modalities (e. g., visual and textual).

Graph Embedding Graph Learning +2

Virtual Pets: Animatable Animal Generation in 3D Scenes

no code implementations21 Dec 2023 Yen-Chi Cheng, Chieh Hubert Lin, Chaoyang Wang, Yash Kant, Sergey Tulyakov, Alexander Schwing, LiangYan Gui, Hsin-Ying Lee

Toward unlocking the potential of generative models in immersive 4D experiences, we introduce Virtual Pet, a novel pipeline to model realistic and diverse motions for target animal species within a 3D environment.

NeRF

Controllable Chest X-Ray Report Generation from Longitudinal Representations

no code implementations9 Oct 2023 Francesco Dalla Serra, Chaoyang Wang, Fani Deligianni, Jeffrey Dalton, Alison Q O'Neil

Previous approaches to automated radiology reporting generally do not provide the prior study as input, precluding comparison which is required for clinical accuracy in some types of scans, and offer only unreliable methods of interpretability.

Anatomy Representation Learning +1

Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting

no code implementations30 Aug 2023 Francesco Dalla Serra, Chaoyang Wang, Fani Deligianni, Jeffrey Dalton, Alison Q. O'Neil

Automated approaches to radiology reporting require the image to be encoded into a suitable token representation for input to the language model.

Image Captioning Language Modelling

AutoDecoding Latent 3D Diffusion Models

1 code implementation NeurIPS 2023 Evangelos Ntavelis, Aliaksandr Siarohin, Kyle Olszewski, Chaoyang Wang, Luc van Gool, Sergey Tulyakov

We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.

Reconstructing Animatable Categories from Videos

2 code implementations CVPR 2023 Gengshan Yang, Chaoyang Wang, N Dinesh Reddy, Deva Ramanan

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging, which are difficult to scale to arbitrary categories.

3D Shape Reconstruction from Videos Dynamic Reconstruction +1

Flow supervision for Deformable NeRF

no code implementations CVPR 2023 Chaoyang Wang, Lachlan Ewen MacDonald, Laszlo A. Jeni, Simon Lucey

In this paper we present a new method for deformable NeRF that can directly use optical flow as supervision.

NeRF Novel View Synthesis +1

The role of noise in denoising models for anomaly detection in medical images

1 code implementation19 Jan 2023 Antanas Kascenas, Pedro Sanchez, Patrick Schrempf, Chaoyang Wang, William Clackett, Shadia S. Mikhael, Jeremy P. Voisey, Keith Goatman, Alexander Weir, Nicolas Pugeault, Sotirios A. Tsaftaris, Alison Q. O'Neil

Denoising methods, for instance classical denoising autoencoders (DAEs) and more recently emerging diffusion models, are a promising approach, however naive application of pixelwise noise leads to poor anomaly detection performance.

Denoising Unsupervised Anomaly Detection

MDGCF: Multi-Dependency Graph Collaborative Filtering with Neighborhood- and Homogeneous-level Dependencies

1 code implementation CIKM 2022 GuoHui Li, Zhiqiang Guo, Jianjun Li, Chaoyang Wang

Specifically, for neighborhood-level dependencies, we explicitly consider both popularity score and preference correlation by designing a joint neighborhood-level dependency weight, based on which we construct a neighborhood-level dependencies graph to capture higher-order interaction features.

Collaborative Filtering Graph Representation Learning +1

MBW: Multi-view Bootstrapping in the Wild

2 code implementations4 Oct 2022 Mosam Dabhi, Chaoyang Wang, Tim Clifford, Laszlo Attila Jeni, Ian R. Fasel, Simon Lucey

Our Multi-view Bootstrapping in the Wild (MBW) approach demonstrates impressive results on standard human datasets, as well as tigers, cheetahs, fish, colobus monkeys, chimpanzees, and flamingos from videos captured casually in a zoo.

3D Reconstruction Semi-supervised 2D and 3D landmark labeling +1

Neural Prior for Trajectory Estimation

no code implementations CVPR 2022 Chaoyang Wang, Xueqian Li, Jhony Kaesemodel Pontes, Simon Lucey

Here, we propose a neural trajectory prior to capture continuous spatio-temporal information without the need for offline data.

Image Denoising Super-Resolution

High Fidelity 3D Reconstructions with Limited Physical Views

no code implementations22 Oct 2021 Mosam Dabhi, Chaoyang Wang, Kunal Saluja, Laszlo Jeni, Ian Fasel, Simon Lucey

Multi-view triangulation is the gold standard for 3D reconstruction from 2D correspondences given known calibration and sufficient views.

3D Reconstruction Vocal Bursts Intensity Prediction

Neural Trajectory Fields for Dynamic Novel View Synthesis

no code implementations12 May 2021 Chaoyang Wang, Ben Eckart, Simon Lucey, Orazio Gallo

Recent approaches to render photorealistic views from a limited set of photographs have pushed the boundaries of our interactions with pictures of static scenes.

NeRF Novel View Synthesis

PAUL: Procrustean Autoencoder for Unsupervised Lifting

no code implementations CVPR 2021 Chaoyang Wang, Simon Lucey

Recent success in casting Non-rigid Structure from Motion (NRSfM) as an unsupervised deep learning problem has raised fundamental questions about what novelty in NRSfM prior could the deep learning offer.

Deep Learning

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

1 code implementation NeurIPS 2020 Chen-Hsuan Lin, Chaoyang Wang, Simon Lucey

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets.

3D Object Reconstruction From A Single Image 3D Reconstruction

A Light Heterogeneous Graph Collaborative Filtering Model using Textual Information

1 code implementation4 Oct 2020 Chaoyang Wang, Zhiqiang Guo, GuoHui Li, Jianjun Li, Peng Pan, Ke Liu

Afterward, by performing a simplified RGCN-based node information propagation on the constructed heterogeneous graph, the embeddings of users and items can be adjusted with textual knowledge, which effectively alleviates the negative effects of data sparsity.

Collaborative Filtering Recommendation Systems +1

A Text-based Deep Reinforcement Learning Framework for Interactive Recommendation

1 code implementation14 Apr 2020 Chaoyang Wang, Zhiqiang Guo, Jianjun Li, Peng Pan, Guo-Hui Li

IRSs usually face the large discrete action space problem, which makes most of the existing RL-based recommendation methods inefficient.

Deep Reinforcement Learning Interactive Recommendation +2

Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild

1 code implementation27 Jan 2020 Chaoyang Wang, Chen-Hsuan Lin, Simon Lucey

The recovery of 3D shape and pose from 2D landmarks stemming from a large ensemble of images can be viewed as a non-rigid structure from motion (NRSfM) problem.

3D Reconstruction

Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes

no code implementations25 Apr 2019 Chaoyang Wang, Simon Lucey, Federico Perazzi, Oliver Wang

We present a fully data-driven method to compute depth from diverse monocular video sequences that contain large amounts of non-rigid objects, e. g., people.

Depth Estimation Depth Prediction

Deep Convolutional Compressed Sensing for LiDAR Depth Completion

no code implementations23 Mar 2018 Nathaniel Chodosh, Chaoyang Wang, Simon Lucey

In this paper we consider the problem of estimating a dense depth map from a set of sparse LiDAR points.

compressed sensing Depth Completion

Learning Depth from Monocular Videos using Direct Methods

1 code implementation CVPR 2018 Chaoyang Wang, Jose Miguel Buenaposada, Rui Zhu, Simon Lucey

The ability to predict depth from a single image - using recent advances in CNNs - is of increasing interest to the vision community.

Depth And Camera Motion Visual Odometry

Semantic Photometric Bundle Adjustment on Natural Sequences

no code implementations30 Nov 2017 Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, Simon Lucey

More recently, excellent results have been attained through the application of photometric bundle adjustment (PBA) methods -- which directly minimize the photometric error across frames.

Object Object Reconstruction

Object-Centric Photometric Bundle Adjustment with Deep Shape Prior

no code implementations4 Nov 2017 Rui Zhu, Chaoyang Wang, Chen-Hsuan Lin, Ziyan Wang, Simon Lucey

Reconstructing 3D shapes from a sequence of images has long been a problem of interest in computer vision.

Object

Rethinking Reprojection: Closing the Loop for Pose-aware ShapeReconstruction from a Single Image

no code implementations15 Jul 2017 Rui Zhu, Hamed Kiani Galoogahi, Chaoyang Wang, Simon Lucey

An emerging problem in computer vision is the reconstruction of 3D shape and pose of an object from a single image.

Deep-LK for Efficient Adaptive Object Tracking

no code implementations19 May 2017 Chaoyang Wang, Hamed Kiani Galoogahi, Chen-Hsuan Lin, Simon Lucey

In this paper we present a new approach for efficient regression based object tracking which we refer to as Deep- LK.

Object Object Tracking +1

Object Proposal by Multi-Branch Hierarchical Segmentation

no code implementations CVPR 2015 Chaoyang Wang, Long Zhao, Shuang Liang, Liqing Zhang, Jinyuan Jia, Yichen Wei

Hierarchical segmentation based object proposal methods have become an important step in modern object detection paradigm.

Object object-detection +2

An non-uniform sampling strategy for physiological signals component analysis

no code implementations IEEE International Conference on Consumer Electronics (ICCE) 2013 Molin Jia, Chaoyang Wang, Kui-Ting Chen, Takaaki Baba

The conventional approach cannot meet the requirement of physiological signal analysis to extract the main component of the acquired signal.

Cannot find the paper you are looking for? You can Submit a new open access paper.