Search Results for author: Wenhao Chai

Found 37 papers, 11 papers with code

SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

1 code implementation18 Nov 2024 Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.

Visual Object Tracking Visual Tracking

Ego3DT: Tracking Every 3D Object in Ego-centric Videos

no code implementations11 Oct 2024 Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang

Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.

3D Reconstruction 3D Scene Reconstruction

PAD: Personalized Alignment of LLMs at Decoding-Time

no code implementations5 Oct 2024 Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods.

Text Generation

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

no code implementations4 Oct 2024 Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning

AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88. 9 on Flickr30k, beating GPT-4V (55. 3) and Gemini-1. 5 Pro (82. 2).

Image Captioning Video Understanding

AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement

no code implementations20 Jul 2024 Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding

Existing low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short in practical applications.

Attribute Low-Light Image Enhancement

RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark

no code implementations18 Jul 2024 Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang

Traditional methods for human localization and pose estimation (HPE), which mainly rely on RGB images as an input modality, confront substantial limitations in real-world applications due to privacy concerns.

3D Human Pose Estimation Benchmarking +2

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check

no code implementations18 Jul 2024 Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang

In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress.

3D Multi-Object Tracking Autonomous Driving +1

STEVE Series: Step-by-Step Construction of Agent Systems in Minecraft

no code implementations17 Jun 2024 Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang

We begin our exploration with a vanilla large language model, augmenting it with a vision encoder and an action codebase trained on our collected high-quality dataset STEVE-21K.

Knowledge Distillation Language Modelling +2

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

1 code implementation26 Apr 2024 Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

2k Question Answering +2

Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model

no code implementations6 Apr 2024 Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang

After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM.

Knowledge Distillation

VersaT2I: Improving Text-to-Image Models with Versatile Reward

no code implementations27 Mar 2024 Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance.

Exploring Learning-based Motion Models in Multi-Object Tracking

no code implementations16 Mar 2024 Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang

In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios.

motion prediction Multi-Object Tracking

Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

no code implementations13 Mar 2024 Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang

To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring.

Minecraft Navigate

Learning Diffusion Texture Priors for Image Restoration

no code implementations CVPR 2024 Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu

When adopting diffusion models for image restoration the crucial challenge lies in how to preserve high-level image fidelity in the randomness diffusion process and generate accurate background structures and realistic texture details.

Image Generation Image Restoration

CityGen: Infinite and Controllable 3D City Layout Generation

no code implementations3 Dec 2023 Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang

In this paper, we propose CityGen, a novel end-to-end framework for infinite, diverse and controllable 3D city layout generation. First, we propose an outpainting pipeline to extend the local layout to an infinite city layout.

Diversity

See and Think: Embodied Agent in Virtual Environment

no code implementations26 Nov 2023 Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang

This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment.

Minecraft Question Answering +1

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

no code implementations24 Nov 2023 Zhongyu Jiang, Wenhao Chai, Lei LI, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang

In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i. e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline.

Ranked #70 on 3D Human Pose Estimation on 3DPW (PA-MPJPE metric)

2D Human Pose Estimation 3D Human Pose Estimation +3

Devil in the Number: Towards Robust Multi-modality Data Filter

no code implementations24 Sep 2023 Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.

Chasing Consistency in Text-to-3D Generation from a Single Image

no code implementations7 Sep 2023 Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.

3D Generation Text to 3D

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation ICCV 2023 Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

1 code implementation7 Jul 2023 Zhongyu Jiang, Zhuoran Zhou, Lei LI, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang

Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods.

Ranked #10 on 3D Human Pose Estimation on 3DPW (PA-MPJPE metric)

3D Human Pose Estimation Image to 3D

A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

no code implementations7 Jul 2023 Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang

Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision.

Deep Learning Survey

Five A$^{+}$ Network: You Only Need 9K Parameters for Underwater Image Enhancement

1 code implementation15 May 2023 Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, ErKang Chen

In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0. 01s processing time.

Computational Efficiency Image Enhancement

Global Adaptation meets Local Generalization: Unsupervised Domain Adaptation for 3D Human Pose Estimation

1 code implementation ICCV 2023 Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang

We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training.

3D Human Pose Estimation 3D Human Pose Estimation in Limited Data +4

Blind Inpainting with Object-aware Discrimination for Artificial Marker Removal

no code implementations27 Mar 2023 Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang

In this paper, we propose a novel blind inpainting method that automatically reconstructs visual contents within the corrupted regions without mask input as guidance.

Object

Deep Learning Methods for Small Molecule Drug Discovery: A Survey

no code implementations1 Mar 2023 Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang

With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade.

Deep Learning Drug Discovery +3

DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models

1 code implementation14 Feb 2023 Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang

We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.

Denoising Style Transfer

Weakly Supervised Two-Stage Training Scheme for Deep Video Fight Detection Model

1 code implementation23 Sep 2022 Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, Volodymyr Kindratenko

In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.

Action Recognition Anomaly Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.