1 code implementation • 18 Nov 2024 • Cheng-Yen Yang, Hsiang-Wei Huang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
The Segment Anything Model 2 (SAM 2) has demonstrated strong performance in object segmentation tasks but faces challenges in visual object tracking, particularly when managing crowded scenes with fast-moving or self-occluding objects.
Ranked #1 on Visual Object Tracking on GOT-10k
no code implementations • 19 Oct 2024 • Xuechen Guo, Wenhao Chai, Shi-Yan Li, Gaoang Wang
Multimodal Large Language Model (MLLM) has recently garnered attention as a prominent research focus.
no code implementations • 11 Oct 2024 • Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang
Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
no code implementations • 5 Oct 2024 • Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, Zuozhu Liu
Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods.
no code implementations • 4 Oct 2024 • Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning
AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88. 9 on Flickr30k, beating GPT-4V (55. 3) and Gemini-1. 5 Pro (82. 2).
no code implementations • 20 Jul 2024 • Yunlong Lin, Tian Ye, Sixiang Chen, Zhenqi Fu, Yingying Wang, Wenhao Chai, Zhaohu Xing, Lei Zhu, Xinghao Ding
Existing low-light image enhancement (LIE) methods have achieved noteworthy success in solving synthetic distortions, yet they often fall short in practical applications.
no code implementations • 18 Jul 2024 • Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan, Zhongyu Jiang, Wenhao Chai, Hsiang-Wei Huang, Chih-Lung Lin, Jenq-Neng Hwang
Traditional methods for human localization and pose estimation (HPE), which mainly rely on RGB images as an input modality, confront substantial limitations in real-world applications due to privacy concerns.
no code implementations • 18 Jul 2024 • Sheng-Yao Kuan, Jen-Hao Cheng, Hsiang-Wei Huang, Wenhao Chai, Cheng-Yen Yang, Hugo Latapie, Gaowen Liu, Bing-Fei Wu, Jenq-Neng Hwang
In the domain of autonomous driving, the integration of multi-modal perception techniques based on data from diverse sensors has demonstrated substantial progress.
no code implementations • 17 Jun 2024 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang
We begin our exploration with a vanilla large language model, augmenting it with a vision encoder and an action codebase trained on our collected high-quality dataset STEVE-21K.
no code implementations • 7 Jun 2024 • Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang
The rendered scenes lack variety, resembling the training images, resulting in monotonous styles.
1 code implementation • 26 Apr 2024 • Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Ranked #4 on Question Answering on NExT-QA (Open-ended VideoQA)
no code implementations • 7 Apr 2024 • Hou-I Liu, Christine Wu, Jen-Hao Cheng, Wenhao Chai, Shian-Yun Wang, Gaowen Liu, Jenq-Neng Hwang, Hong-Han Shuai, Wen-Huang Cheng
Subsequently, we introduce the cross-modal residual distillation to transfer the 3D spatial cues.
no code implementations • 6 Apr 2024 • Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang
After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM.
no code implementations • 27 Mar 2024 • Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance.
no code implementations • 16 Mar 2024 • Hsiang-Wei Huang, Cheng-Yen Yang, Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang
In the field of multi-object tracking (MOT), traditional methods often rely on the Kalman Filter for motion prediction, leveraging its strengths in linear motion scenarios.
no code implementations • 13 Mar 2024 • Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang
To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring.
no code implementations • CVPR 2024 • Tian Ye, Sixiang Chen, Wenhao Chai, Zhaohu Xing, Jing Qin, Ge Lin, Lei Zhu
When adopting diffusion models for image restoration the crucial challenge lies in how to preserve high-level image fidelity in the randomness diffusion process and generate accurate background structures and realistic texture details.
no code implementations • 8 Dec 2023 • Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang
Moreover, we employ GPT-2 as the frozen large language model.
no code implementations • 3 Dec 2023 • Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang
In this paper, we propose CityGen, a novel end-to-end framework for infinite, diverse and controllable 3D city layout generation. First, we propose an outpainting pipeline to extend the local layout to an infinite city layout.
no code implementations • 26 Nov 2023 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang
This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment.
no code implementations • 24 Nov 2023 • Zhongyu Jiang, Wenhao Chai, Lei LI, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang
In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i. e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline.
Ranked #70 on 3D Human Pose Estimation on 3DPW (PA-MPJPE metric)
1 code implementation • 17 Nov 2023 • Zhuoran Zhou, Zhongyu Jiang, Wenhao Chai, Cheng-Yen Yang, Lei LI, Jenq-Neng Hwang
We further apply a guided diffusion model to domain adapt 3D adult pose to infant pose to supplement small datasets.
no code implementations • 24 Sep 2023 • Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang
In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.
no code implementations • 7 Sep 2023 • Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang
In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.
no code implementations • 19 Aug 2023 • Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
Our proposed model takes support images and labels as prompt guidance for a query image.
1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.
1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.
1 code implementation • CVPR 2024 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +10
1 code implementation • 7 Jul 2023 • Zhongyu Jiang, Zhuoran Zhou, Lei LI, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang
Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods.
Ranked #10 on 3D Human Pose Estimation on 3DPW (PA-MPJPE metric)
no code implementations • 7 Jul 2023 • Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision.
no code implementations • 29 Jun 2023 • Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Estimating 3D human poses only from a 2D human pose sequence is thoroughly explored in recent years.
1 code implementation • 15 May 2023 • Jingxia Jiang, Tian Ye, Jinbin Bai, Sixiang Chen, Wenhao Chai, Shi Jun, Yun Liu, ErKang Chen
In this work, we propose the Five A$^{+}$ Network (FA$^{+}$Net), a highly efficient and lightweight real-time underwater image enhancement network with only $\sim$ 9k parameters and $\sim$ 0. 01s processing time.
1 code implementation • ICCV 2023 • Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang
We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training.
3D Human Pose Estimation 3D Human Pose Estimation in Limited Data +4
no code implementations • 27 Mar 2023 • Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang
In this paper, we propose a novel blind inpainting method that automatically reconstructs visual contents within the corrupted regions without mask input as guidance.
no code implementations • 1 Mar 2023 • Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang
With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade.
1 code implementation • 14 Feb 2023 • Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.
1 code implementation • 23 Sep 2022 • Zhenting Qi, Ruike Zhu, Zheyu Fu, Wenhao Chai, Volodymyr Kindratenko
In this paper, we propose a simple but effective method that solves the task from a new perspective: we design the fight detection model as a composition of an action-aware feature extractor and an anomaly score generator.