no code implementations • 11 Mar 2025 • Dongping Li, Tielong Cai, Tianci Tang, Wenhao Chai, Katherine Rose Driggs-Campbell, Gaoang Wang
Developing autonomous home robots controlled by natural language has long been a pursuit of human.
no code implementations • 22 Feb 2025 • Wenhao Hu, Wenhao Chai, Shengyu Hao, Xiaotong Cui, Xuexiang Wen, Jenq-Neng Hwang, Gaoang Wang
To address these challenges, we introduce CCGS, a method designed to achieve both view consistent 2D segmentation and a compact 3D Gaussian segmentation field.
no code implementations • 27 Jan 2025 • Chengting Yu, Xiaochen Zhao, Lei Liu, Shu Yang, Gaoang Wang, Erping Li, Aili Wang
Spiking Neural Networks (SNNs) are emerging as a brain-inspired alternative to traditional Artificial Neural Networks (ANNs), prized for their potential energy efficiency on neuromorphic hardware.
no code implementations • 19 Jan 2025 • Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang
While advanced techniques like radiance fields and 3D Gaussian Splatting achieve rendering quality and impressive efficiency with dense view inputs, they suffer from significant geometric reconstruction errors when applied to sparse input views.
no code implementations • 28 Dec 2024 • Gaoang Wang, Hang Wu, Yang Liao, Zhen Chen, Qing Zhou, Wenxing Wang, Yifei Liu, Yilin Wang, Meijing Wu, Ruiqi Xiang, Yuntao Yu, Xi Zhou, Feng Zhu, Zhonghua Liu, Tingjun Hou
Biotoxins, mainly produced by venomous animals, plants and microorganisms, exhibit high physiological activity and unique effects such as lowering blood pressure and analgesia.
no code implementations • 19 Oct 2024 • Xuechen Guo, Wenhao Chai, Shi-Yan Li, Gaoang Wang
Multimodal Large Language Model (MLLM) has recently garnered attention as a prominent research focus.
1 code implementation • 15 Oct 2024 • Chengting Yu, Lei Liu, Gaoang Wang, Erping Li, Aili Wang
Recent insights have revealed that rate-coding is a primary form of information representation captured by surrogate-gradient-based Backpropagation Through Time (BPTT) in training deep Spiking Neural Networks (SNNs).
no code implementations • 11 Oct 2024 • Shengyu Hao, Wenhao Chai, Zhonghan Zhao, Meiqi Sun, Wendi Hu, Jieyang Zhou, Yixian Zhao, Qi Li, Yizhou Wang, Xi Li, Gaoang Wang
Addressing this issue, this paper introduces a novel zero-shot approach for the 3D reconstruction and tracking of all objects from the ego-centric video.
no code implementations • 22 Aug 2024 • Bozheng Li, Mushui Liu, Gaoang Wang, Yunlong Yu
In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.
no code implementations • 17 Jun 2024 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Ke Ma, Kewei Chen, Dongxu Guo, Tian Ye, Yanting Zhang, Hongwei Wang, Gaoang Wang
We begin our exploration with a vanilla large language model, augmenting it with a vision encoder and an action codebase trained on our collected high-quality dataset STEVE-21K.
1 code implementation • CVPR 2024 • Wenjie Wang, Yehao Lu, Guangcong Zheng, Shuigen Zhan, Xiaoqing Ye, Zichang Tan, Jingdong Wang, Gaoang Wang, Xi Li
Vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range.
no code implementations • 7 Jun 2024 • Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang
The rendered scenes lack variety, resembling the training images, resulting in monotonous styles.
no code implementations • 31 May 2024 • Haolong Ma, Hui Li, Chunyang Cheng, Gaoang Wang, Xiaoning Song, XiaoJun Wu
However, in image fusion, current methods underestimate the potential of SSSM in capturing the global spatial information of both modalities.
1 code implementation • 29 Apr 2024 • Yichen Ouyang, Jianhao Yuan, Hao Zhao, Gaoang Wang, Bo Zhao
Generating long and consistent videos has emerged as a significant yet challenging problem.
1 code implementation • 26 Apr 2024 • Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Ranked #4 on
Question Answering
on NExT-QA (Open-ended VideoQA)
no code implementations • 6 Apr 2024 • Zhonghan Zhao, Ke Ma, Wenhao Chai, Xuan Wang, Kewei Chen, Dongxu Guo, Yanting Zhang, Hongwei Wang, Gaoang Wang
After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance, utilizing the performance and knowledge of a versatile MLM.
no code implementations • 27 Mar 2024 • Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance.
no code implementations • 13 Mar 2024 • Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang
To assess organizational behavior, we design a series of navigation tasks in the Minecraft environment, which includes searching and exploring.
no code implementations • CVPR 2024 • Chenlu Zhan, Yu Lin, Gaoang Wang, Hongwei Wang, Jian Wu
Medical generative models, acknowledged for their high-quality sample generation ability, have accelerated the fast growth of medical applications.
no code implementations • 18 Dec 2023 • Chenlu Zhan, Yufei Zhang, Yu Lin, Gaoang Wang, Hongwei Wang
Medical vision-language pre-training (Med-VLP) models have recently accelerated the fast-growing medical diagnostics application.
no code implementations • 8 Dec 2023 • Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, Gaoang Wang
Moreover, we employ GPT-2 as the frozen large language model.
no code implementations • 3 Dec 2023 • Jie Deng, Wenhao Chai, Jianshu Guo, Qixuan Huang, Wenhao Hu, Jenq-Neng Hwang, Gaoang Wang
In this paper, we propose CityGen, a novel end-to-end framework for infinite, diverse and controllable 3D city layout generation. First, we propose an outpainting pipeline to extend the local layout to an infinite city layout.
no code implementations • 26 Nov 2023 • Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Gaoang Wang
This paper proposes STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment.
no code implementations • 17 Nov 2023 • Yizhou Wang, Jen-Hao Cheng, Jui-Te Huang, Sheng-Yao Kuan, Qiqian Fu, Chiming Ni, Shengyu Hao, Gaoang Wang, Guanbin Xing, Hui Liu, Jenq-Neng Hwang
This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar.
1 code implementation • 2 Nov 2023 • Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li, Gaoang Wang
With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations.
no code implementations • 24 Sep 2023 • Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang
In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.
1 code implementation • 16 Sep 2023 • Qiqian Fu, Guanhong Wang, Gaoang Wang
The key frame selector, Frame Selector, is built on CNN architecture.
no code implementations • 7 Sep 2023 • Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang
In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization.
1 code implementation • ICCV 2023 • Longrong Yang, Xianpan Zhou, XueWei Li, Liang Qiao, Zheyang Li, Ziwei Yang, Gaoang Wang, Xi Li
Thus, the optimum of the distillation loss does not necessarily lead to the optimal student classification scores for dense object detectors.
no code implementations • 19 Aug 2023 • Meiqi Sun, Zhonghan Zhao, Wenhao Chai, Hanjun Luo, Shidong Cao, Yanting Zhang, Jenq-Neng Hwang, Gaoang Wang
Our proposed model takes support images and labels as prompt guidance for a query image.
1 code implementation • 18 Aug 2023 • Hanbing Liu, Jun-Yan He, Zhi-Qi Cheng, Wangmeng Xiang, Qize Yang, Wenhao Chai, Gaoang Wang, Xu Bao, Bin Luo, Yifeng Geng, Xuansong Xie
Typically, PoSynDA uses a diffusion-inspired structure to simulate 3D pose distribution in the target domain.
1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.
1 code implementation • CVPR 2024 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Multiple-choice
Video-based Generative Performance Benchmarking (Consistency)
+11
no code implementations • 7 Jul 2023 • Zhonghan Zhao, Wenhao Chai, Shengyu Hao, Wenhao Hu, Guanhong Wang, Shidong Cao, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision.
no code implementations • 29 Jun 2023 • Zhenyu Zhang, Wenhao Chai, Zhongyu Jiang, Tian Ye, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Estimating 3D human poses only from a 2D human pose sequence is thoroughly explored in recent years.
1 code implementation • CVPR 2023 • Wei Su, Peihan Miao, Huanzhang Dou, Gaoang Wang, Liang Qiao, Zheyang Li, Xi Li
The active perception can take expressions as priors to extract relevant visual features, which can effectively alleviate the mismatches.
1 code implementation • 6 Jun 2023 • XueWei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li
Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude.
Ranked #4 on
Semantic Segmentation
on Stanford2D3D Panoramic
1 code implementation • ICCV 2023 • Wenhao Chai, Zhongyu Jiang, Jenq-Neng Hwang, Gaoang Wang
We observe that the degradation is caused by two factors: 1) the large distribution gap over global positions of poses between the source and target datasets due to variant camera parameters and settings, and 2) the deficient diversity of local structures of poses in training.
3D Human Pose Estimation
3D Human Pose Estimation in Limited Data
+4
no code implementations • 28 Mar 2023 • Xiaoyue Li, Kai Shang, Gaoang Wang, Mark D. Butala
Reducing the radiation dose in computed tomography (CT) is important to mitigate radiation-induced risks.
no code implementations • 27 Mar 2023 • Xuechen Guo, Wenhao Hu, Chiming Ni, Wenhao Chai, Shiyan Li, Gaoang Wang
In this paper, we propose a novel blind inpainting method that automatically reconstructs visual contents within the corrupted regions without mask input as guidance.
no code implementations • 1 Mar 2023 • Wenhao Hu, Yingying Liu, Xuanyu Chen, Wenhao Chai, Hangyue Chen, Hongwei Wang, Gaoang Wang
With the development of computer-assisted techniques, research communities including biochemistry and deep learning have been devoted into the drug discovery field for over a decade.
2 code implementations • 15 Feb 2023 • Shenghao Hao, Peiyuan Liu, Yibing Zhan, Kaixun Jin, Zuozhu Liu, Mingli Song, Jenq-Neng Hwang, Gaoang Wang
Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods.
1 code implementation • 14 Feb 2023 • Shidong Cao, Wenhao Chai, Shengyu Hao, Yanting Zhang, Hangyue Chen, Gaoang Wang
We focus on a new fashion design task, where we aim to transfer a reference appearance image onto a clothing image while preserving the structure of the clothing image.
1 code implementation • 11 Oct 2022 • Chengting Yu, Zheming Gu, Da Li, Gaoang Wang, Aili Wang, Erping Li
We show that endowing synaptic models with temporal dependencies can improve the performance of SNNs on classification tasks.
Ranked #4 on
Audio Classification
on SHD
no code implementations • 7 Oct 2022 • Haozhe Chi, Minghua Yang, Junhao Zhu, Guanhong Wang, Gaoang Wang
Multimodal sentiment analysis (MSA) is an important way of observing mental activities with the help of data captured from multiple modalities.
1 code implementation • 24 Jul 2022 • Gaoang Wang, Yibing Zhan, Xinchao Wang, Mingli Song, Klara Nahrstedt
Anomaly detection aims at identifying deviant samples from the normal data distribution.
no code implementations • 22 May 2022 • Gaoang Wang, Mingli Song, Jenq-Neng Hwang
Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories.
no code implementations • 1 May 2022 • Yang Zhou, Zhanhao He, Keyu Lu, Guanhong Wang, Gaoang Wang
Video-based action recognition is one of the most popular topics in computer vision.
no code implementations • 27 Apr 2022 • Guanhong Wang, Keyu Lu, Yang Zhou, Zhanhao He, Gaoang Wang
Recently, much progress has been made for self-supervised action recognition.
no code implementations • 21 Apr 2022 • Peihan Miao, Wei Su, Gaoang Wang, XueWei Li, Xi Li
As an important and challenging problem in vision-language tasks, referring expression comprehension (REC) generally requires a large amount of multi-grained information of visual and linguistic modalities to realize accurate reasoning.
1 code implementation • 21 Apr 2022 • Chengting Yu, Yangkai Du, Mufeng Chen, Aili Wang, Gaoang Wang, Erping Li
For plasticity, we propose a trainable convolutional synapse that models spike response current to enhance the diversity of spiking neurons for temporal feature extraction.
Ranked #7 on
Audio Classification
on SHD
no code implementations • 31 Dec 2021 • Xiaoqian Ruan, Gaoang Wang
However, the inconsistency and bias among different annotators are harmful to the model training, especially for qualitative and subjective tasks. To address this challenge, in this paper, we propose a novel contrastive regression framework to address the disjoint annotations problem, where each sample is labeled by only one annotator and multiple annotators work on disjoint subsets of the data.
no code implementations • 6 Oct 2021 • Xinkai Yuan, Zilinghan Li, Gaoang Wang
With human-in-the-loop, active learning can iteratively select informative unlabeled samples for labeling and training to improve the performance in the SSL framework.
1 code implementation • ICCV 2021 • Gaoang Wang, Renshu Gu, Zuozhu Liu, Weijie Hu, Mingli Song, Jenq-Neng Hwang
In this paper, we try to explore the significance of motion patterns for vehicle tracking without appearance information.
no code implementations • 11 May 2021 • Yizhou Wang, Gaoang Wang, Hung-Min Hsu, Hui Liu, Jenq-Neng Hwang
Radar has long been a common sensor on autonomous vehicles for obstacle ranging and speed estimation.
no code implementations • 6 May 2021 • Gaoang Wang, Yizhou Wang, Renshu Gu, Weijie Hu, Jenq-Neng Hwang
To address such common challenges in most of the existing trackers, in this paper, a tracklet booster algorithm is proposed, which can be built upon any other tracker.
no code implementations • 26 Mar 2021 • Zhongjie Yu, Gaoang Wang, Lin Chen, Sebastian Raschka, Jiebo Luo
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
no code implementations • 14 Jan 2021 • Gaoang Wang, Lin Chen, Tianqiang Liu, Mingwei He, Jiebo Luo
To solve the first issue of identity overlapping, we propose a dataset-aware loss for multi-dataset training by reducing the penalty when the same person appears in multiple datasets.
no code implementations • 31 Oct 2020 • Renshu Gu, Gaoang Wang, Jenq-Neng Hwang
Videos that contain multiple potentially occluded people captured from freely moving monocular cameras are very common in real-world scenarios, while 3D HPE for such scenarios is quite challenging, partially because there is a lack of such data with accurate 3D ground truth labels in existing datasets.
no code implementations • 18 Oct 2019 • Haotian Zhang, Gaoang Wang, Zhichao Lei, Jenq-Neng Hwang
Drones, or general UAVs, equipped with a single camera have been widely deployed to a broad range of applications, such as aerial photography, fast goods delivery and most importantly, surveillance.
1 code implementation • 18 Nov 2018 • Gaoang Wang, Yizhou Wang, Haotian Zhang, Renshu Gu, Jenq-Neng Hwang
Multi-object tracking (MOT) is an important and practical task related to both surveillance systems and moving camera applications, such as autonomous driving and robotic vision.
Ranked #20 on
Multi-Object Tracking
on MOT16
no code implementations • 22 Aug 2017 • Zheng Tang, Gaoang Wang, Tao Liu, Young-Gun Lee, Adwin Jahn, Xu Liu, Xiaodong He, Jenq-Neng Hwang
In this challenge, we propose a model-based vehicle localization method, which builds a kernel at each patch of the 3D deformable vehicle model and associates them with constraints in 3D space.