no code implementations • 20 Oct 2024 • Bohao Liao, Wei Zhai, Zengyu Wan, Tianzhu Zhang, Yang Cao, Zheng-Jun Zha
First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream.
1 code implementation • 15 Oct 2024 • Hongchen Luo, Wei Zhai, Jiao Wang, Yang Cao, Zheng-Jun Zha
Perceiving potential ``action possibilities'' (\ie, affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.
1 code implementation • 14 Oct 2024 • Wei Zhai, Nan Bai, Qing Zhao, Jianqiang Li, Fan Wang, Hongzhi Qi, Meng Jiang, Xiaoqin Wang, Bing Xiang Yang, Guanghui Fu
The proposed models were evaluated on three downstream tasks and achieved better or comparable performance compared to deep learning models, generalized LLMs, and task fine-tuned LLMs.
no code implementations • 14 Oct 2024 • Jian Yang, Dacheng Yin, Yizhou Zhou, Fengyun Rao, Wei Zhai, Yang Cao, Zheng-Jun Zha
However, we have identified that recent methods inevitably suffer from loss of image information during understanding task, due to either image discretization or diffusion denoising steps.
Ranked #173 on Visual Question Answering on MM-Vet
no code implementations • 30 Sep 2024 • Huilin Deng, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang
Furthermore, we introduce the Real Industrial Anomaly Detection (RIAD), a comprehensive IAD dataset with detailed anomaly descriptions and analyses, offering a valuable resource for MLLM-based IAD development.
no code implementations • 29 Sep 2024 • Cuiyu Liu, Wei Zhai, Yuhang Yang, Hongchen Luo, Sen Liang, Yang Cao, Zheng-Jun Zha
To empower the model with such abilities, we introduce a novel task: grounding 3D scene affordance from egocentric interactions, where the goal is to identify the corresponding affordance regions in a 3D scene based on an egocentric video of an interaction.
no code implementations • 31 Jul 2024 • Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang
To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation.
no code implementations • 23 Jul 2024 • Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong
In this paper, we propose CrysToGraph ($\textbf{Crys}$tals with $\textbf{T}$ransformers $\textbf{o}$n $\textbf{Graph}$s), a novel transformer-based geometric graph network designed specifically for unconventional crystalline systems, and UnconvBench, a comprehensive benchmark to evaluate models' predictive performance on unconventional crystal materials such as defected crystals, low-dimension crystals and MOF.
no code implementations • 22 May 2024 • Yuhang Yang, Wei Zhai, Chengfeng Wang, Chengjun Yu, Yang Cao, Zheng-Jun Zha
For the egocentric HOI, in addition to perceiving semantics e. g., ''what'' interaction is occurring, capturing ''where'' the interaction specifically manifests in 3D space is also crucial, which links the perception and operation.
1 code implementation • 20 May 2024 • Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha
Video virtual try-on aims to transfer a clothing item onto the video of a target person.
no code implementations • 9 May 2024 • Zichen Zhang, Hongchen Luo, Wei Zhai, Yang Cao, Yu Kang
Building upon this relationship, a novel Bidirectional prOgressive Transformer (BOT), which introduces a Bidirectional Progressive mechanism into the anticipation of interaction intention is established.
1 code implementation • 19 Apr 2024 • Hongzhi Qi, Hanfei Liu, Jianqiang Li, Qing Zhao, Wei Zhai, Dan Luo, Tian Yu He, Shuo Liu, Bing Xiang Yang, Guanghui Fu
Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10.
no code implementations • 18 Apr 2024 • Zhong Wang, Zengyu Wan, Han Han, Bohao Liao, Yuliang Wu, Wei Zhai, Yang Cao, Zheng-Jun Zha
Event-based eye tracking has shown great promise with the high temporal resolution and low redundancy provided by the event camera.
2 code implementations • 17 Apr 2024 • Zuowen Wang, Chang Gao, Zongwei Wu, Marcos V. Conde, Radu Timofte, Shih-Chii Liu, Qinyu Chen, Zheng-Jun Zha, Wei Zhai, Han Han, Bohao Liao, Yuliang Wu, Zengyu Wan, Zhong Wang, Yang Cao, Ganchao Tan, Jinze Chen, Yan Ru Pei, Sasskia Brüers, Sébastien Crouzet, Douglas McLelland, Oliver Coenen, Baoheng Zhang, Yizhao Gao, Jingyuan Li, Hayden Kwok-Hay So, Philippe Bich, Chiara Boretti, Luciano Prono, Mircea Lică, David Dinucu-Jianu, Cătălin Grîu, Xiaopeng Lin, Hongwei Ren, Bojun Cheng, Xinan Zhang, Valentin Vial, Anthony Yezzi, James Tsai
This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge.
1 code implementation • 17 Apr 2024 • Meng Jiang, Yi Jing Yu, Qing Zhao, Jianqiang Li, Changwei Song, Hongzhi Qi, Wei Zhai, Dan Luo, Xiaoqin Wang, Guanghui Fu, Bing Xiang Yang
Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care.
3 code implementations • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi
In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.
no code implementations • 14 Mar 2024 • Yuliang Wu, Ganchao Tan, Jinze Chen, Wei Zhai, Yang Cao, Zheng-Jun Zha
In this paper, we propose AsynHDR, a Pixel-Asynchronous HDR imaging system, based on key insights into the challenges in HDR imaging and the unique event-generating mechanism of Dynamic Vision Sensors (DVS).
no code implementations • 14 Mar 2024 • Hongchen Luo, Kai Zhu, Wei Zhai, Yang Cao
Finally, the inferred human movement and high-level action descriptions jointly guide the generation of exocentric motion and interaction content (i. e., corresponding optical flow and occlusion maps) in the backward process of the diffusion model, ultimately warping them into the corresponding exocentric video.
1 code implementation • 14 Feb 2024 • Wei Zhai, Hongzhi Qi, Qing Zhao, Jianqiang Li, Ziqi Wang, Han Wang, Bing Xiang Yang, Guanghui Fu
There is a recognized need for models capable of efficient analysis.
no code implementations • CVPR 2024 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Zheng-Jun Zha
Which underexploit certain correlations between the interaction counterparts (human and object), and struggle to address the uncertainty in interactions.
1 code implementation • 4 Dec 2023 • Fan Lu, Kai Zhu, Kecheng Zheng, Wei Zhai, Yang Cao
Full-spectrum out-of-distribution (F-OOD) detection aims to accurately recognize in-distribution (ID) samples while encountering semantic and covariate shifts simultaneously.
2 code implementations • 22 Sep 2023 • Wei Zhai, Pingyu Wu, Kai Zhu, Yang Cao, Feng Wu, Zheng-Jun Zha
In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets.
2 code implementations • 7 Sep 2023 • Hongzhi Qi, Qing Zhao, Jianqiang Li, Changwei Song, Wei Zhai, Dan Luo, Shuo Liu, Yi Jing Yu, Fan Wang, Huijing Zou, Bing Xiang Yang, Guanghui Fu
We also evaluated the performance of the LLMs after fine-tuning on the proposed tasks.
1 code implementation • 29 Aug 2023 • Guanghui Fu, Qing Zhao, Jianqiang Li, Dan Luo, Changwei Song, Wei Zhai, Shuo Liu, Fan Wang, Yan Wang, Lijuan Cheng, Juan Zhang, Bing Xiang Yang
In the contemporary landscape of social media, an alarming number of users express negative emotions, some of which manifest as strong suicidal intentions.
1 code implementation • ICCV 2023 • Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task.
1 code implementation • CVPR 2023 • Fan Lu, Kai Zhu, Wei Zhai, Kecheng Zheng, Yang Cao
Semantically coherent out-of-distribution (SCOOD) detection aims to discern outliers from the intended data distribution with access to unlabeled extra set.
1 code implementation • ICCV 2023 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.
1 code implementation • CVPR 2023 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
Perceiving potential "action possibilities" (i. e., affordance) regions of images and learning interactive functionalities of objects from human demonstration is a challenging task due to the diversity of human-object interactions.
2 code implementations • 28 Aug 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
Due to the diversity of interactive affordance, the uniqueness of different individuals leads to diverse interactions, which makes it difficult to establish an explicit link between object parts and affordance labels.
1 code implementation • 18 Mar 2022 • Yangyang Li, Wei Zhai, Yang Cao, Zheng-Jun Zha
However, these methods struggle in 1) efficiently generating camouflage images using foreground and background with arbitrary structure; 2) camouflaging foreground objects to regions with multiple appearances (e. g. the junction of the vegetation and the mountains), which limit their practical application.
2 code implementations • CVPR 2022 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
To empower an agent with such ability, this paper proposes a task of affordance grounding from exocentric view, i. e., given exocentric human-object interaction and egocentric object images, learning the affordance knowledge of the object and transferring it to the egocentric image using only the affordance label as supervision.
2 code implementations • CVPR 2022 • Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Non-exemplar class-incremental learning is to recognize both the old and new classes when old class samples cannot be saved.
4 code implementations • 24 Feb 2022 • Liangsheng Lu, Wei Zhai, Hongchen Luo, Yu Kang, Yang Cao
In this paper, we explore to perceive affordance from a vision-language perspective and consider the challenging phrase-based affordance detection problem, i. e., given a set of phrases describing the action purposes, all the object regions in a scene with the same affordance should be detected.
2 code implementations • CVPR 2022 • Pingyu Wu, Wei Zhai, Yang Cao
Existing FPM-based methods use cross-entropy (CE) to evaluate the foreground prediction map and to guide the learning of generator.
1 code implementation • 12 Oct 2021 • Shilian Wu, Wei Zhai, Yongrui Li, Kewei Wang, Zengfu Wang
It is crucial to understand the robustness of text detection models with regard to extensive corruptions, since scene text detection techniques have many practical applications.
no code implementations • 12 Aug 2021 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
For the object branch, we introduce a semantic enhancement module (SEM) to make the network focus on different parts of the object according to the action classes and utilize a distillation loss to align the output features of the object branch with that of the video branch and transfer the knowledge in the video branch to the object branch.
Ranked #2 on Video-to-image Affordance Grounding on EPIC-Hotspot
1 code implementation • 8 Aug 2021 • Wei Zhai, Hongchen Luo, Jing Zhang, Yang Cao, DaCheng Tao
To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.
1 code implementation • CVPR 2021 • Kai Zhu, Yang Cao, Wei Zhai, Jie Cheng, Zheng-Jun Zha
Few-shot class-incremental learning is to recognize the new classes given few samples and not forget the old classes.
class-incremental learning Few-Shot Class-Incremental Learning +2
2 code implementations • 28 Jun 2021 • Hongchen Luo, Wei Zhai, Jing Zhang, Yang Cao, DaCheng Tao
To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i. e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected.
no code implementations • CVPR 2020 • Wei Zhai, Yang Cao, Zheng-Jun Zha, HaiYong Xie, Feng Wu
Next, these primitives are associated with a dependence learning module (DLM) to generate structural representation, in which a two-way collaborative relationship strategy is introduced to perceive the spatial dependencies among multiple primitives.
no code implementations • 12 Apr 2020 • Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
no code implementations • ICCV 2019 • Wei Zhai, Yang Cao, Jing Zhang, Zheng-Jun Zha
Texture recognition is a challenging visual task as multiple perceptual attributes may be perceived from the same texture image when combined with different spatial context.
no code implementations • 16 May 2019 • Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image.