no code implementations • ECCV 2020 • Haochen Wang, Xu-Dong Zhang, Yutao Hu, Yandan Yang, Xian-Bin Cao, Xian-Tong Zhen
The crux of few-shot segmentation is to extract object information from the support image and then propagate it to guide the segmentation of query images.
no code implementations • 14 Apr 2025 • Weixian Lei, Jiacong Wang, Haochen Wang, Xiangtai Li, Jun Hao Liew, Jiashi Feng, Zilong Huang
This paper introduces SAIL, a single transformer unified multimodal large language model (MLLM) that integrates raw pixel encoding and language decoding within a singular architecture.
no code implementations • 2 Apr 2025 • Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang
The latter aims to aggregate information from all available views to recover Bird's-Eye-View images, contributing to a comprehensive overview of the entire scene.
no code implementations • 20 Mar 2025 • Haochen Wang, Kai Hu, Liangcai Gao
Through fine-tuning, the LLM is equipped with audio-visual capabilities, leading to significant improvements in document-centric video understanding.
no code implementations • 30 Oct 2024 • Hongbo Zhao, Lue Fan, Yuntao Chen, Haochen Wang, Yuran Yang, Xiaojuan Jin, Yixin Zhang, Gaofeng Meng, Zhaoxiang Zhang
By publishing and maintaining the dataset, we provide a high-quality benchmark for satellite-based map construction and downstream tasks like autonomous driving.
no code implementations • 12 Oct 2024 • Haochen Wang, Anlin Zheng, Yucheng Zhao, Tiancai Wang, Zheng Ge, Xiangyu Zhang, Zhaoxiang Zhang
This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals.
1 code implementation • 24 Sep 2024 • Qian-Wen Zhang, Haochen Wang, Fang Li, Siyu An, Lingfeng Qiao, Liangcai Gao, Di Yin, Xing Sun
Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure.
no code implementations • 21 Aug 2024 • Haochen Wang, Kai Hu, Haoyu Dong, Liangcai Gao
To the best of our knowledge, this problem has not been previously explored.
2 code implementations • 16 Jul 2024 • Cilin Yan, Haochen Wang, Shilin Yan, XiaoLong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves
In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS).
Ranked #3 on
Referring Video Object Segmentation
on ReVOS
no code implementations • CVPR 2024 • Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji
Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains.
no code implementations • 17 Jun 2024 • Cilin Yan, Haochen Wang, XiaoLong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves
Specifically, we adopt a transformer module which takes the visual feature as "Query", the text features of the anchors as "Key" and the similarity matrix between the text features of anchor and target classes as "Value".
no code implementations • CVPR 2024 • Joshua Ahn, Haochen Wang, Raymond A. Yeh, Greg Shakhnarovich
Scale-ambiguity in 3D scene dimensions leads to magnitude-ambiguity of volumetric densities in neural radiance fields, i. e., the densities double when scene size is halved, and vice versa.
3 code implementations • CVPR 2024 • Hongbo Zhao, Bolin Ni, Haochen Wang, Junsong Fan, Fei Zhu, Yuxi Wang, Yuntao Chen, Gaofeng Meng, Zhaoxiang Zhang
(i) For unwanted knowledge, efficient and effective deleting is crucial.
1 code implementation • 29 Jan 2024 • Jie Liu, Wenzhe Yin, Haochen Wang, Yunlu Chen, Jan-Jakob Sonke, Efstratios Gavves
Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features.
1 code implementation • 21 Dec 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tiancai Wang, Xiangyu Zhang, Zhaoxiang Zhang
To empower the model as a teacher, we propose Hard Patches Mining (HPM), predicting patch-wise losses and subsequently determining where to mask.
1 code implementation • NeurIPS 2023 • Haochen Wang, Junsong Fan, Yuxi Wang, Kaiyou Song, Tong Wang, Zhaoxiang Zhang
As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, the need for an appropriate self-supervised pretext task that enhances the location awareness of ViTs is becoming evident.
1 code implementation • 4 Jun 2023 • Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang
A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability.
1 code implementation • CVPR 2023 • Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen
In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation.
1 code implementation • 23 May 2023 • Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang
To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features.
1 code implementation • 23 Apr 2023 • Cilin Yan, Haochen Wang, Jie Liu, XiaoLong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves
Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing.
1 code implementation • CVPR 2023 • Haochen Wang, Kaiyou Song, Junsong Fan, Yuxi Wang, Jin Xie, Zhaoxiang Zhang
We observe that the reconstruction loss can naturally be the metric of the difficulty of the pre-training task.
1 code implementation • ICCV 2023 • Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves
Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.
no code implementations • 9 Jan 2023 • Jie Liu, Yanqi Bao, Wenzhe Yin, Haochen Wang, Yang Gao, Jan-Jakob Sonke, Efstratios Gavves
However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction.
Ranked #45 on
Few-Shot Semantic Segmentation
on PASCAL-5i (1-Shot)
no code implementations • 9 Dec 2022 • Nam Anh Dinh, Haochen Wang, Greg Shakhnarovich, Rana Hanocka
There is no settled universal 3D representation for geometry with many alternatives such as point clouds, meshes, implicit functions, and voxels to name a few.
1 code implementation • CVPR 2023 • Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, Greg Shakhnarovich
We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field.
Ranked #6 on
Text to 3D
on T$^3$Bench
1 code implementation • 15 Sep 2022 • Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu
Self-training has shown great potential in semi-supervised learning.
1 code implementation • CVPR 2022 • Haochen Wang, Jiayi Shen, Yongtuo Liu, Yan Gao, Efstratios Gavves
To tackle this issue, we propose a Neighbor Transformer Network, or NFormer, which explicitly models interactions across all input images, thus suppressing outlier features and leading to more robust representations overall.
1 code implementation • CVPR 2022 • Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le
A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability.
no code implementations • 2 Feb 2022 • Yan Gao, Qimeng Wang, Xu Tang, Haochen Wang, Fei Ding, Jing Li, Yao Hu
Prior works propose to predict Intersection-over-Union (IoU) between bounding boxes and corresponding ground-truths to improve NMS, while accurately predicting IoU is still a challenging problem.
1 code implementation • 14 May 2021 • Haoliang Sun, Xiankai Lu, Haochen Wang, Yilong Yin, XianTong Zhen, Cees G. M. Snoek, Ling Shao
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
1 code implementation • CVPR 2021 • Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.
no code implementations • CVPR 2020 • Haochen Wang, Ruotian Luo, Michael Maire, Greg Shakhnarovich
The core of our approach, Pixel Consensus Voting, is a framework for instance segmentation based on the Generalized Hough transform.
Ranked #36 on
Panoptic Segmentation
on COCO test-dev
2 code implementations • 1 Aug 2019 • Igor Vasiljevic, Nick Kolkin, Shanyi Zhang, Ruotian Luo, Haochen Wang, Falcon Z. Dai, Andrea F. Daniele, Mohammadreza Mostajabi, Steven Basart, Matthew R. Walter, Gregory Shakhnarovich
We introduce DIODE, a dataset that contains thousands of diverse high resolution color images with accurate, dense, long-range depth measurements.