no code implementations • 9 Oct 2024 • Kerui Huang, Wenbo Xu, Haoliang Hu, XiaoLong Jiang, Lei Sun, Wenyan Zhao, Binbin Long, Shaogang Fan, Zhibo Zhou, Ping Mo, Xiaocheng Jiang, Jianhong Tian, Aihua Deng, Peng Xie, Yun Wang
In this study, we sequenced and analyzed the mitochondrial genome of Cathaya argyrophylla, an endangered and endemic Pinaceae species, uncovering a genome size of 18. 99 Mb, meaning the largest mitochondrial genome reported to date.
no code implementations • 26 Sep 2024 • Huixin Sun, Runqi Wang, Yanjing Li, Xianbin Cao, XiaoLong Jiang, Yao Hu, Baochang Zhang
We propose a method that balances fine-tuning and quantization named ``Prompt for Quantization'' (P4Q), in which we design a lightweight architecture to leverage contrastive loss supervision to enhance the recognition performance of a PTQ model.
1 code implementation • 13 Aug 2024 • Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Fuli Feng
In this paper, we re-examine the SID problem and identify two prevalent biases in current training paradigms, i. e., weakened artifact features and overfitted artifact features.
2 code implementations • 16 Jul 2024 • Cilin Yan, Haochen Wang, Shilin Yan, XiaoLong Jiang, Yao Hu, Guoliang Kang, Weidi Xie, Efstratios Gavves
In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS).
1 code implementation • 27 Jun 2024 • Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, XiaoLong Jiang, Yao Hu, Weidi Xie
This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc.
no code implementations • 17 Jun 2024 • Cilin Yan, Haochen Wang, XiaoLong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves
Specifically, we adopt a transformer module which takes the visual feature as "Query", the text features of the anchors as "Key" and the similarity matrix between the text features of anchor and target classes as "Value".
1 code implementation • 12 Mar 2024 • Mingze Wang, Lili Su, Cilin Yan, Sheng Xu, Pengcheng Yuan, XiaoLong Jiang, Baochang Zhang
RSBuilding is designed to enhance cross-scene generalization and task universality.
1 code implementation • 17 May 2023 • Bohan Zeng, Shanglin Li, Xuhui Liu, Sicheng Gao, XiaoLong Jiang, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
1 code implementation • 23 Apr 2023 • Cilin Yan, Haochen Wang, Jie Liu, XiaoLong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves
Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing.
no code implementations • 14 Apr 2023 • Jie Guo, Qimeng Wang, Yan Gao, XiaoLong Jiang, Xu Tang, Yao Hu, Baochang Zhang
CLIP (Contrastive Language-Image Pretraining) is well-developed for open-vocabulary zero-shot image-level recognition, while its applications in pixel-level tasks are less investigated, where most efforts directly adopt CLIP features without deliberative adaptations.
Ranked #5 on Zero-Shot Semantic Segmentation on COCO-Stuff
1 code implementation • ICCV 2023 • Haochen Wang, Cilin Yan, Shuai Wang, XiaoLong Jiang, Xu Tang, Yao Hu, Weidi Xie, Efstratios Gavves
Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos.
1 code implementation • 16 Feb 2023 • Keyan Chen, Wenyuan Li, Sen Lei, Jianqi Chen, XiaoLong Jiang, Zhengxia Zou, Zhenwei Shi
Despite its fruitful applications in remote sensing, image super-resolution is troublesome to train and deploy as it handles different resolution magnifications with separate models.
1 code implementation • CVPR 2023 • Keyan Chen, XiaoLong Jiang, Yao Hu, Xu Tang, Yan Gao, Jianqi Chen, Weidi Xie
In this paper, we consider the problem of simultaneously detecting objects and inferring their visual attributes in an image, even for those with no manual annotations provided at the training stage, resembling an open-vocabulary scenario.
Ranked #1 on Open Vocabulary Attribute Detection on OVAD benchmark (using extra training data)
1 code implementation • CVPR 2021 • Haochen Wang, XiaoLong Jiang, Haibing Ren, Yao Hu, Song Bai
In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77. 8% J &F and 70 FPS on DAVIS 2017 validation dataset, leading all present solutions in overall accuracy and speed performance.
1 code implementation • 11 Jan 2021 • Tun Zhu, Daoxin Zhang, Yao Hu, Tianran Wang, XiaoLong Jiang, Jianke Zhu, Jiawei Li
Alongside the prevalence of mobile videos, the general public leans towards consuming vertical videos on hand-held devices.
1 code implementation • 11 Jul 2019 • Xiaolong Jiang, Peizhao Li, Yanjing Li, Xian-Tong Zhen
In this work, we present an end-to-end framework to settle data association in online Multiple-Object Tracking (MOT).
no code implementations • CVPR 2019 • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
no code implementations • 3 Mar 2019 • Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xian-Tong Zhen, Xian-Bin Cao, David Doermann, Ling Shao
In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps.
no code implementations • 16 Dec 2018 • Xiaolong Jiang, Peizhao Li, Xian-Tong Zhen, Xian-Bin Cao
To overcome the object-centric information scarcity, both appearance and motion features are deeply integrated by the proposed AMNet, which is an end-to-end offline trained two-stream network.