no code implementations • ECCV 2020 • Zetong Yang, Yanan sun, Shu Liu, Xiaojuan Qi, Jiaya Jia
In 3D recognition, to fuse multi-scale structure information, existing methods apply hierarchical frameworks stacked by multiple fusion layers for integrating current relative locations with structure information from the previous level.
no code implementations • ECCV 2020 • Ruizheng Wu, Huaijia Lin, Xiaojuan Qi, Jiaya Jia
Video propagation is a fundamental problem in video processing where guidance frame predictions are propagated to guide predictions of the target frame.
no code implementations • 16 Dec 2024 • Yi-Hua Huang, Ming-Xian Lin, Yang-tian Sun, ZiYi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
Recently, Gaussian splatting has emerged as a robust technique for representing 3D scenes, enabling real-time rasterization and high-fidelity rendering.
no code implementations • 6 Dec 2024 • Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin
UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.
1 code implementation • 22 Nov 2024 • Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu
To exploit the potential of high-quality VideoQA pairs, we propose a Hybrid LVLMs Collaboration framework, featuring a Frame Selector and a two-stage instruction fine-tuned reasoning LVLM.
1 code implementation • 22 Nov 2024 • Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, Jianhui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, Xiaojuan Qi
Instead, we focus on the fundamental problem of learning in the UV texture space itself.
no code implementations • 21 Oct 2024 • Shizhen Zhao, Xin Wen, Jiahui Liu, Chuofan Ma, Chunfeng Yuan, Xiaojuan Qi
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss that encourages the model to focus on class discrimination within the target dataset.
1 code implementation • 8 Oct 2024 • Wei Huang, Yue Liao, Jianhui Liu, Ruifei He, Haoru Tan, Shiming Zhang, Hongsheng Li, Si Liu, Xiaojuan Qi
Our MC-MoE integrates static quantization and dynamic pruning to collaboratively achieve extreme compression for MoE-LLMs with less accuracy loss, ensuring an optimal trade-off between performance and efficiency.
1 code implementation • 8 Sep 2024 • Jiahui Liu, Xin Wen, Shizhen Zhao, Yingxian Chen, Xiaojuan Qi
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data.
1 code implementation • 26 Jul 2024 • Bo wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang
To address these fundamental challenges, we introduce pruning optimization for input-aware dynamic memristive spiking neural network (PRIME).
no code implementations • 12 Jul 2024 • Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing.
no code implementations • 10 Jul 2024 • Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu
EA-VTR can efficiently encode frame-level and video-level visual representations simultaneously, enabling detailed event content and complex event temporal cross-modal alignment, ultimately enhancing the comprehensive understanding of video events.
no code implementations • CVPR 2024 • Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu
Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency.
no code implementations • 3 Jul 2024 • Runyu Ding, Yuzhe Qin, Jiyue Zhu, Chengzhe Jia, Shiqi Yang, Ruihan Yang, Xiaojuan Qi, Xiaolong Wang
Our system's ability to handle bimanual manipulations while prioritizing safety and real-time performance makes it a powerful tool for advancing dexterous manipulation and imitation learning.
no code implementations • 29 Jun 2024 • Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, yinda zhang
We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model.
no code implementations • 27 Jun 2024 • Chirui Chang, Zhengzhe Liu, Xiaoyang Lyu, Xiaojuan Qi
In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion.
no code implementations • 19 Jun 2024 • Yang-tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking, depth prediction, segmentation, view synthesis, and editing.
1 code implementation • 31 May 2024 • Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi
Severe data imbalance naturally exists among web-scale vision-language datasets.
1 code implementation • 23 May 2024 • Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi
Specifically, the proposed SliM-LLM mainly relies on two novel techniques: (1) Salience-Determined Bit Allocation utilizes the clustering characteristics of salience distribution to allocate the bit-widths of each group, increasing the accuracy of quantized LLMs and maintaining the inference efficiency; (2) Salience-Weighted Quantizer Calibration optimizes the parameters of the quantizer by considering the element-wise salience within the group, balancing the maintenance of salient information and minimization of errors.
2 code implementations • 22 Apr 2024 • Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
To uncover the capabilities of low-bit quantized MLLM, we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods.
1 code implementation • 19 Apr 2024 • Chuofan Ma, Yi Jiang, Jiannan Wu, Zehuan Yuan, Xiaojuan Qi
We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability.
Ranked #17 on
Natural Language Visual Grounding
on ScreenSpot
no code implementations • 15 Apr 2024 • Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit.
1 code implementation • 8 Apr 2024 • Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo wang, Xinyuan Zhang, Binbin Cui, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Demonstrating equivalent generative quality to the software baseline, our system achieved remarkable enhancements in generative speed for both unconditional and conditional generation tasks, by factors of 64. 8 and 156. 5, respectively.
no code implementations • 30 Mar 2024 • Xiaoyang Lyu, Yang-tian Sun, Yi-Hua Huang, Xiuzhe Wu, ZiYi Yang, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi
In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS.
1 code implementation • CVPR 2024 • Xiaoyang Lyu, Chirui Chang, Peng Dai, Yang-tian Sun, Xiaojuan Qi
Scene reconstruction from multi-view images is a fundamental problem in computer vision and graphics.
1 code implementation • 21 Mar 2024 • Weipeng Deng, Jihan Yang, Runyu Ding, Jiahui Liu, Yijiang Li, Xiaojuan Qi, Edith Ngai
To test the language understandability of 3D-VL models, we first propose a language robustness task for systematically assessing 3D-VL models across various tasks, benchmarking their performance when presented with different language style variants.
no code implementations • 9 Mar 2024 • Xiuzhe Wu, Xiaoyang Lyu, Qihao Huang, Yong liu, Yang Wu, Ying Shan, Xiaojuan Qi
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
1 code implementation • CVPR 2024 • Jiequan Cui, Beier Zhu, Xin Wen, Xiaojuan Qi, Bei Yu, Hanwang Zhang
Second, with the proposed concept of Model Prediction Bias, we investigate the origins of problematic representation during optimization.
no code implementations • 24 Feb 2024 • ZiYi Yang, Xinyu Gao, Yangtian Sun, Yihua Huang, Xiaoyang Lyu, Wen Zhou, Shaohui Jiao, Xiaojuan Qi, Xiaogang Jin
Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians.
no code implementations • 22 Feb 2024 • Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song Bai, Xiaojuan Qi
Despite its simplicity, we show that IDA shows efficiency and fast convergence in resolving the social bias in TTI diffusion models.
1 code implementation • CVPR 2024 • Xin Kong, Shikun Liu, Xiaoyang Lyu, Marwan Taher, Xiaojuan Qi, Andrew J. Davison
We introduce EscherNet, a multi-view conditioned diffusion model for view synthesis.
1 code implementation • 6 Feb 2024 • Wei Huang, Yangdong Liu, Haotong Qin, Ying Li, Shiming Zhang, Xianglong Liu, Michele Magno, Xiaojuan Qi
Pretrained large language models (LLMs) exhibit exceptional general language processing capabilities but come with significant demands on memory and computational resources.
1 code implementation • 5 Feb 2024 • Jihan Yang, Runyu Ding, Ellis Brown, Xiaojuan Qi, Saining Xie
There is a sensory gulf between the Earth that humans inhabit and the digital realms in which modern AI agents are created.
no code implementations • 11 Jan 2024 • Peng Dai, Feitong Tan, Xin Yu, Yifan Peng, yinda zhang, Xiaojuan Qi
Virtual environments (VEs) are pivotal for virtual, augmented, and mixed reality systems.
no code implementations • CVPR 2024 • Sitong Wu, Haoru Tan, Zhuotao Tian, Yukang Chen, Xiaojuan Qi, Jiaya Jia
We discover that the lack of consideration for sample-wise affinity consistency across modalities in existing training objectives is the central cause.
no code implementations • 14 Dec 2023 • Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli Ouyang
Recent advancements in text-to-3D generation technology have significantly advanced the conversion of textual descriptions into imaginative well-geometrical and finely textured 3D objects.
no code implementations • 14 Dec 2023 • Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Our random resistive memory-based deep extreme point learning machine may pave the way for energy-efficient and training-friendly edge AI across various data modalities and tasks.
1 code implementation • CVPR 2024 • Yi-Hua Huang, Yang-tian Sun, ZiYi Yang, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi
During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions.
no code implementations • 13 Nov 2023 • Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network.
1 code implementation • 3 Nov 2023 • Zhengzhe Liu, Jingyu Hu, Ka-Hei Hui, Xiaojuan Qi, Daniel Cohen-Or, Chi-Wing Fu
This paper presents a new text-guided technique for generating 3D shapes.
no code implementations • 30 Oct 2023 • Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan Qi
In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks.
1 code implementation • NeurIPS 2023 • Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi
CoDet then leverages visual similarities to discover the co-occurring objects and align them with the shared concept.
Ranked #4 on
Open Vocabulary Object Detection
on LVIS v1.0
(using extra training data)
no code implementations • 29 Sep 2023 • Song Wang, Zhu Wang, Can Li, Xiaojuan Qi, Hayden Kwok-Hay So
In comparison to conventional RGB cameras, the superior temporal resolution of event cameras allows them to capture rich information between frames, making them prime candidates for object tracking.
1 code implementation • ICCV 2023 • Xiuzhe Wu, Pengfei Hu, Yang Wu, Xiaoyang Lyu, Yan-Pei Cao, Ying Shan, Wenming Yang, Zhongqian Sun, Xiaojuan Qi
Therefore, directly learning a mapping function from speech to the entire head image is prone to ambiguity, particularly when using a short video for training.
1 code implementation • ICCV 2023 • Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Zhengzhe Liu, Xiaojuan Qi
In this work, we focus on synthesizing high-quality textures on 3D meshes.
no code implementations • 1 Aug 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes.
Ranked #3 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
1 code implementation • CVPR 2023 • Jiahui Liu, Chirui Chang, Jianhui Liu, Xiaoyang Wu, Lan Ma, Xiaojuan Qi
Unlike the single-scan-based semantic segmentation task, this task requires distinguishing the motion states of points in addition to their semantic categories.
4 code implementations • 23 May 2023 • Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu, Hanwang Zhang
In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels.
1 code implementation • CVPR 2023 • Peng Dai, yinda zhang, Xin Yu, Xiaoyang Lyu, Xiaojuan Qi
Rendering novel view images is highly desirable for many applications.
1 code implementation • CVPR 2024 • Jihan Yang, Runyu Ding, Weipeng Deng, Zhe Wang, Xiaojuan Qi
We propose a lightweight and scalable Regional Point-Language Contrastive learning framework, namely \textbf{RegionPLC}, for open-world 3D scene understanding, aiming to identify and recognize open-set objects and categories.
no code implementations • 27 Mar 2023 • Xiaoyan Qian, Chang Liu, Xiaojuan Qi, Siew-Chong Tan, Edmund Lam, Ngai Wong
3D automatic annotation has received increased attention since manually annotating 3D point clouds is laborious.
1 code implementation • 26 Mar 2023 • Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu
3D scene understanding, e. g., point cloud semantic and instance segmentation, often requires large-scale annotated training data, but clearly, point-wise labels are too tedious to prepare.
2 code implementations • 24 Mar 2023 • Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu
The core of our approach is a two-stage feature-space alignment strategy that leverages a pre-trained single-view reconstruction (SVR) model to map CLIP features to shapes: to begin with, map the CLIP image feature to the detail-rich 3D shape space of the SVR model, then map the CLIP text feature to the 3D shape space through encouraging the CLIP-consistency between rendered images and the input text.
1 code implementation • ICCV 2023 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Xiaojuan Qi
Category-level 6D pose estimation aims to predict the poses and sizes of unseen objects from a specific category.
2 code implementations • 21 Mar 2023 • Zhuotao Tian, Jiequan Cui, Li Jiang, Xiaojuan Qi, Xin Lai, Yixin Chen, Shu Liu, Jiaya Jia
Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing.
2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.
Ranked #1 on
3D Multi-Object Tracking
on nuScenes LiDAR only
1 code implementation • ICCV 2023 • Xiaoyang Lyu, Peng Dai, Zizhang Li, Dongyu Yan, Yi Lin, Yifan Peng, Xiaojuan Qi
We found that the color rendering loss results in optimization bias against low-intensity areas, causing gradient vanishing and leaving these areas unoptimized.
2 code implementations • CVPR 2023 • Zhisheng Zhong, Jiequan Cui, Yibo Yang, Xiaoyang Wu, Xiaojuan Qi, Xiangyu Zhang, Jiaya Jia
Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers.
no code implementations • CVPR 2023 • Ruihang Chu, Zhengzhe Liu, Xiaoqing Ye, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia
The key of Cart is to utilize the prediction of object structures to connect visual observations with user commands for effective manipulations.
no code implementations • 10 Dec 2022 • Hai Wu, Ruifei He, Haoru Tan, Xiaojuan Qi, Kaibin Huang
Experiments show that the proposed vertical-layered representation and developed once QAT scheme are effective in embodying multiple quantized networks into a single one and allow one-time training, and it delivers comparable performance as that of quantized models tailored to any specific bit-width.
1 code implementation • CVPR 2023 • Runyu Ding, Jihan Yang, Chuhui Xue, Wenqing Zhang, Song Bai, Xiaojuan Qi
Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space.
Ranked #2 on
3D Open-Vocabulary Instance Segmentation
on S3DIS
3D Open-Vocabulary Instance Segmentation
Contrastive Learning
+4
1 code implementation • 28 Nov 2022 • Yingxian Chen, Zhengzhe Liu, Baoheng Zhang, Wilton Fok, Xiaojuan Qi, Yik-Chung Wu
Weakly supervised detection of anomalies in surveillance videos is a challenging task.
Anomaly Detection In Surveillance Videos
Video Anomaly Detection
2 code implementations • ICCV 2023 • Xin Wen, Bingchen Zhao, Xiaojuan Qi
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
Ranked #1 on
Open-World Semi-Supervised Learning
on ImageNet-100
1 code implementation • 30 Oct 2022 • Fernando Julio Cendra, Lan Ma, Jiajun Shen, Xiaojuan Qi
SL3D is a generic framework and can be applied to solve different 3D recognition tasks, including classification, object detection, and semantic segmentation.
Ranked #2 on
Unsupervised 3D Semantic Segmentation
on ScanNetV2
1 code implementation • 14 Oct 2022 • Ruifei He, Shuyang Sun, Xin Yu, Chuhui Xue, Wenqing Zhang, Philip Torr, Song Bai, Xiaojuan Qi
Recent text-to-image generation models have shown promising results in generating high-fidelity photo-realistic images.
1 code implementation • 11 Oct 2022 • Shizhen Zhao, Xiaojuan Qi
Most existing 3D point cloud object detection approaches heavily rely on large amounts of labeled training data.
no code implementations • 7 Oct 2022 • Kaibin Huang, Hai Wu, Zhiyan Liu, Xiaojuan Qi
We further propose a virtualized 6G network architecture customized for deploying in-situ model downloading with the key feature of a three-tier (edge, local, and central) AI library.
no code implementations • 28 Sep 2022 • Jianhui Liu, Yukang Chen, Xiaoqing Ye, Zhuotao Tian, Xiao Tan, Xiaojuan Qi
3D scenes are dominated by a large number of background points, which is redundant for the detection task that mainly needs to focus on foreground objects.
1 code implementation • 26 Sep 2022 • Chuofan Ma, Qiushan Guo, Yi Jiang, Zehuan Yuan, Ping Luo, Xiaojuan Qi
Our key finding is that the major cause of degradation is not information loss in the down-sampling process, but rather the mismatch between network architecture and input scale.
2 code implementations • 9 Sep 2022 • Zhengzhe Liu, Peng Dai, Ruihui Li, Xiaojuan Qi, Chi-Wing Fu
Text-guided 3D shape generation remains challenging due to the absence of large paired text-shape data, the substantial semantic gap between these two modalities, and the structural complexity of 3D shapes.
1 code implementation • 20 Jul 2022 • Chang Liu, Xiaoyan Qian, Binxiao Huang, Xiaojuan Qi, Edmund Lam, Siew-Chong Tan, Ngai Wong
By enriching the sparse point clouds, our method achieves 4. 48\% and 4. 03\% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler.
1 code implementation • 20 Jul 2022 • Xin Yu, Peng Dai, Wenbo Li, Lan Ma, Jiajun Shen, Jia Li, Xiaojuan Qi
With the rapid development of mobile devices, modern widely-used mobile phones typically allow users to capture 4K resolution (i. e., ultra-high-definition) images.
Ranked #1 on
Image Restoration
on UHDM
2 code implementations • CVPR 2023 • Yukang Chen, Jianhui Liu, Xiangyu Zhang, Xiaojuan Qi, Jiaya Jia
Recent advance in 2D CNNs has revealed that large kernels are important.
1 code implementation • 1 Jun 2022 • Yanwei Li, Yilun Chen, Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia
To this end, the modality-specific space is first designed to represent different inputs in the voxel feature space.
1 code implementation • CVPR 2022 • Yanwei Li, Xiaojuan Qi, Yukang Chen, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this work, we present a conceptually simple yet effective framework for cross-modality 3D object detection, named voxel field fusion.
1 code implementation • 30 May 2022 • Xin Wen, Bingchen Zhao, Anlin Zheng, Xiangyu Zhang, Xiaojuan Qi
The semantic grouping is performed by assigning pixels to a set of learnable prototypes, which can adapt to each sample by attentive pooling over the feature and form new slots.
Ranked #18 on
Unsupervised Semantic Segmentation
on COCO-Stuff-27
(Accuracy metric)
1 code implementation • 30 May 2022 • Jihan Yang, Shaoshuai Shi, Runyu Ding, Zhe Wang, Xiaojuan Qi
Then, we build a benchmark to assess existing KD methods developed in the 2D domain for 3D object detection upon six well-constructed teacher-student pairs.
1 code implementation • CVPR 2022 • Peng Dai, Xin Yu, Lan Ma, Baoheng Zhang, Jia Li, Wenbo Li, Jiajun Shen, Xiaojuan Qi
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
1 code implementation • 4 Apr 2022 • Runyu Ding, Jihan Yang, Li Jiang, Xiaojuan Qi
Deep learning approaches achieve prominent success in 3D semantic segmentation.
no code implementations • 29 Mar 2022 • Chang Liu, Xiaoyan Qian, Xiaojuan Qi, Edmund Y. Lam, Siew-Chong Tan, Ngai Wong
While a few previous studies tried to automatically generate 3D bounding boxes from weak labels such as 2D boxes, the quality is sub-optimal compared to human annotators.
1 code implementation • CVPR 2022 • Zhengzhe Liu, Yi Wang, Xiaojuan Qi, Chi-Wing Fu
In this work, we explore the challenging task of generating 3D shapes from text.
4 code implementations • CVPR 2022 • Xin Lai, Jianhui Liu, Li Jiang, LiWei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, Jiaya Jia
In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance.
Ranked #17 on
Semantic Segmentation
on ScanNet
1 code implementation • CVPR 2022 • Andong Wang, Wei-Ning Lee, Xiaojuan Qi
To this end, we propose HIerarchical Neuron concepT explainer (HINT) to effectively build bidirectional associations between neurons and hierarchical concepts in a low-cost and scalable manner.
2 code implementations • CVPR 2022 • Anlin Zheng, Yuang Zhang, Xiangyu Zhang, Xiaojuan Qi, Jian Sun
Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.
Ranked #1 on
Object Detection
on CrowdHuman
1 code implementation • CVPR 2022 • Ruifei He, Shuyang Sun, Jihan Yang, Song Bai, Xiaojuan Qi
Large-scale pre-training has been proven to be crucial for various computer vision tasks.
no code implementations • CVPR 2022 • Ruihang Chu, Xiaoqing Ye, Zhengzhe Liu, Xiao Tan, Xiaojuan Qi, Chi-Wing Fu, Jiaya Jia
We explore the way to alleviate the label-hungry problem in a semi-supervised setting for 3D instance segmentation.
2 code implementations • CVPR 2020 • Jin Gao, Yan Lu, Xiaojuan Qi, Yutong Kou, Bing Li, Liang Li, Shan Yu, Weiming Hu
In this paper, we propose a simple yet effective recursive least-squares estimator-aided online learning approach for few-shot online adaptation without requiring offline training.
no code implementations • CVPR 2022 • Yi Zhou, HUI ZHANG, Hana Lee, Shuyang Sun, Pingjun Li, Yangguang Zhu, ByungIn Yoo, Xiaojuan Qi, Jae-Joon Han
We encode all panoptic entities in a video, including both foreground instances and background semantics, with a unified representation called panoptic slots.
1 code implementation • 17 Aug 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, Yukang Chen, Lu Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly.
no code implementations • 15 Aug 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi
These specific designs enable the detector to be trained on meticulously refined pseudo labeled target data with denoised training signals, and thus effectively facilitate adapting an object detector to a target domain without requiring annotations.
no code implementations • 2 Aug 2021 • Botos Csaba, Xiaojuan Qi, Arslan Chaudhry, Puneet Dokania, Philip Torr
The key ingredients to our approach are -- (a) mapping the source to the target domain on pixel-level; (b) training a teacher network on the mapped source and the unannotated target domain using adversarial feature alignment; and (c) finally training a student network using the pseudo-labels obtained from the teacher.
1 code implementation • ICCV 2021 • Ruifei He, Jihan Yang, Xiaojuan Qi
In this paper, we present a simple and yet effective Distribution Alignment and Random Sampling (DARS) method to produce unbiased pseudo labels that match the true class distribution estimated from the labeled data.
2 code implementations • CVPR 2021 • Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu
Point cloud semantic segmentation often requires largescale annotated training data, but clearly, point-wise labels are too tedious to prepare.
1 code implementation • CVPR 2021 • Zhengzhe Liu, Xiaojuan Qi, Chi-Wing Fu
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training, so the 2D network can infer without requiring 3D data.
2 code implementations • CVPR 2021 • Mutian Xu, Runyu Ding, Hengshuang Zhao, Xiaojuan Qi
The key of PAConv is to construct the convolution kernel by dynamically assembling basic weight matrices stored in Weight Bank, where the coefficients of these weight matrices are self-adaptively learned from point positions through ScoreNet.
Ranked #2 on
Point Cloud Segmentation
on PointCloud-C
no code implementations • 22 Mar 2021 • Chang Liu, Xiaojuan Qi, Edmund Lam, Ngai Wong
The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption.
1 code implementation • CVPR 2021 • Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, Xiaojuan Qi
Then, the detector is iteratively improved on the target domain by alternatively conducting two steps, which are the pseudo label updating with the developed quality-aware triplet memory bank and the model training with curriculum data augmentation.
no code implementations • ICCV 2021 • Shuyang Sun, Xiaoyu Yue, Xiaojuan Qi, Wanli Ouyang, Victor Adrian Prisacariu, Philip H.S. Torr
Aggregating features from different depths of a network is widely adopted to improve the network capability.
3 code implementations • 20 Dec 2020 • Mutian Xu, Junhao Zhang, Zhipeng Zhou, Mingye Xu, Xiaojuan Qi, Yu Qiao
GDANet introduces Geometry-Disentangle Module to dynamically disentangle point clouds into the contour and flat part of 3D objects, respectively denoted by sharp and gentle variation components.
Ranked #1 on
Point Cloud Segmentation
on PointCloud-C
2 code implementations • 13 Dec 2020 • Xiaojuan Qi, Zhengzhe Liu, Renjie Liao, Philip H. S. Torr, Raquel Urtasun, Jiaya Jia
Note that GeoNet++ is generic and can be used in other depth/normal prediction frameworks to improve the quality of 3D reconstruction and pixel-wise accuracy of depth and surface normals.
6 code implementations • CVPR 2021 • Yanwei Li, Hengshuang Zhao, Xiaojuan Qi, LiWei Wang, Zeming Li, Jian Sun, Jiaya Jia
In this paper, we present a conceptually simple, strong, and efficient framework for panoptic segmentation, called Panoptic FCN.
Ranked #1 on
Panoptic Segmentation
on COCO minival
(SQ metric)
1 code implementation • NeurIPS 2020 • Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
To achieve this, a new word-level discriminator is proposed, which provides the generator with fine-grained training feedback at word-level, to facilitate training a lightweight generator that has a small number of parameters, but can still correctly focus on specific visual attributes of an image, and then edit them without affecting other contents that are not described in the text.
no code implementations • 23 Oct 2020 • Qichuan Geng, Hong Zhang, Na Jiang, Xiaojuan Qi, Liangjun Zhang, Zhong Zhou
As a consequence, augmenting features with such prior knowledge can effectively improve the classification and localization performance.
2 code implementations • 31 Mar 2020 • Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc van Gool
We propose a novel ECGAN for the challenging semantic image synthesis task.
no code implementations • 12 Feb 2020 • Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz
The goal of this paper is to embed controllable factors, i. e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images.
1 code implementation • CVPR 2020 • Zhengzhe Liu, Xiaojuan Qi, Philip Torr
In this paper, we conduct an empirical study on fake/real faces, and have two important observations: firstly, the texture of fake faces is substantially different from real ones; secondly, global texture statistics are more robust to image editing and transferable to fake faces from different GANs and datasets.
no code implementations • 19 Jan 2020 • Qichuan Geng, Hong Zhang, Xiaojuan Qi, Ruigang Yang, Zhong Zhou, Gao Huang
Semantic segmentation is a challenging task that needs to handle large scale variations, deformations and different viewpoints.
no code implementations • CVPR 2020 • Qizhu Li, Xiaojuan Qi, Philip H. S. Torr
This panoptic submodule gives rise to a novel propagation mechanism for panoptic logits and enables the network to output a coherent panoptic segmentation map for both "stuff" and "thing" classes, without any post-processing.
1 code implementation • ECCV 2020 • Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz
Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class.
Ranked #7 on
Few Shot Action Recognition
on Kinetics-100
no code implementations • 18 Dec 2019 • Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin
In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations.
3 code implementations • 12 Dec 2019 • Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr
The goal of our paper is to semantically edit parts of an image matching a given text that describes desired attributes (e. g., texture, colour, and background), while preserving other contents that are irrelevant to the text.
1 code implementation • ECCV 2020 • Feihu Zhang, Xiaojuan Qi, Ruigang Yang, Victor Prisacariu, Benjamin Wah, Philip Torr
State-of-the-art stereo matching networks have difficulties in generalizing to new unseen environments due to significant domain differences, such as color, illumination, contrast, and texture.
2 code implementations • NeurIPS 2019 • Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip H. S. Torr
In this paper, we propose a novel controllable text-to-image generative adversarial network (ControlGAN), which can effectively synthesise high-quality images and also control parts of the image generation according to natural language descriptions.
Ranked #7 on
Text-to-Image Generation
on Multi-Modal-CelebA-HQ
2 code implementations • ICCV 2019 • Hao Li, Hong Zhang, Xiaojuan Qi, Ruigang Yang, Gao Huang
Adaptive inference is a promising technique to improve the computational efficiency of deep models at test time.
no code implementations • CVPR 2019 • Xiaojuan Qi, Zhengzhe Liu, Qifeng Chen, Jiaya Jia
A future video is the 2D projection of a 3D scene with predicted camera and object motion.
6 code implementations • 13 Jan 2019 • Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene Vorontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, Fabian Lohöfer, Julian Walter Holch, Wieland Sommer, Felix Hofmann, Alexandre Hostettler, Naama Lev-Cohain, Michal Drozdzal, Michal Marianne Amitai, Refael Vivantik, Jacob Sosna, Ivan Ezhov, Anjany Sekuboyina, Fernando Navarro, Florian Kofler, Johannes C. Paetzold, Suprosanna Shit, Xiaobin Hu, Jana Lipková, Markus Rempfler, Marie Piraud, Jan Kirschke, Benedikt Wiestler, Zhiheng Zhang, Christian Hülsemeyer, Marcel Beetz, Florian Ettlinger, Michela Antonelli, Woong Bae, Míriam Bellver, Lei Bi, Hao Chen, Grzegorz Chlebus, Erik B. Dam, Qi Dou, Chi-Wing Fu, Bogdan Georgescu, Xavier Giró-i-Nieto, Felix Gruen, Xu Han, Pheng-Ann Heng, Jürgen Hesser, Jan Hendrik Moltz, Christian Igel, Fabian Isensee, Paul Jäger, Fucang Jia, Krishna Chaitanya Kaluva, Mahendra Khened, Ildoo Kim, Jae-Hun Kim, Sungwoong Kim, Simon Kohl, Tomasz Konopczynski, Avinash Kori, Ganapathy Krishnamurthi, Fan Li, Hongchao Li, Junbo Li, Xiaomeng Li, John Lowengrub, Jun Ma, Klaus Maier-Hein, Kevis-Kokitsi Maninis, Hans Meine, Dorit Merhof, Akshay Pai, Mathias Perslev, Jens Petersen, Jordi Pont-Tuset, Jin Qi, Xiaojuan Qi, Oliver Rippel, Karsten Roth, Ignacio Sarasua, Andrea Schenk, Zengming Shen, Jordi Torres, Christian Wachinger, Chunliang Wang, Leon Weninger, Jianrong Wu, Daguang Xu, Xiaoping Yang, Simon Chun-Ho Yu, Yading Yuan, Miao Yu, Liping Zhang, Jorge Cardoso, Spyridon Bakas, Rickmer Braren, Volker Heinemann, Christopher Pal, An Tang, Samuel Kadoury, Luc Soler, Bram van Ginneken, Hayit Greenspan, Leo Joskowicz, Bjoern Menze
In this work, we report the set-up and results of the Liver Tumor Segmentation Benchmark (LiTS), which was organized in conjunction with the IEEE International Symposium on Biomedical Imaging (ISBI) 2017 and the International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2017 and 2018.
no code implementations • 7 Jan 2019 • Hong Zhang, Hao Ouyang, Shu Liu, Xiaojuan Qi, Xiaoyong Shen, Ruigang Yang, Jiaya Jia
With this principle, we present two conceptually simple and yet computational efficient modules, namely Cascade Prediction Fusion (CPF) and Pose Graph Neural Network (PGNN), to exploit underlying contextual information.
Ranked #10 on
Pose Estimation
on MPII Human Pose
2 code implementations • NeurIPS 2018 • Yi Wang, Xin Tao, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia
In this paper, we propose a generative multi-column network for image inpainting.
no code implementations • ECCV 2018 • Li Jiang, Shaoshuai Shi, Xiaojuan Qi, Jiaya Jia
We propose to add geometric adversarial loss (GAL).
1 code implementation • CVPR 2018 • Xiaojuan Qi, Renjie Liao, Zhengzhe Liu, Raquel Urtasun, Jiaya Jia
In this paper, we propose Geometric Neural Network (GeoNet) to jointly predict depth and surface normal maps from a single image.
1 code implementation • CVPR 2018 • Ruiyu Li, Kaican Li, Yi-Chun Kuo, Michelle Shu, Xiaojuan Qi, Xiaoyong Shen, Jiaya Jia
We address the problem of image segmentation from natural language descriptions.
1 code implementation • CVPR 2018 • Xiaojuan Qi, Qifeng Chen, Jiaya Jia, Vladlen Koltun
We present a semi-parametric approach to photographic image synthesis from semantic layouts.
no code implementations • 26 Nov 2017 • Pengpeng Liu, Xiaojuan Qi, Pinjia He, Yikang Li, Michael R. Lyu, Irwin King
Image completion has achieved significant progress due to advances in generative adversarial networks (GANs).
2 code implementations • ICCV 2017 • Xiaojuan Qi, Renjie Liao, Jiaya Jia, Sanja Fidler, Raquel Urtasun
Each node in the graph corresponds to a set of points and is associated with a hidden representation vector initialized with an appearance feature extracted by a unary CNN from 2D images.
Ranked #33 on
Semantic Segmentation
on SUN-RGBD
(using extra training data)
2 code implementations • 21 Sep 2017 • Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, Chi-Wing Fu, Pheng Ann Heng
Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.
Ranked #1 on
Liver Segmentation
on LiTS2017
(Dice metric)
Automatic Liver And Tumor Segmentation
Image Segmentation
+4
17 code implementations • ECCV 2018 • Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, Jiaya Jia
We focus on the challenging task of real-time semantic segmentation in this paper.
Ranked #11 on
Dichotomous Image Segmentation
on DIS-TE4
Dichotomous Image Segmentation
Real-Time Semantic Segmentation
+3
67 code implementations • CVPR 2017 • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes.
Ranked #4 on
Video Semantic Segmentation
on CamVid
no code implementations • CVPR 2016 • Shu Liu, Xiaojuan Qi, Jianping Shi, Hong Zhang, Jiaya Jia
Aiming at simultaneous detection and segmentation (SDS), we propose a proposal-free framework, which detect and segment object instances via mid-level patches.
no code implementations • CVPR 2016 • Hao Chen, Xiaojuan Qi, Lequan Yu, Pheng-Ann Heng
The morphology of glands has been used routinely by pathologists to assess the malignancy degree of adenocarcinomas.
Ranked #3 on
Optic Disc Segmentation
on REFUGE
no code implementations • 1 Mar 2016 • Korsuk Sirinukunwattana, Josien P. W. Pluim, Hao Chen, Xiaojuan Qi, Pheng-Ann Heng, Yun Bo Guo, Li Yang Wang, Bogdan J. Matuszewski, Elia Bruni, Urko Sanchez, Anton Böhm, Olaf Ronneberger, Bassem Ben Cheikh, Daniel Racoceanu, Philipp Kainz, Michael Pfeiffer, Martin Urschler, David R. J. Snead, Nasir M. Rajpoot
Colorectal adenocarcinoma originating in intestinal glandular structures is the most common form of colon cancer.
no code implementations • ICCV 2015 • Xiaojuan Qi, Jianping Shi, Shu Liu, Renjie Liao, Jiaya Jia
In this paper, we propose an object clique potential for semantic segmentation.