no code implementations • 21 Jun 2025 • Zhihao Yuan, Shuyi Jiang, Chun-Mei Feng, Yaolun Zhang, Shuguang Cui, Zhen Li, Na Zhao
We introduce Scene-R1, a video-grounded framework that learns to reason about 3D scenes without any point-wise 3D instance supervision by pairing reinforcement-learning-driven reasoning with a two-stage grounding pipeline.
1 code implementation • 25 May 2025 • Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao
LiDAR-based 3D panoptic segmentation often struggles with the inherent sparsity of data from LiDAR sensors, which makes it challenging to accurately recognize distant or small objects.
no code implementations • CVPR 2025 • Jiangyi Wang, Na Zhao
Active learning has emerged as a promising approach to reduce the substantial annotation burden in 3D object detection tasks, spurring several initiatives in outdoor environments.
no code implementations • 4 Feb 2025 • Ziyan Guo, Zeyu Hu, Na Zhao, De Wen Soh
Human motion generation and editing are key components of computer graphics and vision.
no code implementations • 16 Jan 2025 • Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang
3D visual grounding (3DVG), which aims to correlate a natural language description with the target object within a 3D scene, is a significant yet challenging task.
no code implementations • 8 Jan 2025 • Yongjia Ma, Junlin Chen, Donglin Di, Qi Xie, Lei Fan, Wei Chen, Xiaofei Gou, Na Zhao, Xun Yang
Creating high-fidelity, coherent long videos is a sought-after aspiration.
1 code implementation • 8 Jan 2025 • Falguni Roy, Yiduo Shen, Na Zhao, Xiaofeng Ding, Md. Omar Faruk
In the second phase, four inference algorithms were applied to detect gender stereotypes by combining the findings from the first phase with users' feedback data.
no code implementations • CVPR 2025 • Lizheng Zu, Lin Lin, Song Fu, Na Zhao, Pan Zhou
Embodied agents based on large language models (LLMs) face significant challenges in collaborative tasks, requiring effective communication and reasonable division of labor to ensure efficient and correct task completion.
no code implementations • 15 Dec 2024 • Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang
To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers.
no code implementations • 5 Nov 2024 • Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang
In this paper, we propose a novel learning approach based on domain expansion and boundary growth to expand the scarce source samples and enlarge the boundaries across the known classes that indirectly broaden the boundary between the known and unknown classes.
no code implementations • 2 Oct 2024 • Shuyi Jiang, QiHao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao
In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity.
1 code implementation • 25 Sep 2024 • Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Na Zhao, Jingjing Chen
To mitigate this gap, we propose EventHallusion, a novel benchmark that focuses on assessing the VideoLLMs' hallucination toward event, the crux of video analysis.
no code implementations • 31 Jul 2024 • Jiangyi Wang, Zhongyao Cheng, Na Zhao, Jun Cheng, Xulei Yang
In this paper, we propose On-the-fly Point Feature Representation (OPFR), which captures abundant geometric information explicitly through Curve Feature Generator module.
no code implementations • 7 Jul 2024 • Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang
Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene.
1 code implementation • 17 Jun 2024 • Yunsong Wang, Na Zhao, Gim Hee Lee
Our approach includes an object-aware augmentation strategy to effectively diversify the source domain data, and we introduce a two-branch adaptation framework consisting of an adversarial training branch and a pseudo labeling branch, in order to simultaneously reach holistic-level and class-level domain alignment.
no code implementations • 17 Jun 2024 • Yunsong Wang, Na Zhao, Gim Hee Lee
The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets.
1 code implementation • 12 Jun 2024 • Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye
Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal.
no code implementations • 25 May 2024 • Huizhou Chen, Jiangyi Wang, Yuxin Li, Na Zhao, Jun Cheng, Xulei Yang
3D environment recognition is essential for autonomous driving systems, as autonomous vehicles require a comprehensive understanding of surrounding scenes.
no code implementations • 19 Apr 2024 • Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Tianwen Qian, Bin Zhu, Na Zhao, Yu-Gang Jiang
Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge.
no code implementations • 18 Mar 2024 • Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang
However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS.
1 code implementation • 10 Jan 2024 • Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang
Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.
1 code implementation • CVPR 2024 • Yicong Li, Na Zhao, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua
With this regard we propose a novel task Language-guided Affordance Segmentation on 3D Object (LASO) which challenges a model to segment a 3D object's part relevant to a given affordance question.
1 code implementation • ICCV 2023 • Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee
Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes.
1 code implementation • 20 Sep 2023 • Yating Xu, Na Zhao, Gim Hee Lee
Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples.
1 code implementation • 18 Dec 2022 • Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee
Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning.
no code implementations • 9 Dec 2022 • Yuyang Zhao, Na Zhao, Gim Hee Lee
In addition, we augment the point patterns of the source data and introduce non-parametric multi-prototypes to ameliorate the intra-class variance enlarged by the augmented point patterns.
no code implementations • 26 Sep 2022 • Junjia Huang, Wei Ma, Rong Li, Na Zhao, Tao Zhou
Result: The mean absolute prediction error on the testing set was 0. 273-0. 257 for spherical equivalent, ranging from 0. 189-0. 160 to 0. 596-0. 473 if we consider different lengths of historical records and different prediction durations.
1 code implementation • 19 Jul 2022 • Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao, Gim Hee Lee
Since Intersection-over-Union (IoU) based optimization maintains the consistency of the final IoU prediction metric and losses, it has been widely used in both regression and classification branches of single-stage 2D object detectors.
2 code implementations • 6 Apr 2022 • Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee
Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning.
Ranked #8 on
Robust Object Detection
on DWD
no code implementations • 14 Dec 2021 • Na Zhao, Gim Hee Lee
Deep learning-based approaches have shown remarkable performance in the 3D object detection task.
no code implementations • 1 Nov 2021 • Na Zhao, Zhen Long, Zhi-Dan Zhao, Jian Wang
This implies that URIR can effectively use knowledge graph to obtain better user codes and item codes, thereby obtaining better recommendation results.
1 code implementation • CVPR 2021 • Na Zhao, Tat-Seng Chua, Gim Hee Lee
These fully supervised approaches heavily rely on large amounts of labeled training data that are difficult to obtain and cannot segment new classes after training.
Few-shot 3D Point Cloud Semantic Segmentation
Segmentation
+1
1 code implementation • CVPR 2020 • Na Zhao, Tat-Seng Chua, Gim Hee Lee
The performance of existing point cloud-based 3D object detection methods heavily relies on large-scale high-quality 3D annotations.
1 code implementation • 3 Sep 2019 • GuanXiong Luo, Na Zhao, Wenhao Jiang, Edward S. Hui, Peng Cao
Purpose: To develop a deep learning-based Bayesian inference for MRI reconstruction.
no code implementations • 27 Aug 2019 • Yang Liu, Runnan He, Kuanquan Wang, Qince Li, Qiang Sun, Na Zhao, Henggui Zhang
Heart disease is one of the most common diseases causing morbidity and mortality.
1 code implementation • 15 Aug 2019 • Na Zhao, Tat-Seng Chua, Gim Hee Lee
In this paper, we present the PS^2-Net -- a locally and globally aware deep learning framework for semantic segmentation on 3D scene-level point clouds.
no code implementations • Frontiers in Physiology 2018 • Runnan He, Kuanquan Wang, Na Zhao, Yang Liu, Yongfeng Yuan, Qince Li, Henggui Zhang
The proposed method analyzed the time-frequency features of the electrocardiogram (ECG), thus being different to conventional AF detecting methods that implement isolating atrial or ventricular activities.
Ranked #2 on
Atrial Fibrillation Detection
on MIT-BIH AF