Search Results for author: Na Zhao

Found 37 papers, 16 papers with code

Scene-R1: Video-Grounded Large Language Models for 3D Scene Reasoning without 3D Annotations

no code implementations21 Jun 2025 Zhihao Yuan, Shuyi Jiang, Chun-Mei Feng, Yaolun Zhang, Shuguang Cui, Zhen Li, Na Zhao

We introduce Scene-R1, a video-grounded framework that learns to reason about 3D scenes without any point-wise 3D instance supervision by pairing reinforcement-learning-driven reasoning with a two-stage grounding pipeline.

Question Answering Scene Understanding +1

How Do Images Align and Complement LiDAR? Towards a Harmonized Multi-modal 3D Panoptic Segmentation

1 code implementation25 May 2025 Yining Pan, Qiongjie Cui, Xulei Yang, Na Zhao

LiDAR-based 3D panoptic segmentation often struggles with the inherent sparsity of data from LiDAR sensors, which makes it challenging to accurately recognize distant or small objects.

3D Panoptic Segmentation Data Augmentation +3

Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection

no code implementations CVPR 2025 Jiangyi Wang, Na Zhao

Active learning has emerged as a promising approach to reduce the substantial annotation burden in 3D object detection tasks, spurring several initiatives in outdoor environments.

3D Object Detection Active Learning +2

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

no code implementations16 Jan 2025 Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang

3D visual grounding (3DVG), which aims to correlate a natural language description with the target object within a 3D scene, is a significant yet challenging task.

3D visual grounding Decoder +2

A Closer Look on Gender Stereotypes in Movie Recommender Systems and Their Implications with Privacy

1 code implementation8 Jan 2025 Falguni Roy, Yiduo Shen, Na Zhao, Xiaofeng Ding, Md. Omar Faruk

In the second phase, four inference algorithms were applied to detect gender stereotypes by combining the findings from the first phase with users' feedback data.

Attribute Recommendation Systems

Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration

no code implementations CVPR 2025 Lizheng Zu, Lin Lin, Song Fu, Na Zhao, Pan Zhou

Embodied agents based on large language models (LLMs) face significant challenges in collaborative tasks, requiring effective communication and reasonable division of labor to ensure efficient and correct task completion.

Provably Secure Robust Image Steganography via Cross-Modal Error Correction

no code implementations15 Dec 2024 Yuang Qi, Kejiang Chen, Na Zhao, Zijin Yang, Weiming Zhang

To leverage provably secure steganography with more effective and high-performance image generation models, and to ensure that stego images can accurately extract secret messages even after being uploaded to social networks and subjected to lossy processing such as JPEG compression, we propose a high-quality, provably secure, and robust image steganography method based on state-of-the-art autoregressive (AR) image generation models using Vector-Quantized (VQ) tokenizers.

Image Generation Image Steganography

Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization

no code implementations5 Nov 2024 Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang

In this paper, we propose a novel learning approach based on domain expansion and boundary growth to expand the scarce source samples and enlarge the boundaries across the known classes that indirectly broaden the boundary between the known and unknown classes.

image-classification Image Classification +1

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians

no code implementations2 Oct 2024 Shuyi Jiang, QiHao Zhao, Hossein Rahmani, De Wen Soh, Jun Liu, Na Zhao

In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity.

3D Reconstruction

EventHallusion: Diagnosing Event Hallucinations in Video LLMs

1 code implementation25 Sep 2024 Jiacheng Zhang, Yang Jiao, Shaoxiang Chen, Na Zhao, Jingjing Chen

To mitigate this gap, we propose EventHallusion, a novel benchmark that focuses on assessing the VideoLLMs' hallucination toward event, the crux of video analysis.

Hallucination Instruction Following

On-the-fly Point Feature Representation for Point Clouds Analysis

no code implementations31 Jul 2024 Jiangyi Wang, Zhongyao Cheng, Na Zhao, Jun Cheng, Xulei Yang

In this paper, we propose On-the-fly Point Feature Representation (OPFR), which captures abundant geometric information explicitly through Curve Feature Generator module.

Semantic Segmentation

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

no code implementations7 Jul 2024 Pengkun Jiao, Na Zhao, Jingjing Chen, Yu-Gang Jiang

Open-vocabulary 3D object detection (OV-3DDet) aims to localize and recognize both seen and previously unseen object categories within any new 3D scene.

3D Object Detection Object +1

Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection

1 code implementation17 Jun 2024 Yunsong Wang, Na Zhao, Gim Hee Lee

Our approach includes an object-aware augmentation strategy to effectively diversify the source domain data, and we introduce a two-branch adaptation framework consisting of an adversarial training branch and a pseudo labeling branch, in order to simultaneously reach holistic-level and class-level domain alignment.

3D Object Detection Object +2

Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding

no code implementations17 Jun 2024 Yunsong Wang, Na Zhao, Gim Hee Lee

The field of self-supervised 3D representation learning has emerged as a promising solution to alleviate the challenge presented by the scarcity of extensive, well-annotated datasets.

3D Object Detection 3D Semantic Segmentation +4

CT3D++: Improving 3D Object Detection with Keypoint-induced Channel-wise Transformer

1 code implementation12 Jun 2024 Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Qiao Liang, Min-Jian Zhao, Jieping Ye

Firstly, we propose CT3D, which sequentially performs raw-point-based embedding, a standard Transformer encoder, and a channel-wise decoder for point features within each proposal.

3D Object Detection Decoder +1

Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation

no code implementations25 May 2024 Huizhou Chen, Jiangyi Wang, Yuxin Li, Na Zhao, Jun Cheng, Xulei Yang

3D environment recognition is essential for autonomous driving systems, as autonomous vehicles require a comprehensive understanding of surrounding scenes.

Autonomous Driving Semantic Segmentation

Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning

no code implementations19 Apr 2024 Yian Li, Wentao Tian, Yang Jiao, Jingjing Chen, Tianwen Qian, Bin Zhu, Na Zhao, Yu-Gang Jiang

Recently, Multimodal Large Language Models (MLLMs) have achieved significant success across multiple disciplines due to their exceptional instruction-following capabilities and extensive world knowledge.

Benchmarking counterfactual +5

View-Consistent 3D Editing with Gaussian Splatting

no code implementations18 Mar 2024 Yuxuan Wang, Xuanyu Yi, Zike Wu, Na Zhao, Long Chen, Hanwang Zhang

However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS.

3DGS

Dual-Perspective Knowledge Enrichment for Semi-Supervised 3D Object Detection

1 code implementation10 Jan 2024 Yucheng Han, Na Zhao, Weiling Chen, Keng Teck Ma, Hanwang Zhang

Our DPKE enriches the knowledge of limited training data, particularly unlabeled data, from two perspectives: data-perspective and feature-perspective.

3D Object Detection Data Augmentation +2

LASO: Language-guided Affordance Segmentation on 3D Object

1 code implementation CVPR 2024 Yicong Li, Na Zhao, Junbin Xiao, Chun Feng, Xiang Wang, Tat-Seng Chua

With this regard we propose a novel task Language-guided Affordance Segmentation on 3D Object (LASO) which challenges a model to segment a 3D object's part relevant to a given affordance question.

Object Segmentation

Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

1 code implementation ICCV 2023 Yating Xu, Conghui Hu, Na Zhao, Gim Hee Lee

Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes.

Point Cloud Segmentation Segmentation

Towards Robust Few-shot Point Cloud Semantic Segmentation

1 code implementation20 Sep 2023 Yating Xu, Na Zhao, Gim Hee Lee

Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples.

Point Cloud Segmentation Representation Learning +1

Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization

1 code implementation18 Dec 2022 Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee

Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning.

Domain Generalization Hallucination +5

Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds

no code implementations9 Dec 2022 Yuyang Zhao, Na Zhao, Gim Hee Lee

In addition, we augment the point patterns of the source data and introduce non-parametric multi-prototypes to ameliorate the intra-class variance enlarged by the augmented point patterns.

Domain Generalization Semantic Segmentation

Myopia prediction for adolescents via time-aware deep learning

no code implementations26 Sep 2022 Junjia Huang, Wei Ma, Rong Li, Na Zhao, Tao Zhou

Result: The mean absolute prediction error on the testing set was 0. 273-0. 257 for spherical equivalent, ranging from 0. 189-0. 160 to 0. 596-0. 473 if we consider different lengths of historical records and different prediction durations.

Deep Learning Prediction +2

Rethinking IoU-based Optimization for Single-stage 3D Object Detection

1 code implementation19 Jul 2022 Hualian Sheng, Sijia Cai, Na Zhao, Bing Deng, Jianqiang Huang, Xian-Sheng Hua, Min-Jian Zhao, Gim Hee Lee

Since Intersection-over-Union (IoU) based optimization maintains the consistency of the final IoU prediction metric and losses, it has been widely used in both regression and classification branches of single-stage 2D object detectors.

3D Object Detection Object +1

Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation

2 code implementations6 Apr 2022 Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee

Furthermore, we present a novel style hallucination module (SHM) to generate style-diversified samples that are essential to consistency learning.

Domain Generalization Hallucination +3

Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection

no code implementations14 Dec 2021 Na Zhao, Gim Hee Lee

Deep learning-based approaches have shown remarkable performance in the 3D object detection task.

3D Object Detection Incremental Learning +2

URIR: Recommendation algorithm of user RNN encoder and item encoder based on knowledge graph

no code implementations1 Nov 2021 Na Zhao, Zhen Long, Zhi-Dan Zhao, Jian Wang

This implies that URIR can effectively use knowledge graph to obtain better user codes and item codes, thereby obtaining better recommendation results.

Knowledge Graphs Recommendation Systems

Few-shot 3D Point Cloud Semantic Segmentation

1 code implementation CVPR 2021 Na Zhao, Tat-Seng Chua, Gim Hee Lee

These fully supervised approaches heavily rely on large amounts of labeled training data that are difficult to obtain and cannot segment new classes after training.

Few-shot 3D Point Cloud Semantic Segmentation Segmentation +1

SESS: Self-Ensembling Semi-Supervised 3D Object Detection

1 code implementation CVPR 2020 Na Zhao, Tat-Seng Chua, Gim Hee Lee

The performance of existing point cloud-based 3D object detection methods heavily relies on large-scale high-quality 3D annotations.

3D Object Detection image-classification +3

PS^2-Net: A Locally and Globally Aware Network for Point-Based Semantic Segmentation

1 code implementation15 Aug 2019 Na Zhao, Tat-Seng Chua, Gim Hee Lee

In this paper, we present the PS^2-Net -- a locally and globally aware deep learning framework for semantic segmentation on 3D scene-level point clouds.

Scene Segmentation Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.