Search Results for author: Sijin Chen

Found 7 papers, 5 papers with code

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

1 code implementation17 Dec 2023 Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.

Instruction Following

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

1 code implementation30 Nov 2023 Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.

3D dense captioning Dense Captioning +1

Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression

no code implementations29 Oct 2023 Sijin Chen, Zhize Li, Yuejie Chi

To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions.

Federated Learning

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

1 code implementation6 Sep 2023 Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture.

3D dense captioning Caption Generation +4

End-to-End 3D Dense Captioning with Vote2Cap-DETR

1 code implementation CVPR 2023 Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu

Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner.

3D dense captioning Dense Captioning +1

Non-Convex Joint Community Detection and Group Synchronization via Generalized Power Method

no code implementations28 Dec 2021 Sijin Chen, Xiwei Cheng, Anthony Man-Cho So

This paper proposes a Generalized Power Method (GPM) to tackle the problem of community detection and group synchronization simultaneously in a direct non-convex manner.

Community Detection Stochastic Block Model

CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud

1 code implementation5 Dec 2020 Wu Zheng, Weiliang Tang, Sijin Chen, Li Jiang, Chi-Wing Fu

Existing single-stage detectors for locating objects in point clouds often treat object localization and category classification as separate tasks, so the localization accuracy and classification confidence may not well align.

3D Object Detection Birds Eye View Object Detection +3

Cannot find the paper you are looking for? You can Submit a new open access paper.