PointCloud-Text Matching: Benchmark Datasets and a Baseline

no code implementations28 Mar 2024 Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu

We observe that the data is challenging and with noisy correspondence due to the sparsity, noise, or disorder of point clouds and the ambiguity, vagueness, or incompleteness of texts, which make existing cross-modal matching methods ineffective for PTM.

Contrastive Learning Retrieval +1

Contributing Dimension Structure of Deep Feature for Coreset Selection

1 code implementation29 Jan 2024 Zhijing Wan, Zhixiang Wang, Yuran Wang, Zheng Wang, Hongyuan Zhu, Shin'ichi Satoh

Existing methods typically measure both the representation and diversity of data based on similarity metrics, such as L2-norm.

Direct Distillation between Different Domains

no code implementations12 Jan 2024 Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama

Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data.

Domain Adaptation Knowledge Distillation

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts

1 code implementation17 Dec 2023 Mingsheng Li, Xin Chen, Chi Zhang, Sijin Chen, Hongyuan Zhu, Fukun Yin, Gang Yu, Tao Chen

Furthermore, we establish a new benchmark for assessing the performance of large models in understanding multi-modal 3D prompts.

Instruction Following

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

1 code implementation30 Nov 2023 Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao Chen

However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud 3D representations of the 3D scene.

3D dense captioning Dense Captioning +1

Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition

no code implementations10 Oct 2023 Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space.

Attribute Contrastive Learning +2

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning

1 code implementation6 Sep 2023 Sijin Chen, Hongyuan Zhu, Mingsheng Li, Xin Chen, Peng Guo, Yinjie Lei, Gang Yu, Taihao Li, Tao Chen

Moreover, we argue that object localization and description generation require different levels of scene understanding, which could be challenging for a shared set of queries to capture.

3D dense captioning Caption Generation +4

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

no code implementations19 Jul 2023 Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms.

Human Activity Recognition Self-Supervised Learning

An Overview of Challenges in Egocentric Text-Video Retrieval

no code implementations7 Jun 2023 Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Text-video retrieval contains various challenges, including biases coming from diverse sources.

Retrieval Video Retrieval

Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

no code implementations20 Apr 2023 Haoyang Peng, Baopu Li, Bo Zhang, Xin Chen, Tao Chen, Hongyuan Zhu

Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models.

Autonomous Driving Classification +3

A Closer Look at Few-Shot 3D Point Cloud Classification

1 code implementation31 Mar 2023 Chuangguan Ye, Hongyuan Zhu, Bo Zhang, Tao Chen

In recent years, research on few-shot learning (FSL) has been fast-growing in the 2D image domain due to the less requirement for labeled training data and greater generalization for novel classes.

Few-Shot 3D Point Cloud Classification Few-Shot Learning +1

What Makes for Effective Few-shot Point Cloud Classification?

1 code implementation31 Mar 2023 Chuangguan Ye, Hongyuan Zhu, Yongbin Liao, Yanggang Zhang, Tao Chen, Jiayuan Fan

Due to the emergence of powerful computing resources and large-scale annotated datasets, deep learning has seen wide applications in our daily life.

Benchmarking Classification +2

End-to-End 3D Dense Captioning with Vote2Cap-DETR

1 code implementation CVPR 2023 Sijin Chen, Hongyuan Zhu, Xin Chen, Yinjie Lei, Tao Chen, Gang Yu

Compared with prior arts, our framework has several appealing advantages: 1) Without resorting to numerous hand-crafted components, our method is based on a full transformer encoder-decoder architecture with a learnable vote query driven object decoder, and a caption decoder that produces the dense captions in a set-prediction manner.

3D dense captioning Dense Captioning +1

Rethinking Image Super Resolution From Long-Tailed Distribution Learning Perspective

no code implementations CVPR 2023 Yuanbiao Gou, Peng Hu, Jiancheng Lv, Hongyuan Zhu, Xi Peng

Existing studies have empirically observed that the resolution of the low-frequency region is easier to enhance than that of the high-frequency one.

Image Super-Resolution

Zero-Shot Point Cloud Segmentation by Semantic-Visual Aware Synthesis

1 code implementation ICCV 2023 Yuwei Yang, Munawar Hayat, Zhao Jin, Hongyuan Zhu, Yinjie Lei

Given only the class-level semantic information for unseen objects, we strive to enhance the correspondence, alignment and consistency between the visual and semantic spaces, to synthesise diverse, generic and transferable visual features.

Point Cloud Segmentation Segmentation +2

RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval

1 code implementation26 Jun 2022 Burak Satar, Hongyuan Zhu, Hanwang Zhang, Joo Hwee Lim

Most methods consider only one joint embedding space between global visual and textual features without considering the local structures of each modality.

Retrieval Text to Video Retrieval +1

Semantic Role Aware Correlation Transformer for Text to Video Retrieval

1 code implementation26 Jun 2022 Burak Satar, Hongyuan Zhu, Xavier Bresson, Joo Hwee Lim

With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical.

Retrieval Text to Video Retrieval +1

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

no code implementations23 May 2022 Peng Hu, Xi Peng, Hongyuan Zhu, Mohamed M. Sabry Aly, Jie Lin

Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e. g., pruning sparsity and quantization codebook) of each layer.


CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

1 code implementation CVPR 2022 Xiuchao Sui, Shaohua Li, Xue Geng, Yan Wu, Xinxing Xu, Yong liu, Rick Goh, Hongyuan Zhu

This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images.

Optical Flow Estimation

Hierarchical Point Cloud Encoding and Decoding with Lightweight Self-Attention based Model

no code implementations13 Feb 2022 En Yen Puang, Hao Zhang, Hongyuan Zhu, Wei Jing

In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data.

Representation Learning Retrieval

Point Cloud Instance Segmentation with Semi-supervised Bounding-Box Mining

1 code implementation30 Nov 2021 Yongbin Liao, Hongyuan Zhu, Yanggang Zhang, Chuangguan Ye, Tao Chen, Jiayuan Fan

For stage two, the bounding box proposals with SPCR are grouped into some subsets, and the instance masks are mined inside each subset with a novel semantic propagation module and a property consistency graph module.

Instance Segmentation Semantic Segmentation

A Survey of Embodied AI: From Simulators to Research Tasks

no code implementations8 Mar 2021 Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan

This paper aims to provide an encyclopedic survey for the field of embodied AI, from its simulators to its research.

Embodied Question Answering Question Answering +1

Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning

no code implementations11 Dec 2019 Tianying Wang, Hao Zhang, Wei Qi Toh, Hongyuan Zhu, Cheston Tan, Yan Wu, Yong liu, Wei Jing

The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem.

reinforcement-learning Reinforcement Learning (RL)

Cross-channel Communication Networks

1 code implementation NeurIPS 2019 Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Convolutional neural networks process input data by sending channel-wise feature response maps to subsequent layers.

6D Pose Estimation with Correlation Fusion

no code implementations24 Sep 2019 Yi Cheng, Hongyuan Zhu, Ying Sun, Cihan Acar, Wei Jing, Yan Wu, Liyuan Li, Cheston Tan, Joo-Hwee Lim

To our best knowledge, this is the first work to explore effective intra- and inter-modality fusion in 6D pose estimation.

6D Pose Estimation 6D Pose Estimation using RGB

Scene Text Synthesis for Efficient and Effective Deep Network Training

no code implementations26 Jan 2019 Changgong Zhang, Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.

Image Generation Scene Text Detection +2

Spatial Fusion GAN for Image Synthesis

no code implementations CVPR 2019 Fangneng Zhan, Hongyuan Zhu, Shijian Lu

Recent advances in generative adversarial networks (GANs) have shown great potentials in realistic image synthesis whereas most existing works address synthesis realism in either appearance space or geometry space but few in both.

Image Generation

Holistic Multi-modal Memory Network for Movie Question Answering

no code implementations12 Nov 2018 Anran Wang, Anh Tuan Luu, Chuan-Sheng Foo, Hongyuan Zhu, Yi Tay, Vijay Chandrasekhar

In this paper, we present the Holistic Multi-modal Memory Network (HMMN) framework which fully considers the interactions between different input sources (multi-modal context, question) in each hop.

Question Answering Retrieval +1

XAI Beyond Classification: Interpretable Neural Clustering

no code implementations22 Aug 2018 Xi Peng, Yunnan Li, Ivor W. Tsang, Hongyuan Zhu, Jiancheng Lv, Joey Tianyi Zhou

The second is implementing discrete $k$-means with a differentiable neural network that embraces the advantages of parallel computing, online clustering, and clustering-favorable representation learning.

Classification Clustering +3

TORNADO: A Spatio-Temporal Convolutional Regression Network for Video Action Proposal

no code implementations ICCV 2017 Hongyuan Zhu, Romain Vial, Shijian Lu

Recently, the regression-based object detectors and long-term recurrent convolutional network (LRCN) have demonstrated superior performance in human action detection and recognition.

Action Detection regression

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

no code implementations26 Jun 2017 Hongyuan Zhu, Romain Vial, Shijian Lu, Yonghong Tian, Xian-Bin Cao

In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action.

Optical Flow Estimation regression

Discriminative Multi-Modal Feature Fusion for RGBD Indoor Scene Recognition

no code implementations CVPR 2016 Hongyuan Zhu, Jean-Baptiste Weibel, Shijian Lu

RGBD scene recognition has attracted increasingly attention due to the rapid development of depth sensors and their wide application scenarios.

Image Segmentation Object Recognition +3

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations16 Jul 2015 Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object object-detection +1

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations3 Feb 2015 Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Image Segmentation Segmentation +1

