Search Results for author: Peihao Chen

Found 19 papers, 11 papers with code

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

Paper
Add Code

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

no code implementations • 16 Jan 2024 • Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan

Human beings possess the capability to multiply a melange of multisensory cues while actively exploring and interacting with the 3D world.

Language Modelling Large Language Model

Paper
Add Code

SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector

1 code implementation • 14 Dec 2023 • Shuailei Ma, Yuefeng Wang, Ying WEI, Jiaqi Fan, Enming Zhang, Xinyu Sun, Peihao Chen

Ablation experiments demonstrate that both of them are effective in mitigating the impact of open-world knowledge distillation on the learning of known objects.

Knowledge Distillation Object +3

Paper
Code

DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning

no code implementations • 10 Dec 2023 • Kunyang Lin, Yufeng Wang, Peihao Chen, Runhao Zeng, Siyuan Zhou, Mingkui Tan, Chuang Gan

In this paper, we propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents by utilizing intrinsic rewards to learn the optimal policy for each agent.

Multi-agent Reinforcement Learning reinforcement-learning +2

Paper
Add Code

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

no code implementations • 6 Nov 2023 • Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan

A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.

CoLA Question Answering +5

Paper
Add Code

$A^2$Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models

no code implementations • 15 Aug 2023 • Peihao Chen, Xinyu Sun, Hongyan Zhi, Runhao Zeng, Thomas H. Li, Gaowen Liu, Mingkui Tan, Chuang Gan

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction annotation data.

Navigate Robot Navigation +1

Paper
Add Code

3D-LLM: Injecting the 3D World into Large Language Models

5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

3D Question Answering (3D-QA) Dense Captioning +1

747

Paper
Code

Learning Vision-and-Language Navigation from YouTube Videos

1 code implementation • ICCV 2023 • Kunyang Lin, Peihao Chen, Diwei Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

In this paper, we propose to learn an agent from these videos by creating a large-scale dataset which comprises reasonable path-instruction pairs from house tour videos and pre-training the agent on it.

Navigate Vision and Language Navigation

Paper
Code

Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition

1 code implementation • 20 Jul 2023 • Weidong Chen, Xiaofen Xing, Peihao Chen, Xiangmin Xu

Although PTMs shed new light on artificial general intelligence, they are constructed with general tasks in mind, and thus, their efficacy for specific tasks can be further improved.

Speech Emotion Recognition

Paper
Code

Detecting the open-world objects with the help of the Brain

1 code implementation • 21 Mar 2023 • Shuailei Ma, Yuefeng Wang, Ying WEI, Peihao Chen, Zhixiang Ye, Jiaqi Fan, Enming Zhang, Thomas H. Li

We propose leveraging the VL as the ``Brain'' of the open-world detector by simply generating unknown labels.

Object object-detection +1

Paper
Code

Learning Active Camera for Multi-Object Navigation

no code implementations • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Weiwen Hu, Wenbing Huang, Thomas H. Li, Mingkui Tan, Chuang Gan

How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics.

Navigate Object

Paper
Add Code

Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

1 code implementation • 14 Oct 2022 • Peihao Chen, Dongyu Ji, Kunyang Lin, Runhao Zeng, Thomas H. Li, Mingkui Tan, Chuang Gan

To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.

Navigate Vision and Language Navigation

Paper
Code

Masked Motion Encoding for Self-Supervised Video Representation Learning

2 code implementations • CVPR 2023 • Xinyu Sun, Peihao Chen, LiangWei Chen, Changhao Li, Thomas H. Li, Mingkui Tan, Chuang Gan

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

Ranked #2 on Self-Supervised Action Recognition on HMDB51

Optical Flow Estimation Representation Learning +2

Paper
Code

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

1 code implementation • 27 Oct 2020 • Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition.

Ranked #11 on Self-Supervised Action Recognition on UCF101

Representation Learning Retrieval +2

Paper
Code

Location-aware Graph Convolutional Networks for Video Question Answering

1 code implementation • 7 Aug 2020 • Deng Huang, Peihao Chen, Runhao Zeng, Qing Du, Mingkui Tan, Chuang Gan

In this work, we propose to represent the contents in the video as a location-aware graph by incorporating the location information of an object into the graph construction.

Action Recognition graph construction +3

Paper
Code

Foley Music: Learning to Generate Music from Videos

no code implementations • ECCV 2020 • Chuang Gan, Deng Huang, Peihao Chen, Joshua B. Tenenbaum, Antonio Torralba

In this paper, we introduce Foley Music, a system that can synthesize plausible music for a silent video clip about people playing musical instruments.

Music Generation Translation

Paper
Add Code

Generating Visually Aligned Sound from Videos

1 code implementation • 14 Jul 2020 • Peihao Chen, Yang Zhang, Mingkui Tan, Hongdong Xiao, Deng Huang, Chuang Gan

During testing, the audio forwarding regularizer is removed to ensure that REGNET can produce purely aligned sound only from visual features.

Paper
Code

Dense Regression Network for Video Grounding

1 code implementation • CVPR 2020 • Runhao Zeng, Haoming Xu, Wenbing Huang, Peihao Chen, Mingkui Tan, Chuang Gan

The key idea of this paper is to use the distances between the frame within the ground truth and the starting (ending) frame as dense supervisions to improve the video grounding accuracy.

Ranked #6 on Natural Language Moment Retrieval on ActivityNet Captions

Natural Language Moment Retrieval Natural Language Queries +2

Paper
Code

Self-supervised Moving Vehicle Tracking with Stereo Sound

no code implementations • ICCV 2019 • Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

At test time, the stereo-sound student network can work independently to perform object localization us-ing just stereo audio and camera meta-data, without any visual input.

Object Localization Visual Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.