Search Results for author: Shijia Huang

Found 8 papers, 7 papers with code

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

1 code implementation • 5 Dec 2023 • Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities.

229

Paper
Code

Towards Learning a Generalist Model for Embodied Navigation

2 code implementations • 4 Dec 2023 • Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, LiWei Wang

We conduct extensive experiments to evaluate the performance and generalizability of our model.

Ranked #1 on Visual Navigation on Cooperative Vision-and-Dialogue Navigation

3D Question Answering (3D-QA) Embodied Question Answering +3

Paper
Code

CLEVA: Chinese Language Models EVAluation Platform

1 code implementation • 9 Aug 2023 • Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, LiWei Wang

With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue.

Paper
Code

MP-Former: Mask-Piloted Transformer for Image Segmentation

1 code implementation • CVPR 2023 • Hao Zhang, Feng Li, Huaizhe xu, Shijia Huang, Shilong Liu, Lionel M. Ni, Lei Zhang

We present a mask-piloted Transformer which improves masked-attention in Mask2Former for image segmentation.

Image Segmentation Segmentation +1

105

Paper
Code

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

1 code implementation • 28 Nov 2022 • Shilong Liu, Yaoyuan Liang, Feng Li, Shijia Huang, Hao Zhang, Hang Su, Jun Zhu, Lei Zhang

As phrase extraction can be regarded as a $1$D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction.

Ranked #7 on Referring Expression Comprehension on RefCOCO

object-detection Object Detection +4

Paper
Code

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation

no code implementations • 15 Nov 2022 • Shijia Huang, Feng Li, Hao Zhang, Shilong Liu, Lei Zhang, LiWei Wang

Our mutual supervision contains two directions.

Reference Expression Generation Referring Expression +2

Paper
Add Code

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

1 code implementation • 6 Apr 2022 • Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia

First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features.

Ranked #1 on 3D Object Detection From Stereo Images on KITTI Cyclists Moderate

3D Object Detection From Stereo Images Relation

Paper
Code

Multi-View Transformer for 3D Visual Grounding

1 code implementation • CVPR 2022 • Shijia Huang, Yilun Chen, Jiaya Jia, LiWei Wang

The multi-view space enables the network to learn a more robust multi-modal representation for 3D visual grounding and eliminates the dependence on specific views.

Visual Grounding

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.