Search Results for author: Yining Hong

Found 17 papers, 5 papers with code

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

Paper
Add Code

MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

no code implementations • 16 Jan 2024 • Yining Hong, Zishuo Zheng, Peihao Chen, Yian Wang, Junyan Li, Chuang Gan

Human beings possess the capability to multiply a melange of multisensory cues while actively exploring and interacting with the 3D world.

Language Modelling Large Language Model

Paper
Add Code

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

no code implementations • 8 Nov 2023 • Zhenfang Chen, Rui Sun, Wenjun Liu, Yining Hong, Chuang Gan

If not, we initialize a new module needed by the task and specify the inputs and outputs of this new module.

Question Answering Referring Expression +3

Paper
Add Code

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

no code implementations • 6 Nov 2023 • Junyan Li, Delin Chen, Yining Hong, Zhenfang Chen, Peihao Chen, Yikang Shen, Chuang Gan

A communication token is generated by the LLM following a visual entity or a relation, to inform the detection network to propose regions that are relevant to the sentence generated so far.

CoLA Question Answering +5

Paper
Add Code

3D-LLM: Injecting the 3D World into Large Language Models

5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

3D Question Answering (3D-QA) Dense Captioning +1

756

Paper
Code

3D Concept Learning and Reasoning from Multi-View Images

no code implementations • CVPR 2023 • Yining Hong, Chunru Lin, Yilun Du, Zhenfang Chen, Joshua B. Tenenbaum, Chuang Gan

We suggest that a principled approach for 3D reasoning from multi-view images should be to infer a compact 3D representation of the world from the multi-view images, which is further grounded on open-vocabulary semantic concepts, and then to execute reasoning on these 3D representations.

Question Answering Visual Question Answering +1

Paper
Add Code

See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning

no code implementations • 12 Jan 2023 • Zhenfang Chen, Qinhong Zhou, Yikang Shen, Yining Hong, Hao Zhang, Chuang Gan

The see stage scans the image and grounds the visual concept candidates with a visual perception model.

Few-Shot Learning Image Captioning +4

Paper
Add Code

3D Concept Grounding on Neural Fields

no code implementations • 13 Jul 2022 • Yining Hong, Yilun Du, Chunru Lin, Joshua B. Tenenbaum, Chuang Gan

Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks.

Instance Segmentation Question Answering +3

Paper
Add Code

Fixing Malfunctional Objects With Learned Physical Simulation and Functional Prediction

no code implementations • CVPR 2022 • Yining Hong, Kaichun Mo, Li Yi, Leonidas J. Guibas, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

Specifically, FixNet consists of a perception module to extract the structured representation from the 3D point cloud, a physical dynamics prediction module to simulate the results of interactions on 3D objects, and a functionality prediction module to evaluate the functionality and choose the correct fix.

Paper
Add Code

PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

no code implementations • NeurIPS 2021 • Yining Hong, Li Yi, Joshua B. Tenenbaum, Antonio Torralba, Chuang Gan

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies.

Instance Segmentation Object +2

Paper
Add Code

Neural-Symbolic Solver for Math Word Problems with Auxiliary Tasks

1 code implementation • ACL 2021 • Jinghui Qin, Xiaodan Liang, Yining Hong, Jianheng Tang, Liang Lin

Previous math word problem solvers following the encoder-decoder paradigm fail to explicitly incorporate essential math symbolic constraints, leading to unexplainable and unreasonable predictions.

Math

Paper
Code

VLGrammar: Grounded Grammar Induction of Vision and Language

1 code implementation • ICCV 2021 • Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang

In this work, we study grounded grammar induction of vision and language in a joint learning framework.

Clustering Contrastive Learning +3

Paper
Code

A Minimalist Dataset for Systematic Generalization of Perception, Syntax, and Semantics

no code implementations • 2 Mar 2021 • Qing Li, Siyuan Huang, Yining Hong, Yixin Zhu, Ying Nian Wu, Song-Chun Zhu

We believe the HINT dataset and the experimental findings are of great interest to the learning community on systematic generalization.

Few-Shot Learning Program Synthesis +1

Paper
Add Code

SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

no code implementations • 27 Dec 2020 • Yining Hong, Qing Li, Ran Gong, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability.

Math Mathematical Reasoning

Paper
Add Code

Learning by Fixing: Solving Math Word Problems with Weak Supervision

1 code implementation • 19 Dec 2020 • Yining Hong, Qing Li, Daniel Ciao, Siyuan Huang, Song-Chun Zhu

To generate more diverse solutions, \textit{tree regularization} is applied to guide the efficient shrinkage and exploration of the solution space, and a \textit{memory buffer} is designed to track and save the discovered various fixes for each problem.

Ranked #1 on Math Word Problem Solving on Math23K (weakly-supervised metric)

Math Weakly-supervised Learning

Paper
Code

A Competence-aware Curriculum for Visual Concepts Learning via Question Answering

no code implementations • ECCV 2020 • Qing Li, Siyuan Huang, Yining Hong, Song-Chun Zhu

Humans can progressively learn visual concepts from easy to hard questions.

Question Answering

Paper
Add Code

Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning

1 code implementation • ICML 2020 • Qing Li, Siyuan Huang, Yining Hong, Yixin Chen, Ying Nian Wu, Song-Chun Zhu

In this paper, we address these issues and close the loop of neural-symbolic learning by (1) introducing the \textbf{grammar} model as a \textit{symbolic prior} to bridge neural perception and symbolic reasoning, and (2) proposing a novel \textbf{back-search} algorithm which mimics the top-down human-like learning procedure to propagate the error through the symbolic reasoning module efficiently.

Question Answering Reinforcement Learning (RL) +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.