no code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Xinyue Cui, Lihong Li
In this paper, we advocate the use of explicit memory for efficient exploration in reinforcement learning.
no code implementations • 14 Nov 2024 • Yuyao Liu, Jiayuan Mao, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects.
no code implementations • 30 Oct 2024 • Xiaolin Fang, Bo-Ruei Huang, Jiayuan Mao, Jasmine Shone, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
In this paper, we propose KALM, a framework that leverages large pre-trained vision-language models (LMs) to automatically generate task-relevant and cross-instance consistent keypoints.
no code implementations • 14 Oct 2024 • Morris Yau, Ekin Akyürek, Jiayuan Mao, Joshua B. Tenenbaum, Stefanie Jegelka, Jacob Andreas
Our findings bridge a critical gap between theoretical expressivity and learnability of Transformers, and show that flexible and general models of computation are efficiently learnable.
3 code implementations • 9 Oct 2024 • Manling Li, Shiyu Zhao, Qineng Wang, Kangrui Wang, Yu Zhou, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
1 code implementation • 26 Sep 2024 • Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao, Natasha Jaques
We propose a new framework, Follow Instructions with Social and Embodied Reasoning (FISER), aiming for better natural language instruction following in collaborative embodied tasks.
no code implementations • 12 Sep 2024 • Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, Jiajun Wu
A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them.
1 code implementation • 11 Sep 2024 • Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories.
no code implementations • 17 Jun 2024 • Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum
We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization.
no code implementations • 20 May 2024 • Yiqing Xu, Jiayuan Mao, Yilun Du, Tomas Lozáno-Pérez, Leslie Pack Kaebling, David Hsu
This paper studies the challenge of developing robots capable of understanding under-specified instructions for creating functional object arrangements, such as "set up a dining table for two"; previous arrangement approaches have focused on much more explicit instructions, such as "put object A on the table."
no code implementations • 11 May 2024 • Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lionel Wong, Joshua B. Tenenbaum, Roger P. Levy
Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency.
no code implementations • 9 May 2024 • Weiyu Liu, Jiayuan Mao, Joy Hsu, Tucker Hermans, Animesh Garg, Jiajun Wu
In this paper, we propose composable part-based manipulation (CPM), a novel approach that leverages object-part decomposition and part-part correspondences to improve learning and generalization of robotic manipulation skills.
no code implementations • 6 May 2024 • Weiyu Liu, Geng Chen, Joy Hsu, Jiayuan Mao, Jiajun Wu
This paper presents a framework for learning state and action abstractions in sequential decision-making domains.
no code implementations • 25 Mar 2024 • Yanwei Wang, Tsun-Hsuan Wang, Jiayuan Mao, Michael Hagenow, Julie Shah
Grounding the common-sense reasoning of Large Language Models (LLMs) in physical domains remains a pivotal yet unsolved problem for embodied AI.
no code implementations • 13 Dec 2023 • Lionel Wong, Jiayuan Mao, Pratyusha Sharma, Zachary S. Siegel, Jiahai Feng, Noa Korneev, Joshua B. Tenenbaum, Jacob Andreas
Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand.
1 code implementation • NeurIPS 2023 • Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
Goal-conditioned policies are generally understood to be "feed-forward" circuits, in the form of neural networks that map from the current state and the goal specification to the next action to take.
no code implementations • 6 Nov 2023 • Jiayuan Mao, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
Humans demonstrate an impressive ability to acquire and generalize manipulation "tricks."
1 code implementation • 24 Oct 2023 • Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor.
Visual Question Answering (VQA) Split A
Visual Question Answering (VQA) Split B
+1
2 code implementations • 12 Oct 2023 • Po-Chen Ko, Jiayuan Mao, Yilun Du, Shao-Hua Sun, Joshua B. Tenenbaum
In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations.
no code implementations • 5 Oct 2023 • Yanming Wan, Jiayuan Mao, Joshua B. Tenenbaum
We introduce HandMeThat, a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
no code implementations • 5 Oct 2023 • Jiayuan Mao, Xuelin Yang, Xikun Zhang, Noah D. Goodman, Jiajun Wu
First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments.
no code implementations • 2 Sep 2023 • Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning.
no code implementations • 26 Apr 2023 • Renhao Wang, Jiayuan Mao, Joy Hsu, Hang Zhao, Jiajun Wu, Yang Gao
Robots operating in the real world require both rich manipulation skills as well as the ability to semantically reason about when to apply those skills.
1 code implementation • CVPR 2023 • Joy Hsu, Jiayuan Mao, Jiajun Wu
Different functional modules in the programs are implemented as neural networks.
no code implementations • 9 Mar 2023 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
Reasoning about the relationships between entities from input facts (e. g., whether Ari is a grandparent of Charlie) generally requires explicit consideration of other entities that are not mentioned in the query (e. g., the parents of Charlie).
no code implementations • 9 Mar 2023 • Jiayuan Mao, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
This paper studies a model learning and online planning approach towards building flexible and general robots.
no code implementations • 9 Mar 2023 • Zhezheng Luo, Jiayuan Mao, Joshua B. Tenenbaum, Leslie Pack Kaelbling
Next, we analyze the learning properties of these neural networks, especially focusing on how they can be trained on a finite set of small graphs and generalize to larger graphs, which we term structural generalization.
no code implementations • 9 Mar 2023 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomás Lozano-Pérez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals.
no code implementations • 3 Feb 2023 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Ran Zhang, Chin-Yi Cheng, Jiajun Wu
Human-designed visual manuals are crucial components in shape assembly activities.
no code implementations • 25 Jul 2022 • Ruocheng Wang, Yunzhi Zhang, Jiayuan Mao, Chin-Yi Cheng, Jiajun Wu
We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions.
no code implementations • CVPR 2022 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
We introduce Programmatic Motion Concepts, a hierarchical motion representation for human actions that captures both low-level motion and high-level description as motion concepts.
no code implementations • ICLR 2022 • Lingjie Mei, Jiayuan Mao, Ziqi Wang, Chuang Gan, Joshua B. Tenenbaum
We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts.
no code implementations • NeurIPS 2021 • Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum
We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts.
no code implementations • 29 Sep 2021 • Guangxuan Xiao, Leslie Pack Kaelbling, Jiajun Wu, Jiayuan Mao
To leverage the sparsity in hypergraph neural networks, SpaLoc represents the grounding of relationships such as parent and grandparent as sparse tensors and uses neural networks and finite-domain quantification operations to infer new facts based on the input.
no code implementations • 29 Sep 2021 • Zhezheng Luo, Jiayuan Mao, Joshua B. Tenenbaum, Leslie Pack Kaelbling
Our first contribution is a fine-grained analysis of the expressiveness of these neural networks, that is, the set of functions that they can realize and the set of problems that they can solve.
no code implementations • 29 Sep 2021 • Zhezheng Luo, Jiayuan Mao, Jiajun Wu, Tomas Perez, Joshua B. Tenenbaum, Leslie Pack Kaelbling
We present a framework for learning compositional, rational skill models (RatSkills) that support efficient planning and inverse planning for achieving novel goals and recognizing activities.
no code implementations • 10 Jun 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman
We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events.
no code implementations • CVPR 2021 • Sumith Kulal, Jiayuan Mao, Alex Aiken, Jiajun Wu
We posit that adding higher-level motion primitives, which can capture natural coarser units of motion such as backswing or follow-through, can be used to improve downstream analysis tasks.
no code implementations • 30 Mar 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
no code implementations • 1 Jan 2021 • Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer Ullman
We aim to learn generalizable representations for complex activities by quantifying over both entities and time, as in “the kicker is behind all the other players,” or “the player controls the ball until it moves toward the goal.” Such a structural inductive bias of object relations, object quantification, and temporal orders will enable the learned representation to generalize to situations with varying numbers of agents, objects, and time courses.
no code implementations • ICLR 2021 • Zhenfang Chen, Jiayuan Mao, Jiajun Wu, Kwan-Yee Kenneth Wong, Joshua B. Tenenbaum, Chuang Gan
We study the problem of dynamic visual reasoning on raw videos.
no code implementations • Findings (ACL) 2021 • Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu
These object-centric concepts derived from language facilitate the learning of object-centric representations.
no code implementations • 21 Dec 2020 • Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan
In contrast, symbolic and modular models have a relatively better grounding and robustness, though at the cost of accuracy.
no code implementations • NeurIPS 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Noah Snavely, Jiajun Wu
We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene.
no code implementations • CVPR 2020 • Yikai Li, Jiayuan Mao, Xiuming Zhang, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
We study the inverse graphics problem of inferring a holistic representation for natural images.
1 code implementation • NeurIPS 2019 • Chi Han, Jiayuan Mao, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu
Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i. e., the color).
no code implementations • ICCV 2019 • Jiayuan Mao, Xiuming Zhang, Yikai Li, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
Humans are capable of building holistic representations for images at various levels, from local objects, to pairwise relations, to global structures.
no code implementations • 17 Jun 2019 • Sidi Lu, Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu
In this paper, we propose a hybrid inference algorithm, the Neurally-Guided Structure Inference (NG-SI), keeping the advantages of both search-based and data-driven methods.
no code implementations • ACL 2019 • Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu
We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text.
2 code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou
We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning.
2 code implementations • ICLR 2019 • Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun Wu
To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation.
Ranked #6 on
Visual Question Answering (VQA)
on CLEVR
1 code implementation • 11 Apr 2019 • Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei LI, Weiwei Sun, Wei-Ying Ma
We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts.
no code implementations • 6 Nov 2018 • Jiangtao Feng, Lingpeng Kong, Po-Sen Huang, Chong Wang, Da Huang, Jiayuan Mao, Kan Qiao, Dengyong Zhou
We also design an efficient dynamic programming algorithm to decode segments that allows the model to be trained faster than the existing neural phrase-based machine translation method by Huang et al. (2018).
4 code implementations • ECCV 2018 • Borui Jiang, Ruixuan Luo, Jiayuan Mao, Tete Xiao, Yuning Jiang
The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes.
Ranked #185 on
Object Detection
on COCO test-dev
1 code implementation • COLING 2018 • Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun
Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e. g., MS-COCO) both quantitatively and qualitatively.
no code implementations • ICLR 2018 • Jiayuan Mao, Honghua Dong, Joseph J. Lim
Recent state-of-the-art reinforcement learning algorithms are trained under the goal of excelling in one specific task.
Hierarchical Reinforcement Learning
reinforcement-learning
+2
no code implementations • CVPR 2017 • Jiayuan Mao, Tete Xiao, Yuning Jiang, Zhimin Cao
Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods.
Ranked #15 on
Pedestrian Detection
on Caltech