no code implementations • SIGDIAL (ACL) 2022 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent’s mental state transition and teach the agent to make decisions guided by its value like a human.
1 code implementation • 8 Jul 2024 • Yizhou Zhao, Hengwei Bian, Michael Mu, Mostofa R. Uddin, Zhenyang Li, Xiang Li, Tianyang Wang, Min Xu
In addition to prompt-based single-particle instance segmentation, our approach can automatically search for similar features, facilitating full tomogram semantic segmentation with only one prompt.
no code implementations • CVPR 2024 • Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj, Min Xu, Jimei Yang, Chun-Hao Paul Huang
Specifically, we design Human-aware Metric SLAM to reconstruct metric-scale camera poses and scene point clouds using camera-frame HMR as a strong prior, addressing depth, scale, and dynamic ambiguities.
no code implementations • 12 May 2024 • Zhenyang Li, Zilong Chen, Feifan Qu, Mingqing Wang, Yizhou Zhao, Kai Zhang, Yifan Peng
In NeRF-aided editing tasks, object movement presents difficulties in supervision generation due to the introduction of variability in object positions.
1 code implementation • ICCV 2023 • Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang
To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes.
no code implementations • 10 Dec 2022 • Yizhou Zhao, Qiaozi Gao, Liang Qiu, Govind Thattai, Gaurav S. Sukhatme
We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction.
no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
Temporal modeling is crucial for various video learning tasks.
no code implementations • 23 Jun 2022 • Yizhou Zhao, Steven Gong, Xiaofeng Gao, Wensi Ai, Song-Chun Zhu
With the recent progress of simulations by 3D modeling software and game engines, many researchers have focused on Embodied AI tasks in the virtual environment.
no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu
One-shot object detection aims at detecting novel objects according to merely one given instance.
no code implementations • 9 Mar 2022 • Yizhou Zhao, Liang Qiu, Wensi Ai, Pan Lu, Song-Chun Zhu
We propose a Spatial-Temporal And-Or graph (ST-AOG), a stochastic grammar model, to encode the contextual relationship between motion, emotion, and relation, forming a triangle in a conditional random field.
1 code implementation • 24 Jan 2022 • Zhiwei Jia, Kaixiang Lin, Yizhou Zhao, Qiaozi Gao, Govind Thattai, Gaurav Sukhatme
With the proposed Affordance-aware Multimodal Neural SLAM (AMSLAM) approach, we obtain more than 40% improvement over prior published work on the ALFRED benchmark and set a new state-of-the-art generalization performance at a success rate of 23. 48% on the test unseen scenes.
no code implementations • 12 Dec 2021 • Liang Qiu, Yizhou Zhao, Jinchao Li, Pan Lu, Baolin Peng, Jianfeng Gao, Song-Chun Zhu
To the best of our knowledge, ValueNet is the first large-scale text dataset for human value modeling, and we are the first one trying to incorporate a value model into emotionally intelligent dialogue systems.
1 code implementation • 12 Dec 2021 • Yizhou Zhao, Liang Qiu, Pan Lu, Feng Shi, Tian Han, Song-Chun Zhu
Current pre-training methods in computer vision focus on natural images in the daily-life context.
2 code implementations • 10 Nov 2021 • Yizhou Zhao, Kaixiang Lin, Zhiwei Jia, Qiaozi Gao, Govind Thattai, Jesse Thomason, Gaurav S. Sukhatme
However, current simulators for Embodied AI (EAI) challenges only provide simulated indoor scenes with a limited number of layouts.
1 code implementation • 25 Oct 2021 • Pan Lu, Liang Qiu, Jiaqi Chen, Tony Xia, Yizhou Zhao, Wei zhang, Zhou Yu, Xiaodan Liang, Song-Chun Zhu
Also, we develop a strong IconQA baseline Patch-TRM that applies a pyramid cross-modal Transformer with input diagram embeddings pre-trained on the icon dataset.
Ranked #1 on
Visual Question Answering (VQA)
on IconQA
1 code implementation • 15 Jul 2021 • Feng Shi, Chonghan Lee, Liang Qiu, Yizhou Zhao, Tianyi Shen, Shivran Muralidhar, Tian Han, Song-Chun Zhu, Vijaykrishnan Narayanan
The cognitive system for human action and behavior has evolved into a deep learning regime, and especially the advent of Graph Convolution Networks has transformed the field in recent years.
no code implementations • ACL 2021 • Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu
Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly.
Ranked #5 on
Dialog Relation Extraction
on DialogRE
no code implementations • 12 Mar 2021 • Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu
One of which is to track the agent's mental state transition and teach the agent to make decisions guided by its value like a human.
no code implementations • 19 Jan 2021 • Yizhou Zhao, Hua Sun
The identity of the dropped users is not known a priori and the server needs to securely recover the sum of the remaining surviving users.
no code implementations • 18 Nov 2020 • Yizhou Zhao, Song-Chun Zhu
We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy.
1 code implementation • EMNLP 2020 • Liang Qiu, Yizhou Zhao, Weiyan Shi, Yuan Liang, Feng Shi, Tao Yuan, Zhou Yu, Song-Chun Zhu
Inducing a meaningful structural representation from one or a set of dialogues is a crucial but challenging task in computational linguistics.
no code implementations • 24 Jul 2020 • Xiaofeng Gao, Ran Gong, Yizhou Zhao, Shu Wang, Tianmin Shu, Song-Chun Zhu
Thus, in this paper, we propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations, where the robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications based on its online Bayesian inference of the user's mental state.
Bayesian Inference
Explainable Artificial Intelligence (XAI)
+1
no code implementations • 25 Sep 2019 • Feng Shi, Yizhou Zhao, Ziheng Xu, Tianyang Liu, Song-Chun Zhu
Graph Neural Networks as a combination of Graph Signal Processing and Deep Convolutional Networks shows great power in pattern recognition in non-Euclidean domains.