no code implementations • Findings (NAACL) 2022 • Liwen Zhang, Zixia Jia, Wenjuan Han, Zilong Zheng, Kewei Tu
Adversarial attack of structured prediction models faces various challenges such as the difficulty of perturbing discrete words, the sentence quality issue, and the sensitivity of outputs to small perturbations.
no code implementations • 18 Sep 2023 • Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao
Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration.
no code implementations • 27 Jun 2023 • Shuwen Qiu, Song-Chun Zhu, Zilong Zheng
We design an explicit mind module that can track three-level beliefs -- the speaker's belief, the speaker's prediction of the listener's belief, and the common belief based on the gap between the first two.
no code implementations • 15 Jun 2023 • Hengli Li, Song-Chun Zhu, Zilong Zheng
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings that frequently arise in real-life conversations and is essential for the development of communicative social agents.
no code implementations • 4 Jun 2023 • Jianghui Wang, Yuxuan Wang, Dongyan Zhao, Zilong Zheng
We introduce MoviePuzzle, a novel challenge that targets visual narrative reasoning and holistic movie understanding.
1 code implementation • 30 May 2023 • Yuxuan Wang, Jianghui Wang, Dongyan Zhao, Zilong Zheng
We introduce CDBERT, a new learning paradigm that enhances the semantics understanding ability of the Chinese PLMs with dictionary knowledge and structure of Chinese characters.
1 code implementation • 30 May 2023 • Yuxuan Wang, Zilong Zheng, Xueliang Zhao, Jinpeng Li, Yueqian Wang, Dongyan Zhao
Video-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues.
1 code implementation • 24 May 2023 • Xiaojuan Tang, Zilong Zheng, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, Muhan Zhang
On the whole, our analysis provides a novel perspective on the role of semantics in developing and evaluating language models' reasoning abilities.
1 code implementation • 17 Dec 2022 • Zixia Jia, Zhaohui Yan, Wenjuan Han, Zilong Zheng, Kewei Tu
Prior works on joint Information Extraction (IE) typically model instance (e. g., event triggers, entities, roles, relations) interactions by representation enhancement, type dependencies scoring, or global decoding.
1 code implementation • 14 Oct 2022 • Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li, Yitao Liang, Song-Chun Zhu, Siyuan Huang
We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answering in 3D Scenes (SQA3D).
Ranked #1 on
Referring Expression
on SQA3D
1 code implementation • 7 Sep 2022 • Yanzeng Li, Zilong Zheng, Wenjuan Han, Lei Zou
Semantic Web technology has successfully facilitated many RDF models with rich data representation methods.
1 code implementation • CVPR 2022 • Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng
Our goal is to bridge the visual scene graphs and linguistic dependency trees seamlessly.
no code implementations • ICLR 2022 • Bo Wan, Wenjuan Han, Zilong Zheng, Tinne Tuytelaars
We introduce a new task, unsupervised vision-language (VL) grammar induction.
1 code implementation • 25 Jun 2021 • Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes
In this paper, to model the uncertainty of visual saliency, we study the saliency prediction problem from the perspective of generative models by learning a conditional probability distribution over the saliency map given an input image, and treating the saliency prediction as a sampling process from the learned distribution.
no code implementations • CVPR 2021 • Zilong Zheng, Jianwen Xie, Ping Li
Exploiting internal statistics of a single natural image has long been recognized as a significant research paradigm where the goal is to learn the distribution of patches within the image without relying on external training data.
1 code implementation • CVPR 2021 • Lifeng Fan, Shuwen Qiu, Zilong Zheng, Tao Gao, Song-Chun Zhu, Yixin Zhu
By aggregating different beliefs and true world states, our model essentially forms "five minds" during the interactions between two agents.
no code implementations • 7 Mar 2021 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu
This paper studies the unsupervised cross-domain translation problem by proposing a generative framework, in which the probability distribution of each domain is represented by a generative cooperative network that consists of an energy-based model and a latent variable model.
no code implementations • 29 Dec 2020 • Jianwen Xie, Zilong Zheng, Ping Li
In this paper, we propose to learn a variational auto-encoder (VAE) to initialize the finite-step MCMC, such as Langevin dynamics that is derived from the energy function, for efficient amortized sampling of the EBM.
no code implementations • 25 Dec 2020 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
3D data that contains rich geometry information of objects and scenes is valuable for understanding 3D physical world.
no code implementations • 25 Apr 2020 • Tao Yuan, Hangxin Liu, Lifeng Fan, Zilong Zheng, Tao Gao, Yixin Zhu, Song-Chun Zhu
Aiming to understand how human (false-)belief--a core socio-cognitive ability--would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs.
1 code implementation • CVPR 2021 • Jianwen Xie, Yifei Xu, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
We propose a generative model of unordered point sets, such as point clouds, in the form of an energy-based model, where the energy function is parameterized by an input-permutation-invariant bottom-up neural network.
no code implementations • 26 Nov 2019 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
To model the motions explicitly, it is natural for the model to be based on the motions or the displacement fields of the pixels.
1 code implementation • CVPR 2019 • Zilong Zheng, Wenguan Wang, Siyuan Qi, Song-Chun Zhu
The answer to a given question is represented by a node with missing value.
Ranked #14 on
Visual Dialog
on VisDial v0.9 val
no code implementations • 7 Feb 2019 • Jianwen Xie, Zilong Zheng, Xiaolin Fang, Song-Chun Zhu, Ying Nian Wu
This paper studies the problem of learning the conditional distribution of a high-dimensional output given an input, where the output and input may belong to two different domains, e. g., the output is a photo image and the input is a sketch image.
no code implementations • 27 Dec 2018 • Jianwen Xie, Ruiqi Gao, Zilong Zheng, Song-Chun Zhu, Ying Nian Wu
The non-linear transformation of this transition model can be parametrized by a feedforward neural network.
1 code implementation • CVPR 2018 • Jianwen Xie, Zilong Zheng, Ruiqi Gao, Wenguan Wang, Song-Chun Zhu, Ying Nian Wu
This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for modeling volumetric shape patterns.