Search Results for author: Jiarui Zhang

Found 21 papers, 8 papers with code

Exploring Perceptual Limitation of Multimodal Large Language Models

1 code implementation12 Feb 2024 Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip Ilievski, Maosong Sun

Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception.

Object Question Answering

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

1 code implementation22 Jan 2024 Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara

While large language models (LLMs) are still being adopted to new domains and utilized in novel applications, we are experiencing an influx of the new generation of foundation models, namely multi-modal large language models (MLLMs).

Passive Non-Line-of-Sight Imaging with Light Transport Modulation

no code implementations26 Dec 2023 Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

In this work, we propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network.

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

no code implementations13 Dec 2023 Jinta Weng, Jiarui Zhang, Yue Hu, Daidong Fa, Xiaofeng Xuand, Heyan Huang

In interaction with large language models, embedding more task-related information into prompts will make it easier to stimulate knowledge embedded in large language models.

Language Modelling Large Language Model +1

Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

2 code implementations24 Oct 2023 Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to 46% with size.

Question Answering Visual Question Answering

PINN-based viscosity solution of HJB equation

no code implementations18 Sep 2023 Tianyu Liu, Steven Ding, Jiarui Zhang, Liutao Zhou

This paper proposed a novel PINN-based viscosity solution for HJB equations.

Differentially private sliced inverse regression in the federated paradigm

no code implementations10 Jun 2023 Shuaida He, Jiarui Zhang, Xin Chen

Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool.

Dimensionality Reduction regression

A Study of Situational Reasoning for Traffic Understanding

1 code implementation5 Jun 2023 Jiarui Zhang, Filip Ilievski, Kaixin Ma, Aravinda Kollaa, Jonathan Francis, Alessandro Oltramari

Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure.

Decision Making Knowledge Graphs +2

Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

no code implementations31 May 2023 Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions?

Question Answering Visual Question Answering

Knowledge-enhanced Agents for Interactive Text Games

no code implementations8 May 2023 Prateek Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan Francis, Kaixin Ma

We experiment with four models on the 10 tasks in the ScienceWorld text-based game environment, to illustrate the impact of knowledge injection on various model configurations and challenging task settings.

Instruction Following Knowledge Graphs +5

Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings

2 code implementations ICCV 2023 Baixin Xu, Jiarui Zhang, Kwan-Yee Lin, Chen Qian, Ying He

To address this, we propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy, allowing for progressively capturing high-frequency geometric details.

3D Reconstruction Neural Rendering +1

Utilizing Background Knowledge for Robust Reasoning over Traffic Situations

1 code implementation4 Dec 2022 Jiarui Zhang, Filip Ilievski, Aravinda Kollaa, Jonathan Francis, Kaixin Ma, Alessandro Oltramari

Understanding novel situations in the traffic domain requires an intricate combination of domain-specific and causal commonsense knowledge.

Knowledge Graphs Multiple-choice +2

AReputation-Based Mechanism for Transaction Processing in Blockchain Systems

no code implementations journal 2022 Jiarui Zhang, Yukun Cheng, Xiaotie Deng

First, we modify the verification strategy so that nodes set a probability of verifying a received transaction considering the likelihood of it being spam: transactions from a node with a low reputation have a high probability of being verified.

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

no code implementations21 May 2022 Jiarui Zhang, Filip Ilievski, Kaixin Ma, Jonathan Francis, Alessandro Oltramari

In this paper, we study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.

Knowledge Graphs

CCPM: A Chinese Classical Poetry Matching Dataset

1 code implementation3 Jun 2021 Wenhao Li, Fanchao Qi, Maosong Sun, Xiaoyuan Yi, Jiarui Zhang

We hope this dataset can further enhance the study on incorporating deep semantics into the understanding and generation system of Chinese classical poetry.

Translation

Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation ($\text{POINT}^2$)

no code implementations10 Mar 2019 Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou

As the POI tracker is shift-invariant, $\text{POINT}^2$ is more robust to the initial pose of the 3D pre-intervention image.

Cannot find the paper you are looking for? You can Submit a new open access paper.