Search Results for author: Jiarui Zhang

Found 27 papers, 9 papers with code

Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions

no code implementations11 Dec 2024 Jiarui Zhang, Ollie Liu, Tianyu Yu, Jinyi Hu, Willie Neiswanger

For instance, Euclid outperforms the best closed-source model, Gemini-1. 5-Pro, by up to 58. 56% on certain Geoperception benchmark tasks and 10. 65% on average across all tasks.

Medical Image Analysis

Guided Profile Generation Improves Personalization with LLMs

no code implementations19 Sep 2024 Jiarui Zhang

In modern commercial systems, including Recommendation, Ranking, and E-Commerce platforms, there is a trend towards improving customer experiences by incorporating Personalization context as input into Large Language Models (LLMs).

Descriptive Profile Generation

EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs

no code implementations30 Aug 2024 Zhen Fan, Peng Dai, Zhuo Su, Xu Gao, Zheng Lv, Jiarui Zhang, Tianyuan Du, Guidong Wang, Yang Zhang

Specifically, EMHI provides synchronized stereo images from downward-sloping cameras on the headset and IMU data from body-worn sensors, along with pose annotations in SMPL format.

Pose Estimation

Application of Data-Driven Model Predictive Control for Autonomous Vehicle Steering

no code implementations11 Jul 2024 Jiarui Zhang, Aijing Kong, Yu Tang, Zhichao Lv, Lulu Guo, Peng Hang

With the development of autonomous driving technology, there are increasing demands for vehicle control, and MPC has become a widely researched topic in both industry and academia.

Autonomous Driving Model Predictive Control +1

Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection

no code implementations16 May 2024 Jiarui Zhang, Shaojuan Wu, Xiaowang Zhang, Zhiyong Feng

Then, based on masked language model prediction, we present a target-aware relative stance sample generation method for obtaining relative bias.

Contrastive Learning counterfactual +4

MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning

1 code implementation21 Apr 2024 Yifan Jiang, Jiarui Zhang, Kexuan Sun, Zhivar Sourati, Kian Ahrabian, Kaixin Ma, Filip Ilievski, Jay Pujara

Further analysis of perception questions reveals that MLLMs struggle to comprehend the visual features (near-random performance) and even count the panels in the puzzle ( <45%), hindering their ability for abstract reasoning.

Visual Reasoning

Exploring Perceptual Limitation of Multimodal Large Language Models

1 code implementation12 Feb 2024 Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip Ilievski, Maosong Sun

Multimodal Large Language Models (MLLMs) have recently shown remarkable perceptual capability in answering visual questions, however, little is known about the limits of their perception.

Object Question Answering

The Curious Case of Nonverbal Abstract Reasoning with Multi-Modal Large Language Models

no code implementations22 Jan 2024 Kian Ahrabian, Zhivar Sourati, Kexuan Sun, Jiarui Zhang, Yifan Jiang, Fred Morstatter, Jay Pujara

While large language models (LLMs) are still being adopted to new domains and utilized in novel applications, we are experiencing an influx of the new generation of foundation models, namely multi-modal large language models (MLLMs).

Passive Non-Line-of-Sight Imaging with Light Transport Modulation

1 code implementation26 Dec 2023 Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

In this work, we propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network.

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

no code implementations13 Dec 2023 Jinta Weng, Jiarui Zhang, Yue Hu, Daidong Fa, Xiaofeng Xuand, Heyan Huang

In interaction with large language models, embedding more task-related information into prompts will make it easier to stimulate knowledge embedded in large language models.

Language Modeling Language Modelling +2

Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs

2 code implementations24 Oct 2023 Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

In particular, we show that their zero-shot accuracy in answering visual questions is very sensitive to the size of the visual subject of the question, declining up to 46% with size.

Question Answering Visual Question Answering

PINN-based viscosity solution of HJB equation

no code implementations18 Sep 2023 Tianyu Liu, Steven Ding, Jiarui Zhang, Liutao Zhou

This paper proposed a novel PINN-based viscosity solution for HJB equations.

Differentially private sliced inverse regression in the federated paradigm

no code implementations10 Jun 2023 Shuaida He, Jiarui Zhang, Xin Chen

Sliced inverse regression (SIR), which includes linear discriminant analysis (LDA) as a special case, is a popular and powerful dimension reduction tool.

Dimensionality Reduction regression

A Study of Situational Reasoning for Traffic Understanding

1 code implementation5 Jun 2023 Jiarui Zhang, Filip Ilievski, Kaixin Ma, Aravinda Kollaa, Jonathan Francis, Alessandro Oltramari

Intelligent Traffic Monitoring (ITMo) technologies hold the potential for improving road safety/security and for enabling smart city infrastructure.

Decision Making Knowledge Graphs +2

Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

no code implementations31 May 2023 Jiarui Zhang, Mahyar Khayatkhoei, Prateek Chhikara, Filip Ilievski

As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions?

Question Answering Visual Question Answering

Knowledge-enhanced Agents for Interactive Text Games

no code implementations8 May 2023 Prateek Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan Francis, Kaixin Ma

We experiment with four models on the 10 tasks in the ScienceWorld text-based game environment, to illustrate the impact of knowledge injection on various model configurations and challenging task settings.

Instruction Following Knowledge Graphs +5

Deformable Model-Driven Neural Rendering for High-Fidelity 3D Reconstruction of Human Heads Under Low-View Settings

2 code implementations ICCV 2023 Baixin Xu, Jiarui Zhang, Kwan-Yee Lin, Chen Qian, Ying He

To address this, we propose geometry decomposition and adopt a two-stage, coarse-to-fine training strategy, allowing for progressively capturing high-frequency geometric details.

3D Reconstruction Neural Rendering +1

Utilizing Background Knowledge for Robust Reasoning over Traffic Situations

1 code implementation4 Dec 2022 Jiarui Zhang, Filip Ilievski, Aravinda Kollaa, Jonathan Francis, Kaixin Ma, Alessandro Oltramari

Understanding novel situations in the traffic domain requires an intricate combination of domain-specific and causal commonsense knowledge.

Knowledge Graphs Multiple-choice +2

AReputation-Based Mechanism for Transaction Processing in Blockchain Systems

no code implementations journal 2022 Jiarui Zhang, Yukun Cheng, Xiaotie Deng

First, we modify the verification strategy so that nodes set a probability of verifying a received transaction considering the likelihood of it being spam: transactions from a node with a low reputation have a high probability of being verified.

An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs

no code implementations21 May 2022 Jiarui Zhang, Filip Ilievski, Kaixin Ma, Jonathan Francis, Alessandro Oltramari

In this paper, we study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.

Decoder Knowledge Graphs

CCPM: A Chinese Classical Poetry Matching Dataset

1 code implementation3 Jun 2021 Wenhao Li, Fanchao Qi, Maosong Sun, Xiaoyuan Yi, Jiarui Zhang

We hope this dataset can further enhance the study on incorporating deep semantics into the understanding and generation system of Chinese classical poetry.


Multiview 2D/3D Rigid Registration via a Point-Of-Interest Network for Tracking and Triangulation ($\text{POINT}^2$)

no code implementations10 Mar 2019 Haofu Liao, Wei-An Lin, Jiarui Zhang, Jingdan Zhang, Jiebo Luo, S. Kevin Zhou

As the POI tracker is shift-invariant, $\text{POINT}^2$ is more robust to the initial pose of the 3D pre-intervention image.

Cannot find the paper you are looking for? You can Submit a new open access paper.