Search Results for author: Zhanyu Wang

Found 17 papers, 4 papers with code

MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

no code implementations4 Dec 2023 Ling Yang, Zhanyu Wang, Luping Zhou

Despite the success of Large Language Models (LLMs) in general image tasks, a gap persists in the medical field for a multimodal large model adept at handling the nuanced diversity of medical images.

Instruction Following Question Answering +1

Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models

no code implementations4 Dec 2023 Bingshuai Liu, Chenyang Lyu, Zijun Min, Zhanyu Wang, Jinsong Su, Longyue Wang

The advancement of Large Language Models (LLMs) has brought substantial attention to the Chain of Thought (CoT) approach, primarily due to its ability to enhance the capability of LLMs on complex reasoning tasks.

Question Answering Retrieval

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

no code implementations25 Nov 2023 Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu

While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation.

Instruction Following Language Modelling +7

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis

no code implementations31 Oct 2023 Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lei Wang, Lingqiao Liu, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou

This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding.

Descriptive Medical Visual Question Answering +3

R2GenGPT: Radiology Report Generation with Frozen LLMs

1 code implementation18 Sep 2023 Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou

First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM.

METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

no code implementations CVPR 2023 Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou

In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation.

Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

no code implementations4 Apr 2023 Yunyi Liu, Zhanyu Wang, Dong Xu, Luping Zhou

To bridge this gap, in this paper, we propose a new Transformer based framework for medical VQA (named as Q2ATransformer), which integrates the advantages of both the classification and the generation approaches and provides a unified treatment for the close-end and open-end questions.

Classification Medical Visual Question Answering +2

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

1 code implementation12 Oct 2022 Zhanyu Wang, Guang Cheng, Jordan Awan

Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure.

regression

MICO: Selective Search with Mutual Information Co-training

1 code implementation COLING 2022 Zhanyu Wang, Xiao Zhang, Hyokun Yun, Choon Hui Teo, Trishul Chilimbi

In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups.

Retrieval

A Medical Semantic-Assisted Transformer for Radiographic Report Generation

no code implementations22 Aug 2022 Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, Luping Zhou

Automated radiographic report generation is a challenging cross-domain task that aims to automatically generate accurate and semantic-coherence reports to describe medical images.

Image Captioning Medical Report Generation

CLIP4Caption ++: Multi-CLIP for Video Caption

no code implementations11 Oct 2021 Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li

We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features.

Sentence

A Self-Boosting Framework for Automated Radiographic Report Generation

no code implementations CVPR 2021 Zhanyu Wang, Luping Zhou, Lei Wang, Xiu Li

On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports.

Image Captioning Image-text matching +3

Variance Reduction on General Adaptive Stochastic Mirror Descent

no code implementations26 Dec 2020 Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng

In this work, we investigate the idea of variance reduction by studying its properties with general adaptive mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems.

Online Regularization towards Always-Valid High-Dimensional Dynamic Pricing

no code implementations5 Jul 2020 Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng

In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.

valid Vocal Bursts Intensity Prediction

Directional Pruning of Deep Neural Networks

1 code implementation NeurIPS 2020 Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng

In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region.

The Sample Complexity of Meta Sparse Regression

no code implementations22 Feb 2020 Zhanyu Wang, Jean Honorio

A key difference between meta-learning and the classical multi-task learning, is that meta-learning focuses only on the recovery of the parameters of the novel task, while multi-task learning estimates the parameter of all tasks, which requires l to grow with T .

Few-Shot Learning Multi-Task Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.