no code implementations • 26 Nov 2024 • Yunyi Liu, Yingshu Li, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou
Leveraging GPT-4, we designed an easy-to-use data generation pipeline, enabling us to produce extensive training data based on two distinct scoring systems, each containing reports of varying quality along with corresponding scores.
no code implementations • 9 Sep 2024 • Yingshu Li, Zhanyu Wang, Yunyi Liu, Lei Wang, Lingqiao Liu, Luping Zhou
Harnessing the robust capabilities of Large Language Models (LLMs) for narrative generation, logical reasoning, and common-sense knowledge integration, this study delves into utilizing LLMs to enhance automated radiology report generation (R2Gen).
no code implementations • 27 Apr 2024 • Yunyi Liu, Zhanyu Wang, Yingshu Li, Xinyu Liang, Lingqiao Liu, Lei Wang, Luping Zhou
This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs).
no code implementations • 4 Dec 2023 • Ling Yang, Zhanyu Wang, Zhenghao Chen, Xinyu Liang, Luping Zhou
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks, yet their application in medical imaging is nascent, lacking tailored models.
no code implementations • 4 Dec 2023 • Bingshuai Liu, Chenyang Lyu, Zijun Min, Zhanyu Wang, Jinsong Su, Longyue Wang
The advancement of Large Language Models (LLMs) has brought substantial attention to the Chain of Thought (CoT) approach, primarily due to its ability to enhance the capability of LLMs on complex reasoning tasks.
no code implementations • 25 Nov 2023 • Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu, Chenyang Lyu, Huayang Li, Deng Cai, Luping Zhou, Shuming Shi, Zhaopeng Tu
While the recent advances in Multimodal Large Language Models (MLLMs) constitute a significant leap forward in the field, these models are predominantly confined to the realm of input-side multimodal comprehension, lacking the capacity for multimodal content generation.
no code implementations • 31 Oct 2023 • Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lei Wang, Lingqiao Liu, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou
This work conducts an evaluation of GPT-4V's multimodal capability for medical image analysis, with a focus on three representative tasks of radiology report generation, medical visual question answering, and medical visual grounding.
1 code implementation • 18 Sep 2023 • Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou
First, it attains state-of-the-art (SOTA) performance by training only the lightweight visual alignment module while freezing all the parameters of LLM.
no code implementations • CVPR 2023 • Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou
In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation.
no code implementations • 4 Apr 2023 • Yunyi Liu, Zhanyu Wang, Dong Xu, Luping Zhou
To bridge this gap, in this paper, we propose a new Transformer based framework for medical VQA (named as Q2ATransformer), which integrates the advantages of both the classification and the generation approaches and provides a unified treatment for the close-end and open-end questions.
1 code implementation • 12 Oct 2022 • Zhanyu Wang, Guang Cheng, Jordan Awan
For the composition of the DP bootstrap, we present a numerical method to compute the exact privacy cost of releasing multiple DP bootstrap estimates, and using the Gaussian-DP (GDP) framework (Dong et al., 2022), we show that the release of $B$ DP bootstrap estimates from mechanisms satisfying $(\mu/\sqrt{(2-2/\mathrm{e})B})$-GDP asymptotically satisfies $\mu$-GDP as $B$ goes to infinity.
1 code implementation • COLING 2022 • Zhanyu Wang, Xiao Zhang, Hyokun Yun, Choon Hui Teo, Trishul Chilimbi
In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups.
no code implementations • 22 Aug 2022 • Zhanyu Wang, Mingkang Tang, Lei Wang, Xiu Li, Luping Zhou
Automated radiographic report generation is a challenging cross-domain task that aims to automatically generate accurate and semantic-coherence reports to describe medical images.
no code implementations • 13 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li
It is noted that our model is only trained on the MSR-VTT dataset.
no code implementations • 11 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li
We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features.
no code implementations • CVPR 2021 • Zhanyu Wang, Luping Zhou, Lei Wang, Xiu Li
On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports.
no code implementations • 26 Dec 2020 • Wenjie Li, Zhanyu Wang, Yichen Zhang, Guang Cheng
In this work, we investigate the idea of variance reduction by studying its properties with general adaptive mirror descent algorithms in nonsmooth nonconvex finite-sum optimization problems.
no code implementations • 5 Jul 2020 • Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng
In this paper, we propose a novel approach for designing dynamic pricing policy based regularized online statistical learning with theoretical guarantees.
1 code implementation • NeurIPS 2020 • Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng
In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region.
no code implementations • 22 Feb 2020 • Zhanyu Wang, Jean Honorio
A key difference between meta-learning and the classical multi-task learning, is that meta-learning focuses only on the recovery of the parameters of the novel task, while multi-task learning estimates the parameter of all tasks, which requires l to grow with T .