Search Results for author: Shu Zhang

Found 46 papers, 19 papers with code

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives

no code implementations • 9 Jan 2024 • Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang

Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.

Robot Task Planning

Paper
Add Code

Understanding LLMs: A Comprehensive Overview from Training to Inference

no code implementations • 4 Jan 2024 • Yiheng Liu, Hao He, Tianle Han, Xu Zhang, Mengyuan Liu, Jiaming Tian, Yutong Zhang, Jiaqi Wang, Xiaohui Gao, Tianyang Zhong, Yi Pan, Shaochen Xu, Zihao Wu, Zhengliang Liu, Xin Zhang, Shu Zhang, Xintao Hu, Tuo Zhang, Ning Qiang, Tianming Liu, Bao Ge

Low-cost training and deployment of LLMs represent the future development trend.

Language Modelling Large Language Model +2

Paper
Add Code

DomainForensics: Exposing Face Forgery across Domains via Bi-directional Adaptation

no code implementations • 17 Dec 2023 • Qingxuan Lv, Yuezun Li, Junyu Dong, Sheng Chen, Hui Yu, Huiyu Zhou, Shu Zhang

Specifically, our strategy considers both forward and backward adaptation, to transfer the forgery knowledge from the source domain to the target domain in forward adaptation and then reverse the adaptation from the target domain to the source domain in backward adaptation.

DeepFake Detection Face Swapping +2

Paper
Add Code

Holistic Evaluation of GPT-4V for Biomedical Imaging

no code implementations • 10 Nov 2023 • Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang, Xinyu Wang, Xu Zhang, Lin Zhao, Yiheng Liu, Kai Zhang, Liheng Yan, Lichao Sun, Jun Liu, Ning Qiang, Bao Ge, Xiaoyan Cai, Shijie Zhao, Xintao Hu, Yixuan Yuan, Gang Li, Shu Zhang, Xin Zhang, Xi Jiang, Tuo Zhang, Dinggang Shen, Quanzheng Li, Wei Liu, Xiang Li, Dajiang Zhu, Tianming Liu

GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain.

Anatomy Image Captioning +1

Paper
Add Code

ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data

no code implementations • 8 Oct 2023 • Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang

Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels.

Decision Making Language Modelling +1

Paper
Add Code

HIC-YOLOv5: Improved YOLOv5 For Small Object Detection

1 code implementation • 28 Sep 2023 • Shiyi Tang, Shu Zhang, Yini Fang

Small object detection has been a challenging problem in the field of object detection.

Object object-detection +2

Paper
Code

Evaluating Large Language Models for Radiology Natural Language Processing

1 code implementation • 25 Jul 2023 • Zhengliang Liu, Tianyang Zhong, Yiwei Li, Yutong Zhang, Yi Pan, Zihao Zhao, Peixin Dong, Chao Cao, Yuxiao Liu, Peng Shu, Yaonai Wei, Zihao Wu, Chong Ma, Jiaqi Wang, Sheng Wang, Mengyue Zhou, Zuowei Jiang, Chunlin Li, Jason Holmes, Shaochen Xu, Lu Zhang, Haixing Dai, Kai Zhang, Lin Zhao, Yuanhao Chen, Xu Liu, Peilong Wang, Pingkun Yan, Jun Liu, Bao Ge, Lichao Sun, Dajiang Zhu, Xiang Li, Wei Liu, Xiaoyan Cai, Xintao Hu, Xi Jiang, Shu Zhang, Xin Zhang, Tuo Zhang, Shijie Zhao, Quanzheng Li, Hongtu Zhu, Dinggang Shen, Tianming Liu

The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP).

795

Paper
Code

Correlation-Aware Mutual Learning for Semi-supervised Medical Image Segmentation

1 code implementation • 12 Jul 2023 • Shengbo Gao, Ziji Zhang, Jiechao Ma, Zihao Li, Shu Zhang

Our approach is based on a mutual learning strategy that incorporates two modules: the Cross-sample Mutual Attention Module (CMA) and the Omni-Correlation Consistency Module (OCC).

Image Segmentation Segmentation +2

Paper
Code

Review of Large Vision Models and Visual Prompt Engineering

no code implementations • 3 Jul 2023 • Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu, Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, Enze Shi, Yi Pan, Tuo Zhang, Dajiang Zhu, Xiang Li, Xi Jiang, Bao Ge, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang

This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering.

Prompt Engineering

Paper
Add Code

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

1 code implementation • 1 Jun 2023 • Hong-Yu Zhou, Yizhou Yu, Chengdi Wang, Shu Zhang, Yuanxu Gao, Jia Pan, Jun Shao, Guangming Lu, Kang Zhang, Weimin Li

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results.

Representation Learning

347

Paper
Code

Attention Paper: How Generative AI Reshapes Digital Shadow Industry?

no code implementations • 26 May 2023 • Qichao Wang, Huan Ma, WenTao Wei, Hangyu Li, Liang Chen, Peilin Zhao, Binwen Zhao, Bo Hu, Shu Zhang, Zibin Zheng, Bingzhe Wu

The rapid development of digital economy has led to the emergence of various black and shadow internet industries, which pose potential risks that can be identified and managed through digital risk management (DRM) that uses different techniques such as machine learning and deep learning.

Management

Paper
Add Code

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild

1 code implementation • NeurIPS 2023 • Can Qin, Shu Zhang, Ning Yu, Yihao Feng, Xinyi Yang, Yingbo Zhou, Huan Wang, Juan Carlos Niebles, Caiming Xiong, Silvio Savarese, Stefano Ermon, Yun Fu, ran Xu

Visual generative foundation models such as Stable Diffusion show promise in navigating these goals, especially when prompted with arbitrary languages.

Image Generation

577

Paper
Code

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation • 14 May 2023 • Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #4 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

351

Paper
Code

Prompt Engineering for Healthcare: Methodologies and Applications

no code implementations • 28 Apr 2023 • Jiaqi Wang, Enze Shi, Sigang Yu, Zihao Wu, Chong Ma, Haixing Dai, Qiushi Yang, Yanqing Kang, Jinru Wu, Huawen Hu, Chenxi Yue, Haiyang Zhang, Yiheng Liu, Yi Pan, Zhengliang Liu, Lichao Sun, Xiang Li, Bao Ge, Xi Jiang, Dajiang Zhu, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang

Prompt engineering is a critical technique in the field of natural language processing that involves designing and optimizing the prompts used to input information into models, aiming to enhance their performance on specific tasks.

Machine Translation Prompt Engineering +3

Paper
Add Code

ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

2 code implementations • 17 Apr 2023 • Chong Ma, Zihao Wu, Jiaqi Wang, Shaochen Xu, Yaonai Wei, Zhengliang Liu, Xi Jiang, Lei Guo, Xiaoyan Cai, Shu Zhang, Tuo Zhang, Dajiang Zhu, Dinggang Shen, Tianming Liu, Xiang Li

The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section.

In-Context Learning

Paper
Code

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

1 code implementation • ICCV 2023 • Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, ran Xu

Empirical results show that GlueNet can be trained efficiently and enables various capabilities beyond previous state-of-the-art models: 1) multilingual language models such as XLM-Roberta can be aligned with existing T2I models, allowing for the generation of high-quality images from captions beyond English; 2) GlueNet can align multi-modal encoders such as AudioCLIP with the Stable Diffusion model, enabling sound-to-image generation; 3) it can also upgrade the current text encoder of the latent diffusion model for challenging case generation.

Image Generation

Paper
Code

HIVE: Harnessing Human Feedback for Instructional Visual Editing

1 code implementation • 16 Mar 2023 • Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences.

Text-based Image Editing

Paper
Code

Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset

1 code implementation • 13 Mar 2023 • Arun Tejasvi Chaganty, Megan Leszczynski, Shu Zhang, Ravi Ganti, Krisztian Balog, Filip Radlinski

Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e. g. a playlist or radio) than over single items (e. g. songs).

Music Recommendation Recommendation Systems +1

Paper
Code

Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation

no code implementations • 27 Jan 2023 • Megan Leszczynski, Shu Zhang, Ravi Ganti, Krisztian Balog, Filip Radlinski, Fernando Pereira, Arun Tejasvi Chaganty

This has motivated conversational recommender systems (CRSs), with control provided through natural language feedback.

Language Modelling Music Recommendation +3

Paper
Add Code

Vertical Federated Linear Contextual Bandits

no code implementations • 20 Oct 2022 • Zeyu Cao, Zhipeng Liang, Shu Zhang, Hangyu Li, Ouyang Wen, Yu Rong, Peilin Zhao, Bingzhe Wu

In this paper, we investigate a novel problem of building contextual bandits in the vertical federated setting, i. e., contextual information is vertically distributed over different departments.

Multi-Armed Bandits

Paper
Add Code

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

1 code implementation • CVPR 2022 • Shu Zhang, ran Xu, Caiming Xiong, Chetan Ramaiah

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks.

Contrastive Learning Representation Learning

141

Paper
Code

AutoMO-Mixer: An automated multi-objective Mixer model for balanced, safe and robust prediction in medicine

no code implementations • 4 Mar 2022 • Xi Chen, Jiahuan Lv, Dehua Feng, Xuanqin Mou, Ling Bai, Shu Zhang, Zhiguo Zhou

Accurately identifying patient's status through medical images plays an important role in diagnosis and treatment.

Specificity

Paper
Add Code

DeepSSN: a deep convolutional neural network to assess spatial scene similarity

1 code implementation • 7 Feb 2022 • Danhuai Guo, Shiyin Ge, Shu Zhang, Song Gao, Ran Tao, Yangang Wang

Spatial-query-by-sketch is an intuitive tool to explore human spatial knowledge about geographic environments and to support communication with scene database queries.

Data Augmentation Information Retrieval +1

Paper
Code

Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training

1 code implementation • 5 Jan 2022 • Shu Zhang, Zihao Li, Hong-Yu Zhou, Jiechao Ma, Yizhou Yu

The difficulties in both data acquisition and annotation substantially restrict the sample sizes of training datasets for 3D medical imaging applications.

Ranked #1 on Medical Object Detection on DeepLesion

Contrastive Learning Medical Object Detection

Paper
Code

A Multi-task Deep Feature Selection Method for Brain Imaging Genetics

no code implementations • 1 Jul 2021 • Chenglin Yu, Dingnan Cui, Muheng Shang, Shu Zhang, Lei Guo, Junwei Han, Lei Du, Alzheimer's Disease Neuroimaging Initiative

Though deep learning models can extract the nonlinear relationship, they could not select relevant genetic factors.

feature selection

Paper
Add Code

SSMD: Semi-Supervised Medical Image Detection with Adaptive Consistency and Heterogeneous Perturbation

no code implementations • 3 Jun 2021 • Hong-Yu Zhou, Chengdi Wang, Haofeng Li, Gang Wang, Shu Zhang, Weimin Li, Yizhou Yu

Semi-Supervised classification and segmentation methods have been widely investigated in medical image analysis.

medical image detection object-detection +2

Paper
Add Code

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

1 code implementation • 21 Apr 2021 • Jie Lian, Jingyu Liu, Shu Zhang, Kai Gao, Xiaoqing Liu, Dingwen Zhang, Yizhou Yu

Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN.

Instance Segmentation Object Detection +2

Paper
Code

Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices

1 code implementation • 16 Dec 2020 • Shu Zhang, Jincheng Xu, Yu-Chun Chen, Jiechao Ma, Zihao Li, Yizhou Wang, Yizhou Yu

We demonstrate that with the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset (3. 48% absolute improvement in the sensitivity of FPs@0. 5), significantly surpassing the baseline method by up to 6. 06% (in MAP@0. 5) which adopts 2D convolution for 3D context modeling.

Ranked #4 on Medical Object Detection on DeepLesion

Computed Tomography (CT) Lesion Detection +2

Paper
Code

A method for sharing dynamic geometry information in studies on liquid-based detectors

no code implementations • 16 Dec 2020 • Shu Zhang, Jing-Shu Li, Yang-Jie Su, Yu-Mei Zhang, Zi-Yuan Li, Zheng-Yun You

The liquid-based detectors are widely used in particle and nuclear physics experiments.

Instrumentation and Detectors

Paper
Add Code

Deep Feature Mining via Attention-based BiLSTM-GCN for Human Motor Imagery Recognition

no code implementations • 2 May 2020 • Yimin Hou, Shuyue Jia, Xiangmin Lun, Shu Zhang, Tao Chen, Fang Wang, Jinglei Lv

The introduced deep feature mining approach can precisely recognize human motion intents from raw EEG signals, which paves the road to translate the EEG based MI recognition to practical BCI systems.

EEG Motor Imagery

Paper
Add Code

Deep Homography Estimation for Dynamic Scenes

1 code implementation • CVPR 2020 • Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

We then develop a multi-scale neural network and show that when properly trained using our new dataset, this neural network can already handle dynamic scenes to some extent.

Homography Estimation Multi-Task Learning

Paper
Code

Multi-level Similarity Learning for Low-Shot Recognition

no code implementations • 13 Dec 2019 • Hongwei Xv, Xin Sun, Junyu Dong, Shu Zhang, Qiong Li

Low-shot learning indicates the ability to recognize unseen objects based on very limited labeled training samples, which simulates human visual intelligence.

Paper
Add Code

MVP-Net: Multi-view FPN with Position-aware Attention for Deep Universal Lesion Detection

1 code implementation • 10 Sep 2019 • Zihao Li, Shu Zhang, Junge Zhang, Kaiqi Huang, Yizhou Wang, Yizhou Yu

In this paper, we propose to incorporate domain knowledge in clinical practice into the model design of universal lesion detectors.

Ranked #8 on Medical Object Detection on DeepLesion

Computed Tomography (CT) Lesion Detection +2

Paper
Code

A Preliminary Study on Data Augmentation of Deep Learning for Image Classification

no code implementations • 9 Jun 2019 • Benlin Hu, Cheng Lei, Dong Wang, Shu Zhang, Zhenyu Chen

Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance.

Data Augmentation General Classification +1

Paper
Add Code

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

1 code implementation • CVPR 2019 • Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang

In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention.

Ranked #27 on Visual Question Answering (VQA) on MSRVTT-QA

Question Answering Video Question Answering +1

Paper
Code

Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment

no code implementations • 24 Apr 2017 • Shu Zhang, Hui Yu, Ting Wang, Junyu Dong, Honghai Liu

With the increasing demands of applications in virtual reality such as 3D films, virtual Human-Machine Interactions and virtual agents, the analysis of 3D human face analysis is considered to be more and more important as a fundamental step for those virtual reality tasks.

Paper
Add Code

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis

3 code implementations • ICCV 2017 • Rui Huang, Shu Zhang, Tianyu Li, Ran He

This paper proposes a Two-Pathway Generative Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by simultaneously perceiving global structures and local details.

Face Recognition Generative Adversarial Network

Paper
Code

A Distribution-based Model to Learn Bilingual Word Embeddings

no code implementations • COLING 2016 • Hailong Cao, Tiejun Zhao, Shu Zhang, Yao Meng

We introduce a distribution based model to learn bilingual word embeddings from monolingual data.

Machine Translation Word Embeddings

Paper
Add Code

DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification

no code implementations • 16 Nov 2016 • Shu Zhang, Ran He, Tieniu Tan

The occlusions incurred by random meshes severely degenerate the performance of face verification systems, which raises the MeshFace verification problem between MeshFace and daily photos.

Face Alignment Face Verification +1

Paper
Add Code

Adaptive Algorithm and Platform Selection for Visual Detection and Tracking

no code implementations • 21 May 2016 • Shu Zhang, Qi Zhu, Amit Roy-Chowdhury

In this paper, we focus on this problem and propose a framework to adaptively select the "best" algorithm-parameter combination and the computation platform under performance and cost constraints at design time, and adapt the algorithms at runtime based on real-time inputs.

Pedestrian Detection