Search Results for author: Shu Zhang

Found 46 papers, 19 papers with code

Large Language Models for Robotics: Opportunities, Challenges, and Perspectives

no code implementations9 Jan 2024 Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang, Peng Shu, Enze Shi, Huawen Hu, Chong Ma, Yiheng Liu, Xuhui Wang, Yincheng Yao, Xuan Liu, Huaqin Zhao, Zhengliang Liu, Haixing Dai, Lin Zhao, Bao Ge, Xiang Li, Tianming Liu, Shu Zhang

Notably, in the realm of robot task planning, LLMs harness their advanced reasoning and language comprehension capabilities to formulate precise and efficient action plans based on natural language instructions.

Robot Task Planning

DomainForensics: Exposing Face Forgery across Domains via Bi-directional Adaptation

no code implementations17 Dec 2023 Qingxuan Lv, Yuezun Li, Junyu Dong, Sheng Chen, Hui Yu, Huiyu Zhou, Shu Zhang

Specifically, our strategy considers both forward and backward adaptation, to transfer the forgery knowledge from the source domain to the target domain in forward adaptation and then reverse the adaptation from the target domain to the source domain in backward adaptation.

DeepFake Detection Face Swapping +2

HIC-YOLOv5: Improved YOLOv5 For Small Object Detection

1 code implementation28 Sep 2023 Shiyi Tang, Shu Zhang, Yini Fang

Small object detection has been a challenging problem in the field of object detection.

Object object-detection +2

Correlation-Aware Mutual Learning for Semi-supervised Medical Image Segmentation

1 code implementation12 Jul 2023 Shengbo Gao, Ziji Zhang, Jiechao Ma, Zihao Li, Shu Zhang

Our approach is based on a mutual learning strategy that incorporates two modules: the Cross-sample Mutual Attention Module (CMA) and the Omni-Correlation Consistency Module (OCC).

Image Segmentation Segmentation +2

Review of Large Vision Models and Visual Prompt Engineering

no code implementations3 Jul 2023 Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu, Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, Enze Shi, Yi Pan, Tuo Zhang, Dajiang Zhu, Xiang Li, Xi Jiang, Bao Ge, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang

This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering.

Prompt Engineering

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

1 code implementation1 Jun 2023 Hong-Yu Zhou, Yizhou Yu, Chengdi Wang, Shu Zhang, Yuanxu Gao, Jia Pan, Jun Shao, Guangming Lu, Kang Zhang, Weimin Li

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results.

Representation Learning

Attention Paper: How Generative AI Reshapes Digital Shadow Industry?

no code implementations26 May 2023 Qichao Wang, Huan Ma, WenTao Wei, Hangyu Li, Liang Chen, Peilin Zhao, Binwen Zhao, Bo Hu, Shu Zhang, Zibin Zheng, Bingzhe Wu

The rapid development of digital economy has led to the emergence of various black and shadow internet industries, which pose potential risks that can be identified and managed through digital risk management (DRM) that uses different techniques such as machine learning and deep learning.

Management

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding

1 code implementation14 May 2023 Le Xue, Ning Yu, Shu Zhang, Junnan Li, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, ran Xu, Juan Carlos Niebles, Silvio Savarese

Recent advancements in multimodal pre-training methods have shown promising efficacy in 3D representation learning by aligning multimodal features across 3D shapes, their 2D counterparts, and language descriptions.

Ranked #4 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)

3D Point Cloud Classification Representation Learning +1

Prompt Engineering for Healthcare: Methodologies and Applications

no code implementations28 Apr 2023 Jiaqi Wang, Enze Shi, Sigang Yu, Zihao Wu, Chong Ma, Haixing Dai, Qiushi Yang, Yanqing Kang, Jinru Wu, Huawen Hu, Chenxi Yue, Haiyang Zhang, Yiheng Liu, Yi Pan, Zhengliang Liu, Lichao Sun, Xiang Li, Bao Ge, Xi Jiang, Dajiang Zhu, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang

Prompt engineering is a critical technique in the field of natural language processing that involves designing and optimizing the prompts used to input information into models, aiming to enhance their performance on specific tasks.

Machine Translation Prompt Engineering +3

ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT

2 code implementations17 Apr 2023 Chong Ma, Zihao Wu, Jiaqi Wang, Shaochen Xu, Yaonai Wei, Zhengliang Liu, Xi Jiang, Lei Guo, Xiaoyan Cai, Shu Zhang, Tuo Zhang, Dajiang Zhu, Dinggang Shen, Tianming Liu, Xiang Li

The 'Impression' section of a radiology report is a critical basis for communication between radiologists and other physicians, and it is typically written by radiologists based on the 'Findings' section.

In-Context Learning

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation

1 code implementation ICCV 2023 Can Qin, Ning Yu, Chen Xing, Shu Zhang, Zeyuan Chen, Stefano Ermon, Yun Fu, Caiming Xiong, ran Xu

Empirical results show that GlueNet can be trained efficiently and enables various capabilities beyond previous state-of-the-art models: 1) multilingual language models such as XLM-Roberta can be aligned with existing T2I models, allowing for the generation of high-quality images from captions beyond English; 2) GlueNet can align multi-modal encoders such as AudioCLIP with the Stable Diffusion model, enabling sound-to-image generation; 3) it can also upgrade the current text encoder of the latent diffusion model for challenging case generation.

Image Generation

HIVE: Harnessing Human Feedback for Instructional Visual Editing

1 code implementation16 Mar 2023 Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, ran Xu

Incorporating human feedback has been shown to be crucial to align text generated by large language models to human preferences.

Text-based Image Editing

Beyond Single Items: Exploring User Preferences in Item Sets with the Conversational Playlist Curation Dataset

1 code implementation13 Mar 2023 Arun Tejasvi Chaganty, Megan Leszczynski, Shu Zhang, Ravi Ganti, Krisztian Balog, Filip Radlinski

Users in consumption domains, like music, are often able to more efficiently provide preferences over a set of items (e. g. a playlist or radio) than over single items (e. g. songs).

Music Recommendation Recommendation Systems +1

Vertical Federated Linear Contextual Bandits

no code implementations20 Oct 2022 Zeyu Cao, Zhipeng Liang, Shu Zhang, Hangyu Li, Ouyang Wen, Yu Rong, Peilin Zhao, Bingzhe Wu

In this paper, we investigate a novel problem of building contextual bandits in the vertical federated setting, i. e., contextual information is vertically distributed over different departments.

Multi-Armed Bandits

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

1 code implementation CVPR 2022 Shu Zhang, ran Xu, Caiming Xiong, Chetan Ramaiah

Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks.

Contrastive Learning Representation Learning

DeepSSN: a deep convolutional neural network to assess spatial scene similarity

1 code implementation7 Feb 2022 Danhuai Guo, Shiyin Ge, Shu Zhang, Song Gao, Ran Tao, Yangang Wang

Spatial-query-by-sketch is an intuitive tool to explore human spatial knowledge about geographic environments and to support communication with scene database queries.

Data Augmentation Information Retrieval +1

Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training

1 code implementation5 Jan 2022 Shu Zhang, Zihao Li, Hong-Yu Zhou, Jiechao Ma, Yizhou Yu

The difficulties in both data acquisition and annotation substantially restrict the sample sizes of training datasets for 3D medical imaging applications.

Contrastive Learning Medical Object Detection

A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation

1 code implementation21 Apr 2021 Jie Lian, Jingyu Liu, Shu Zhang, Kai Gao, Xiaoqing Liu, Dingwen Zhang, Yizhou Yu

Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN.

Instance Segmentation Object Detection +2

Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices

1 code implementation16 Dec 2020 Shu Zhang, Jincheng Xu, Yu-Chun Chen, Jiechao Ma, Zihao Li, Yizhou Wang, Yizhou Yu

We demonstrate that with the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset (3. 48% absolute improvement in the sensitivity of FPs@0. 5), significantly surpassing the baseline method by up to 6. 06% (in MAP@0. 5) which adopts 2D convolution for 3D context modeling.

Computed Tomography (CT) Lesion Detection +2

A method for sharing dynamic geometry information in studies on liquid-based detectors

no code implementations16 Dec 2020 Shu Zhang, Jing-Shu Li, Yang-Jie Su, Yu-Mei Zhang, Zi-Yuan Li, Zheng-Yun You

The liquid-based detectors are widely used in particle and nuclear physics experiments.

Instrumentation and Detectors

Deep Feature Mining via Attention-based BiLSTM-GCN for Human Motor Imagery Recognition

no code implementations2 May 2020 Yimin Hou, Shuyue Jia, Xiangmin Lun, Shu Zhang, Tao Chen, Fang Wang, Jinglei Lv

The introduced deep feature mining approach can precisely recognize human motion intents from raw EEG signals, which paves the road to translate the EEG based MI recognition to practical BCI systems.

EEG Motor Imagery

Deep Homography Estimation for Dynamic Scenes

1 code implementation CVPR 2020 Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

We then develop a multi-scale neural network and show that when properly trained using our new dataset, this neural network can already handle dynamic scenes to some extent.

Homography Estimation Multi-Task Learning

Multi-level Similarity Learning for Low-Shot Recognition

no code implementations13 Dec 2019 Hongwei Xv, Xin Sun, Junyu Dong, Shu Zhang, Qiong Li

Low-shot learning indicates the ability to recognize unseen objects based on very limited labeled training samples, which simulates human visual intelligence.

A Preliminary Study on Data Augmentation of Deep Learning for Image Classification

no code implementations9 Jun 2019 Benlin Hu, Cheng Lei, Dong Wang, Shu Zhang, Zhenyu Chen

Deep learning models have a large number of freeparameters that need to be calculated by effective trainingof the models on a great deal of training data to improvetheir generalization performance.

Data Augmentation General Classification +1

Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering

1 code implementation CVPR 2019 Chenyou Fan, Xiaofan Zhang, Shu Zhang, Wensheng Wang, Chi Zhang, Heng Huang

In this paper, we propose a novel end-to-end trainable Video Question Answering (VideoQA) framework with three major components: 1) a new heterogeneous memory which can effectively learn global context information from appearance and motion features; 2) a redesigned question memory which helps understand the complex semantics of question and highlights queried subjects; and 3) a new multimodal fusion layer which performs multi-step reasoning by attending to relevant visual and textual hints with self-updated attention.

Question Answering Video Question Answering +1

Dense 3D Facial Reconstruction from a Single Depth Image in Unconstrained Environment

no code implementations24 Apr 2017 Shu Zhang, Hui Yu, Ting Wang, Junyu Dong, Honghai Liu

With the increasing demands of applications in virtual reality such as 3D films, virtual Human-Machine Interactions and virtual agents, the analysis of 3D human face analysis is considered to be more and more important as a fundamental step for those virtual reality tasks.

Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis

3 code implementations ICCV 2017 Rui Huang, Shu Zhang, Tianyu Li, Ran He

This paper proposes a Two-Pathway Generative Adversarial Network (TP-GAN) for photorealistic frontal view synthesis by simultaneously perceiving global structures and local details.

Face Recognition Generative Adversarial Network

DeMeshNet: Blind Face Inpainting for Deep MeshFace Verification

no code implementations16 Nov 2016 Shu Zhang, Ran He, Tieniu Tan

The occlusions incurred by random meshes severely degenerate the performance of face verification systems, which raises the MeshFace verification problem between MeshFace and daily photos.

Face Alignment Face Verification +1

Adaptive Algorithm and Platform Selection for Visual Detection and Tracking

no code implementations21 May 2016 Shu Zhang, Qi Zhu, Amit Roy-Chowdhury

In this paper, we focus on this problem and propose a framework to adaptively select the "best" algorithm-parameter combination and the computation platform under performance and cost constraints at design time, and adapt the algorithms at runtime based on real-time inputs.

Pedestrian Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.