Search Results for author: Shutao Li

Found 26 papers, 9 papers with code

VPAI_Lab at MedVidQA 2022: A Two-Stage Cross-modal Fusion Method for Medical Instructional Video Classification

1 code implementation BioNLP (ACL) 2022 Bin Li, Yixuan Weng, Fei Xia, Bin Sun, Shutao Li

Given an input video, the MedVidCL task aims to correctly classify it into one of three following categories: Medical Instructional, Medical Non-instructional, and Non-medical.

Video Classification

Continuing Pre-trained Model with Multiple Training Strategies for Emotional Classification

no code implementations WASSA (ACL) 2022 Bin Li, Yixuan Weng, Qiya Song, Bin Sun, Shutao Li

This paper describes the contribution of the LingJing team’s method to the Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA) 2022 shared task on Emotion Classification.

Attribute Classification +4

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

no code implementations4 Feb 2024 Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma

Therefore, we propose GeReA, a generate-reason framework that prompts a MLLM like InstructBLIP with question relevant vision and language information to generate knowledge-relevant descriptions and reasons those descriptions for knowledge-based VQA.

Language Modelling Large Language Model +3

Hyperspectral Image Fusion via Logarithmic Low-rank Tensor Ring Decomposition

no code implementations16 Oct 2023 Jun Zhang, Lipeng Zhu, Chao Wang, Shutao Li

On the other hand, the tensor nuclear norm (TNN)-based approaches have recently demonstrated to be more efficient on keeping high-dimensional low-rank structures in tensor recovery.

valid

VPUFormer: Visual Prompt Unified Transformer for Interactive Image Segmentation

1 code implementation11 Jun 2023 Xu Zhang, Kailun Yang, Jiacheng Lin, Jin Yuan, Zhiyong Li, Shutao Li

Specifically, we design a Prompt-unified Encoder (PuE) by using Gaussian mapping to generate a unified one-dimensional vector for click, box, and scribble prompts, which well captures users' intentions as well as provides a denser representation of user prompts.

Image Segmentation Segmentation +1

LOGO-Former: Local-Global Spatio-Temporal Transformer for Dynamic Facial Expression Recognition

no code implementations5 May 2023 Fuyan Ma, Bin Sun, Shutao Li

Previous methods for dynamic facial expression recognition (DFER) in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos.

Dynamic Facial Expression Recognition Facial Expression Recognition

Learning to Locate Visual Answer in Video Corpus Using Question

1 code implementation11 Oct 2022 Bin Li, Yixuan Weng, Bin Sun, Shutao Li

We introduce a new task, named video corpus visual answer localization (VCVAL), which aims to locate the visual answer in a large collection of untrimmed instructional videos using a natural language question.

Contrastive Learning Language Modelling +2

Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation

no code implementations5 Jul 2022 Bin Li, Yixuan Weng, Ziyu Ma, Bin Sun, Shutao Li

To fully leverage the visual information for both scene understanding and dialogue generation, we propose the scene-aware prompt for the MDUG task.

Dialogue Generation Dialogue Understanding +2

Spatio-Temporal Transformer for Dynamic Facial Expression Recognition in the Wild

no code implementations10 May 2022 Fuyan Ma, Bin Sun, Shutao Li

Previous methods for dynamic facial expression in the wild are mainly based on Convolutional Neural Networks (CNNs), whose local operations ignore the long-range dependencies in videos.

Dynamic Facial Expression Recognition Facial Expression Recognition +1

LingYi: Medical Conversational Question Answering System based on Multi-modal Knowledge Graphs

1 code implementation20 Apr 2022 Fei Xia, Bin Li, Yixuan Weng, Shizhu He, Kang Liu, Bin Sun, Shutao Li, Jun Zhao

The medical conversational system can relieve the burden of doctors and improve the efficiency of healthcare, especially during the pandemic.

Conversational Question Answering Dialogue Generation +3

Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video

no code implementations13 Mar 2022 Bin Li, Yixuan Weng, Bin Sun, Shutao Li

However, due to the weak correlations and huge gaps of the semantic features between the textual question and visual answer, existing methods adopting visual span predictor perform poorly in the TAGV task.

Language Modelling Question Answering +2

PSG: Prompt-based Sequence Generation for Acronym Extraction

no code implementations29 Nov 2021 Bin Li, Fei Xia, Yixuan Weng, Xiusheng Huang, Bin Sun, Shutao Li

In this paper, we propose a Prompt-based Sequence Generation (PSG) method for the acronym extraction task.

document understanding Language Modelling +1

Hybrid Mutimodal Fusion for Dimensional Emotion Recognition

no code implementations16 Oct 2021 Ziyu Ma, Fuyan Ma, Bin Sun, Shutao Li

For the MuSe-Stress sub-challenge, we highlight our solutions in three aspects: 1) the audio-visual features and the bio-signal features are used for emotional state recognition.

Emotion Recognition

More but Correct: Generating Diversified and Entity-revised Medical Response

no code implementations3 Aug 2021 Bin Li, Encheng Chen, Hongru Liu, Yixuan Weng, Bin Sun, Shutao Li, Yongping Bai, Meiling Hu

Medical Dialogue Generation (MDG) is intended to build a medical dialogue system for intelligent consultation, which can communicate with patients in real-time, thereby improving the efficiency of clinical diagnosis with broad application prospects.

Dialogue Generation

Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion

no code implementations31 Mar 2021 Fuyan Ma, Bin Sun, Shutao Li

Facial Expression Recognition (FER) in the wild is extremely challenging due to occlusions, variant head poses, face deformation and motion blur under unconstrained conditions.

Facial Expression Recognition Facial Expression Recognition (FER)

Fusion of Dual Spatial Information for Hyperspectral Image Classification

1 code implementation23 Oct 2020 Puhong Duan, Pedram Ghamisi, Xudong Kang, Behnood Rasti, Shutao Li, Richard Gloaguen

In the spatial optimization stage, a pixel-level classifier is used to obtain the class probability followed by an extended random walker-based spatial optimization technique.

Classification General Classification +1

Recent Advances and New Guidelines on Hyperspectral and Multispectral Image Fusion

no code implementations8 Aug 2020 Renwei Dian, Shutao Li, Bin Sun, Anjing Guo

Hyperspectral image (HSI) with high spectral resolution often suffers from low spatial resolution owing to the limitations of imaging sensors.

Naive Gabor Networks for Hyperspectral Image Classification

no code implementations9 Dec 2019 Chenying Liu, Jun Li, Lin He, Antonio J. Plaza, Shutao Li, Bo Li

Specifically, we develop an innovative phase-induced Gabor kernel, which is trickily designed to perform the Gabor feature learning via a linear combination of local low-frequency and high-frequency components of data controlled by the kernel phase.

Classification General Classification +1

Deep Learning for Hyperspectral Image Classification: An Overview

no code implementations26 Oct 2019 Shutao Li, Weiwei Song, Leyuan Fang, Yushi Chen, Pedram Ghamisi, Jón Atli Benediktsson

Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems.

BIG-bench Machine Learning Classification +2

Deep Hashing Learning for Visual and Semantic Retrieval of Remote Sensing Images

no code implementations10 Sep 2019 Weiwei Song, Shutao Li, Jon Atli Benediktsson

Although retrieval methods have achieved great success, there is still a question that needs to be responded to: Can we obtain the accurate semantic labels of the returned similar images to further help analyzing and processing imagery?

Deep Hashing Image Retrieval +1

Cannot find the paper you are looking for? You can Submit a new open access paper.