Search Results for author: Yunxin Li

Found 18 papers, 5 papers with code

LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhang

Long video understanding is a significant and ongoing challenge in the intersection of multimedia and artificial intelligence.

Video Understanding

Paper
Add Code

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

no code implementations • 21 Feb 2024 • Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang

Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.

Language Modelling Question Answering +1

Paper
Add Code

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

1 code implementation • 21 Feb 2024 • Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang

For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.

In-Context Learning Language Modelling +2

Paper
Code

Frame Structure and Protocol Design for Sensing-Assisted NR-V2X Communications

no code implementations • 27 Dec 2023 • Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Qingjiang Shi, Christos Masouros

In this study, we propose novel frame structures that incorporate ISAC signals for three crucial stages in the NR-V2X system: initial access, connected mode, and beam failure and recovery.

Paper
Add Code

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

no code implementations • 27 Nov 2023 • Yunxin Li, Baotian Hu, Wei Wang, Xiaochun Cao, Min Zhang

These models predominantly map visual information into language representation space, leveraging the vast knowledge and powerful text generation abilities of LLMs to produce multimodal instruction-following responses.

Instruction Following multimodal generation +1

Paper
Add Code

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

no code implementations • 13 Nov 2023 • Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang

The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA).

Decision Making General Knowledge +3

Paper
Add Code

Sensing as a Service in 6G Perceptive Mobile Networks: Architecture, Advances, and the Road Ahead

no code implementations • 16 Aug 2023 • Fuwang Dong, Fan Liu, Yuanhao Cui, Shihang Lu, Yunxin Li

Sensing-as-a-service is anticipated to be the core feature of 6G perceptive mobile networks (PMN), where high-precision real-time sensing will become an inherent capability rather than being an auxiliary function as before.

Management

Paper
Add Code

Training Multimedia Event Extraction With Generated Images and Captions

no code implementations • 15 Jun 2023 • Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li

Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction.

Event Extraction Structured Prediction

Paper
Add Code

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

1 code implementation • 8 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang

This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues.

Language Modelling

Paper
Code

LMEye: An Interactive Perception Network for Large Language Models

1 code implementation • 5 May 2023 • Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang

LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.

Language Modelling Large Language Model +1

Paper
Code

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

1 code implementation • 3 May 2023 • Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang

Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR.

Image Retrieval Logical Reasoning +1

Paper
Code

Towards ISAC-Empowered Vehicular Networks: Framework, Advances, and Opportunities

no code implementations • 1 May 2023 • Zhen Du, Fan Liu, Yunxin Li, Weijie Yuan, Yuanhao Cui, Zenghui Zhang, Christos Masouros, Bo Ai

Connected and autonomous vehicle (CAV) networks face several challenges, such as low throughput, high latency, and poor localization accuracy.

Paper
Add Code

ISAC-Enabled V2I Networks Based on 5G NR: How Much Can the Overhead Be Reduced?

no code implementations • 30 Jan 2023 • Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Christos Masouros

The emergence of the fifth-generation (5G) New Radio (NR) brings additional possibilities to vehicle-to-everything (V2X) network with improved quality of services.

Management

Paper
Add Code

Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations

1 code implementation • 23 Jul 2022 • Qian Yang, Yunxin Li, Baotian Hu, Lin Ma, Yuxing Ding, Min Zhang

CSI), a relation inferrer, and a Lexical Constraint-aware Generator (arr.

Decision Making Explanation Generation +5

Paper
Code

MSDF: A General Open-Domain Multi-Skill Dialog Framework

no code implementations • 17 Jun 2022 • Yu Zhao, Xinshuo Hu, Yunxin Li, Baotian Hu, Dongfang Li, Sichao Chen, Xiaolong Wang

In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e. g. knowledge grounded dialog and persona based dialog).

Paper
Add Code

Medical Dialogue Response Generation with Pivotal Information Recalling

no code implementations • 17 Jun 2022 • Yu Zhao, Yunxin Li, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang

To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i. e., knowledge-aware dialogue graph encoder and recall-enhanced generator.

Dialogue Generation Graph Attention +1

Paper
Add Code

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations • 4 Jul 2021 • Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Sentence Word Embeddings

Paper
Add Code

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations • 1 Jul 2021 • Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.