Search Results for author: Yunxin Li

Found 18 papers, 5 papers with code

LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

no code implementations21 Feb 2024 Yunxin Li, Xinyu Chen, Baotain Hu, Min Zhang

Long video understanding is a significant and ongoing challenge in the intersection of multimedia and artificial intelligence.

Video Understanding

Cognitive Visual-Language Mapper: Advancing Multimodal Comprehension with Enhanced Visual Knowledge Alignment

no code implementations21 Feb 2024 Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang

Evaluating and Rethinking the current landscape of Large Multimodal Models (LMMs), we observe that widely-used visual-language projection approaches (e. g., Q-former or MLP) focus on the alignment of image-text descriptions yet ignore the visual knowledge-dimension alignment, i. e., connecting visuals to their relevant knowledge.

Language Modelling Question Answering +1

A Multimodal In-Context Tuning Approach for E-Commerce Product Description Generation

1 code implementation21 Feb 2024 Yunxin Li, Baotian Hu, Wenhan Luo, Lin Ma, Yuxin Ding, Min Zhang

For this setting, previous methods utilize visual and textual encoders to encode the image and keywords and employ a language model-based decoder to generate the product description.

In-Context Learning Language Modelling +2

Frame Structure and Protocol Design for Sensing-Assisted NR-V2X Communications

no code implementations27 Dec 2023 Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Qingjiang Shi, Christos Masouros

In this study, we propose novel frame structures that incorporate ISAC signals for three crucial stages in the NR-V2X system: initial access, connected mode, and beam failure and recovery.

Towards Vision Enhancing LLMs: Empowering Multimodal Knowledge Storage and Sharing in LLMs

no code implementations27 Nov 2023 Yunxin Li, Baotian Hu, Wei Wang, Xiaochun Cao, Min Zhang

These models predominantly map visual information into language representation space, leveraging the vast knowledge and powerful text generation abilities of LLMs to produce multimodal instruction-following responses.

Instruction Following multimodal generation +1

A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering

no code implementations13 Nov 2023 Yunxin Li, Longyue Wang, Baotian Hu, Xinyu Chen, Wanqi Zhong, Chenyang Lyu, Wei Wang, Min Zhang

The emergence of multimodal large models (MLMs) has significantly advanced the field of visual understanding, offering remarkable capabilities in the realm of visual question answering (VQA).

Decision Making General Knowledge +3

Sensing as a Service in 6G Perceptive Mobile Networks: Architecture, Advances, and the Road Ahead

no code implementations16 Aug 2023 Fuwang Dong, Fan Liu, Yuanhao Cui, Shihang Lu, Yunxin Li

Sensing-as-a-service is anticipated to be the core feature of 6G perceptive mobile networks (PMN), where high-precision real-time sensing will become an inherent capability rather than being an auxiliary function as before.

Management

Training Multimedia Event Extraction With Generated Images and Captions

no code implementations15 Jun 2023 Zilin Du, Yunxin Li, Xu Guo, Yidan Sun, Boyang Li

Contemporary news reporting increasingly features multimedia content, motivating research on multimedia event extraction.

Event Extraction Structured Prediction

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

1 code implementation8 May 2023 Yunxin Li, Baotian Hu, Xinyu Chen, Yuxin Ding, Lin Ma, Min Zhang

This makes the language model well-suitable for such multi-modal reasoning scenario on joint textual and visual clues.

Language Modelling

LMEye: An Interactive Perception Network for Large Language Models

1 code implementation5 May 2023 Yunxin Li, Baotian Hu, Xinyu Chen, Lin Ma, Yong Xu, Min Zhang

LMEye addresses this issue by allowing the LLM to request the desired visual information aligned with various human instructions, which we term as the dynamic visual information interaction.

Language Modelling Large Language Model +1

A Neural Divide-and-Conquer Reasoning Framework for Image Retrieval from Linguistically Complex Text

1 code implementation3 May 2023 Yunxin Li, Baotian Hu, Yuxin Ding, Lin Ma, Min Zhang

Inspired by the Divide-and-Conquer algorithm and dual-process theory, in this paper, we regard linguistically complex texts as compound proposition texts composed of multiple simple proposition sentences and propose an end-to-end Neural Divide-and-Conquer Reasoning framework, dubbed NDCR.

Image Retrieval Logical Reasoning +1

Towards ISAC-Empowered Vehicular Networks: Framework, Advances, and Opportunities

no code implementations1 May 2023 Zhen Du, Fan Liu, Yunxin Li, Weijie Yuan, Yuanhao Cui, Zenghui Zhang, Christos Masouros, Bo Ai

Connected and autonomous vehicle (CAV) networks face several challenges, such as low throughput, high latency, and poor localization accuracy.

ISAC-Enabled V2I Networks Based on 5G NR: How Much Can the Overhead Be Reduced?

no code implementations30 Jan 2023 Yunxin Li, Fan Liu, Zhen Du, Weijie Yuan, Christos Masouros

The emergence of the fifth-generation (5G) New Radio (NR) brings additional possibilities to vehicle-to-everything (V2X) network with improved quality of services.

Management

MSDF: A General Open-Domain Multi-Skill Dialog Framework

no code implementations17 Jun 2022 Yu Zhao, Xinshuo Hu, Yunxin Li, Baotian Hu, Dongfang Li, Sichao Chen, Xiaolong Wang

In this paper, we propose a general Multi-Skill Dialog Framework, namely MSDF, which can be applied in different dialog tasks (e. g. knowledge grounded dialog and persona based dialog).

Medical Dialogue Response Generation with Pivotal Information Recalling

no code implementations17 Jun 2022 Yu Zhao, Yunxin Li, Yuxiang Wu, Baotian Hu, Qingcai Chen, Xiaolong Wang, Yuxin Ding, Min Zhang

To mitigate this problem, we propose a medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i. e., knowledge-aware dialogue graph encoder and recall-enhanced generator.

Dialogue Generation Graph Attention +1

Sentence-level Online Handwritten Chinese Character Recognition

no code implementations4 Jul 2021 Yunxin Li, Qian Yang, Qingcai Chen, Lin Ma, Baotian Hu, Xiaolong Wang, Yuxin Ding

Single online handwritten Chinese character recognition~(single OLHCCR) has achieved prominent performance.

Sentence Word Embeddings

GlyphCRM: Bidirectional Encoder Representation for Chinese Character with its Glyph

no code implementations1 Jul 2021 Yunxin Li, Yu Zhao, Baotian Hu, Qingcai Chen, Yang Xiang, Xiaolong Wang, Yuxin Ding, Lin Ma

Previous works indicate that the glyph of Chinese characters contains rich semantic information and has the potential to enhance the representation of Chinese characters.

Cannot find the paper you are looking for? You can Submit a new open access paper.