Search Results for author: Yanzhao Zhang

Found 16 papers, 9 papers with code

Towards Text-Image Interleaved Retrieval

1 code implementation18 Feb 2025 Xin Zhang, Ziqi Dai, Yongqi Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Jun Yu, Wenjie Li, Min Zhang

In this work, we introduce the text-image interleaved retrieval (TIIR) task, where the query and document are interleaved text-image sequences, and the model is required to understand the semantics from the interleaved context for effective retrieval.

Information Retrieval Language Modeling +5

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

no code implementations22 Dec 2024 Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

Last, we provide in-depth analyses of model scaling and training strategies, and perform ablation studies on both the model and synthetic data.

Retrieval

When Text Embedding Meets Large Language Model: A Comprehensive Survey

no code implementations12 Dec 2024 Zhijie Nie, Zhangchi Feng, Mingxin Li, Cunwang Zhang, Yanzhao Zhang, Dingkun Long, Richong Zhang

Text embedding has become a foundational technology in natural language processing (NLP) during the deep learning era, driving advancements across a wide array of downstream tasks.

Information Retrieval Language Modeling +4

Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging

no code implementations19 Oct 2024 Mingxin Li, Zhijie Nie, Yanzhao Zhang, Dingkun Long, Richong Zhang, Pengjun Xie

Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has facilitated the development of versatile general-purpose text embedding models.

model Semantic Textual Similarity +2

An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

1 code implementation16 Aug 2024 Peiming Guo, Sinuo Liu, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

We propose the first end-to-end model for photo-sharing multi-modal dialogue generation, which integrates an image perceptron and an image generator with a large language model.

Image Generation Language Modeling +3

mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

no code implementations29 Jul 2024 Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders).

Contrastive Learning Text Retrieval +1

Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training

1 code implementation8 Apr 2024 Longhui Zhang, Dingkun Long, Meishan Zhang, Yanzhao Zhang, Pengjun Xie, Min Zhang

Experimental results on Chinese sequence labeling datasets demonstrate that the improved BABERT variant outperforms the vanilla version, not only on these tasks but also more broadly across a range of Chinese natural language understanding tasks.

Language Modeling Language Modelling +1

Language Models are Universal Embedders

1 code implementation12 Oct 2023 Xin Zhang, Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie, Meishan Zhang, Min Zhang

As such cases span from English to other natural or programming languages, from retrieval to classification and beyond, it is desirable to build a unified embedding model rather than dedicated ones for each scenario.

Code Search Language Modeling +3

Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense Passage Retrieval

no code implementations22 May 2023 Zehan Li, Yanzhao Zhang, Dingkun Long, Pengjun Xie

Recently, various studies have been directed towards exploring dense passage retrieval techniques employing pre-trained language models, among which the masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.

Decoder Passage Retrieval +1

Retrieval Oriented Masking Pre-training Language Model for Dense Passage Retrieval

1 code implementation27 Oct 2022 Dingkun Long, Yanzhao Zhang, Guangwei Xu, Pengjun Xie

Pre-trained language model (PTM) has been shown to yield powerful text representations for dense passage retrieval task.

Language Modeling Language Modelling +3

HLATR: Enhance Multi-stage Text Retrieval with Hybrid List Aware Transformer Reranking

1 code implementation21 May 2022 Yanzhao Zhang, Dingkun Long, Guangwei Xu, Pengjun Xie

Existing text retrieval systems with state-of-the-art performance usually adopt a retrieve-then-reranking architecture due to the high computational cost of pre-trained language models and the large corpus size.

Passage Ranking Passage Re-Ranking +1

Cannot find the paper you are looking for? You can Submit a new open access paper.