Search Results for author: Di Yin

Found 13 papers, 6 papers with code

FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema

1 code implementation • 19 Feb 2024 • Junru Lu, Siyu An, Min Zhang, Yulan He, Di Yin, Xing Sun

In the quest to facilitate the deep intelligence of Large Language Models (LLMs) accessible in final-end user-bot interactions, the art of prompt crafting emerges as a critical yet complex task for the average user.

Paper
Code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,783

Paper
Code

MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL

1 code implementation • 18 Dec 2023 • Bing Wang, Changyu Ren, Jian Yang, Xinnian Liang, Jiaqi Bai, Linzheng Chai, Zhao Yan, Qian-Wen Zhang, Di Yin, Xing Sun, Zhoujun Li

Our framework comprises a core decomposer agent for Text-to-SQL generation with few-shot chain-of-thought reasoning, accompanied by two auxiliary agents that utilize external tools or models to acquire smaller sub-databases and refine erroneous SQL queries.

Ranked #5 on Text-To-SQL on BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation)

SQL Parsing Text-To-SQL

Paper
Code

VKIE: The Application of Key Information Extraction on Video Text

no code implementations • 18 Oct 2023 • Siyu An, Ye Liu, Haoyuan Peng, Di Yin

Extracting structured information from videos is critical for numerous downstream applications in the industry.

Key Information Extraction

Paper
Add Code

Rethinking Data Perturbation and Model Stabilization for Semi-supervised Medical Image Segmentation

1 code implementation • 23 Aug 2023 • Zhen Zhao, Ye Liu, Meng Zhao, Di Yin, Yixuan Yuan, Luping Zhou

Studies on semi-supervised medical image segmentation (SSMIS) have seen fast progress recently.

Image Segmentation Segmentation +2

Paper
Code

MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation

1 code implementation • 16 Aug 2023 • Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun, Yunsheng Wu

We propose MemoChat, a pipeline for refining instructions that enables large language models (LLMs) to effectively employ self-composed memos for maintaining consistent long-range open-domain conversations.

Memorization Retrieval

Paper
Code

OSAN: A One-Stage Alignment Network To Unify Multimodal Alignment and Unsupervised Domain Adaptation

no code implementations • CVPR 2023 • Ye Liu, Lingfeng Qiao, Changchong Lu, Di Yin, Chen Lin, Haoyuan Peng, Bo Ren

An intuitive way to handle these two problems is to fulfill these tasks in two separate stages: aligning modalities followed by domain adaptation, or vice versa.

Unsupervised Domain Adaptation

Paper
Add Code

Grafting Pre-trained Models for Multimodal Headline Generation

no code implementations • 14 Nov 2022 • Lingfeng Qiao, Chen Wu, Ye Liu, Haoyuan Peng, Di Yin, Bo Ren

In this paper, we propose a novel approach to graft the video encoder from the pre-trained video-language model on the generative pre-trained language model.

Headline Generation Language Modelling +1

Paper
Add Code

Unsupervised Extractive Summarization with Heterogeneous Graph Embeddings for Chinese Document

no code implementations • 9 Nov 2022 • Chen Lin, Ye Liu, Siyu An, Di Yin

In the scenario of unsupervised extractive summarization, learning high-quality sentence representations is essential to select salient sentences from the input document.

Extractive Summarization Sentence +2

Paper
Add Code

Leveraging Key Information Modeling to Improve Less-Data Constrained News Headline Generation via Duality Fine-Tuning

no code implementations • 10 Oct 2022 • Zhuoxuan Jiang, Lingfeng Qiao, Di Yin, Shanshan Feng, Bo Ren

Recent language generative models are mostly trained on large-scale datasets, while in some real scenarios, the training datasets are often expensive to obtain and would be small-scale.

Headline Generation Informativeness +1

Paper
Add Code

OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

no code implementations • 4 Jul 2022 • Ye Liu, Lingfeng Qiao, Di Yin, Zhuoxuan Jiang, Xinghua Jiang, Deqiang Jiang, Bo Ren

In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category.

Scene Segmentation

Paper
Add Code

RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

1 code implementation • NAACL 2022 • Yuan Liang, Zhuoxuan Jiang, Di Yin, Bo Ren

To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance.

Ranked #1 on Document-level Event Extraction on ChFinAnn

Document-level Event Extraction Event Extraction +3

Paper
Code

Contrastive Graph Multimodal Model for Text Classification in Videos

no code implementations • 6 Jun 2022 • Ye Liu, Changchong Lu, Chen Lin, Di Yin, Bo Ren

However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing.

Contrastive Learning Optical Character Recognition (OCR) +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.