Search Results for author: Yu Yan

Found 29 papers, 17 papers with code

DistJoin: A Decoupled Join Cardinality Estimator based on Adaptive Neural Predicate Modulation

no code implementations12 Mar 2025 Kaixin Zhang, Hongzhi Wang, ZiQi Li, Yabin Lu, Yingze Li, Yu Yan, Yiming Guan

We conceptualize these challenges as the "Trilemma of Cardinality Estimation", where learned cardinality estimation methods struggle to balance generality, accuracy, and updatability.

Collaborative Stance Detection via Small-Large Language Model Consistency Verification

1 code implementation27 Feb 2025 Yu Yan, Sheng Sun, Zixiang Tang, Teli Liu, Min Liu

However, heavily relying on LLMs for stance detection, regardless of the cost, is impractical for real-world social media monitoring systems that require vast data analysis.

Language Modeling Language Modelling +2

from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

no code implementations25 Feb 2025 Yu Yan, Sheng Sun, Zenghao Duan, Teli Liu, Min Liu, Zhiyi Yin, Qi Li, Jiangyu Lei

Current studies have exposed the risk of Large Language Models (LLMs) generating harmful content by jailbreak attacks.

Language Modeling Language Modelling

Na'vi or Knave: Jailbreaking Language Models via Metaphorical Avatars

no code implementations10 Dec 2024 Yu Yan, Sheng Sun, Junqi Tong, Min Liu, Qi Li

In our study, we introduce a novel attack framework that exploits the imaginative capacity of LLMs to achieve jailbreaking, the J\underline{\textbf{A}}ilbreak \underline{\textbf{V}}ia \underline{\textbf{A}}dversarial Me\underline{\textbf{TA}} -pho\underline{\textbf{R}} (\textit{AVATAR}).

Safety Alignment

MERLIN: Multi-stagE query performance prediction for dynamic paRallel oLap pIpeliNe

no code implementations1 Dec 2024 Kaixin Zhang, Hongzhi Wang, Kunkai Gu, ZiQi Li, Chunyu Zhao, Yingze Li, Yu Yan

High-performance OLAP database technology has emerged with the growing demand for massive data analysis.

Prediction

Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models

no code implementations18 Jul 2024 Simha Sankar Baradwaj, Destiny Gilliland, Jack Rincon, Henning Hermjakob, Yu Yan, Irsyad Adam, Gwyneth Lemaster, Dean Wang, Karol Watson, Alex Bui, Wei Wang, Peipei Ping

We explore strategies that can be implemented throughout the biomedical AI pipeline to effectively tackle these challenges, ensuring that these FMs are translated responsibly into clinical and translational settings.

Decision Making Memorization +2

Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

1 code implementation17 Jul 2024 Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively.

Navigate RAG +1

CliBench: A Multifaceted and Multigranular Evaluation of Large Language Models for Clinical Decision Making

no code implementations14 Jun 2024 Mingyu Derek Ma, Chenchen Ye, Yu Yan, Xiaoxuan Wang, Peipei Ping, Timothy S Chang, Wei Wang

The integration of Artificial Intelligence (AI), especially Large Language Models (LLMs), into the clinical diagnosis process offers significant potential to improve the efficiency and accessibility of medical care.

Decision Making Diagnostic

Xmodel-LM Technical Report

3 code implementations5 Jun 2024 Yichuan Wang, Yang Liu, Yu Yan, Qun Wang, Xucheng Huang, Ling Jiang

We introduce Xmodel-LM, a compact and efficient 1. 1B language model pre-trained on around 2 trillion tokens.

Language Modeling Language Modelling

Harmonic and Interharmonic Detection in Power Systems Based on Fractal-Optimized Variational Mode Decomposition

no code implementations16 May 2024 Pei Yuhang, Yu Min, Yu Yan

The proposed method introduces a parameter determination approach based on the minimum Fractal box dimension (FBD) of Variational Mode Decomposition (VMD) components, aiming to address the issue of manual determination of VMD decomposition layers in advance.

Scene Summarization: Clustering Scene Videos into Spatially Diverse Frames

no code implementations28 Nov 2023 Chao Chen, Mingzhi Zhu, Ankush Pratap Singh, Yu Yan, Felix Juefei Xu, Chen Feng

It aims to summarize a long video walkthrough of a scene into a small set of frames that are spatially diverse in the scene, which has many impotant applications, such as in surveillance, real estate, and robotics.

Clustering Diversity +3

Duet: efficient and scalable hybriD neUral rElation undersTanding

1 code implementation25 Jul 2023 Kaixin Zhang, Hongzhi Wang, Yabin Lu, ZiQi Li, Chang Shu, Yu Yan, Donghua Yang

Although both data-driven and hybrid methods are proposed to avoid this problem, most of them suffer from high training and estimation costs, limited scalability, instability, and long-tail distribution problems on high-dimensional tables, which seriously affects the practical application of learned cardinality estimators.

Relation

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

no code implementations23 May 2022 Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan

To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.

Question Generation Question-Generation +1

Factorisation-based Image Labelling

1 code implementation19 Nov 2021 Yu Yan, Yael Balbastre, Mikael Brudfors, John Ashburner

Segmentation of brain magnetic resonance images (MRI) into anatomical regions is a useful task in neuroimaging.

Brain Segmentation Segmentation

EL-Attention: Memory Efficient Lossless Attention for Generation

1 code implementation11 May 2021 Yu Yan, Jiusheng Chen, Weizhen Qi, Nikhil Bhendawade, Yeyun Gong, Nan Duan, Ruofei Zhang

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks.

Question Generation Question-Generation

ProphetNet: Predicting Future N-gram for Sequence-to-SequencePre-training

3 code implementations Findings of the Association for Computational Linguistics 2020 Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.

Abstractive Text Summarization Prediction +2

ProphetNet-Ads: A Looking Ahead Strategy for Generative Retrieval Models in Sponsored Search Engine

1 code implementation21 Oct 2020 Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou

We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models.

Retrieval

Tell Me How to Ask Again: Question Data Augmentation with Controllable Rewriting in Continuous Space

1 code implementation EMNLP 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Jiancheng Lv, Nan Duan, Ming Zhou

In this paper, we propose a novel data augmentation method, referred to as Controllable Rewriting based Question Data Augmentation (CRQDA), for machine reading comprehension (MRC), question generation, and question-answering natural language inference tasks.

Data Augmentation Machine Reading Comprehension +6

RikiNet: Reading Wikipedia Pages for Natural Question Answering

no code implementations ACL 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Yu Yan, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Nan Duan

The representations are then fed into the predictor to obtain the span of the short answer, the paragraph of the long answer, and the answer type in a cascaded manner.

Natural Language Understanding Natural Questions +1

Diverse, Controllable, and Keyphrase-Aware: A Corpus and Method for News Multi-Headline Generation

1 code implementation EMNLP 2020 Dayiheng Liu, Yeyun Gong, Jie Fu, Wei Liu, Yu Yan, Bo Shao, Daxin Jiang, Jiancheng Lv, Nan Duan

Furthermore, we propose a simple and effective method to mine the keyphrases of interest in the news article and build a first large-scale keyphrase-aware news headline corpus, which contains over 180K aligned triples of $<$news article, headline, keyphrase$>$.

Decoder Diversity +2

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training

5 code implementations13 Jan 2020 Weizhen Qi, Yu Yan, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou

This paper presents a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.

Ranked #6 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Prediction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.