Search Results for author: Yingxia Shao

Found 42 papers, 16 papers with code

Matching-oriented Embedding Quantization For Ad-hoc Retrieval

1 code implementation • EMNLP 2021 • Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie

In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated.

Quantization Retrieval

Paper
Code

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

no code implementations • ICML 2020 • Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost.

Quantization

Paper
Add Code

Token-Efficient Leverage Learning in Large Language Models

1 code implementation • 1 Apr 2024 • Yuanhao Zeng, Min Wang, Yihang Wang, Yingxia Shao

With the same amount of task data, TELL leads in improving task performance compared to SFT.

Instruction Following Translation

Paper
Code

LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model

no code implementations • 12 Mar 2024 • Linmei Hu, Hongyu He, Duokang Wang, Ziwang Zhao, Yingxia Shao, Liqiang Nie

Furthermore, we utilize the LLM to enrich the information of personality labels for enhancing the detection performance.

Contrastive Learning Language Modelling +2

Paper
Add Code

Making Large Language Models A Better Foundation For Dense Retrieval

1 code implementation • 24 Dec 2023 • Chaofan Li, Zheng Liu, Shitao Xiao, Yingxia Shao

LLaRA consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively.

Retrieval Sentence +1

4,834

Paper
Code

Experimental Analysis of Large-scale Learnable Vector Storage Compression

1 code implementation • 27 Nov 2023 • Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains.

Benchmarking

Paper
Code

Relation Extraction Model Based on Semantic Enhancement Mechanism

no code implementations • 5 Nov 2023 • Peiyu Liu, Junping Du, Yingxia Shao, Zeli Guan

The CasAug model proposed in this paper based on the CasRel framework combined with the semantic enhancement mechanism can solve this problem to a certain extent.

Information Retrieval Natural Language Understanding +4

Paper
Add Code

Dynamic Fair Federated Learning Based on Reinforcement Learning

no code implementations • 2 Nov 2023 • Weikang Chen, Junping Du, Yingxia Shao, Jia Wang, Yangxi Zhou

Federated learning enables a collaborative training and optimization of global models among a group of devices without sharing local data samples.

Fairness Federated Learning +1

Paper
Add Code

Entity Alignment Method of Science and Technology Patent based on Graph Convolution Network and Information Fusion

no code implementations • 1 Nov 2023 • Runze Fang, Yawen Li, Yingxia Shao, Zeli Guan, Zhe Xue

The entity alignment of science and technology patents aims to link the equivalent entities in the knowledge graph of different science and technology patent data sources.

Attribute Entity Alignment

Paper
Add Code

Accelerating Scalable Graph Neural Network Inference with Node-Adaptive Propagation

no code implementations • 17 Oct 2023 • Xinyi Gao, Wentao Zhang, Junliang Yu, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

To further accelerate Scalable GNNs inference in this inductive setting, we propose an online propagation framework and two novel node-adaptive propagation methods that can customize the optimal propagation depth for each node based on its topological information and thereby avoid redundant feature propagation.

Paper
Add Code

Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering

no code implementations • 22 Jun 2023 • Tianyu Zhao, Junping Du, Yingxia Shao, Zeli Guan

The algorithm combines OPTICS clustering and adaptive learning technology, and can effective-ly deal with the problem of non-independent and identically distributed data across different user terminals.

Clustering Federated Learning

Paper
Add Code

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

1 code implementation • 4 May 2023 • Shitao Xiao, Zheng Liu, Yingxia Shao, Zhao Cao

It is designed to improve the quality of semantic representation where all contextualized embeddings of the pre-trained model can be leveraged.

Information Retrieval Open-Domain Question Answering +2

201

Paper
Code

Distributed Graph Neural Network Training: A Survey

no code implementations • 1 Nov 2022 • Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen

In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.

Distributed Computing

Paper
Add Code

Efficient Graph Neural Network Inference at Large Scale

no code implementations • 1 Nov 2022 • Xinyi Gao, Wentao Zhang, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.

Paper
Add Code

Adaptive Dual Channel Convolution Hypergraph Representation Learning for Technological Intellectual Property

no code implementations • 12 Oct 2022 • Yuxin Liu, Yawen Li, Yingxia Shao, Zeli Guan

Therefore, a hypergraph neural network model based on dual channel convolution is proposed.

Graph Learning Representation Learning

Paper
Add Code

A Relational Triple Extraction Method Based on Feature Reasoning for Technological Patents

no code implementations • 7 Oct 2022 • Runze Fang, Junping Du, Yingxia Shao, Zeli Guan

However, most of them only establish separate table features for each relationship, which ignores the implicit relationship between different entity pairs and different relationship features.

Relation

Paper
Add Code

Diffusion Models: A Comprehensive Survey of Methods and Applications

2 code implementations • 2 Sep 2022 • Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Image Super-Resolution Text-to-Image Generation +1

2,661

Paper
Code

A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

no code implementations • 30 Jun 2022 • Chengjie Ma, Junping Du, Yingxia Shao, Ang Li, Zeli Guan

We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics.

Paper
Add Code

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

no code implementations • 6 Jun 2022 • Xingchen Liu, Yawen Li, Yingxia Shao, Ang Li, Jian Liang

Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).

Decision Making Sentiment Analysis

Paper
Add Code

Sentiment Analysis of Online Travel Reviews Based on Capsule Network and Sentiment Lexicon

no code implementations • 5 Jun 2022 • Jia Wang, Junping Du, Yingxia Shao, Ang Li

In this paper, we study the text sentiment classification of online travel reviews based on social media online comments and propose the SCCL model based on capsule network and sentiment lexicon.

Language Modelling Sentiment Analysis +1

Paper
Add Code

RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder

1 code implementation • 24 May 2022 • Shitao Xiao, Zheng Liu, Yingxia Shao, Zhao Cao

The sentence embedding is generated from the encoder's masked input; then, the original sentence is recovered based on the sentence embedding and the decoder's masked input via masked language modeling.

Ranked #1 on Information Retrieval on MSMARCO

Information Retrieval Language Modelling +6

201

Paper
Code

Profiling and Evolution of Intellectual Property

no code implementations • 20 Apr 2022 • Bowen Yu, Yingxia Shao, Ang Li

In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding.

Retrieval

Paper
Add Code

Retrieval of Scientific and Technological Resources for Experts and Scholars

no code implementations • 13 Apr 2022 • Suyu Ouyang, Yingxia Shao, Ang Li

The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements.

Relation Extraction Representation Learning +1

Paper
Add Code

Research on Intellectual Property Resource Profile and Evolution Law

no code implementations • 13 Apr 2022 • Yuhui Wang, Yingxia Shao, Ang Li

In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.

Paper
Add Code

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings

2 code implementations • 1 Apr 2022 • Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Defu Lian, Yeyun Gong, Qi Chen, Fan Yang, Hao Sun, Yingxia Shao, Denvy Deng, Qi Zhang, Xing Xie

We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index.

Contrastive Learning Knowledge Distillation +2

Paper
Code

Scientific and Technological Text Knowledge Extraction Method of based on Word Mixing and GRU

no code implementations • 31 Mar 2022 • Suyu Ouyang, Yingxia Shao, Junping Du, Ang Li

The knowledge extraction task is to extract triple relations (head entity-relation-tail entity) from unstructured text data.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

no code implementations • 21 Mar 2022 • Yuhui Wang, Junping Du, Yingxia Shao

This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method.

named-entity-recognition Named Entity Recognition +1

Paper
Add Code

Web Page Content Extraction Based on Multi-feature Fusion

no code implementations • 21 Mar 2022 • Bowen Yu, Junping Du, Yingxia Shao

With the rapid growth of the number and types of web resources, there are still problems to be solved when using a single strategy to extract the text information of different pages.

Paper
Add Code

Space4HGNN: A Novel, Modularized and Reproducible Platform to Evaluate Heterogeneous Graph Neural Network

1 code implementation • 18 Feb 2022 • Tianyu Zhao, Cheng Yang, Yibo Li, Quan Gan, Zhenyi Wang, Fengqi Liang, Huan Zhao, Yingxia Shao, Xiao Wang, Chuan Shi

Heterogeneous Graph Neural Network (HGNN) has been successfully employed in various tasks, but we cannot accurately know the importance of different design dimensions of HGNNs due to diverse architectures and applied scenarios.

Paper
Code

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search

no code implementations • 13 Feb 2022 • Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu, Premkumar Srinivasan, Denvy Deng, Qi Zhang, Xing Xie

On the other hand, the capability of making high-CTR retrieval is optimized by learning to discriminate user's clicked ads from the entire corpus.

Contrastive Learning Knowledge Distillation +2

Paper
Add Code

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

2 code implementations • 14 Jan 2022 • Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Yingxia Shao, Defu Lian, Chaozhuo Li, Hao Sun, Denvy Deng, Liangjie Zhang, Qi Zhang, Xing Xie

In this work, we tackle this problem with Bi-Granular Document Representation, where the lightweight sparse embeddings are indexed and standby in memory for coarse-grained candidate search, and the heavyweight dense embeddings are hosted in disk for fine-grained post verification.

Quantization Retrieval

Paper
Code

LECF: Recommendation via Learnable Edge Collaborative Filtering

1 code implementation • Science China Information Sciences 2021 • Shitao Xiao, Yingxia Shao, Yawen Li, Hongzhi Yin, Yanyan Shen & Bin Cui

In this paper, we model an interaction between user and item as an edge and propose a novel CF framework, called learnable edge collaborative filtering (LECF).

Collaborative Filtering

Paper
Code

Self-Supervised Graph Co-Training for Session-based Recommendation

2 code implementations • 24 Aug 2021 • Xin Xia, Hongzhi Yin, Junliang Yu, Yingxia Shao, Lizhen Cui

In this paper, for informative session-based data augmentation, we combine self-supervised learning with co-training, and then develop a framework to enhance session-based recommendation.

Contrastive Learning Data Augmentation +2

Paper
Code

Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs

no code implementations • The VLDB Journal 2021 • Yingxia Shao, Shiyue Huang, Yawen Li, Xupeng Miao, Bin Cui & Lei Chen

In this paper, to clearly compare the efficiency of various node sampling methods, we first design a cost model and propose two new node sampling methods: one follows the acceptance-rejection paradigm to achieve a better balance between memory and time cost, and the other is optimized for fast sampling the skewed probability distributions existed in natural graphs.

Community Detection Graph Embedding

Paper
Add Code

Matching-oriented Product Quantization For Ad-hoc Retrieval

2 code implementations • 16 Apr 2021 • Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie

In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated.

Quantization Retrieval

Paper
Code

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

1 code implementation • 18 Feb 2021 • Shitao Xiao, Zheng Liu, Yingxia Shao, Tao Di, Xing Xie

Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding.

News Recommendation Recommendation Systems

Paper
Code

Efficient Automatic CASH via Rising Bandits

no code implementations • 8 Dec 2020 • Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang, Bin Cui

In this framework, the BO methods are used to solve the HPO problem for each ML algorithm separately, incorporating a much smaller hyperparameter space for BO methods.

Bayesian Optimization BIG-bench Machine Learning +2

Paper
Add Code

UniNet: Scalable Network Representation Learning with Metropolis-Hastings Sampling

1 code implementation • 10 Oct 2020 • Xingyu Yao, Yingxia Shao, Bin Cui, Lei Chen

Finally, with the new edge sampler and random walk model abstraction, we carefully implement a scalable NRL framework called UniNet.

Representation Learning

Paper
Code

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

no code implementations • 10 Oct 2019 • Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.

Paper
Add Code

An Experimental Evaluation of Large Scale GBDT Systems

no code implementations • 3 Jul 2019 • Fangcheng Fu, Jiawei Jiang, Yingxia Shao, Bin Cui

Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications.

Management

Paper
Add Code

NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding

6 code implementations • 16 Dec 2018 • Yongqi Zhang, Quanming Yao, Yingxia Shao, Lei Chen

Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding.

Ranked #5 on Link Prediction on FB15k

Generative Adversarial Network Knowledge Graph Embedding +1

Paper
Code

Fast Hyperparameter Optimization of Deep Neural Networks via Ensembling Multiple Surrogates

no code implementations • 6 Nov 2018 • Yang Li, Jiawei Jiang, Yingxia Shao, Bin Cui

The performance of deep neural networks crucially depends on good hyperparameter configurations.

Bayesian Optimization Hyperparameter Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.