Search Results for author: Yingxia Shao

Found 42 papers, 16 papers with code

Matching-oriented Embedding Quantization For Ad-hoc Retrieval

1 code implementation EMNLP 2021 Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie

In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated.

Quantization Retrieval

Don't Waste Your Bits! Squeeze Activations and Gradients for Deep Neural Networks via TinyScript

no code implementations ICML 2020 Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, Bin Cui

Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost.

Quantization

Token-Efficient Leverage Learning in Large Language Models

1 code implementation1 Apr 2024 Yuanhao Zeng, Min Wang, Yihang Wang, Yingxia Shao

With the same amount of task data, TELL leads in improving task performance compared to SFT.

Instruction Following Translation

Making Large Language Models A Better Foundation For Dense Retrieval

1 code implementation24 Dec 2023 Chaofan Li, Zheng Liu, Shitao Xiao, Yingxia Shao

LLaRA consists of two pretext tasks: EBAE (Embedding-Based Auto-Encoding) and EBAR (Embedding-Based Auto-Regression), where the text embeddings from LLM are used to reconstruct the tokens for the input sentence and predict the tokens for the next sentence, respectively.

Retrieval Sentence +1

Experimental Analysis of Large-scale Learnable Vector Storage Compression

1 code implementation27 Nov 2023 Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains.

Benchmarking

Relation Extraction Model Based on Semantic Enhancement Mechanism

no code implementations5 Nov 2023 Peiyu Liu, Junping Du, Yingxia Shao, Zeli Guan

The CasAug model proposed in this paper based on the CasRel framework combined with the semantic enhancement mechanism can solve this problem to a certain extent.

Information Retrieval Natural Language Understanding +4

Dynamic Fair Federated Learning Based on Reinforcement Learning

no code implementations2 Nov 2023 Weikang Chen, Junping Du, Yingxia Shao, Jia Wang, Yangxi Zhou

Federated learning enables a collaborative training and optimization of global models among a group of devices without sharing local data samples.

Fairness Federated Learning +1

Entity Alignment Method of Science and Technology Patent based on Graph Convolution Network and Information Fusion

no code implementations1 Nov 2023 Runze Fang, Yawen Li, Yingxia Shao, Zeli Guan, Zhe Xue

The entity alignment of science and technology patents aims to link the equivalent entities in the knowledge graph of different science and technology patent data sources.

Attribute Entity Alignment

Accelerating Scalable Graph Neural Network Inference with Node-Adaptive Propagation

no code implementations17 Oct 2023 Xinyi Gao, Wentao Zhang, Junliang Yu, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

To further accelerate Scalable GNNs inference in this inductive setting, we propose an online propagation framework and two novel node-adaptive propagation methods that can customize the optimal propagation depth for each node based on its topological information and thereby avoid redundant feature propagation.

Reinforcement Federated Learning Method Based on Adaptive OPTICS Clustering

no code implementations22 Jun 2023 Tianyu Zhao, Junping Du, Yingxia Shao, Zeli Guan

The algorithm combines OPTICS clustering and adaptive learning technology, and can effective-ly deal with the problem of non-independent and identically distributed data across different user terminals.

Clustering Federated Learning

RetroMAE-2: Duplex Masked Auto-Encoder For Pre-Training Retrieval-Oriented Language Models

1 code implementation4 May 2023 Shitao Xiao, Zheng Liu, Yingxia Shao, Zhao Cao

It is designed to improve the quality of semantic representation where all contextualized embeddings of the pre-trained model can be leveraged.

Information Retrieval Open-Domain Question Answering +2

Distributed Graph Neural Network Training: A Survey

no code implementations1 Nov 2022 Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen

In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.

Distributed Computing

Efficient Graph Neural Network Inference at Large Scale

no code implementations1 Nov 2022 Xinyi Gao, Wentao Zhang, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications.

A Relational Triple Extraction Method Based on Feature Reasoning for Technological Patents

no code implementations7 Oct 2022 Runze Fang, Junping Du, Yingxia Shao, Zeli Guan

However, most of them only establish separate table features for each relationship, which ignores the implicit relationship between different entity pairs and different relationship features.

Relation

Diffusion Models: A Comprehensive Survey of Methods and Applications

2 code implementations2 Sep 2022 Ling Yang, Zhilong Zhang, Yang song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, Ming-Hsuan Yang

This survey aims to provide a contextualized, in-depth look at the state of diffusion models, identifying the key areas of focus and pointing to potential areas for further exploration.

Image Super-Resolution Text-to-Image Generation +1

A Rare Topic Discovery Model for Short Texts Based on Co-occurrence word Network

no code implementations30 Jun 2022 Chengjie Ma, Junping Du, Yingxia Shao, Ang Li, Zeli Guan

We provide a simple and general solution for the discovery of scarce topics in unbalanced short-text datasets, namely, a word co-occurrence network-based model CWIBTD, which can simultaneously address the sparsity and unbalance of short-text topics and attenuate the effect of occasional pairwise occurrences of words, allowing the model to focus more on the discovery of scarce topics.

A sentiment analysis model for car review texts based on adversarial training and whole word mask BERT

no code implementations6 Jun 2022 Xingchen Liu, Yawen Li, Yingxia Shao, Ang Li, Jian Liang

Based on this, we propose a car review text sentiment analysis model based on adversarial training and whole word mask BERT(ATWWM-BERT).

Decision Making Sentiment Analysis

Sentiment Analysis of Online Travel Reviews Based on Capsule Network and Sentiment Lexicon

no code implementations5 Jun 2022 Jia Wang, Junping Du, Yingxia Shao, Ang Li

In this paper, we study the text sentiment classification of online travel reviews based on social media online comments and propose the SCCL model based on capsule network and sentiment lexicon.

Language Modelling Sentiment Analysis +1

RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder

1 code implementation24 May 2022 Shitao Xiao, Zheng Liu, Yingxia Shao, Zhao Cao

The sentence embedding is generated from the encoder's masked input; then, the original sentence is recovered based on the sentence embedding and the decoder's masked input via masked language modeling.

Information Retrieval Language Modelling +6

Profiling and Evolution of Intellectual Property

no code implementations20 Apr 2022 Bowen Yu, Yingxia Shao, Ang Li

In recent years, with the rapid growth of Internet data, the number and types of scientific and technological resources are also rapidly expanding.

Retrieval

Retrieval of Scientific and Technological Resources for Experts and Scholars

no code implementations13 Apr 2022 Suyu Ouyang, Yingxia Shao, Ang Li

The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements.

Relation Extraction Representation Learning +1

Research on Intellectual Property Resource Profile and Evolution Law

no code implementations13 Apr 2022 Yuhui Wang, Yingxia Shao, Ang Li

In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing.

An Intellectual Property Entity Recognition Method Based on Transformer and Technological Word Information

no code implementations21 Mar 2022 Yuhui Wang, Junping Du, Yingxia Shao

This paper proposes a method for extracting intellectual property entities based on Transformer and technical word information , and provides accurate word vector representation in combination with the BERT language method.

named-entity-recognition Named Entity Recognition +1

Web Page Content Extraction Based on Multi-feature Fusion

no code implementations21 Mar 2022 Bowen Yu, Junping Du, Yingxia Shao

With the rapid growth of the number and types of web resources, there are still problems to be solved when using a single strategy to extract the text information of different pages.

Space4HGNN: A Novel, Modularized and Reproducible Platform to Evaluate Heterogeneous Graph Neural Network

1 code implementation18 Feb 2022 Tianyu Zhao, Cheng Yang, Yibo Li, Quan Gan, Zhenyi Wang, Fengqi Liang, Huan Zhao, Yingxia Shao, Xiao Wang, Chuan Shi

Heterogeneous Graph Neural Network (HGNN) has been successfully employed in various tasks, but we cannot accurately know the importance of different design dimensions of HGNNs due to diverse architectures and applied scenarios.

Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval

2 code implementations14 Jan 2022 Shitao Xiao, Zheng Liu, Weihao Han, Jianjin Zhang, Yingxia Shao, Defu Lian, Chaozhuo Li, Hao Sun, Denvy Deng, Liangjie Zhang, Qi Zhang, Xing Xie

In this work, we tackle this problem with Bi-Granular Document Representation, where the lightweight sparse embeddings are indexed and standby in memory for coarse-grained candidate search, and the heavyweight dense embeddings are hosted in disk for fine-grained post verification.

Quantization Retrieval

LECF: Recommendation via Learnable Edge Collaborative Filtering

1 code implementation Science China Information Sciences 2021 Shitao Xiao, Yingxia Shao, Yawen Li, Hongzhi Yin, Yanyan Shen & Bin Cui

In this paper, we model an interaction between user and item as an edge and propose a novel CF framework, called learnable edge collaborative filtering (LECF).

Collaborative Filtering

Self-Supervised Graph Co-Training for Session-based Recommendation

2 code implementations24 Aug 2021 Xin Xia, Hongzhi Yin, Junliang Yu, Yingxia Shao, Lizhen Cui

In this paper, for informative session-based data augmentation, we combine self-supervised learning with co-training, and then develop a framework to enhance session-based recommendation.

Contrastive Learning Data Augmentation +2

Memory-aware framework for fast and scalable second-order random walk over billion-edge natural graphs

no code implementations The VLDB Journal 2021 Yingxia Shao, Shiyue Huang, Yawen Li, Xupeng Miao, Bin Cui & Lei Chen

In this paper, to clearly compare the efficiency of various node sampling methods, we first design a cost model and propose two new node sampling methods: one follows the acceptance-rejection paradigm to achieve a better balance between memory and time cost, and the other is optimized for fast sampling the skewed probability distributions existed in natural graphs.

Community Detection Graph Embedding

Matching-oriented Product Quantization For Ad-hoc Retrieval

2 code implementations16 Apr 2021 Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie

In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated.

Quantization Retrieval

Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

1 code implementation18 Feb 2021 Shitao Xiao, Zheng Liu, Yingxia Shao, Tao Di, Xing Xie

Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding.

News Recommendation Recommendation Systems

Efficient Automatic CASH via Rising Bandits

no code implementations8 Dec 2020 Yang Li, Jiawei Jiang, Jinyang Gao, Yingxia Shao, Ce Zhang, Bin Cui

In this framework, the BO methods are used to solve the HPO problem for each ML algorithm separately, incorporating a much smaller hyperparameter space for BO methods.

Bayesian Optimization BIG-bench Machine Learning +2

UniNet: Scalable Network Representation Learning with Metropolis-Hastings Sampling

1 code implementation10 Oct 2020 Xingyu Yao, Yingxia Shao, Bin Cui, Lei Chen

Finally, with the new edge sampler and random walk model abstraction, we carefully implement a scalable NRL framework called UniNet.

Representation Learning

DeGNN: Characterizing and Improving Graph Neural Networks with Graph Decomposition

no code implementations10 Oct 2019 Xupeng Miao, Nezihe Merve Gürel, Wentao Zhang, Zhichao Han, Bo Li, Wei Min, Xi Rao, Hansheng Ren, Yinan Shan, Yingxia Shao, Yujie Wang, Fan Wu, Hui Xue, Yaming Yang, Zitao Zhang, Yang Zhao, Shuai Zhang, Yujing Wang, Bin Cui, Ce Zhang

Despite the wide application of Graph Convolutional Network (GCN), one major limitation is that it does not benefit from the increasing depth and suffers from the oversmoothing problem.

An Experimental Evaluation of Large Scale GBDT Systems

no code implementations3 Jul 2019 Fangcheng Fu, Jiawei Jiang, Yingxia Shao, Bin Cui

Gradient boosting decision tree (GBDT) is a widely-used machine learning algorithm in both data analytic competitions and real-world industrial applications.

Management

NSCaching: Simple and Efficient Negative Sampling for Knowledge Graph Embedding

6 code implementations16 Dec 2018 Yongqi Zhang, Quanming Yao, Yingxia Shao, Lei Chen

Negative sampling, which samples negative triplets from non-observed ones in the training data, is an important step in KG embedding.

Generative Adversarial Network Knowledge Graph Embedding +1

Cannot find the paper you are looking for? You can Submit a new open access paper.