Search Results for author: Bingsheng He

Found 46 papers, 26 papers with code

Evaluating Small Language Models for News Summarization: Implications and Factors Influencing Performance

no code implementations2 Feb 2025 Borui Xu, Yao Chen, Zeyi Wen, Weiguo Liu, Bingsheng He

This research not only contributes to the understanding of SLMs but also provides practical insights for researchers seeking efficient summarization solutions that balance performance and resource use.

What Limits LLM-based Human Simulation: LLMs or Our Design?

no code implementations15 Jan 2025 Qian Wang, Jiaying Wu, Zhenheng Tang, Bingqiao Luo, Nuo Chen, Wei Chen, Bingsheng He

We argue that advancing LLM-based human simulation requires addressing both LLM's inherent limitations and simulation framework design challenges.

OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML

no code implementations15 Jan 2025 Xuanhe Zhou, Wei Zhou, Liguo Qi, Hao Zhang, Dihao Chen, Bingsheng He, Mian Lu, Guoliang Li, Fan Wu, Yuqiang Chen

Efficient and consistent feature computation is crucial for a wide range of online ML applications.

Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation

1 code implementation18 Dec 2024 Jun Hu, Bryan Hooi, Bingsheng He, Yinwei Wei

Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information.

Graph Learning Multi-modal Recommendation +1

"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing

no code implementations16 Dec 2024 Moming Duan, Rui Zhao, Linshan Jiang, Nigel Shadbolt, Bingsheng He

In this paper, we propose addressing the above challenges along two lines: 1) For license analysis, we have developed a new vocabulary for ML workflow management and encoded license rules to enable ontological reasoning for analyzing rights granting and compliance issues.

model

Partitioning Message Passing for Graph Fraud Detection

no code implementations16 Nov 2024 Wei Zhuo, Zemin Liu, Bryan Hooi, Bingsheng He, Guang Tan, Rizal Fathony, Jia Chen

Label imbalance and homophily-heterophily mixture are the fundamental problems encountered when applying Graph Neural Networks (GNNs) to Graph Fraud Detection (GFD) tasks.

Fraud Detection Inductive Bias

Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data

1 code implementation23 Oct 2024 Zhaomin Wu, Junyi Hou, Yiqun Diao, Bingsheng He

To overcome these limitations, we introduce the Federated Transformer (FeT), a novel framework that supports multi-party VFL with fuzzy identifiers.

Entity Alignment Vertical Federated Learning

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression

no code implementations16 Oct 2024 Zhenheng Tang, Xueze Kang, Yiming Yin, Xinglin Pan, Yuxin Wang, Xin He, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Amelie Chi Zhou, Bo Li, Bingsheng He, Xiaowen Chu

To alleviate hardware scarcity in training large deep neural networks (DNNs), particularly large language models (LLMs), we present FusionLLM, a decentralized training system designed and implemented for training DNNs using geo-distributed GPUs across different computing clusters or individual devices.

Scheduling

Model-Based Differentially Private Knowledge Transfer for Large Language Models

no code implementations14 Oct 2024 Zhaomin Wu, Jizhou Guo, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang

As large language models (LLMs) become increasingly prevalent in web services, effectively leveraging domain-specific knowledge while ensuring privacy has become critical.

Privacy Preserving RAG +2

Federated Data-Efficient Instruction Tuning for Large Language Models

no code implementations14 Oct 2024 Zhen Qin, Zhaomin Wu, Bingsheng He, Shuiguang Deng

Instruction tuning helps improve pretrained large language models (LLMs) in terms of the responsiveness to human instructions, which is benefited from diversified instruction data.

Federated Learning

Aggressive Post-Training Compression on Extremely Large Language Models

no code implementations30 Sep 2024 Zining Zhang, Yao Chen, Bingsheng He, Zhenjie Zhang

The increasing size and complexity of Large Language Models (LLMs) pose challenges for their deployment on personal computers and mobile devices.

Model Compression Network Pruning +1

LLM-PBE: Assessing Data Privacy in Large Language Models

1 code implementation23 Aug 2024 Qinbin Li, Junyuan Hong, Chulin Xie, Jeffrey Tan, Rachel Xin, Junyi Hou, Xavier Yin, Zhun Wang, Dan Hendrycks, Zhangyang Wang, Bo Li, Bingsheng He, Dawn Song

Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.

Revisiting, Benchmarking and Understanding Unsupervised Graph Domain Adaptation

1 code implementation9 Jul 2024 Meihan Liu, Zhen Zhang, Jiachen Tang, Jiajun Bu, Bingsheng He, Sheng Zhou

Unsupervised Graph Domain Adaptation (UGDA) involves the transfer of knowledge from a label-rich source graph to an unlabeled target graph under domain discrepancies.

Benchmarking Domain Adaptation +1

A Reflective LLM-based Agent to Guide Zero-shot Cryptocurrency Trading

no code implementations27 Jun 2024 Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen, Xu Liu, Bingsheng He

The utilization of Large Language Models (LLMs) in financial trading has primarily been concentrated within the stock market, aiding in economic and financial decisions.

StreamFP: Learnable Fingerprint-guided Data Selection for Efficient Stream Learning

1 code implementation11 Jun 2024 Tongjun Shi, Shuhao Zhang, Binbin Chen, Bingsheng He

Stream Learning (SL) requires models that can quickly adapt to continuously evolving data, posing significant challenges in both computational efficiency and learning accuracy.

Computational Efficiency Continual Learning

Collaborate to Adapt: Source-Free Graph Domain Adaptation via Bi-directional Adaptation

1 code implementation3 Mar 2024 Zhen Zhang, Meihan Liu, Anhui Wang, Hongyang Chen, Zhao Li, Jiajun Bu, Bingsheng He

Unsupervised Graph Domain Adaptation (UGDA) has emerged as a practical solution to transfer knowledge from a label-rich source graph to a completely unlabelled target graph.

Contrastive Learning Domain Adaptation +1

BuffGraph: Enhancing Class-Imbalanced Node Classification via Buffer Nodes

no code implementations20 Feb 2024 Qian Wang, Zemin Liu, Zhen Zhang, Bingsheng He

Class imbalance in graph-structured data, where minor classes are significantly underrepresented, poses a critical challenge for Graph Neural Networks (GNNs).

Classification Node Classification

Exploiting Label Skews in Federated Learning with Model Concatenation

1 code implementation11 Dec 2023 Yiqun Diao, Qinbin Li, Bingsheng He

However, non-IID data has been a key challenge in FL, which could significantly degrade the accuracy of the final model.

Federated Learning Image Classification

Efficient Heterogeneous Graph Learning via Random Projection

1 code implementation23 Oct 2023 Jun Hu, Bryan Hooi, Bingsheng He

To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way.

Graph Learning Graph Neural Network +2

Effective and Efficient Federated Tree Learning on Hybrid Data

no code implementations18 Oct 2023 Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song

To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data.

Federated Learning

EX-Graph: A Pioneering Dataset Bridging Ethereum and X

1 code implementation2 Oct 2023 Qian Wang, Zhen Zhang, Zemin Liu, Shengliang Lu, Bingqiao Luo, Bingsheng He

While numerous public blockchain datasets are available, their utility is constrained by an exclusive focus on blockchain data.

Link Prediction

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs

no code implementations3 Sep 2023 Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu

The rapid growth of memory and computation requirements of large language models (LLMs) has outpaced the development of hardware, hindering people who lack large-scale high-end GPUs from training or deploying LLMs.

Scheduling

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

1 code implementation29 Aug 2023 Yiqun Diao, Yutong Yang, Qinbin Li, Bingsheng He, Mian Lu

Thus, a natural question is how those open environment challenges look like and how existing incremental learning algorithms perform on real-world relational data streams.

Incremental Learning Missing Values

A Survey of Imbalanced Learning on Graphs: Problems, Techniques, and Future Directions

1 code implementation26 Aug 2023 Zemin Liu, Yuan Li, Nan Chen, Qian Wang, Bryan Hooi, Bingsheng He

However, these methods often suffer from data imbalance, a common issue in graph data where certain segments possess abundant data while others are scarce, thereby leading to biased learning outcomes.

Graph Learning Link Prediction +1

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

1 code implementation5 Jul 2023 Zhaomin Wu, Junyi Hou, Bingsheng He

However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions.

Diversity Feature Correlation +2

Towards Open Federated Learning Platforms: Survey and Vision from Technical and Legal Perspectives

2 code implementations5 Jul 2023 Moming Duan, Qinbin Li, Linshan Jiang, Bingsheng He

To fully unleash the potential of FL, we advocate rethinking the design of current FL frameworks and extending it to a more generalized concept: Open Federated Learning Platforms, positioned as a crowdsourcing collaborative machine learning infrastructure for all Internet users.

Federated Learning Survey

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection

1 code implementation29 Mar 2023 Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, Ling Liu

As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized.

Fraud Detection

HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler for Neural Networks

no code implementations21 Nov 2022 Zining Zhang, Bingsheng He, Zhenjie Zhang

However, due to the gigantic search space and lack of intelligent search guidance, current auto-schedulers require hours to days of tuning time to find the best-performing tensor program for the entire neural network.

reinforcement-learning Reinforcement Learning (RL)

Practical Vertical Federated Learning with Unsupervised Representation Learning

1 code implementation13 Aug 2022 Zhaomin Wu, Qinbin Li, Bingsheng He

As societal concerns on data privacy recently increase, we have witnessed data silos among multiple parties in various applications.

Privacy Preserving Representation Learning +1

Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump

1 code implementation21 Apr 2022 Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, Zhao Li

With the proliferation of pump-and-dump schemes (P&Ds) in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance to alert potentially susceptible investors.

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs

no code implementations10 Jan 2022 Ruofan Liang, Bingsheng He, Shengen Yan, Peng Sun

Multi-tenant machine learning services have become emerging data-intensive workloads in data centers with heavy usage of GPU resources.

BIG-bench Machine Learning Scheduling

Adversarial Collaborative Learning on Non-IID Features

1 code implementation29 Sep 2021 Qinbin Li, Bingsheng He, Dawn Song

Federated learning has been a popular approach to enable collaborative learning on multiple parties without exchanging raw data.

Federated Learning

Model-Contrastive Federated Learning

6 code implementations CVPR 2021 Qinbin Li, Bingsheng He, Dawn Song

A key challenge in federated learning is to handle the heterogeneity of local data distribution across parties.

Contrastive Learning Federated Learning +2

Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload

no code implementations23 Mar 2021 Johan Kok Zhi Kang, Gaurav, Sien Yi Tan, Feng Cheng, Shixuan Sun, Bingsheng He

The use of deep learning models for forecasting the resource consumption patterns of SQL queries have recently been a popular area of study.

Federated Learning on Non-IID Data Silos: An Experimental Study

3 code implementations3 Feb 2021 Qinbin Li, Yiqun Diao, Quan Chen, Bingsheng He

We find that non-IID does bring significant challenges in learning accuracy of FL algorithms, and none of the existing state-of-the-art FL algorithms outperforms others in all cases.

BIG-bench Machine Learning Federated Learning

Practical One-Shot Federated Learning for Cross-Silo Setting

1 code implementation2 Oct 2020 Qinbin Li, Bingsheng He, Dawn Song

Federated learning enables multiple parties to collaboratively learn a model without exchanging their data.

Federated Learning Transfer Learning

Model-Agnostic Round-Optimal Federated Learning via Knowledge Transfer

no code implementations28 Sep 2020 Qinbin Li, Bingsheng He, Dawn Song

In this paper, we propose a novel federated learning algorithm FedKT that needs only a single communication round (i. e., round-optimal).

Federated Learning Transfer Learning

The OARF Benchmark Suite: Characterization and Implications for Federated Learning Systems

1 code implementation14 Jun 2020 Sixu Hu, Yuan Li, Xu Liu, Qinbin Li, Zhaomin Wu, Bingsheng He

This paper presents and characterizes an Open Application Repository for Federated Learning (OARF), a benchmark suite for federated machine learning systems.

Federated Learning

Privacy-Preserving Gradient Boosting Decision Trees

2 code implementations11 Nov 2019 Qinbin Li, Zhaomin Wu, Zeyi Wen, Bingsheng He

Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds.

Privacy Preserving

Practical Federated Gradient Boosting Decision Trees

3 code implementations11 Nov 2019 Qinbin Li, Zeyi Wen, Bingsheng He

There have been several recent studies on how to train GBDTs in the federated learning setting.

Federated Learning

Adaptive Kernel Value Caching for SVM Training

no code implementations8 Nov 2019 Qinbin Li, Zeyi Wen, Bingsheng He

Our experimental results show that EFU often has 20\% higher hit ratio than LRU in the training with the Gaussian kernel.

Classification General Classification +3

A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection

1 code implementation23 Jul 2019 Qinbin Li, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, Bingsheng He

By systematically summarizing the existing federated learning systems, we present the design factors, case studies, and future research opportunities.

BIG-bench Machine Learning Federated Learning +1

Accelerating Generative Neural Networks on Unmodified Deep Learning Processors -- A Software Approach

2 code implementations3 Jul 2019 Dawen Xu, Ying Wang, Kaijie Tu, Cheng Liu, Bingsheng He, Lei Zhang

Generative neural network is a new category of neural networks and it has been widely utilized in applications such as content generation, unsupervised learning, segmentation and pose estimation.

Deep Learning Pose Estimation

A Survey on Graph Processing Accelerators: Challenges and Opportunities

no code implementations26 Feb 2019 Chuangyi Gui, Long Zheng, Bingsheng He, Cheng Liu, Xinyu Chen, Xiaofei Liao, Hai Jin

Graph is a well known data structure to represent the associated relationships in a variety of applications, e. g., data science and machine learning.

Distributed, Parallel, and Cluster Computing

Efficient Memory Management for GPU-based Deep Learning Systems

no code implementations19 Feb 2019 Junzhe Zhang, Sai Ho Yeung, Yao Shu, Bingsheng He, Wei Wang

They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the lifetime and read/write order of all variables.

Deep Learning Management +1

Cannot find the paper you are looking for? You can Submit a new open access paper.