Search Results for author: Shen Li

Found 47 papers, 18 papers with code

Wukong: Towards a Scaling Law for Large-Scale Recommendation

1 code implementation • 4 Mar 2024 • Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen

Scaling laws play an instrumental role in the sustainable improvement in model quality.

Language Modelling Large Language Model

Paper
Code

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

no code implementations • 1 Mar 2024 • Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov

We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology.

Paper
Add Code

Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning

no code implementations • 22 Feb 2024 • Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li

To support various applications, business owners often seek the customized models that are obtained by fine-tuning a pre-trained LLM through the API provided by LLM owners or cloud servers.

Paper
Add Code

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein

no code implementations • 11 Jan 2024 • Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song

We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework.

Protein Language Model

Paper
Add Code

Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News Detection

1 code implementation • 28 Sep 2023 • Jiaying Wu, Shen Li, Ailin Deng, Miao Xiong, Bryan Hooi

Despite considerable advances in automated fake news detection, due to the timely nature of news, it remains a critical open question how to effectively predict the veracity of news articles based on limited fact-checks.

Fake News Detection

Paper
Code

On the Performance of Multidimensional Constellation Shaping for Linear and Nonlinear Optical Fiber Channel

no code implementations • 17 Aug 2023 • Bin Chen, Zhiwei Liang, Shen Li, Yi Lei, Gabriele Liga, Alex Alvarado

Multidimensional constellation shaping of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations.

Paper
Add Code

A Robust Integrated Multi-Strategy Bus Control System via Deep Reinforcement Learning

no code implementations • 16 Aug 2023 • Qinghui Nie, Jishun Ou, Haiyang Zhang, Jiawei Lu, Shen Li, Haotian Shi

An efficient urban bus control system has the potential to significantly reduce travel delays and streamline the allocation of transportation resources, thereby offering enhanced and user-friendly transit services to passengers.

Paper
Add Code

Proximity-Informed Calibration for Deep Neural Networks

1 code implementation • NeurIPS 2023 • Miao Xiong, Ailin Deng, Pang Wei Koh, Jiaying Wu, Shen Li, Jianqing Xu, Bryan Hooi

We examine the problem over 504 pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are relatively more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples.

Paper
Code

How Simulation Helps Autonomous Driving:A Survey of Sim2real, Digital Twins, and Parallel Intelligence

no code implementations • 2 May 2023 • Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, Long Chen

In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue.

Autonomous Driving

Paper
Add Code

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

no code implementations • 21 Apr 2023 • Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains.

Paper
Add Code

Trust, but Verify: Using Self-Supervised Probing to Improve Trustworthiness

1 code implementation • 6 Feb 2023 • Ailin Deng, Shen Li, Miao Xiong, Zhirui Chen, Bryan Hooi

Trustworthy machine learning is of primary importance to the practical deployment of deep learning models.

Out-of-Distribution Detection

Paper
Code

Probabilistic Knowledge Distillation of Face Ensembles

no code implementations • CVPR 2023 • Jianqing Xu, Shen Li, Ailin Deng, Miao Xiong, Jiaying Wu, Jiaxiang Wu, Shouhong Ding, Bryan Hooi

Mean ensemble (i. e. averaging predictions from multiple models) is a commonly-used technique in machine learning that improves the performance of each individual model.

Face Image Quality Face Recognition +2

Paper
Add Code

Birds of a Feather Trust Together: Knowing When to Trust a Classifier via Adaptive Neighborhood Aggregation

1 code implementation • 29 Nov 2022 • Miao Xiong, Shen Li, Wenjie Feng, Ailin Deng, Jihai Zhang, Bryan Hooi

How do we know when the predictions made by a classifier can be trusted?

Autonomous Driving

Paper
Code

Coordinating CAV Swarms at Intersections with a Deep Learning Model

no code implementations • 10 Nov 2022 • Jiawei Zhang, Shen Li, Li Li

Connected and automated vehicles (CAVs) are viewed as a special kind of robots that have the potential to significantly improve the safety and efficiency of traffic.

Scheduling

Paper
Add Code

lo-fi: distributed fine-tuning without communication

no code implementations • 19 Oct 2022 • Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos

When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.

Paper
Add Code

Neural PCA for Flow-Based Representation Learning

no code implementations • 23 Aug 2022 • Shen Li, Bryan Hooi

Without exploiting any label information, the principal components recovered store the most informative elements in their \emph{leading} dimensions and leave the negligible in the \emph{trailing} ones, allowing for clear performance improvements of $5\%$-$10\%$ in downstream tasks.

Density Estimation Inductive Bias +1

Paper
Add Code

Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

no code implementations • 9 Jun 2022 • Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah

In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration.

Imitation Learning

Paper
Add Code

Parameter-Efficient Sparsity for Large Language Models Fine-Tuning

2 code implementations • 23 May 2022 • Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai

With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.

1,953

Paper
Code

Neighborhood Attention Transformer

5 code implementations • CVPR 2023 • Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi

We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision.

Ranked #119 on Semantic Segmentation on ADE20K

Image Classification Object Detection +1

126,503

Paper
Code

A General Framework for Debiasing in CTR Prediction

no code implementations • 6 Dec 2021 • Wenjie Chu, Shen Li, Chao Chen, Longfei Xu, Hengbin Cui, Kaikui Liu

Most of the existing methods for debaising in click-through rate (CTR) prediction depend on an oversimplified assumption, i. e., the click probability is the product of observation probability and relevance probability.

Click-Through Rate Prediction

Paper
Add Code

Probabilistic Contrastive Loss for Self-Supervised Learning

no code implementations • 2 Dec 2021 • Shen Li, Jianqing Xu, Bryan Hooi

This paper proposes a probabilistic contrastive loss function for self-supervised learning.

Contrastive Learning Self-Supervised Learning

Paper
Add Code

Enhancing Top-N Item Recommendations by Peer Collaboration

no code implementations • 31 Oct 2021 • Yang Sun, Fajie Yuan, Min Yang, Alexandros Karatzoglou, Shen Li, Xiaoyan Zhao

In this paper, we plan to exploit such redundancy phenomena to improve the performance of RS.

Recommendation Systems

Paper
Add Code

R4: A Framework for Route Representation and Route Recommendation

no code implementations • 20 Oct 2021 • Ran Cheng, Chao Chen, Longfei Xu, Shen Li, Lei Wang, Hengbin Cui, Kaikui Liu, Xiaolong Li

For user representation, we utilize a series of historical navigation to extract user preference.

Attribute

Paper
Add Code

Set-based State Estimation with Probabilistic Consistency Guarantee under Epistemic Uncertainty

no code implementations • 18 Oct 2021 • Shen Li, Theodoros Stouraitis, Michael Gienger, Sethu Vijayakumar, Julie A. Shah

Consistent state estimation is challenging, especially under the epistemic uncertainties arising from learned (nonlinear) dynamic and observation models.

Paper
Add Code

Supervised Bayesian Specification Inference from Demonstrations

no code implementations • 6 Jul 2021 • Ankit Shah, Pritish Kamath, Shen Li, Patrick Craven, Kevin Landers, Kevin Oden, Julie Shah

When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.

Probabilistic Programming

Paper
Add Code

Spherical Confidence Learning for Face Recognition

1 code implementation • CVPR 2021 • Shen Li, Jianqing Xu, Xiaqing Xu, Pengcheng Shen, Shaoxin Li, Bryan Hooi

Probabilistic Face Embeddings (PFE) is the first attempt to address this dilemma.

Face Recognition

Paper
Code

You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient

1 code implementation • 4 Jun 2021 • Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji

Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.

AutoML Model Compression

Paper
Code

1xN Pattern for Pruning Convolutional Neural Networks

1 code implementation • 31 May 2021 • Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji

We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.

Network Pruning

Paper
Code

Tackling Variabilities in Autonomous Driving

no code implementations • 21 Apr 2021 • Yuqiong Qi, Yang Hu, Haibin Wu, Shen Li, Haiyu Mao, Xiaochun Ye, Dongrui Fan, Ninghui Sun

In this work, we aim to extensively explore the above system design challenges and these challenges motivate us to propose a comprehensive framework that synergistically handles the heterogeneous hardware accelerator design principles, system design criteria, and task scheduling mechanism.

Autonomous Driving Reinforcement Learning (RL) +1

Paper
Add Code

Continuity-Discrimination Convolutional Neural Network for Visual Object Tracking

no code implementations • 18 Apr 2021 • Shen Li, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen

This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN), for visual object tracking.

Object Visual Object Tracking

Paper
Add Code

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

1 code implementation • 5 Feb 2021 • Chaoyang He, Shen Li, Mahdi Soltanolkotabi, Salman Avestimehr

PipeTransformer automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training, and instead allocates resources for training of the remaining active layers.

Paper
Code

Hypersphere Face Uncertainty Learning

no code implementations • 1 Jan 2021 • Shen Li, Jianqing Xu, Xiaqing Xu, Pengcheng Shen, Shaoxin Li, Bryan Hooi

To address these issues, in this paper, we propose a novel framework for face uncertainty learning in hyperspherical space.

Face Verification

Paper
Add Code

PyTorch Distributed: Experiences on Accelerating Data Parallel Training

3 code implementations • 28 Jun 2020 • Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala

This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module.

78,717

Paper
Code

Quantum Inspired Word Representation and Computation

no code implementations • 6 Apr 2020 • Shen Li, Renfen Hu, Jinshan Wu

Word meaning has different aspects, while the existing word representation "compresses" these aspects into a single vector, and it needs further analysis to recover the information in different dimensions.

Paper
Add Code

Identifying through Flows for Recovering Latent Representations

2 code implementations • ICLR 2020 • Shen Li, Bryan Hooi, Gim Hee Lee

Yet, most deep generative models do not address the question of identifiability, and thus fail to deliver on the promise of the recovery of the true latent sources that generate the observations.

Representation Learning

Paper
Code

Self-Balanced Dropout

1 code implementation • 6 Aug 2019 • Shen Li, Chenhao Su, Renfen Hu, Zhengdong Lu

Dropout is known as an effective way to reduce overfitting via preventing co-adaptations of units.

Paper
Code

A Prism Module for Semantic Disentanglement in Name Entity Recognition

1 code implementation • ACL 2019 • Kun Liu, Shen Li, Daqi Zheng, Zhengdong Lu, Sheng Gao, Si Li

To solve this problem, we propose a prism module to disentangle the semantic aspects of words and reduce noise at the input layer of a model.

Ranked #52 on Named Entity Recognition (NER) on CoNLL 2003 (English)

Disentanglement named-entity-recognition +2

Paper
Code

Diachronic Sense Modeling with Deep Contextualized Word Embeddings: An Ecological View

no code implementations • ACL 2019 • Renfen Hu, Shen Li, Shichen Liang

Diachronic word embeddings have been widely used in detecting temporal changes.

Change Detection Diachronic Word Embeddings +1

Paper
Add Code

Bayesian Inference of Temporal Task Specifications from Demonstrations

no code implementations • NeurIPS 2018 • Ankit Shah, Pritish Kamath, Julie A. Shah, Shen Li

When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.

Probabilistic Programming

Paper
Add Code

Generalize Symbolic Knowledge With Neural Rule Engine

no code implementations • 30 Aug 2018 • Shen Li, Hengru Xu, Zhengdong Lu

As neural networks have dominated the state-of-the-art results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge.

Paper
Add Code

From Random to Supervised: A Novel Dropout Mechanism Integrated with Global Information

1 code implementation • CONLL 2018 • Hengru Xu, Shen Li, Renfen Hu, Si Li, Sheng Gao

Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training.

General Classification Sentiment Analysis +3

Paper
Code

Analogical Reasoning on Chinese Morphological and Semantic Relations

2 code implementations • ACL 2018 • Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du

Analogical reasoning is effective in capturing linguistic regularities.

Word Embeddings

11,631

Paper
Code

Achievable Information Rates for Nonlinear Fiber Communication via End-to-end Autoencoder Learning

1 code implementation • 20 Apr 2018 • Shen Li, Christian Häger, Nil Garcia, Henk Wymeersch

Machine learning is used to compute achievable information rates (AIRs) for a simplified fiber channel.

BIG-bench Machine Learning

Paper
Code

RDeepSense: Reliable Deep Mobile Computing Models with Uncertainty Estimations

no code implementations • 9 Sep 2017 • Shuochao Yao, Yiran Zhao, Huajie Shao, Aston Zhang, Chao Zhang, Shen Li, Tarek Abdelzaher

Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications.