5 code implementations • CVPR 2023 • Ali Hassani, Steven Walton, Jiachen Li, Shen Li, Humphrey Shi
We present Neighborhood Attention (NA), the first efficient and scalable sliding-window attention mechanism for vision.
Ranked #119 on Semantic Segmentation on ADE20K
3 code implementations • 28 Jun 2020 • Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, Soumith Chintala
This paper presents the design, implementation, and evaluation of the PyTorch distributed data parallel module.
2 code implementations • ACL 2018 • Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, Xiaoyong Du
Analogical reasoning is effective in capturing linguistic regularities.
2 code implementations • 23 May 2022 • Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai
With the dramatically increased number of parameters in language models, sparsity methods have received ever-increasing research focus to compress and accelerate the models.
1 code implementation • CVPR 2021 • Shen Li, Jianqing Xu, Xiaqing Xu, Pengcheng Shen, Shaoxin Li, Bryan Hooi
Probabilistic Face Embeddings (PFE) is the first attempt to address this dilemma.
1 code implementation • 5 Feb 2021 • Chaoyang He, Shen Li, Mahdi Soltanolkotabi, Salman Avestimehr
PipeTransformer automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training, and instead allocates resources for training of the remaining active layers.
1 code implementation • 4 Jun 2021 • Shaokun Zhang, Xiawu Zheng, Chenyi Yang, Yuchao Li, Yan Wang, Fei Chao, Mengdi Wang, Shen Li, Jun Yang, Rongrong Ji
Motivated by the necessity of efficient inference across various constraints on BERT, we propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
1 code implementation • 31 May 2021 • Mingbao Lin, Yuxin Zhang, Yuchao Li, Bohong Chen, Fei Chao, Mengdi Wang, Shen Li, Yonghong Tian, Rongrong Ji
We also provide a workflow of filter rearrangement that first rearranges the weight matrix in the output channel dimension to derive more influential blocks for accuracy improvements and then applies similar rearrangement to the next-layer weights in the input channel dimension to ensure correct convolutional operations.
1 code implementation • 28 Sep 2023 • Jiaying Wu, Shen Li, Ailin Deng, Miao Xiong, Bryan Hooi
Despite considerable advances in automated fake news detection, due to the timely nature of news, it remains a critical open question how to effectively predict the veracity of news articles based on limited fact-checks.
2 code implementations • ICLR 2020 • Shen Li, Bryan Hooi, Gim Hee Lee
Yet, most deep generative models do not address the question of identifiability, and thus fail to deliver on the promise of the recovery of the true latent sources that generate the observations.
1 code implementation • 20 Apr 2018 • Shen Li, Christian Häger, Nil Garcia, Henk Wymeersch
Machine learning is used to compute achievable information rates (AIRs) for a simplified fiber channel.
1 code implementation • NeurIPS 2023 • Miao Xiong, Ailin Deng, Pang Wei Koh, Jiaying Wu, Shen Li, Jianqing Xu, Bryan Hooi
We examine the problem over 504 pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are relatively more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples.
1 code implementation • 4 Mar 2024 • Buyun Zhang, Liang Luo, Yuxin Chen, Jade Nie, Xi Liu, Daifeng Guo, Yanli Zhao, Shen Li, Yuchen Hao, Yantao Yao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Maxim Naumov, Wenlin Chen
Scaling laws play an instrumental role in the sustainable improvement in model quality.
1 code implementation • 29 Nov 2022 • Miao Xiong, Shen Li, Wenjie Feng, Ailin Deng, Jihai Zhang, Bryan Hooi
How do we know when the predictions made by a classifier can be trusted?
1 code implementation • ACL 2019 • Kun Liu, Shen Li, Daqi Zheng, Zhengdong Lu, Sheng Gao, Si Li
To solve this problem, we propose a prism module to disentangle the semantic aspects of words and reduce noise at the input layer of a model.
Ranked #52 on Named Entity Recognition (NER) on CoNLL 2003 (English)
1 code implementation • 6 Feb 2023 • Ailin Deng, Shen Li, Miao Xiong, Zhirui Chen, Bryan Hooi
Trustworthy machine learning is of primary importance to the practical deployment of deep learning models.
1 code implementation • 6 Aug 2019 • Shen Li, Chenhao Su, Renfen Hu, Zhengdong Lu
Dropout is known as an effective way to reduce overfitting via preventing co-adaptations of units.
1 code implementation • CONLL 2018 • Hengru Xu, Shen Li, Renfen Hu, Si Li, Sheng Gao
Dropout is used to avoid overfitting by randomly dropping units from the neural networks during training.
no code implementations • 9 Sep 2017 • Shuochao Yao, Yiran Zhao, Huajie Shao, Aston Zhang, Chao Zhang, Shen Li, Tarek Abdelzaher
Recent advances in deep learning have led various applications to unprecedented achievements, which could potentially bring higher intelligence to a broad spectrum of mobile and ubiquitous applications.
no code implementations • 30 Aug 2018 • Shen Li, Hengru Xu, Zhengdong Lu
As neural networks have dominated the state-of-the-art results in a wide range of NLP tasks, it attracts considerable attention to improve the performance of neural models by integrating symbolic knowledge.
no code implementations • EMNLP 2017 • Zhe Zhao, Tao Liu, Shen Li, Bofang Li, Xiaoyong Du
The existing word representation methods mostly limit their information source to word co-occurrence statistics.
no code implementations • EMNLP 2017 • Shen Li, Zhe Zhao, Tao Liu, Renfen Hu, Xiaoyong Du
Convolutional Neural Networks (CNNs) are widely used in NLP tasks.
no code implementations • NeurIPS 2018 • Ankit Shah, Pritish Kamath, Julie A. Shah, Shen Li
When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.
no code implementations • ACL 2019 • Renfen Hu, Shen Li, Shichen Liang
Diachronic word embeddings have been widely used in detecting temporal changes.
no code implementations • 6 Apr 2020 • Shen Li, Renfen Hu, Jinshan Wu
Word meaning has different aspects, while the existing word representation "compresses" these aspects into a single vector, and it needs further analysis to recover the information in different dimensions.
no code implementations • 1 Jan 2021 • Shen Li, Jianqing Xu, Xiaqing Xu, Pengcheng Shen, Shaoxin Li, Bryan Hooi
To address these issues, in this paper, we propose a novel framework for face uncertainty learning in hyperspherical space.
no code implementations • 18 Apr 2021 • Shen Li, Bingpeng Ma, Hong Chang, Shiguang Shan, Xilin Chen
This paper proposes a novel model, named Continuity-Discrimination Convolutional Neural Network (CD-CNN), for visual object tracking.
no code implementations • 21 Apr 2021 • Yuqiong Qi, Yang Hu, Haibin Wu, Shen Li, Haiyu Mao, Xiaochun Ye, Dongrui Fan, Ninghui Sun
In this work, we aim to extensively explore the above system design challenges and these challenges motivate us to propose a comprehensive framework that synergistically handles the heterogeneous hardware accelerator design principles, system design criteria, and task scheduling mechanism.
no code implementations • 6 Jul 2021 • Ankit Shah, Pritish Kamath, Shen Li, Patrick Craven, Kevin Landers, Kevin Oden, Julie Shah
When observing task demonstrations, human apprentices are able to identify whether a given task is executed correctly long before they gain expertise in actually performing that task.
no code implementations • 18 Oct 2021 • Shen Li, Theodoros Stouraitis, Michael Gienger, Sethu Vijayakumar, Julie A. Shah
Consistent state estimation is challenging, especially under the epistemic uncertainties arising from learned (nonlinear) dynamic and observation models.
no code implementations • 20 Oct 2021 • Ran Cheng, Chao Chen, Longfei Xu, Shen Li, Lei Wang, Hengbin Cui, Kaikui Liu, Xiaolong Li
For user representation, we utilize a series of historical navigation to extract user preference.
no code implementations • 31 Oct 2021 • Yang Sun, Fajie Yuan, Min Yang, Alexandros Karatzoglou, Shen Li, Xiaoyan Zhao
In this paper, we plan to exploit such redundancy phenomena to improve the performance of RS.
no code implementations • 2 Dec 2021 • Shen Li, Jianqing Xu, Bryan Hooi
This paper proposes a probabilistic contrastive loss function for self-supervised learning.
no code implementations • 6 Dec 2021 • Wenjie Chu, Shen Li, Chao Chen, Longfei Xu, Hengbin Cui, Kaikui Liu
Most of the existing methods for debaising in click-through rate (CTR) prediction depend on an oversimplified assumption, i. e., the click probability is the product of observation probability and relevance probability.
no code implementations • 9 Jun 2022 • Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah
In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration.
no code implementations • 23 Aug 2022 • Shen Li, Bryan Hooi
Without exploiting any label information, the principal components recovered store the most informative elements in their \emph{leading} dimensions and leave the negligible in the \emph{trailing} ones, allowing for clear performance improvements of $5\%$-$10\%$ in downstream tasks.
no code implementations • 19 Oct 2022 • Mitchell Wortsman, Suchin Gururangan, Shen Li, Ali Farhadi, Ludwig Schmidt, Michael Rabbat, Ari S. Morcos
When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step.
no code implementations • 10 Nov 2022 • Jiawei Zhang, Shen Li, Li Li
Connected and automated vehicles (CAVs) are viewed as a special kind of robots that have the potential to significantly improve the safety and efficiency of traffic.
no code implementations • 21 Apr 2023 • Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li
It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains.
no code implementations • 2 May 2023 • Xuemin Hu, Shen Li, Tingyu Huang, Bo Tang, Rouxing Huai, Long Chen
In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue.
no code implementations • CVPR 2023 • Jianqing Xu, Shen Li, Ailin Deng, Miao Xiong, Jiaying Wu, Jiaxiang Wu, Shouhong Ding, Bryan Hooi
Mean ensemble (i. e. averaging predictions from multiple models) is a commonly-used technique in machine learning that improves the performance of each individual model.
no code implementations • 16 Aug 2023 • Qinghui Nie, Jishun Ou, Haiyang Zhang, Jiawei Lu, Shen Li, Haotian Shi
An efficient urban bus control system has the potential to significantly reduce travel delays and streamline the allocation of transportation resources, thereby offering enhanced and user-friendly transit services to passengers.
no code implementations • 17 Aug 2023 • Bin Chen, Zhiwei Liang, Shen Li, Yi Lei, Gabriele Liga, Alex Alvarado
Multidimensional constellation shaping of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations.
no code implementations • 11 Jan 2024 • Bo Chen, Xingyi Cheng, Pan Li, Yangli-ao Geng, Jing Gong, Shen Li, Zhilei Bei, Xu Tan, Boyan Wang, Xin Zeng, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song
We propose a unified protein language model, xTrimoPGLM, to address these two types of tasks simultaneously through an innovative pre-training framework.
no code implementations • 22 Feb 2024 • Shen Li, Liuyi Yao, Jinyang Gao, Lan Zhang, Yaliang Li
To support various applications, business owners often seek the customized models that are obtained by fine-tuning a pre-trained LLM through the API provided by LLM owners or cloud servers.
no code implementations • 1 Mar 2024 • Liang Luo, Buyun Zhang, Michael Tsang, Yinbin Ma, Ching-Hsiang Chu, Yuxin Chen, Shen Li, Yuchen Hao, Yanli Zhao, Guna Lakshminarayanan, Ellie Dingqiao Wen, Jongsoo Park, Dheevatsa Mudigere, Maxim Naumov
We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology.