no code implementations • Findings (NAACL) 2022 • Chengyue Gong, Xiaocong Du, Dhruv Choudhary, Bhargav Bhushanam, Qiang Liu, Arun Kejariwal
On the definition side, we reduce the bias in transfer loss by focusing on the items to which information from high-frequency items can be efficiently transferred.
no code implementations • 2 Sep 2022 • Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du, Bhargav Bhushanam, Aryan Mokhtari, Arun Kejariwal, Qiang Liu
One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error.
no code implementations • 4 May 2021 • Xiaocong Du, Bhargav Bhushanam, Jiecao Yu, Dhruv Choudhary, Tianxiang Gao, Sherman Wong, Louis Feng, Jongsoo Park, Yu Cao, Arun Kejariwal
Our method leverages structured sparsification to reduce computational cost without hurting the model capacity at the end of offline training so that a full-size model is available in the recurring training stage to learn new data in real-time.
no code implementations • 11 Nov 2019 • Gokul Krishnan, Xiaocong Du, Yu Cao
Inspired by the observation that brain networks follow the Small-World model, we propose a novel structural pruning scheme, which includes (1) hierarchically trimming the network into a Small-World model before training, (2) training the network for a given dataset, and (3) optimizing the network for accuracy.
no code implementations • 28 May 2019 • Xiaocong Du, Gokul Krishnan, Abinash Mohanty, Zheng Li, Gouranga Charan, Yu Cao
Machine learning algorithms have made significant advances in many applications.
no code implementations • 28 May 2019 • Xiaocong Du, Gouranga Charan, Frank Liu, Yu Cao
Such a system requires learning from the data stream, training the model to preserve previous information and adapt to a new task, and generating a single-headed vector for future inference.
no code implementations • 27 May 2019 • Xiaocong Du, Zheng Li, Yufei Ma, Yu Cao
A typical training pipeline to mitigate over-parameterization is to pre-define a DNN structure first with redundant learning units (filters and neurons) under the goal of high accuracy, then to prune redundant learning units after training with the purpose of efficient inference.
no code implementations • 27 May 2019 • Xiaocong Du, Zheng Li, Yu Cao
Today a canonical approach to reduce the computation cost of Deep Neural Networks (DNNs) is to pre-define an over-parameterized model before training to guarantee the learning capacity, and then prune unimportant learning units (filters and neurons) during training to improve model compactness.