To address this issue, we connect this structured output learning problem with the structured modeling framework in sequence transduction field.
Deep neural networks have gained great success due to the increasing amounts of data, and diverse effective neural network designs.
To alleviate two disadvantages of two categories of methods, we propose to unify the static compression and dynamic compression techniques jointly to obtain an input-adaptive compressed model, which can further better balance the total compression ratios and the model performances.
1 code implementation • 9 Sep 2023 • Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong Mu
Specifically, we introduce a well-designed visual tokenizer to translate the non-linguistic image into a sequence of discrete tokens like a foreign language that LLM can read.
Increasing the size of embedding layers has shown to be effective in improving the performance of recommendation models, yet gradually causing their sizes to exceed terabytes in industrial recommender systems, and hence the increase of computing and storage costs.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem to better balance the model performance and the computation efficiency.
Our method only requires negligible computation cost for optimizing the sampling distributions of path and data, but achieves lower gradient variance during supernet training and better generalization performance for the supernet, resulting in a more consistent NAS.
Unmanned Aerial Vehicles (UAVs) based video text spotting has been extensively used in civil and military domains.
When training is finished, some gates are exact zero, while others are around one, which is particularly favored by the practical hot-start training in the industry, due to no damage to the model performance before and after removing the features corresponding to exact-zero gates.
Predictor-based Neural Architecture Search (NAS) continues to be an important topic because it aims to mitigate the time-consuming search procedure of traditional NAS methods.
Ranked #21 on Neural Architecture Search on CIFAR-10
During the training process, the polarization effect will drive a subset of gates to smoothly decrease to exact zero, while other gates gradually stay away from zero by a large margin.
To further improve the performance of these tasks, we propose a novel Hand Image Understanding (HIU) framework to extract comprehensive information of the hand object from a single RGB image, by jointly considering the relationships between these tasks.
The recommendation system (RS) plays an important role in the content recommendation and retrieval scenarios.
The goal of this work is to develop a novel learning framework for accurate and expressive fashion captioning.
Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity.
Color compatibility is important for evaluating the compatibility of a fashion outfit, yet it was neglected in previous studies.
Neural Architecture Search (NAS) has shown great potentials in automatically designing scalable network architectures for dense image predictions.
Ranked #13 on Semantic Segmentation on Cityscapes test (using extra training data)