Moreover, we propose an Uncertain Information Aggregation module to leverage the global structure for integrating the local information.
This paper introduces CoLLiE, an efficient library that facilitates collaborative training of large language models using 3D parallelism, parameter-efficient fine-tuning (PEFT) methods, and optimizers such as Lion, Adan, Sophia, LOMO and AdaLomo.
To overcome the aforementioned challenges, we propose an Unified Medical Image Pre-training framework, namely UniMedI, which utilizes diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images (especially for 2D and 3D images).
As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks.
Inspired by these findings, we propose LongLLMLingua for prompt compression towards improving LLMs' perception of the key information to simultaneously address the three challenges.
Large language models (LLMs) have been applied in various applications due to their astonishing capabilities.
Our approach is extensively evaluated using iPhone 12 and laptop implementations, and we demonstrate the utility of game states in the game video recovery and the effectiveness of our overall design.
Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length.
Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training.
Pronunciation assessment is a major challenge in the computer-aided pronunciation training system, especially at the word (phoneme)-level.
In this paper, we propose a novel compression strategy that leverages structured pruning and knowledge distillation to reduce the model size and inference cost of the Conformer model while preserving high recognition performance.
Motivated by the fact that all event structures can be inferred from AMR, this work reformulates EAE as a link prediction problem on AMR graphs.
Comprehensive experiments demonstrate EfficientViT outperforms existing efficient models, striking a good trade-off between speed and accuracy.
Also, it achieves 18. 73% bitrate saving and generates perceptually pleasant reconstructions compared to the state-of-the-art end-to-end image compression methods, which benefits the applications of focused plenoptic cameras greatly.
In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches.
However, prior supernet training methods that rely on uniform sampling suffer from the gradient conflict issue: the sampled subnets can have vastly different model sizes (e. g., 50M vs. 2G FLOPs), leading to different optimization directions and inferior performance.
The combination of Neural Architecture Search (NAS) and quantization has proven successful in automatically designing low-FLOPs INT8 quantized neural networks (QNN).
To facilitate the research on this problem, a new benchmark dataset named LDV-WebRTC is constructed based on a real-world online streaming system.
Correspondingly, we propose a Dual Stream deep model for Stereotypical Behaviours Detection, DS-SBD, based on the temporal trajectory of human poses and the repetition patterns of human actions.
Ensemble methods can deliver surprising performance gains but also bring significantly higher computational costs, e. g., can be up to 2048X in large-scale ensemble tasks.
Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning.
To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description.
In this paper, we investigate the root cause of the underwhelming performance of the existing generative DocRE models and discover that the culprit is the inadequacy of the training paradigm, instead of the capacities of the models.
A good state representation is crucial to solving complicated reinforcement learning (RL) challenges.
Specifically, we first pre-train an encoder-decoder framework in an automatic speech recognition (ASR) objective by using speech-to-text dataset, and then fine-tune ASR encoder on the cleft palate dataset for hypernasality estimation.
Then, our proposed CKBG method enhances this lightweight base model by bypassing the original network with ``kernel grafts'', which are extra convolutional kernels containing the prior knowledge of external pretrained image SR models.
Thanks to HyperFD, each local task (client) is able to effectively leverage the learning "experience" of previous tasks without uploading raw images to the platform; meanwhile, the meta-feature extractor is continuously learned to better trade off the bias and variance.
To solve the game, we propose a platform negotiation method that simulates the bargaining among platforms and locally optimizes their policies via gradient descent.
In this paper, we propose a novel generative framework for RTS data - RTSGAN to tackle the aforementioned challenges.
A good state representation is crucial to reinforcement learning (RL) while an ideal representation is hard to learn only with signals from the RL objective.
How to make intelligent decisions is a central problem in machine learning and cognitive science.
The energy consumption of deep learning models is increasing at a breathtaking rate, which raises concerns due to potential negative effects on carbon neutrality in the context of global warming and climate change.
One of the key challenges in Neural Architecture Search (NAS) is to efficiently rank the performances of architectures.
To solve this problem, a sampling batch normalization embedded deep neural network (SBNEDNN) method is developed in this paper.
Unlike previous approaches that apply search algorithms on a small, human-designed search space without considering hardware diversity, we propose HURRICANE that explores the automatic hardware-aware search over a much larger search space and a two-stage search algorithm, to efficiently generate tailored models for different types of hardware.