1 code implementation • 20 Jun 2024 • Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li
By employing an large LLM for inference and a small LLM for output, we achieve an average 37% reduction in response latency, alongside a 4. 30% improvement in accuracy on the MMLU-Pro dataset compared with the baseline.
no code implementations • 5 Mar 2024 • Fu Chen, Qinglin Zhao, Li Feng, Chuangtao Chen, Yangbin Lin, Jianhong Lin
This paper introduces a novel Quantum Mixed-State Self-Attention Network (QMSAN) for natural language processing tasks.
no code implementations • 13 Jan 2024 • Chuangtao Chen, Qinglin Zhao, Mengchu Zhou, Zhimin He, Zhili Sun, Haozhen Situ
We introduce partial trace operations to enforce non-unitary and reduce the number of trainable parameters by using a parameter-sharing strategy and incorporating temporal information as an input in the backward process.
no code implementations • 10 Jun 2023 • Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann, Bing Li
Deep neural networks (DNNs) have been widely deployed across diverse domains such as computer vision and natural language processing.