no code implementations • SIGDIAL (ACL) 2021 • Ye Tian, Tim Nieradzik, Sepehr Jalali, Da-Shan Shiu
Analysis on sentence embeddings of disfluent and fluent sentence pairs reveals that the deeper the layer, the more similar their representation (exp2).
no code implementations • EACL (AdaptNLP) 2021 • Jezabel Garcia, Federica Freddi, Jamie McGowan, Tim Nieradzik, Feng-Ting Liao, Ye Tian, Da-Shan Shiu, Alberto Bernacchia
In meta-learning, the knowledge learned from previous tasks is transferred to new ones, but this transfer only works if tasks are related.
Cross-Lingual Natural Language Inference
Cross-Lingual Transfer
+1
no code implementations • 29 May 2025 • Aya Kayal, Sattar Vakili, Laura Toni, Da-Shan Shiu, Alberto Bernacchia
Existing work, which adopts the Bradley-Terry-Luce (BTL) feedback model, provides regret bounds for the performance of several algorithms.
no code implementations • 20 May 2025 • Davide Buffelli, Sowmen Das, Yu-Wei Lin, Sattar Vakili, Chien-Yi Wang, Masoud Attarifar, Pritthijit Nath, Da-Shan Shiu
Artificial Intelligence (AI) has demonstrated unprecedented performance across various domains, and its application to communication systems is an active area of research.
1 code implementation • 20 May 2025 • Yen-chen Wu, Feng-Ting Liao, Meng-Hsi Chen, Pei-Chen Ho, Farhang Nabiei, Da-Shan Shiu
Transformers, the standard implementation for large language models (LLMs), typically consist of tens to hundreds of discrete layers.
no code implementations • 16 May 2025 • Chan-Jan Hsu, Davide Buffelli, Jamie McGowan, Feng-Ting Liao, Yi-Chang Chen, Sattar Vakili, Da-Shan Shiu
Recent advances in large language models (LLMs) have demonstrated the power of reasoning through self-generated chains of thought.
2 code implementations • 9 Apr 2025 • Liang-Hsuan Tseng, Yi-Chang Chen, Kuan-Yi Lee, Da-Shan Shiu, Hung-Yi Lee
To our knowledge, TASTE is the first end-to-end approach that utilizes a reconstruction objective to automatically learn a text-aligned speech tokenization and embedding suitable for spoken language modeling.
no code implementations • 29 Jan 2025 • Chan-Jan Hsu, Yi-Cheng Lin, Chia-Chun Lin, Wei-Chih Chen, Ho Lam Chung, Chen-An Li, Yi-Chang Chen, Chien-Yu Yu, Ming-Ji Lee, Chien-Cheng Chen, Ru-Heng Huang, Hung-Yi Lee, Da-Shan Shiu
We present BreezyVoice, a Text-to-Speech (TTS) system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities to address the unique challenges of polyphone disambiguation in the language.
1 code implementation • 23 Jan 2025 • MediaTek Research, :, Chan-Jan Hsu, Chia-Sheng Liu, Meng-Hsi Chen, Muxi Chen, Po-chun Hsu, Yi-Chang Chen, Da-Shan Shiu
In addition to language modeling capabilities, we significantly augment the models with function calling and vision understanding capabilities.
no code implementations • 2 Dec 2024 • Yi-Chang Chen, Po-chun Hsu, Chan-Jan Hsu, Da-Shan Shiu
This research delves into enhancing the function-calling capabilities of LLMs by exploring different approaches, including prompt formats for integrating function descriptions, blending function-calling and instruction-following data, introducing a novel Decision Token for conditional prompts, leveraging chain-of-thought reasoning, and overcoming multilingual challenges with a translation pipeline.
no code implementations • 25 Nov 2024 • Cheng-Wei Lin, Wan-Hsuan Hsieh, Kai-Xin Guan, Chan-Jan Hsu, Chia-Chen Kuo, Chuan-Lin Lai, Chung-Wei Chung, Ming-Jen Wang, Da-Shan Shiu
The quality and size of a pretraining dataset significantly influence the performance of large language models (LLMs).
1 code implementation • 12 Nov 2024 • Davide Buffelli, Jamie McGowan, Wangkun Xu, Alexandru Cioba, Da-Shan Shiu, Guillaume Hennequin, Alberto Bernacchia
We exploit this novel setting to study the training and generalization properties of the GN optimizer.
1 code implementation • 19 Sep 2024 • Tzu-Lin Kuo, Feng-Ting Liao, Mu-Wei Hsieh, Fu-Chieh Chang, Po-chun Hsu, Da-Shan Shiu
In real-world applications with Large Language Models (LLMs), external retrieval mechanisms - such as Search-Augmented Generation (SAG), tool utilization, and Retrieval-Augmented Generation (RAG) - are often employed to enhance the quality of augmented generations in dialogues.
1 code implementation • 23 May 2024 • Chan-Jan Hsu, Yi-Chang Chen, Feng-Ting Liao, Pei-Chen Ho, Yu-Hsiang Wang, Po-chun Hsu, Da-Shan Shiu
We introduce "Generative Fusion Decoding" (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 5 Mar 2024 • Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-chun Hsu, Yi-Chang Chen, Da-Shan Shiu
Breeze-7B is an open-source language model based on Mistral-7B, designed to address the need for improved language comprehension and chatbot-oriented capabilities in Traditional Chinese.
1 code implementation • 15 Sep 2023 • Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-chun Hsu, Yi-Chang Chen, Da-Shan Shiu
In an effort to advance the evaluation of language models in Traditional Chinese and stimulate further research in this field, we have open-sourced our benchmark and opened the model for trial.
no code implementations • 10 Aug 2023 • Ushnish Sengupta, Chinkuo Jao, Alberto Bernacchia, Sattar Vakili, Da-Shan Shiu
In this paper, we propose a diffusion model based channel sampling approach for rapidly synthesizing channel realizations from limited data.
1 code implementation • 18 Jul 2023 • Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-Shan Shiu
In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt.
1 code implementation • 1 Jun 2023 • Ayan Das, Stathi Fotiadis, Anil Batra, Farhang Nabiei, FengTing Liao, Sattar Vakili, Da-Shan Shiu, Alberto Bernacchia
We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring.
no code implementations • 8 Mar 2023 • Philipp Ennen, Po-chun Hsu, Chan-Jan Hsu, Chang-Le Liu, Yen-chen Wu, Yin-Hsiang Liao, Chin-Tung Lin, Da-Shan Shiu, Wei-Yun Ma
In this paper we present the multilingual language model BLOOM-zh that features enhanced support for Traditional Chinese.
no code implementations • 13 Apr 2022 • Fu-Chieh Chang, Yu-Wei Tseng, Ya-Wen Yu, Ssu-Rui Lee, Alexandru Cioba, I-Lun Tseng, Da-Shan Shiu, Jhih-Wei Hsu, Cheng-Yuan Wang, Chien-Yi Yang, Ren-Chu Wang, Yao-Wen Chang, Tai-Chen Chen, Tung-Chieh Chen
Recently, successful applications of reinforcement learning to chip placement have emerged.
no code implementations • 8 Feb 2022 • Sattar Vakili, Jonathan Scarlett, Da-Shan Shiu, Alberto Bernacchia
Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization.
no code implementations • 13 Sep 2021 • Sattar Vakili, Michael Bromberg, Jezabel Garcia, Da-Shan Shiu, Alberto Bernacchia
As a byproduct of our results, we show the equivalence between the RKHS corresponding to the NT kernel and its counterpart corresponding to the Mat\'ern family of kernels, showing the NT kernels induce a very general class of models.
no code implementations • NeurIPS 2021 • Sattar Vakili, Nacime Bouziani, Sepehr Jalali, Alberto Bernacchia, Da-Shan Shiu
Consider the sequential optimization of a continuous, possibly non-convex, and expensive to evaluate objective function $f$.
no code implementations • 21 May 2021 • Philipp Ennen, Yen-Ting Lin, Ali Girayhan Ozbay, Ferdinando Insalata, Maolin Li, Ye Tian, Sepehr Jalali, Da-Shan Shiu
In light of the recent success of data-driven approaches, we propose the novel future bridging NLG (FBNLG) concept for dialogue systems and simulators.
no code implementations • 15 Mar 2021 • Alexandru Cioba, Michael Bromberg, Qian Wang, Ritwik Niyogi, Georgios Batzolis, Jezabel Garcia, Da-Shan Shiu, Alberto Bernacchia
We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks.
no code implementations • 8 Mar 2021 • Jezabel R. Garcia, Federica Freddi, Feng-Ting Liao, Jamie McGowan, Tim Nieradzik, Da-Shan Shiu, Ye Tian, Alberto Bernacchia
We show that TreeMAML improves the state of the art results for cross-lingual Natural Language Inference.
Cross-Lingual Natural Language Inference
Cross-Lingual Transfer
+2
no code implementations • 1 Jan 2021 • Jezabel Garcia, Federica Freddi, Jamie McGowan, Tim Nieradzik, Da-Shan Shiu, Ye Tian, Alberto Bernacchia
In meta-learning, the knowledge learned from previous tasks is transferred to new ones, but this transfer only works if tasks are related, and sharing information between unrelated tasks might hurt performance.
no code implementations • 1 Jan 2021 • Georgios Batzolis, Alberto Bernacchia, Da-Shan Shiu, Michael Bromberg, Alexandru Cioba
They are tested on benchmarks with a fixed number of data-points for each training task, and this number is usually arbitrary, for example, 5 instances per class in few-shot classification.
no code implementations • NeurIPS Workshop SVRHM 2021 • Federica Freddi, Jezabel R Garcia, Michael Bromberg, Sepehr Jalali, Da-Shan Shiu, Alvin Chua, Alberto Bernacchia
We propose a novel architecture that allows flexible information flow between features $z$ and locations $(x, y)$ across the entire image with a small number of layers.