no code implementations • 3 Feb 2025 • YuHang Zhou, Giannis Karamanolakis, Victor Soto, Anna Rumshisky, Mayank Kulkarni, Furong Huang, Wei Ai, Jianhua Lu
The recent success of specialized Large Language Models (LLMs) in domains such as mathematical reasoning and coding has led to growing interest in methods for merging these expert LLMs into a unified Mixture-of-Experts (MoE) model, with the goal of enhancing performance in each domain while retaining effectiveness on general tasks.
no code implementations • 19 May 2024 • Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu, Aidong Zhang
In this paper, we propose MAML-en-LLM, a novel method for meta-training LLMs, which can learn truly generalizable parameters that not only perform well on disjointed tasks but also adapts to unseen tasks.
no code implementations • NAACL 2021 • Victor Soto, Konstantine Arkoudas
Accordingly, unsupervised learning and SSL (semi-supervised learning) techniques continue to be of vital importance.
no code implementations • WS 2016 • Fahad AlGhamdi, Giovanni Molina, Mona Diab, Thamar Solorio, Abdelati Hawwari, Victor Soto, Julia Hirschberg
We address the problem of Part of Speech tagging (POS) in the context of linguistic code switching (CS).
no code implementations • WS 2018 • Gustavo Aguilar, Fahad AlGhamdi, Victor Soto, Mona Diab, Julia Hirschberg, Thamar Solorio
In the third shared task of the Computational Approaches to Linguistic Code-Switching (CALCS) workshop, we focus on Named Entity Recognition (NER) on code-switched social-media data.
no code implementations • WS 2018 • Victor Soto, Julia Hirschberg
Code-switching is the fluent alternation between two or more languages in conversation between bilinguals.
no code implementations • 24 Mar 2017 • Victor Soto, Julia Hirschberg
We split the annotation task into three subtasks: one in which a subset of tokens are labeled automatically, one in which questions are specifically designed to disambiguate a subset of high frequency words, and a more general cascaded approach for the remaining data in which questions are displayed to the worker following a decision tree structure.
1 code implementation • NeurIPS 2016 • Victor Soto, Alberto Suárez, Gonzalo Martinez-Muñoz
An analysis of this classical urn model based on the hypergeometric distribution makes it possible to estimate the confidence on the outcome of majority voting when only a fraction of the individual predictions is known.