no code implementations • 8 Oct 2024 • Bowen Jin, Jinsung Yoon, Jiawei Han, Sercan O. Arik
Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources.
no code implementations • 2 Oct 2024 • Mohammadreza Pourreza, Hailong Li, Ruoxi Sun, Yeounoh Chung, Shayan Talaei, Gaurav Tarlok Kakkar, Yu Gan, Amin Saberi, Fatma Ozcan, Sercan O. Arik
In tackling the challenges of large language model (LLM) performance for Text-to-SQL tasks, we introduce CHASE-SQL, a new framework that employs innovative strategies, using test-time compute in multi-agent modeling to improve candidate generation and selection.
no code implementations • 22 Aug 2024 • Mohammadreza Pourreza, Ruoxi Sun, Hailong Li, Lesly Miculicich, Tomas Pfister, Sercan O. Arik
This leads to a versatile model optimized for multiple SQL dialects, outperforming single-dialect models and significantly enhancing overall performance.
no code implementations • 13 Aug 2024 • Sayna Ebrahimi, Sercan O. Arik, Tejas Nama, Tomas Pfister
Multimodal Large Language Models (MLLMs) demonstrate remarkable image-language capabilities, but their widespread use faces challenges in cost-effective training and adaptation.
Ranked #31 on Visual Question Answering on MM-Vet
no code implementations • 16 Jul 2024 • Hongjin Su, Howard Yen, Mengzhou Xia, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
To better benchmark retrieval on such challenging queries, we introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
no code implementations • 22 Jun 2024 • Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Sercan O. Arik
We conclude that studying exemplar optimization both as a standalone method and its optimal combination with instruction optimization remain a crucial aspect of APO and deserve greater consideration in future research, even in the era of highly capable instruction-following models.
1 code implementation • 25 Aug 2023 • Nicasia Beebe-Wang, Sayna Ebrahimi, Jinsung Yoon, Sercan O. Arik, Tomas Pfister
In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Series), a framework for identifying suitable pretraining strategies for sparse and irregularly sampled time series datasets.
no code implementations • 24 Aug 2023 • Helen Zhou, Sercan O. Arik, Jingtao Wang
We explore a wide range of plausible cost trade-off scenarios, and empirically demonstrate that end-to-end optimization often outperforms optimization of standard business-agnostic forecasting metrics (by up to 45. 7% for a simple scaling model, and up to 54. 0% for an LSTM encoder-decoder model).
1 code implementation • 26 May 2023 • Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister
To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data.
no code implementations • 24 May 2023 • Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Martin Eisenschlos, Sercan O. Arik, Tomas Pfister
A hallmark of modern large language models (LLMs) is their impressive general zero-shot and few-shot abilities, often elicited through in-context learning (ICL) via prompting.
no code implementations • 23 May 2023 • Xingchen Wan, Ruoxi Sun, Hanjun Dai, Sercan O. Arik, Tomas Pfister
Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans.
no code implementations • 6 Apr 2023 • Yihe Dong, Sercan O. Arik
Feature selection has been widely used to alleviate compute requirements during training, elucidate model interpretability, and improve model generalizability.
4 code implementations • 10 Mar 2023 • Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, Tomas Pfister
Extending them, in this paper, we investigate the capabilities of linear models for time-series forecasting and present Time-Series Mixer (TSMixer), a novel architecture designed by stacking multi-layer perceptrons (MLPs).
Ranked #39 on Time Series Forecasting on ETTh1 (336) Multivariate (MAE metric)
no code implementations • 12 Jan 2023 • Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister
Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility.
no code implementations • 30 Nov 2022 • Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Tomas Pfister
Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled.
Semi-supervised Anomaly Detection Supervised Anomaly Detection
no code implementations • 12 Nov 2022 • Zachary Izzo, Jinsung Yoon, Sercan O. Arik, James Zou
However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning, and DP guarantees themselves can be difficult to interpret.
no code implementations • 15 Jun 2022 • Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister
For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area.
no code implementations • 4 Feb 2022 • Sercan O. Arik, Nathanael C. Yoder, Tomas Pfister
Real-world time-series datasets often violate the assumptions of standard supervised learning for forecasting -- their distributions evolve over time, rendering the conventional training and model selection procedures suboptimal.
1 code implementation • NeurIPS 2021 • Sungyong Seo, Sercan O. Arik, Jinsung Yoon, Xiang Zhang, Kihyuk Sohn, Tomas Pfister
The key aspect of DeepCTRL is that it does not require retraining to adapt the rule strength -- at inference, the user can adjust it based on the desired operation point on accuracy vs. rule verification ratio.
no code implementations • 11 Jun 2021 • Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Chen-Yu Lee, Tomas Pfister
We demonstrate our method on various unsupervised AD tasks with image and tabular data.
6 code implementations • 26 May 2021 • Zizhao Zhang, Han Zhang, Long Zhao, Ting Chen, Sercan O. Arik, Tomas Pfister
Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well.
Ranked #89 on Image Classification on CIFAR-10
no code implementations • NeurIPS 2020 • Sercan O. Arik, Chun-Liang Li, Jinsung Yoon, Rajarishi Sinha, Arkady Epshteyn, Long T. Le, Vikas Menon, Shashank Singh, Leyou Zhang, Nate Yoder, Martin Nikoltchev, Yash Sonthalia, Hootan Nakhost, Elli Kanal, Tomas Pfister
We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19.
no code implementations • 15 Jul 2020 • Yu-Han Liu, Sercan O. Arik
We propose a novel method to explain trained deep neural networks (DNNs), by distilling them into surrogate models using unsupervised clustering.
34 code implementations • 19 Dec 2019 • Bryan Lim, Sercan O. Arik, Nicolas Loeff, Tomas Pfister
Multi-horizon forecasting problems often contain a complex mix of inputs -- including static (i. e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed historically -- without any prior information on how they interact with the target.
2 code implementations • NeurIPS 2020 • Chih-Kuan Yeh, Been Kim, Sercan O. Arik, Chun-Liang Li, Tomas Pfister, Pradeep Ravikumar
Next, we propose a concept discovery method that aims to infer a complete set of concepts that are additionally encouraged to be interpretable, which addresses the limitations of existing methods on concept explanations.
no code implementations • ECCV 2020 • Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister
Active learning (AL) combines data labeling and model training to minimize the labeling cost by prioritizing the selection of high value data that can best improve model performance.
2 code implementations • CVPR 2020 • Zizhao Zhang, Han Zhang, Sercan O. Arik, Honglak Lee, Tomas Pfister
For instance, on CIFAR100 with a $40\%$ uniform noise ratio and only 10 trusted labeled data per class, our method achieves $80. 2{\pm}0. 3\%$ classification accuracy, where the error rate is only $1. 4\%$ higher than a neural network trained without label noise.
1 code implementation • 26 Sep 2019 • Jinsung Yoon, Sercan O. Arik, Tomas Pfister
Understanding black-box machine learning models is crucial for their widespread adoption.
2 code implementations • ICML 2020 • Jinsung Yoon, Sercan O. Arik, Tomas Pfister
To adaptively learn data values jointly with the target task predictor model, we propose a meta learning framework which we name Data Valuation using Reinforcement Learning (DVRL).
no code implementations • 25 Sep 2019 • Mingfei Gao, Zizhao Zhang, Guo Yu, Sercan O. Arik, Larry S. Davis, Tomas Pfister
Active learning (AL) aims to integrate data labeling and model training in a unified way, and to minimize the labeling budget by prioritizing the selection of high value data that can best improve model performance.
no code implementations • ECCV 2020 • Linchao Zhu, Sercan O. Arik, Yi Yang, Tomas Pfister
We propose a novel adaptive transfer learning framework, learning to transfer learn (L2TL), to improve performance on a target dataset by careful extraction of the related information from a source dataset.
19 code implementations • 20 Aug 2019 • Sercan O. Arik, Tomas Pfister
We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet.
Ranked #1 on Poker Hand Classification on Poker Hand
4 code implementations • 17 Feb 2019 • Sercan O. Arik, Tomas Pfister
We propose a novel inherently interpretable machine learning method that bases decisions on few relevant examples that we call prototypes.
no code implementations • 20 Aug 2018 • Sercan O. Arik, Heewoo Jun, Gregory Diamos
We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms.
2 code implementations • NeurIPS 2018 • Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.
7 code implementations • ICLR 2018 • Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.
no code implementations • 15 Mar 2017 • Sercan O. Arik, Markus Kliegl, Rewon Child, Joel Hestness, Andrew Gibiansky, Chris Fougner, Ryan Prenger, Adam Coates
Keyword spotting (KWS) constitutes a major component of human-technology interfaces.
3 code implementations • ICML 2017 • Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xi-An Li, John Miller, Andrew Ng, Jonathan Raiman, Shubho Sengupta, Mohammad Shoeybi
We present Deep Voice, a production-quality text-to-speech system constructed entirely from deep neural networks.