no code implementations • 10 Apr 2024 • Sahil Garg, Anderson Schneider, Anant Raj, Kashif Rasul, Yuriy Nevmyvaka, Sneihil Gopal, Amit Dhurandhar, Guillermo Cecchi, Irina Rish
In addition to the data efficiency gained from direct sampling, we propose an algorithm that offers a significant reduction in sample complexity for estimating the divergence of the data distribution with respect to the marginal distribution.
1 code implementation • 24 Mar 2024 • Shengyi Huang, Michael Noukhovitch, Arian Hosseini, Kashif Rasul, Weixun Wang, Lewis Tunstall
This work is the first to openly reproduce the Reinforcement Learning from Human Feedback (RLHF) scaling behaviors reported in OpenAI's seminal TL;DR summarization work.
no code implementations • 20 Feb 2024 • Zijie Pan, Yushan Jiang, Dongjin Song, Sahil Garg, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka
To address this issue, we propose a novel Structural Knowledge Informed Continual Learning (SKI-CL) framework to perform MTS forecasting within a continual learning paradigm, which leverages structural knowledge to steer the forecasting model toward identifying and adapting to different regimes, and selects representative MTS samples from each regime for memory replay.
1 code implementation • 25 Oct 2023 • Lewis Tunstall, Edward Beeching, Nathan Lambert, Nazneen Rajani, Kashif Rasul, Younes Belkada, Shengyi Huang, Leandro von Werra, Clémentine Fourrier, Nathan Habib, Nathan Sarrazin, Omar Sanseviero, Alexander M. Rush, Thomas Wolf
Starting from a dataset of outputs ranked by a teacher model, we apply distilled direct preference optimization (dDPO) to learn a chat model with significantly improved intent alignment.
1 code implementation • 12 Oct 2023 • Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Hena Ghonia, Rishika Bhagwatkar, Arian Khorasani, Mohammad Javad Darvishi Bayazi, George Adamopoulos, Roland Riachi, Nadhir Hassen, Marin Biloš, Sahil Garg, Anderson Schneider, Nicolas Chapados, Alexandre Drouin, Valentina Zantedeschi, Yuriy Nevmyvaka, Irina Rish
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization.
no code implementations • 23 May 2023 • Manuel Kunz, Stefan Birr, Mones Raslan, Lei Ma, Zhen Li, Adele Gouttes, Mateusz Koren, Tofigh Naghibi, Johannes Stephan, Mariia Bulycheva, Matthias Grzeschik, Armin Kekić, Michael Narodovitch, Kashif Rasul, Julian Sieber, Tim Januschowski
These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption.
1 code implementation • 12 May 2023 • Yu Chen, Wei Deng, Shikai Fang, Fengpei Li, Nicole Tianjiao Yang, Yikai Zhang, Kashif Rasul, Shandian Zhe, Anderson Schneider, Yuriy Nevmyvaka
We show that optimizing the transport cost improves the performance and the proposed algorithm achieves the state-of-the-art result in healthcare and environmental data while exhibiting the advantage of exploring both temporal and feature patterns in probabilistic time series imputation.
no code implementations • 4 Nov 2022 • Marin Biloš, Kashif Rasul, Anderson Schneider, Yuriy Nevmyvaka, Stephan Günnemann
Temporal data such as time series can be viewed as discretized measurements of the underlying function.
no code implementations • 29 Jun 2022 • Stephan Rabanser, Tim Januschowski, Kashif Rasul, Oliver Borchert, Richard Kurle, Jan Gasthaus, Michael Bohlke-Schneider, Nicolas Papernot, Valentin Flunkert
We introduce a novel, practically relevant variation of the anomaly detection problem in multi-variate time series: intrinsic anomaly detection.
no code implementations • 31 May 2022 • Kashif Rasul, Young-Jin Park, Max Nihlén Ramström, Kyung-Min Kim
Time series models aim for accurate predictions of the future given the past, where the forecasts are used for important downstream tasks like business decision making.
no code implementations • 29 Sep 2021 • Stephan Rabanser, Tim Januschowski, Kashif Rasul, Oliver Borchert, Richard Kurle, Jan Gasthaus, Michael Bohlke-Schneider, Nicolas Papernot, Valentin Flunkert
Modern time series corpora, in particular those coming from sensor-based data, exhibit characteristics that have so far not been adequately addressed in the literature on representation learning for time series.
1 code implementation • 8 Jul 2021 • Adèle Gouttes, Kashif Rasul, Mateusz Koren, Johannes Stephan, Tofigh Naghibi
Here, we propose a general method for probabilistic time series forecasting.
1 code implementation • 28 Jan 2021 • Kashif Rasul, Calvin Seward, Ingmar Schuster, Roland Vollgraf
In this work, we propose \texttt{TimeGrad}, an autoregressive model for multivariate probabilistic time series forecasting which samples from the data distribution at each time step by estimating its gradient.
Multivariate Time Series Forecasting Probabilistic Time Series Forecasting +1
1 code implementation • ICLR 2021 • Kashif Rasul, Abdul-Saboor Sheikh, Ingmar Schuster, Urs Bergmann, Roland Vollgraf
In this work we model the multivariate temporal dynamics of time series via an autoregressive deep learning model, where the data distribution is represented by a conditioned normalizing flow.
no code implementations • 6 Sep 2019 • Kashif Rasul, Ingmar Schuster, Roland Vollgraf, Urs Bergmann
We present a generative model that is defined on finite sets of exchangeable, potentially high dimensional, data.
1 code implementation • NAACL 2019 • Alan Akbik, Tanja Bergmann, Duncan Blythe, Kashif Rasul, Stefan Schweter, Rol Vollgraf,
We present FLAIR, an NLP framework designed to facilitate training and distribution of state-of-the-art sequence labeling, text classification and language models.
no code implementations • 10 Feb 2019 • Andreas Merentitis, Kashif Rasul, Roland Vollgraf, Abdul-Saboor Sheikh, Urs Bergmann
This helps the bandit framework to select the best agents early, since these rewards are smoother and less sparse than the environment reward.
no code implementations • 4 Dec 2017 • Abdul-Saboor Sheikh, Kashif Rasul, Andreas Merentitis, Urs Bergmann
This work explores maximum likelihood optimization of neural networks through hypernetworks.
37 code implementations • 25 Aug 2017 • Han Xiao, Kashif Rasul, Roland Vollgraf
We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70, 000 fashion products from 10 categories, with 7, 000 images per category.