no code implementations • ICML 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

The best existing efficient (i. e., polynomial-time) algorithms for this problem only guarantee a $O(T^{2/3})$ upper-bound on the regret.

no code implementations • ICML 2020 • Pierre Perrault, Zheng Wen, Michal Valko, Jennifer Healey

We introduce a new budgeted framework for online influence maximization, considering the total cost of an advertising campaign instead of the common cardinality constraint on a chosen influencer set.

1 code implementation • 31 Jul 2024 • Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Gregoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel Kloumann, Ishan Misra, Ivan Evtimov, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer Van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, Khalid El-Arini, Krithika Iyer, Kshitiz Malik, Kuenley Chiu, Kunal Bhalla, Lauren Rantala-Yeary, Laurens van der Maaten, Lawrence Chen, Liang Tan, Liz Jenkins, Louis Martin, Lovish Madaan, Lubo Malo, Lukas Blecher, Lukas Landzaat, Luke de Oliveira, Madeline Muzzi, Mahesh Pasupuleti, Mannat Singh, Manohar Paluri, Marcin Kardas, Mathew Oldham, Mathieu Rita, Maya Pavlova, Melanie Kambadur, Mike Lewis, Min Si, Mitesh Kumar Singh, Mona Hassan, Naman Goyal, Narjes Torabi, Nikolay Bashlykov, Nikolay Bogoychev, Niladri Chatterji, Olivier Duchenne, Onur Çelebi, Patrick Alrassy, Pengchuan Zhang, Pengwei Li, Petar Vasic, Peter Weng, Prajjwal Bhargava, Pratik Dubal, Praveen Krishnan, Punit Singh Koura, Puxin Xu, Qing He, Qingxiao Dong, Ragavan Srinivasan, Raj Ganapathy, Ramon Calderer, Ricardo Silveira Cabral, Robert Stojnic, Roberta Raileanu, Rohit Girdhar, Rohit Patel, Romain Sauvestre, Ronnie Polidoro, Roshan Sumbaly, Ross Taylor, Ruan Silva, Rui Hou, Rui Wang, Saghar Hosseini, Sahana Chennabasappa, Sanjay Singh, Sean Bell, Seohyun Sonia Kim, Sergey Edunov, Shaoliang Nie, Sharan Narang, Sharath Raparthy, Sheng Shen, Shengye Wan, Shruti Bhosale, Shun Zhang, Simon Vandenhende, Soumya Batra, Spencer Whitman, Sten Sootla, Stephane Collot, Suchin Gururangan, Sydney Borodinsky, Tamar Herman, Tara Fowler, Tarek Sheasha, Thomas Georgiou, Thomas Scialom, Tobias Speckbacher, Todor Mihaylov, Tong Xiao, Ujjwal Karn, Vedanuj Goswami, Vibhor Gupta, Vignesh Ramanathan, Viktor Kerkez, Vincent Gonguet, Virginie Do, Vish Vogeti, Vladan Petrovic, Weiwei Chu, Wenhan Xiong, Wenyin Fu, Whitney Meers, Xavier Martinet, Xiaodong Wang, Xiaoqing Ellen Tan, Xinfeng Xie, Xuchao Jia, Xuewei Wang, Yaelle Goldschlag, Yashesh Gaur, Yasmine Babaei, Yi Wen, Yiwen Song, Yuchen Zhang, Yue Li, Yuning Mao, Zacharie Delpierre Coudert, Zheng Yan, Zhengxing Chen, Zoe Papakipos, Aaditya Singh, Aaron Grattafiori, Abha Jain, Adam Kelsey, Adam Shajnfeld, Adithya Gangidi, Adolfo Victoria, Ahuva Goldstand, Ajay Menon, Ajay Sharma, Alex Boesenberg, Alex Vaughan, Alexei Baevski, Allie Feinstein, Amanda Kallet, Amit Sangani, Anam Yunus, Andrei Lupu, Andres Alvarado, Andrew Caples, Andrew Gu, Andrew Ho, Andrew Poulton, Andrew Ryan, Ankit Ramchandani, Annie Franco, Aparajita Saraf, Arkabandhu Chowdhury, Ashley Gabriel, Ashwin Bharambe, Assaf Eisenman, Azadeh Yazdan, Beau James, Ben Maurer, Benjamin Leonhardi, Bernie Huang, Beth Loyd, Beto De Paola, Bhargavi Paranjape, Bing Liu, Bo Wu, Boyu Ni, Braden Hancock, Bram Wasti, Brandon Spence, Brani Stojkovic, Brian Gamido, Britt Montalvo, Carl Parker, Carly Burton, Catalina Mejia, Changhan Wang, Changkyu Kim, Chao Zhou, Chester Hu, Ching-Hsiang Chu, Chris Cai, Chris Tindal, Christoph Feichtenhofer, Damon Civin, Dana Beaty, Daniel Kreymer, Daniel Li, Danny Wyatt, David Adkins, David Xu, Davide Testuggine, Delia David, Devi Parikh, Diana Liskovich, Didem Foss, Dingkang Wang, Duc Le, Dustin Holland, Edward Dowling, Eissa Jamil, Elaine Montgomery, Eleonora Presani, Emily Hahn, Emily Wood, Erik Brinkman, Esteban Arcaute, Evan Dunbar, Evan Smothers, Fei Sun, Felix Kreuk, Feng Tian, Firat Ozgenel, Francesco Caggioni, Francisco Guzmán, Frank Kanayet, Frank Seide, Gabriela Medina Florez, Gabriella Schwarz, Gada Badeer, Georgia Swee, Gil Halpern, Govind Thattai, Grant Herman, Grigory Sizov, Guangyi, Zhang, Guna Lakshminarayanan, Hamid Shojanazeri, Han Zou, Hannah Wang, Hanwen Zha, Haroun Habeeb, Harrison Rudolph, Helen Suk, Henry Aspegren, Hunter Goldman, Ibrahim Damlaj, Igor Molybog, Igor Tufanov, Irina-Elena Veliche, Itai Gat, Jake Weissman, James Geboski, James Kohli, Japhet Asher, Jean-Baptiste Gaya, Jeff Marcus, Jeff Tang, Jennifer Chan, Jenny Zhen, Jeremy Reizenstein, Jeremy Teboul, Jessica Zhong, Jian Jin, Jingyi Yang, Joe Cummings, Jon Carvill, Jon Shepard, Jonathan McPhie, Jonathan Torres, Josh Ginsburg, Junjie Wang, Kai Wu, Kam Hou U, Karan Saxena, Karthik Prasad, Kartikay Khandelwal, Katayoun Zand, Kathy Matosich, Kaushik Veeraraghavan, Kelly Michelena, Keqian Li, Kun Huang, Kunal Chawla, Kushal Lakhotia, Kyle Huang, Lailin Chen, Lakshya Garg, Lavender A, Leandro Silva, Lee Bell, Lei Zhang, Liangpeng Guo, Licheng Yu, Liron Moshkovich, Luca Wehrstedt, Madian Khabsa, Manav Avalani, Manish Bhatt, Maria Tsimpoukelli, Martynas Mankus, Matan Hasson, Matthew Lennie, Matthias Reso, Maxim Groshev, Maxim Naumov, Maya Lathi, Meghan Keneally, Michael L. Seltzer, Michal Valko, Michelle Restrepo, Mihir Patel, Mik Vyatskov, Mikayel Samvelyan, Mike Clark, Mike Macey, Mike Wang, Miquel Jubert Hermoso, Mo Metanat, Mohammad Rastegari, Munish Bansal, Nandhini Santhanam, Natascha Parks, Natasha White, Navyata Bawa, Nayan Singhal, Nick Egebo, Nicolas Usunier, Nikolay Pavlovich Laptev, Ning Dong, Ning Zhang, Norman Cheng, Oleg Chernoguz, Olivia Hart, Omkar Salpekar, Ozlem Kalinli, Parkin Kent, Parth Parekh, Paul Saab, Pavan Balaji, Pedro Rittner, Philip Bontrager, Pierre Roux, Piotr Dollar, Polina Zvyagina, Prashant Ratanchandani, Pritish Yuvraj, Qian Liang, Rachad Alao, Rachel Rodriguez, Rafi Ayub, Raghotham Murthy, Raghu Nayani, Rahul Mitra, Raymond Li, Rebekkah Hogan, Robin Battey, Rocky Wang, Rohan Maheswari, Russ Howes, Ruty Rinott, Sai Jayesh Bondu, Samyak Datta, Sara Chugh, Sara Hunt, Sargun Dhillon, Sasha Sidorov, Satadru Pan, Saurabh Verma, Seiji Yamamoto, Sharadh Ramaswamy, Shaun Lindsay, Sheng Feng, Shenghao Lin, Shengxin Cindy Zha, Shiva Shankar, Shuqiang Zhang, Sinong Wang, Sneha Agarwal, Soji Sajuyigbe, Soumith Chintala, Stephanie Max, Stephen Chen, Steve Kehoe, Steve Satterfield, Sudarshan Govindaprasad, Sumit Gupta, Sungmin Cho, Sunny Virk, Suraj Subramanian, Sy Choudhury, Sydney Goldman, Tal Remez, Tamar Glaser, Tamara Best, Thilo Kohler, Thomas Robinson, Tianhe Li, Tianjun Zhang, Tim Matthews, Timothy Chou, Tzook Shaked, Varun Vontimitta, Victoria Ajayi, Victoria Montanez, Vijai Mohan, Vinay Satish Kumar, Vishal Mangla, Vítor Albiero, Vlad Ionescu, Vlad Poenaru, Vlad Tiberiu Mihailescu, Vladimir Ivanov, Wei Li, Wenchen Wang, WenWen Jiang, Wes Bouaziz, Will Constable, Xiaocheng Tang, Xiaofang Wang, Xiaojian Wu, Xiaolan Wang, Xide Xia, Xilun Wu, Xinbo Gao, Yanjun Chen, Ye Hu, Ye Jia, Ye Qi, Yenda Li, Yilin Zhang, Ying Zhang, Yossi Adi, Youngjin Nam, Yu, Wang, Yuchen Hao, Yundi Qian, Yuzi He, Zach Rait, Zachary DeVito, Zef Rosnbrick, Zhaoduo Wen, Zhenyu Yang, Zhiwei Zhao

This paper presents a new set of foundation models, called Llama 3.

Ranked #3 on Multi-task Language Understanding on MMLU

no code implementations • 20 May 2024 • Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

(b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed.

no code implementations • 14 May 2024 • Yunhao Tang, Daniel Zhaohan Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney

However, rising popularity in offline alignment algorithms challenge the need for on-policy sampling in RLHF.

no code implementations • 13 Mar 2024 • Daniele Calandriello, Daniel Guo, Remi Munos, Mark Rowland, Yunhao Tang, Bernardo Avila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot

Building on this equivalence, we introduce the IPO-MD algorithm that generates data with a mixture policy (between the online and reference policy) similarly as the general Nash-MD algorithm.

no code implementations • 8 Feb 2024 • Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

The GPO framework also sheds light on how offline algorithms enforce regularization, through the design of the convex function that defines the loss.

no code implementations • 5 Feb 2024 • Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Aligning language models with human preferences is crucial for reducing errors and biases in these models.

no code implementations • 1 Dec 2023 • Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot

We term this approach Nash learning from human feedback (NLHF).

no code implementations • 26 Oct 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard

In particular, we study the demonstration-regularized reinforcement learning that leverages the expert demonstrations by KL-regularization for a policy learned by behavior cloning.

1 code implementation • 18 Oct 2023 • Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos

In particular we derive a new general objective called $\Psi$PO for learning from human preferences that is expressed in terms of pairwise preferences and therefore bypasses both approximations.

no code implementations • 1 Sep 2023 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback.

1 code implementation • 17 Aug 2023 • Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L. Dyer

Message passing neural networks have shown a lot of success on graph-structured data.

Ranked #1 on Node Classification on AMZ Comp

no code implementations • 29 May 2023 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko

Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings.

no code implementations • 29 May 2023 • Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko

In reinforcement learning, the advantage function is critical for policy improvement, but is often extracted from a learned Q-function.

1 code implementation • 22 May 2023 • Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL), has served as the basis for recent high-performing practical RL algorithms.

no code implementations • 2 May 2023 • Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space.

no code implementations • 6 Apr 2023 • Denis Belomestny, Pierre Menard, Alexey Naumov, Daniil Tiapkin, Michal Valko

These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum.

1 code implementation • 14 Mar 2023 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Menard

Finally, we apply developed regularization techniques to reduce sample complexity of visitation entropy maximization to $\widetilde{\mathcal{O}}(H^2SA/\varepsilon^2)$, yielding a statistical separation between maximum entropy exploration and reward-free exploration.

1 code implementation • 23 Dec 2022 • Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko

Imperfect information games (IIG) are games in which each player only partially observes the current game state.

no code implementations • 6 Dec 2022 • Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko

We identify that a faster paced optimization of the predictor and semi-gradient updates on the representation, are crucial to preventing the representation collapse.

no code implementations • 18 Nov 2022 • Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko

In this work, we study a natural solution derived from structural causal models of the world: Our key idea is to learn representations of the future that capture precisely the unpredictable aspects of each outcome -- which we use as additional input for predictions, such that intrinsic rewards only reflect the predictable aspects of world dynamics.

1 code implementation • 28 Sep 2022 • Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Menard

We consider reinforcement learning in an environment modeled by an episodic, finite, stage-dependent Markov decision process of horizon $H$ with $S$ states, and $A$ actions.

no code implementations • 16 Jun 2022 • Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pîslar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments.

no code implementations • 27 May 2022 • Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model.

no code implementations • 16 May 2022 • Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Menard

We propose the Bayes-UCBVI algorithm for reinforcement learning in tabular, stage-dependent, episodic Markov decision process: a natural extension of the Bayes-UCB algorithm by Kaufmann et al. (2012) for multi-armed bandits.

no code implementations • 30 Mar 2022 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases.

no code implementations • 17 Feb 2022 • Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent SIfre, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

In this paper we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.

no code implementations • 30 Jan 2022 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Computing a Gaussian process (GP) posterior has a computational cost cubical in the number of historical points.

no code implementations • NeurIPS 2021 • Tadashi Kozuno, Pierre Ménard, Remi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an extensive game with imperfect information (EGII) through self-play.

no code implementations • 23 Nov 2021 • Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We introduce a generic strategy for provably efficient multi-goal exploration.

1 code implementation • NeurIPS 2021 • Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer

Our approach combines a generative modeling framework with an instance-specific alignment loss that tries to maximize the representational similarity between transformed views of the input (brain state).

1 code implementation • NeurIPS 2021 • Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko

Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions.

no code implementations • ICML Workshop URL 2021 • Omar Darwiche Domingues, Corentin Tallec, Remi Munos, Michal Valko

In this paper, we study the problem of representation learning and exploration in reinforcement learning.

no code implementations • 11 Jun 2021 • Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play.

no code implementations • 11 Jun 2021 • Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko

In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We study the problem of learning in the stochastic shortest path (SSP) setting, where an agent seeks to minimize the expected cost accumulated before reaching a goal state.

1 code implementation • ICCV 2021 • Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-bastien Grill, Aäron van den Oord, Andrew Zisserman

Most successful self-supervised learning methods are trained to align the representations of two independent views from the data.

no code implementations • ICLR Workshop GTRL 2021 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Remi Munos, Petar Veličković, Michal Valko

Current state-of-the-art self-supervised learning methods for graph neural networks are based on contrastive learning.

1 code implementation • 1 Mar 2021 • Pierre Menard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko

We propose UCBMQ, Upper Confidence Bound Momentum Q-learning, a new algorithm for reinforcement learning in tabular and possibly stage-dependent, episodic Markov decision process.

no code implementations • 27 Feb 2021 • Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel

These results indicate that Peng's Q($\lambda$), which was thought to be unsafe, is a theoretically-sound and practically effective algorithm.

1 code implementation • 19 Feb 2021 • Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Bernardo Avila-Pires, Lindsey Kitchell, Keith B. Hengen, William Gray-Roncal, Michal Valko, Eva L. Dyer

State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a sample.

4 code implementations • ICLR 2022 • Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Veličković, Michal Valko

To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input.

no code implementations • 6 Jan 2021 • Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Avila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos

Exploration is essential for solving complex Reinforcement Learning (RL) tasks.

no code implementations • 5 Jan 2021 • Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko

We demonstrate that from an algorithm guaranteeing an approximation factor for the ratio of submodular (RS) optimization problem, we can build another algorithm having a different kind of approximation guarantee -- weaker than the classical one -- for the difference of submodular (DS) optimization problem, and vice versa.

Data Structures and Algorithms

no code implementations • NeurIPS 2020 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

We investigate the exploration of an unknown environment when no reward function is provided.

8 code implementations • NeurIPS 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Remi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

1 code implementation • NeurIPS 2020 • Daniele Calandriello, Michal Derezinski, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, recommendation, stochastic optimization, experimental design and more.

1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis

The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.

3 code implementations • 20 Oct 2020 • Pierre H. Richemond, Jean-bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation.

no code implementations • 7 Oct 2020 • Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode.

no code implementations • 27 Jul 2020 • Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko

Realistic environments often provide agents with very limited feedback.

3 code implementations • ICML 2020 • Jean-bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence.

no code implementations • NeurIPS 2021 • Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric

One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.

no code implementations • 9 Jul 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric.

no code implementations • ICML 2020 • Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits.

no code implementations • 30 Jun 2020 • Daniele Calandriello, Michał Dereziński, Michal Valko

Determinantal point processes (DPPs) are a useful probabilistic model for selecting a small diverse subset out of a large collection of items, with applications in summarization, stochastic optimization, active learning and more.

no code implementations • ICML 2020 • Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko

Significant work has been recently dedicated to the stochastic delayed bandit setting because of its relevance in applications.

32 code implementations • 13 Jun 2020 • Jean-bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view.

Ranked #2 on Self-Supervised Person Re-Identification on SYSU-30k

no code implementations • 11 Jun 2020 • Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko

Reward-free exploration is a reinforcement learning setting studied by Jin et al. (2020), who address it by running several algorithms with regret guarantees in parallel.

no code implementations • NeurIPS 2020 • Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko

In CMAB, the question of the existence of an efficient policy with an optimal asymptotic regret (up to a factor poly-logarithmic with the action size) is still open for many families of distributions, including mutually independent outcomes, and more generally the multivariate sub-Gaussian family.

no code implementations • NeurIPS 2020 • Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support.

no code implementations • 14 Apr 2020 • Aadirupa Saha, Pierre Gaillard, Michal Valko

We then study the most general version of the problem where at each round available sets are generated from some unknown arbitrary distribution (i. e., without the independence assumption) and propose an efficient algorithm with $O(\sqrt {2^K T})$ regret guarantee.

1 code implementation • 12 Apr 2020 • Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning problems whose state-action space is endowed with a metric.

no code implementations • ICML 2020 • Yunhao Tang, Michal Valko, Rémi Munos

In this work, we investigate the application of Taylor expansions in reinforcement learning.

1 code implementation • ICML 2020 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Gaussian processes (GP) are one of the most successful frameworks to model uncertainty.

no code implementations • ICML 2020 • Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric

Many popular reinforcement learning problems (e. g., navigation in a maze, some Atari games, mountain car) are instances of the episodic setting under its stochastic shortest path (SSP) formulation, where an agent has to achieve a goal state while minimizing the cumulative cost.

1 code implementation • NeurIPS 2019 • Guillaume Gautier, Rémi Bardenet, Michal Valko

In the absence of DPP machinery to derive an efficient sampler and analyze their estimator, the idea of Monte Carlo integration with DPPs was stored in the cellar of numerical integration.

1 code implementation • NeurIPS 2019 • Jean-bastien Grill, Omar Darwiche Domingues, Pierre Menard, Remi Munos, Michal Valko

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser.

no code implementations • 24 Oct 2019 • Xuedong Shang, Rianne de Heide, Emilie Kaufmann, Pierre Ménard, Michal Valko

We investigate and provide new insights on the sampling rule called Top-Two Thompson Sampling (TTTS).

no code implementations • 9 Oct 2019 • Victor Gabillon, Rasul Tutunov, Michal Valko, Haitham Bou Ammar

In this paper, we formalise order-robust optimisation as an instance of online learning minimising simple regret, and propose Vroom, a zero'th order optimisation algorithm capable of achieving vanishing regret in non-stationary environments, while recovering favorable rates under stochastic reward-generating processes.

1 code implementation • NeurIPS 2019 • Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents.

no code implementations • 20 Jun 2019 • Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet

By trying to minimize the $\ell^2$-loss $\mathbb{E} [\lVert\hat{\beta}-\beta^{\star}\rVert^2]$ the decision maker is actually minimizing the trace of the covariance matrix of the problem, which corresponds then to online A-optimal design.

2 code implementations • NeurIPS 2019 • Michał Dereziński, Daniele Calandriello, Michal Valko

For this purpose, we propose a new algorithm which, given access to $\mathbf{L}$, samples exactly from a determinantal point process while satisfying the following two properties: (1) its preprocessing cost is $n \cdot \text{poly}(k)$, i. e., sublinear in the size of $\mathbf{L}$, and (2) its sampling cost is $\text{poly}(k)$, i. e., independent of the size of $\mathbf{L}$.

1 code implementation • 13 Mar 2019 • Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco

Moreover, we show that our procedure selects at most $\tilde{O}(d_{eff})$ points, where $d_{eff}$ is the effective dimension of the explored space, which is typically much smaller than both $d$ and $t$.

no code implementations • 11 Feb 2019 • Pierre Perrault, Vianney Perchet, Michal Valko

We improve the efficiency of algorithms for stochastic \emph{combinatorial semi-bandits}.

no code implementations • NeurIPS 2018 • Jean-bastien Grill, Michal Valko, Rémi Munos

Given $W$, our goal is to return an $\epsilon$-approximation of its maximum using the smallest possible number of function evaluations, the sample complexity of the algorithm.

no code implementations • 27 Nov 2018 • Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko

In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary.

no code implementations • 1 Oct 2018 • Peter L. Bartlett, Victor Gabillon, Michal Valko

The difficulty of optimization is measured in terms of 1) the amount of \emph{noise} $b$ of the function evaluation and 2) the local smoothness, $d$, of the function.

1 code implementation • ECCV 2018 • Edouard Oyallon, Eugene Belilovsky, Sergey Zagoruyko, Michal Valko

We study the first-order scattering transform as a candidate for reducing the signal processed by a convolutional neural network (CNN).

2 code implementations • 19 Sep 2018 • Guillaume Gautier, Guillermo Polito, Rémi Bardenet, Michal Valko

Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning.

no code implementations • ICML 2018 • Daniele Calandriello, Alessandro Lazaric, Ioannis Koutis, Michal Valko

By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e. g., SSL).

no code implementations • 6 Jun 2018 • Pierre Perrault, Vianney Perchet, Michal Valko

We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution.

no code implementations • 27 Mar 2018 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that sequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_{eff}(\gamma)$ of the dataset.

no code implementations • NeurIPS 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

The embedded space is continuously updated to guarantee that the embedding remains accurate, and we show that the per-step cost only grows with the effective dimension of the problem and not with $T$.

no code implementations • ICML 2017 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

First-order KOCO methods such as functional gradient descent require only $\mathcal{O}(t)$ time and space per iteration, and, when the only information on the losses is their convexity, achieve a minimax optimal $\mathcal{O}(\sqrt{T})$ regret.

1 code implementation • ICML 2017 • Guillaume Gautier, Rémi Bardenet, Michal Valko

Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general.

no code implementations • NeurIPS 2016 • Jean-bastien Grill, Michal Valko, Remi Munos

We study the sampling-based planning problem in Markov decision processes (MDPs) that we can access only through a generative model, usually referred to as Monte-Carlo planning.

no code implementations • 13 Sep 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko

We derive a new proof to show that the incremental resparsification algorithm proposed by Kelner and Levin (2013) produces a spectral sparsifier in high probability.

1 code implementation • NeurIPS 2017 • Zheng Wen, Branislav Kveton, Michal Valko, Sharan Vaswani

Specifically, we aim to learn the set of "best influencers" in a social network online while repeatedly interacting with it.

no code implementations • 21 Jan 2016 • Daniele Calandriello, Alessandro Lazaric, Michal Valko, Ioannis Koutis

While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples.

no code implementations • NeurIPS 2015 • Jean-bastien Grill, Michal Valko, Remi Munos

We study the problem of black-box optimization of a function $f$ of any dimension, given function evaluations perturbed by noise.

no code implementations • 15 Jun 2015 • Manjesh Kumar Hanawal, Venkatesh Saligrama, Michal Valko, R\' emi Munos

We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}.

no code implementations • 18 May 2015 • Alexandra Carpentier, Michal Valko

As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter $\beta$ characterizing the distribution of the near-optimal arms.

no code implementations • NeurIPS 2014 • Alexandra Carpentier, Michal Valko

In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values.

no code implementations • NeurIPS 2014 • Tomáš Kocák, Gergely Neu, Michal Valko, Remi Munos

As the predictions of our first algorithm cannot be always computed efficiently in this setting, we propose another algorithm with similar properties and with the benefit of always being computationally efficient, at the price of a slightly more complicated tuning mechanism.

no code implementations • NeurIPS 2014 • Gergely Neu, Michal Valko

Most work on sequential learning assumes a fixed set of actions that are available all the time.

no code implementations • 30 May 2014 • Branislav Kveton, Zheng Wen, Azin Ashkan, Michal Valko

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method.

no code implementations • 26 Sep 2013 • Michal Valko, Nathaniel Korda, Remi Munos, Ilias Flaounas, Nelo Cristianini

For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.