no code implementations • 10 Sep 2024 • Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur
In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.
no code implementations • 14 Aug 2024 • Jiri Hron, Laura Culp, Gamaleldin Elsayed, Rosanne Liu, Ben Adlam, Maxwell Bileschi, Bernd Bohnet, JD Co-Reyes, Noah Fiedel, C. Daniel Freeman, Izzeddin Gur, Kathleen Kenealy, Jaehoon Lee, Peter J. Liu, Gaurav Mishra, Igor Mordatch, Azade Nova, Roman Novak, Aaron Parisi, Jeffrey Pennington, Alex Rizkowsky, Isabelle Simpson, Hanie Sedghi, Jascha Sohl-Dickstein, Kevin Swersky, Sharad Vikram, Tris Warkentin, Lechao Xiao, Kelvin Xu, Jasper Snoek, Simon Kornblith
We find that for a fixed dataset, larger and longer-trained LMs hallucinate less.
no code implementations • 8 Jul 2024 • Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington
Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices.
no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel
To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.
1 code implementation • 30 Nov 2023 • Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur
We show that while existing prompted LMAs (gpt-3. 5-turbo or gpt-4) achieve 94. 0% average success rate on base tasks, their performance degrades to 24. 9% success rate on compositional tasks.
no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein
We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.
no code implementations • 25 Sep 2023 • Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith
In this work, we seek ways to reproduce and study training stability and instability at smaller scales.
no code implementations • 24 Jul 2023 • Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust
Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation.
Ranked #1 on on Mind2Web
no code implementations • 19 May 2023 • Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur
The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data.
no code implementations • 29 Nov 2022 • Srivatsan Krishnan, Natasha Jaques, Shayegan Omidshafiei, Dan Zhang, Izzeddin Gur, Vijay Janapa Reddi, Aleksandra Faust
It is unclear how scalable single-agent formulations are as we increase the complexity of the design space (e. g., full stack System-on-Chip design).
1 code implementation • 19 Oct 2022 • Abdus Salam Azad, Izzeddin Gur, Jasper Emhoff, Nathaniel Alexis, Aleksandra Faust, Pieter Abbeel, Ion Stoica
Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks.
no code implementations • 8 Oct 2022 • Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust
We contribute HTML understanding models (fine-tuned LLMs) and an in-depth analysis of their capabilities under three tasks: (i) Semantic Classification of HTML elements, (ii) Description Generation for HTML inputs, and (iii) Autonomous Web Navigation of HTML pages.
no code implementations • 25 May 2022 • Sungryull Sohn, Hyunjae Woo, Jongwook Choi, lyubing qiang, Izzeddin Gur, Aleksandra Faust, Honglak Lee
Different from the previous meta-rl methods trying to directly infer the unstructured task embedding, our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks, and use it as a prior to improve the task inference in testing.
Hierarchical Reinforcement Learning Meta Reinforcement Learning +2
1 code implementation • NeurIPS 2021 • Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Manoj Tiwari, Honglak Lee, Aleksandra Faust
We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments.
no code implementations • CVPR 2022 • Su Wang, Ceslee Montgomery, Jordi Orbay, Vighnesh Birodkar, Aleksandra Faust, Izzeddin Gur, Natasha Jaques, Austin Waters, Jason Baldridge, Peter Anderson
We study the automatic generation of navigation instructions from 360-degree images captured on indoor routes.
no code implementations • 29 Sep 2021 • Izzeddin Gur, Ofir Nachum, Aleksandra Faust
We formalize our approach as offline targeted environment design(OTED), which automatically learns a distribution over simulator parameters to match a provided offline dataset, and then uses the learned simulator to train an RL agent in standard online fashion.
no code implementations • ICML Workshop URL 2021 • Alberto Camacho, Izzeddin Gur, Marcin Lukasz Moczulski, Ofir Nachum, Aleksandra Faust
We are concerned with a setting where the demonstrations comprise only a subset of state-action pairs (as opposed to the whole trajectories).
1 code implementation • 2 Mar 2021 • Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust
The regret objective trains the adversary to design a curriculum of environments that are "just-the-right-challenge" for the navigator agents; our results show that over time, the adversary learns to generate increasingly complex web navigation tasks.
no code implementations • 24 Nov 2020 • Jihyeon Lee, Joseph Z. Xu, Kihyuk Sohn, Wenhan Lu, David Berthelot, Izzeddin Gur, Pranav Khaitan, Ke-Wei, Huang, Kyriacos Koupparis, Bernhard Kowatsch
To respond to disasters such as earthquakes, wildfires, and armed conflicts, humanitarian organizations require accurate and timely data in the form of damage assessments, which indicate what buildings and population centers have been most affected.
no code implementations • ICLR 2019 • Izzeddin Gur, Ulrich Rueckert, Aleksandra Faust, Dilek Hakkani-Tur
Even though recent approaches improve the success rate on relatively simple environments with the help of human demonstrations to guide the exploration, they still fail in environments where the set of possible instructions can reach millions.
no code implementations • 11 Nov 2018 • Izzeddin Gur, Dilek Hakkani-Tur, Gokhan Tur, Pararth Shah
We further develop several variants by utilizing a latent variable model to inject random variations into user responses to promote diversity in simulated user responses and a novel goal regularization mechanism to penalize divergence of user responses from the initial user goal.
no code implementations • EMNLP 2018 • Semih Yavuz, Izzeddin Gur, Yu Su, Xifeng Yan
The SQL queries in WikiSQL are simple: Each involves one relation and does not have any join operation.
no code implementations • ACL 2018 • Izzeddin Gur, Semih Yavuz, Yu Su, Xifeng Yan
The recent advance in deep learning and semantic parsing has significantly improved the translation accuracy of natural language questions to structured queries.
no code implementations • EMNLP 2017 • Semih Yavuz, Izzeddin Gur, Yu Su, Xifeng Yan
The existing factoid QA systems often lack a post-inspection component that can help models recover from their own mistakes.
no code implementations • EMNLP 2017 • Daniel Hewlett, Llion Jones, Alex Lacoste, re, Izzeddin Gur
We also evaluate the model in a semi-supervised setting by downsampling the WikiReading training set to create increasingly smaller amounts of supervision, while leaving the full unlabeled document corpus to train a sequence autoencoder on document windows.
2 code implementations • NAACL 2018 • Yu Su, Honglei Liu, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan
We study the problem of textual relation embedding with distant supervision.