no code implementations • 12 Mar 2025 • Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, Amir Gholami
Large language models (LLMs) have shown remarkable advancements in enabling language agents to tackle simple tasks.
no code implementations • 31 Jan 2025 • Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta
The remarkable progress in text-to-video diffusion models enables photorealistic generations, although the contents of the generated video often include unnatural movement or deformation, reverse playback, and motionless scenes.
1 code implementation • 9 Jan 2025 • Gouki Minegishi, Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo
Sparse autoencoders (SAEs) have gained a lot of attention as a promising tool to improve the interpretability of large language models (LLMs) by mapping the complex superposition of polysemantic neurons into monosemantic features and composing a sparse dictionary of words.
no code implementations • 3 Dec 2024 • Hiroki Furuta, Heiga Zen, Dale Schuurmans, Aleksandra Faust, Yutaka Matsuo, Percy Liang, Sherry Yang
In this work, we investigate the use of feedback to enhance the object dynamics in text-to-video models.
no code implementations • 10 Sep 2024 • Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur
In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.
1 code implementation • 26 Feb 2024 • Hiroki Furuta, Gouki Minegishi, Yusuke Iwasawa, Yutaka Matsuo
Grokking on modular addition has been known to implement Fourier representation and its calculation circuits with trigonometric identities in Transformers.
no code implementations • 15 Feb 2024 • Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer
Current Large Language Models (LLMs) are not only limited to some maximum context length, but also are not able to robustly consume long inputs.
1 code implementation • 30 Nov 2023 • Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur
We show that while existing prompted LMAs (gpt-3. 5-turbo or gpt-4) achieve 94. 0% average success rate on base tasks, their performance degrades to 24. 9% success rate on compositional tasks.
no code implementations • 24 Jul 2023 • Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust
Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation.
Ranked #1 on
on Mind2Web
no code implementations • 19 May 2023 • Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum, Yutaka Matsuo, Aleksandra Faust, Shixiang Shane Gu, Izzeddin Gur
The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data.
1 code implementation • 28 Nov 2022 • So Kuroki, Tatsuya Matsushima, Jumpei Arima, Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu, Yujin Tang
While natural systems often present collective intelligence that allows them to self-organize and adapt to changes, the equivalent is missing in most artificial systems.
1 code implementation • 25 Nov 2022 • Hiroki Furuta, Yusuke Iwasawa, Yutaka Matsuo, Shixiang Shane Gu
The rise of generalist large-scale models in natural language and vision has made us expect that a massive data-driven approach could achieve broader generalization in other domains such as continuous control.
1 code implementation • 19 Nov 2021 • Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
We present Generalized Decision Transformer (GDT) for solving any HIM problem, and show how different choices for the feature function and the anti-causal aggregator not only recover DT as a special case, but also lead to novel Categorical DT (CDT) and Bi-directional DT (BDT) for matching different statistics of the future.
no code implementations • 10 Oct 2021 • Shixiang Shane Gu, Manfred Diaz, Daniel C. Freeman, Hiroki Furuta, Seyed Kamyar Seyed Ghasemipour, Anton Raichuk, Byron David, Erik Frey, Erwin Coumans, Olivier Bachem
While reward maximization is at the core of RL, reward engineering is not the only -- sometimes nor the easiest -- way for specifying complex behaviors.
no code implementations • ICLR 2022 • Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
Inspired by distributional and state-marginal matching literatures in RL, we demonstrate that all these approaches are essentially doing hindsight information matching (HIM) -- training policies that can output the rest of trajectory that matches a given future state information statistics.
1 code implementation • NeurIPS 2021 • Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang Shane Gu
These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO's performances but also transfer to noticeable gains in SAC.
1 code implementation • 23 Mar 2021 • Hiroki Furuta, Tatsuya Matsushima, Tadashi Kozuno, Yutaka Matsuo, Sergey Levine, Ofir Nachum, Shixiang Shane Gu
Progress in deep reinforcement learning (RL) research is largely enabled by benchmark task environments.
4 code implementations • ICLR 2021 • Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu
We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.