no code implementations • 19 Feb 2024 • Pengrui Han, Rafal Kocielnik, Adhithya Saravanan, Roy Jiang, Or Sharir, Anima Anandkumar
Our results reveal that: (1) ChatGPT can efficiently produce high-quality training data for debiasing other LLMs; (2) data produced via our approach surpasses existing datasets in debiasing performance while also preserving internal knowledge of a pre-trained LLM; and (3) synthetic data exhibits generalizability across categories, effectively mitigating various biases, including intersectional ones.
no code implementations • 27 Jul 2023 • Or Sharir, Anima Anandkumar
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
no code implementations • 21 Dec 2022 • Or Sharir, Garnet Kin-Lic Chan, Anima Anandkumar
Quantum many-body problems are some of the most challenging problems in science and are central to demystifying some exotic quantum phenomena, e. g., high-temperature superconductors.
no code implementations • 18 Mar 2021 • Or Sharir, Amnon Shashua, Giuseppe Carleo
We establish a direct connection between general tensor networks and deep feed-forward artificial neural networks.
no code implementations • 30 Jun 2020 • Yoel Zeldes, Dan Padnos, Or Sharir, Barak Peleg
We introduce a simple and efficient method, called Auxiliary Tuning, for adapting a pre-trained Language Model to a novel task; we demonstrate this approach on the task of conditional text generation.
1 code implementation • NeurIPS 2020 • Yoav Levine, Noam Wies, Or Sharir, Hofit Bata, Amnon Shashua
Our guidelines elucidate the depth-to-width trade-off in self-attention networks of sizes up to the scale of GPT3 (which we project to be too deep for its size), and beyond, marking an unprecedented width of 30K as optimal for a 1-Trillion parameter network.
no code implementations • 19 Apr 2020 • Or Sharir, Barak Peleg, Yoav Shoham
We review the cost of training large-scale language models, and the drivers of these costs.
no code implementations • ACL 2020 • Yoav Levine, Barak Lenz, Or Dagan, Ori Ram, Dan Padnos, Or Sharir, Shai Shalev-Shwartz, Amnon Shashua, Yoav Shoham
The ability to learn from large unlabeled corpora has allowed neural language models to advance the frontier in natural language understanding.
Ranked #11 on Word Sense Disambiguation on Words in Context
2 code implementations • 11 Feb 2019 • Or Sharir, Yoav Levine, Noam Wies, Giuseppe Carleo, Amnon Shashua
Artificial Neural Networks were recently shown to be an efficient representation of highly-entangled many-body quantum states.
no code implementations • 26 Mar 2018 • Yoav Levine, Or Sharir, Nadav Cohen, Amnon Shashua
Modern deep learning has enabled unprecedented achievements in various domains.
no code implementations • ICLR 2018 • Yoav Levine, Or Sharir, Amnon Shashua
We prove that deep recurrent networks support Start-End separation ranks which are exponentially higher than those supported by their shallow counterparts.
1 code implementation • 25 Oct 2017 • Yoav Levine, Or Sharir, Alon Ziv, Amnon Shashua
A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies.
no code implementations • 12 Oct 2017 • Or Sharir, Amnon Shashua
We present a novel tractable generative model that extends Sum-Product Networks (SPNs) and significantly boosts their power.
no code implementations • 5 May 2017 • Nadav Cohen, Or Sharir, Yoav Levine, Ronen Tamari, David Yakira, Amnon Shashua
Expressive efficiency refers to the ability of a network architecture to realize functions that require an alternative architecture to be much larger.
1 code implementation • ICLR 2018 • Or Sharir, Amnon Shashua
Expressive efficiency refers to the relation between two architectures A and B, whereby any function realized by B could be replicated by A, but there exists functions realized by A, which cannot be replicated by B unless its size grows significantly larger.
2 code implementations • 13 Oct 2016 • Or Sharir, Ronen Tamari, Nadav Cohen, Amnon Shashua
Other methods, based on arithmetic circuits and sum-product networks, do allow tractable marginalization, but their performance is challenged by the need to learn the structure of a circuit.
no code implementations • 16 Sep 2015 • Nadav Cohen, Or Sharir, Amnon Shashua
In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling.
no code implementations • CVPR 2016 • Nadav Cohen, Or Sharir, Amnon Shashua
We present a deep layered architecture that generalizes convolutional neural networks (ConvNets).