3 code implementations • 11 Jul 2023 • Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparameterized models remains poorly understood, while training costs grow exponentially.
1 code implementation • ACL 2022 • Vladislav Lialin, Kevin Zhao, Namrata Shivagunde, Anna Rumshisky
Existing pre-trained transformer analysis works usually focus only on one or two model families at a time, overlooking the variability of the architecture and pre-training objectives.
1 code implementation • ACL 2022 • Saurabh Kulshreshtha, Olga Kovaleva, Namrata Shivagunde, Anna Rumshisky
Solving crossword puzzles requires diverse reasoning capabilities, access to a vast amount of knowledge about language and the world, and the ability to satisfy the constraints imposed by the structure of the puzzle.
Natural Language Understanding Open-Domain Question Answering +1
1 code implementation • 29 Mar 2023 • Namrata Shivagunde, Vladislav Lialin, Anna Rumshisky
Finally, we observe that while GPT3 has generated all the examples in ROLE-1500 is only able to solve 24. 6% of them during probing.
1 code implementation • 2 Apr 2024 • Namrata Shivagunde, Vladislav Lialin, Sherin Muckatira, Anna Rumshisky
In contrast, the underlying pre-trained LLMs they use as a backbone are known to be brittle in this respect.