Search Results for author: Kshitij Gupta

Found 17 papers, 10 papers with code

Translate and Classify: Improving Sequence Level Classification for English-Hindi Code-Mixed Data

1 code implementation NAACL (CALCS) 2021 Devansh Gautam, Kshitij Gupta, Manish Shrivastava

To translate English-Hindi code-mixed data to English, we use mBART, a pre-trained multilingual sequence-to-sequence model that has shown competitive performance on various low-resource machine translation pairs and has also shown performance gains in languages that were not in its pre-training corpus.

Machine Translation Natural Language Inference +2

CoMeT: Towards Code-Mixed Translation Using Parallel Monolingual Sentences

1 code implementation NAACL (CALCS) 2021 Devansh Gautam, Prashant Kodali, Kshitij Gupta, Anmol Goel, Manish Shrivastava, Ponnurangam Kumaraguru

Code-mixed languages are very popular in multilingual societies around the world, yet the resources lag behind to enable robust systems on such languages.

Machine Translation Translation

Towards Detecting Political Bias in Hindi News Articles

no code implementations ACL 2022 Samyak Agrawal, Kshitij Gupta, Devansh Gautam, Radhika Mamidi

Political propaganda in recent times has been amplified by media news portals through biased reporting, creating untruthful narratives on serious issues causing misinformed public opinions with interests of siding and helping a particular political party.

Bias Detection Transfer Learning

Scaling Instructable Agents Across Many Simulated Worlds

no code implementations13 Mar 2024 SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi, Zhitao Gong, Lucy Gonzales, Kshitij Gupta, Karol Gregor, Arne Olav Hallingstad, Tim Harley, Sam Haves, Felix Hill, Ed Hirst, Drew A. Hudson, Jony Hudson, Steph Hughes-Fitt, Danilo J. Rezende, Mimi Jasarevic, Laura Kampis, Rosemary Ke, Thomas Keck, Junkyung Kim, Oscar Knagg, Kavya Kopparapu, Andrew Lampinen, Shane Legg, Alexander Lerchner, Marjorie Limont, YuLan Liu, Maria Loks-Thompson, Joseph Marino, Kathryn Martin Cussons, Loic Matthey, Siobhan Mcloughlin, Piermaria Mendolicchio, Hamza Merzic, Anna Mitenkova, Alexandre Moufarek, Valeria Oliveira, Yanko Oliveira, Hannah Openshaw, Renke Pan, Aneesh Pappu, Alex Platonov, Ollie Purkiss, David Reichert, John Reid, Pierre Harvey Richemond, Tyson Roberts, Giles Ruscoe, Jaume Sanchez Elias, Tasha Sandars, Daniel P. Sawyer, Tim Scholtes, Guy Simmons, Daniel Slater, Hubert Soyer, Heiko Strathmann, Peter Stys, Allison C. Tam, Denis Teplyashin, Tayfun Terzi, Davide Vercelli, Bojan Vujatovic, Marcus Wainwright, Jane X. Wang, Zhengdong Wang, Daan Wierstra, Duncan Williams, Nathaniel Wong, Sarah York, Nick Young

Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI.

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

ARB: Advanced Reasoning Benchmark for Large Language Models

no code implementations25 Jul 2023 Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge.

Math

Broken Neural Scaling Laws

1 code implementation26 Oct 2022 Ethan Caballero, Kshitij Gupta, Irina Rish, David Krueger

Moreover, this functional form accurately models and extrapolates scaling behavior that other functional forms are incapable of expressing such as the non-monotonic transitions present in the scaling behavior of phenomena such as double descent and the delayed, sharp inflection points present in the scaling behavior of tasks such as arithmetic.

Adversarial Robustness Continual Learning +8

MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation

no code implementations1 Oct 2022 Kshitij Gupta

We empirically demonstrate the effectiveness of self-supervised pre-training and data augmentation for zero-shot multi-lingual machine translation.

Data Augmentation Language Modelling +3

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

1 code implementation7 Jun 2022 Kshitij Gupta, Devansh Gautam, Radhika Mamidi

We propose a pipeline that utilizes English-only vision-language models to train a monolingual model for a target language.

Knowledge Distillation Question Answering +1

Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning

2 code implementations30 May 2022 Aniket Didolkar, Kshitij Gupta, Anirudh Goyal, Nitesh B. Gundavarapu, Alex Lamb, Nan Rosemary Ke, Yoshua Bengio

A slow stream that is recurrent in nature aims to learn a specialized and compressed representation, by forcing chunks of $K$ time steps into a single representation which is divided into multiple vectors.

Decision Making Inductive Bias

Volta at SemEval-2021 Task 9: Statement Verification and Evidence Finding with Tables using TAPAS and Transfer Learning

1 code implementation SEMEVAL 2021 Devansh Gautam, Kshitij Gupta, Manish Shrivastava

We fine-tune TAPAS (a model which extends BERT's architecture to capture tabular structure) for both the subtasks as it has shown state-of-the-art performance in various table understanding tasks.

Logical Reasoning Transfer Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.