no code implementations • 9 Mar 2025 • Hritik Bansal, Clark Peng, Yonatan Bitton, Roman Goldenberg, Aditya Grover, Kai-Wei Chang
We perform human evaluation that assesses semantic adherence, physical commonsense, and grounding of physical rules in the generated videos.
1 code implementation • 26 Feb 2025 • Mehran Kazemi, Bahare Fatemi, Hritik Bansal, John Palowitch, Chrysovalantis Anastasiou, Sanket Vaibhav Mehta, Lalit K. Jain, Virginia Aglietti, Disha Jindal, Peter Chen, Nishanth Dikkala, Gladys Tyen, Xin Liu, Uri Shalit, Silvia Chiappa, Kate Olszewska, Yi Tay, Vinh Q. Tran, Quoc V. Le, Orhan Firat
One particular exception is the BIG-Bench dataset, which has served as a crucial benchmark for evaluating the general reasoning capabilities of LLMs, thanks to its diverse set of challenging tasks that allowed for a comprehensive assessment of general reasoning across various skills within a unified framework.
1 code implementation • 17 Dec 2024 • Hritik Bansal, Daniel Israel, Siyan Zhao, Shufan Li, Tung Nguyen, Aditya Grover
To address these gaps, we present MedMax, the first large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models.
no code implementations • 29 Aug 2024 • Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi
Training on high-quality synthetic data from strong language models (LMs) is a common strategy to improve the reasoning performance of LMs.
no code implementations • 27 Aug 2024 • Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
We demonstrate that GenRM outperforms discriminative, DPO verifiers, and LLM-as-a-Judge, resulting in a 16-40% improvement in the number of problems solved with Best-of-N on algorithmic and math reasoning tasks.
1 code implementation • 2 Jul 2024 • Cheng-Yi Li, Kao-Jung Chang, Cheng-Fu Yang, Hsin-Yu Wu, WenTing Chen, Hritik Bansal, Ling Chen, Yi-Ping Yang, Yu-Chun Chen, Shih-Pin Chen, Jiing-Feng Lirng, Kai-Wei Chang, Shih-Hwa Chiou
Multi-modal large language models (MLLMs) have been given free rein to explore exciting medical applications with a primary focus on radiology report generation.
3 code implementations • 17 Jun 2024 • Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner, Maciej Kilian, HANLIN ZHANG, Rulin Shao, Sarah Pratt, Sunny Sanyal, Gabriel Ilharco, Giannis Daras, Kalyani Marathe, Aaron Gokaslan, Jieyu Zhang, Khyathi Chandu, Thao Nguyen, Igor Vasiljevic, Sham Kakade, Shuran Song, Sujay Sanghavi, Fartash Faghri, Sewoong Oh, Luke Zettlemoyer, Kyle Lo, Alaaeldin El-Nouby, Hadi Pouransari, Alexander Toshev, Stephanie Wang, Dirk Groeneveld, Luca Soldaini, Pang Wei Koh, Jenia Jitsev, Thomas Kollar, Alexandros G. Dimakis, Yair Carmon, Achal Dave, Ludwig Schmidt, Vaishaal Shankar
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models.
no code implementations • 5 Jun 2024 • Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover
Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts, synthesize realistic motions and render complex objects.
1 code implementation • 7 May 2024 • Hritik Bansal, Yonatan Bitton, Michal Yarom, Idan Szpektor, Aditya Grover, Kai-Wei Chang
For instance, we condition the visual features of the earlier and later scenes of the generated video with the representations of the first scene description (e. g., 'a red panda climbing a tree') and second scene description (e. g., 'the red panda sleeps on the top of the tree'), respectively.
no code implementations • 7 Apr 2024 • Hritik Bansal, Po-Nien Kung, P. Jeffrey Brantingham, Kai-Wei Chang, Nanyun Peng
In this paper, we propose GenEARL, a training-free generative framework that harness the power of the modern generative models to understand event task descriptions given image contexts to perform the EARL task.
no code implementations • 1 Apr 2024 • Yixin Wan, Arjun Subramonian, Anaelia Ovalle, Zongyu Lin, Ashima Suvarna, Christina Chance, Hritik Bansal, Rebecca Pattichis, Kai-Wei Chang
In this survey, we review prior studies on dimensions of bias: Gender, Skintone, and Geo-Culture.
1 code implementation • 31 Mar 2024 • Hritik Bansal, Ashima Suvarna, Gantavya Bhatt, Nanyun Peng, Kai-Wei Chang, Aditya Grover
A common technique for aligning large language models (LLMs) relies on acquiring human preferences by comparing multiple generations conditioned on a fixed context.
no code implementations • 5 Mar 2024 • Zefan Cai, Po-Nien Kung, Ashima Suvarna, Mingyu Derek Ma, Hritik Bansal, Baobao Chang, P. Jeffrey Brantingham, Wei Wang, Nanyun Peng
We hypothesize that a diverse set of event types and definitions are the key for models to learn to follow event definitions while existing event extraction datasets focus on annotating many high-quality examples for a few event types.
1 code implementation • 24 Jan 2024 • Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang, Nanyun Peng
We conduct experiments to assess the performance of 14 foundation models (GPT-4V, Gemini-Pro-Vision, LLaVA-Next) and establish a human performance baseline.
1 code implementation • 6 Dec 2023 • Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Sandeep Madireddy, Aditya Grover
At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals.
1 code implementation • CVPR 2024 • Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover
Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions.
1 code implementation • 3 Oct 2023 • Pan Lu, Hritik Bansal, Tony Xia, Jiacheng Liu, Chunyuan Li, Hannaneh Hajishirzi, Hao Cheng, Kai-Wei Chang, Michel Galley, Jianfeng Gao
To bridge this gap, we present MathVista, a benchmark designed to combine challenges from diverse mathematical and visual tasks.
1 code implementation • 30 Aug 2023 • Hritik Bansal, John Dang, Aditya Grover
In particular, we find that LLMs that leverage rankings data for alignment (say model X) are preferred over those that leverage ratings data (say model Y), with a rank-based evaluation protocol (is X/Y's response better than reference response?)
1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt
These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.
1 code implementation • NeurIPS 2023 • Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover
Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts.
1 code implementation • 23 May 2023 • Da Yin, Xiao Liu, Fan Yin, Ming Zhong, Hritik Bansal, Jiawei Han, Kai-Wei Chang
Instruction tuning has emerged to enhance the capabilities of large language models (LLMs) to comprehend instructions and generate appropriate responses.
1 code implementation • ICCV 2023 • Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang
Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data.
1 code implementation • 5 Feb 2023 • Hritik Bansal, Aditya Grover
Recent research on robustness has revealed significant performance gaps between neural image classifiers trained on datasets that are similar to the test set, and those that are from a naturally shifted distribution, such as sketches, paintings, and animations of the object categories observed during training.
1 code implementation • 18 Dec 2022 • Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth
Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance.
1 code implementation • 27 Oct 2022 • Hritik Bansal, Da Yin, Masoud Monajatipoor, Kai-Wei Chang
To this end, we introduce an Ethical NaTural Language Interventions in Text-to-Image GENeration (ENTIGEN) benchmark dataset to evaluate the change in image generations conditional on ethical interventions across three social axes -- gender, skin color, and culture.
1 code implementation • 28 May 2022 • Shashank Goel, Hritik Bansal, Sumit Bhatia, Ryan A. Rossi, Vishwa Vinay, Aditya Grover
Recent advances in contrastive representation learning over paired image-text data have led to models such as CLIP that achieve state-of-the-art performance for zero-shot classification and distributional robustness.
1 code implementation • 24 May 2022 • Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang
In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs.
1 code implementation • 10 Feb 2021 • Hritik Bansal, Gantavya Bhatt, Pankaj Malhotra, Prathosh A. P
Systematic generalization aims to evaluate reasoning about novel combinations from known components, an intrinsic property of human cognition.
1 code implementation • SCiL 2021 • Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal
However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences with attractors.
1 code implementation • ACL 2020 • Gantavya Bhatt, Hritik Bansal, Rishubh Singh, Sumeet Agarwal
Long short-term memory (LSTM) networks and their variants are capable of encapsulating long-range dependencies, which is evident from their performance on a variety of linguistic tasks.
Ranked #35 on
Language Modelling
on WikiText-103
(Validation perplexity metric)
no code implementations • 21 Apr 2019 • Ujjwal Gupta, Hritik Bansal, Deepak Joshi
In this paper, we develop a sex-specific and age-dependent classification method to diagnose the Parkinson's disease using the online handwriting recorded from individuals with Parkinson's(n=37;m/f-19/18;age-69. 3+-10. 9years) and healthy controls(n=38;m/f-20/18;age-62. 4+-11. 3 years). The sex specific and age dependent classifier was observed significantly outperforming the generalized classifier.