Search Results for author: Aryaman Arora

Found 25 papers, 14 papers with code

Bhāṣācitra: Visualising the dialect geography of South Asia

1 code implementation ACL (LChange) 2021 Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu

We present Bhāṣācitra, a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data.

Universal Dependencies for Punjabi

no code implementations LREC 2022 Aryaman Arora

We introduce the first Universal Dependencies treebank for Punjabi (written in the Gurmukhi script) and discuss corpus design and linguistic phenomena encountered in annotation.

POS

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation NAACL (SIGMORPHON) 2022 Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

ReFT: Representation Finetuning for Language Models

2 code implementations4 Apr 2024 Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

3 code implementations12 Mar 2024 Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

1 code implementation19 Feb 2024 Aryaman Arora, Dan Jurafsky, Christopher Potts

Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e. g., surprisal comparisons).

Benchmarking Interpretability Techniques for Deep Learning

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

1 code implementation23 Jan 2024 Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".

IruMozhi: Automatically classifying diglossia in Tamil

no code implementations13 Nov 2023 Kabilan Prasanna, Aryaman Arora

Tamil, a Dravidian language of South Asia, is a highly diglossic language with two very different registers in everyday use: Literary Tamil (preferred in writing and formal communication) and Spoken Tamil (confined to speech and informal media).

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

1 code implementation27 Aug 2023 Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang

Furthermore, we release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability by the community.

Question Answering Text Generation +1

Jambu: A historical linguistic database for South Asian languages

1 code implementation5 Jun 2023 Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala

We introduce Jambu, a cognate database of South Asian languages which unifies dozens of previous sources in a structured and accessible format.

CGELBank Annotation Manual v1.0

1 code implementation27 May 2023 Brett Reynolds, Nathan Schneider, Aryaman Arora

CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language.

Localizing Model Behavior with Path Patching

1 code implementation12 Apr 2023 Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora

Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes.

CGELBank: CGEL as a Framework for English Syntax Annotation

1 code implementation1 Oct 2022 Brett Reynolds, Aryaman Arora, Nathan Schneider

We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project.

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

no code implementations LREC 2022 Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi.

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Estimating the Entropy of Linguistic Distributions

no code implementations ACL 2022 Aryaman Arora, Clara Meister, Ryan Cotterell

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language.

Computational historical linguistics and language diversity in South Asia

no code implementations ACL 2022 Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala

South Asia is home to a plethora of languages, many of which severely lack access to new language technologies.

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

no code implementations UDW (SyntaxFest) 2021 Adam Farris, Aryaman Arora

We present the first linguistically annotated treebank of Ashokan Prakrit, an early Middle Indo-Aryan dialect continuum attested through Emperor Ashoka Maurya's 3rd century BCE rock and pillar edicts.

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

1 code implementation COLING (LAW) 2020 Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Bradford Salen, Nathan Schneider

We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish.

Bhasacitra: Visualising the dialect geography of South Asia

no code implementations28 May 2021 Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu

We present Bhasacitra, a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data.

Hindi-Urdu Adposition and Case Supersenses v1.0

no code implementations2 Mar 2021 Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

These are the guidelines for the application of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al. 2018) to Modern Standard Hindi of Delhi.

Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

1 code implementation ACL 2020 Aryaman Arora, Luke Gessler, Nathan Schneider

Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted).

Cannot find the paper you are looking for? You can Submit a new open access paper.