Search Results for author: Aryaman Arora

Found 25 papers, 14 papers with code

Bhāṣācitra: Visualising the dialect geography of South Asia

1 code implementation • ACL (LChange) 2021 • Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu

We present Bhāṣācitra, a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data.

Paper
Code

SNACS Annotation of Case Markers and Adpositions in Hindi

no code implementations • SCiL 2021 • Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

Paper
Add Code

Universal Dependencies for Punjabi

no code implementations • LREC 2022 • Aryaman Arora

We introduce the first Universal Dependencies treebank for Punjabi (written in the Gurmukhi script) and discuss corpus design and linguistic phenomena encountered in annotation.

POS

Paper
Add Code

SIGMORPHON–UniMorph 2022 Shared Task 0: Generalization and Typologically Diverse Morphological Inflection

1 code implementation • NAACL (SIGMORPHON) 2022 • Jordan Kodner, Salam Khalifa, Khuyagbaatar Batsuren, Hossep Dolatian, Ryan Cotterell, Faruk Akkus, Antonios Anastasopoulos, Taras Andrushko, Aryaman Arora, Nona Atanalov, Gábor Bella, Elena Budianskaya, Yustinus Ghanggo Ate, Omer Goldman, David Guriel, Simon Guriel, Silvia Guriel-Agiashvili, Witold Kieraś, Andrew Krizhanovsky, Natalia Krizhanovsky, Igor Marchenko, Magdalena Markowska, Polina Mashkovtseva, Maria Nepomniashchaya, Daria Rodionova, Karina Scheifer, Alexandra Sorova, Anastasia Yemelina, Jeremiah Young, Ekaterina Vylomova

The 2022 SIGMORPHON–UniMorph shared task on large scale morphological inflection generation included a wide range of typologically diverse languages: 33 languages from 11 top-level language families: Arabic (Modern Standard), Assamese, Braj, Chukchi, Eastern Armenian, Evenki, Georgian, Gothic, Gujarati, Hebrew, Hungarian, Itelmen, Karelian, Kazakh, Ket, Khalkha Mongolian, Kholosi, Korean, Lamahalot, Low German, Ludic, Magahi, Middle Low German, Old English, Old High German, Old Norse, Polish, Pomak, Slovak, Turkish, Upper Sorbian, Veps, and Xibe.

Morphological Inflection

Paper
Code

ReFT: Representation Finetuning for Language Models

2 code implementations • 4 Apr 2024 • Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

590

Paper
Code

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

3 code implementations • 12 Mar 2024 • Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

590

Paper
Code

CausalGym: Benchmarking causal interpretability methods on linguistic tasks

1 code implementation • 19 Feb 2024 • Aryaman Arora, Dan Jurafsky, Christopher Potts

Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e. g., surprisal comparisons).

Ranked #1 on Interpretability Techniques for Deep Learning on CausalGym

Benchmarking Interpretability Techniques for Deep Learning

Paper
Code

Predicting positive transfer for improved low-resource speech recognition using acoustic pseudo-tokens

no code implementations • 3 Feb 2024 • Nay San, Georgios Paraskevopoulos, Aryaman Arora, Xiluo He, Prabhjot Kaur, Oliver Adams, Dan Jurafsky

Continued pre-training on 70-200 hours of untranscribed speech in these languages can help -- but what about languages without that much recorded data?

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

1 code implementation • 23 Jan 2024 • Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".

Paper
Code

IruMozhi: Automatically classifying diglossia in Tamil

no code implementations • 13 Nov 2023 • Kabilan Prasanna, Aryaman Arora

Tamil, a Dravidian language of South Asia, is a highly diglossic language with two very different registers in everyday use: Literary Tamil (preferred in writing and formal communication) and Spoken Tamil (confined to speech and informal media).

Paper
Add Code

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

1 code implementation • 27 Aug 2023 • Vedant Palit, Rohan Pandey, Aryaman Arora, Paul Pu Liang

Furthermore, we release our BLIP causal tracing tool as open source to enable further experimentation in vision-language mechanistic interpretability by the community.

Question Answering Text Generation +1

Paper
Code

Jambu: A historical linguistic database for South Asian languages

1 code implementation • 5 Jun 2023 • Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala

We introduce Jambu, a cognate database of South Asian languages which unifies dozens of previous sources in a structured and accessible format.

Paper
Code

CGELBank Annotation Manual v1.0

1 code implementation • 27 May 2023 • Brett Reynolds, Nathan Schneider, Aryaman Arora

CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language.

Paper
Code

Localizing Model Behavior with Path Patching

1 code implementation • 12 Apr 2023 • Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora

Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes.

Paper
Code

CGELBank: CGEL as a Framework for English Syntax Annotation

1 code implementation • 1 Oct 2022 • Brett Reynolds, Aryaman Arora, Nathan Schneider

We introduce the syntactic formalism of the \textit{Cambridge Grammar of the English Language} (CGEL) to the world of treebanking through the CGELBank project.

Paper
Code

The SIGMORPHON 2022 Shared Task on Morpheme Segmentation

1 code implementation • NAACL (SIGMORPHON) 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova

The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.

Ranked #8 on Morpheme Segmentaiton on UniMorph 4.0

Morpheme Segmentaiton Segmentation +1

Paper
Code

MASALA: Modelling and Analysing the Semantics of Adpositions in Linguistic Annotation of Hindi

no code implementations • LREC 2022 • Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

We present a completed, publicly available corpus of annotated semantic relations of adpositions and case markers in Hindi.

Paper
Add Code

UniMorph 4.0: Universal Morphology

no code implementations • LREC 2022 • Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

Paper
Add Code

Estimating the Entropy of Linguistic Distributions

no code implementations • ACL 2022 • Aryaman Arora, Clara Meister, Ryan Cotterell

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language.

Paper
Add Code

Computational historical linguistics and language diversity in South Asia

no code implementations • ACL 2022 • Aryaman Arora, Adam Farris, Samopriya Basu, Suresh Kolichala

South Asia is home to a plethora of languages, many of which severely lack access to new language technologies.

Paper
Add Code

For the Purpose of Curry: A UD Treebank for Ashokan Prakrit

no code implementations • UDW (SyntaxFest) 2021 • Adam Farris, Aryaman Arora

We present the first linguistically annotated treebank of Ashokan Prakrit, an early Middle Indo-Aryan dialect continuum attested through Emperor Ashoka Maurya's 3rd century BCE rock and pillar edicts.

Paper
Add Code

PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English

1 code implementation • COLING (LAW) 2020 • Michael Kranzlein, Emma Manning, Siyao Peng, Shira Wein, Aryaman Arora, Bradford Salen, Nathan Schneider

We present the Prepositions Annotated with Supersense Tags in Reddit International English ("PASTRIE") corpus, a new dataset containing manually annotated preposition supersenses of English data from presumed speakers of four L1s: English, French, German, and Spanish.

Paper
Code

Bhasacitra: Visualising the dialect geography of South Asia

no code implementations • 28 May 2021 • Aryaman Arora, Adam Farris, Gopalakrishnan R, Samopriya Basu

We present Bhasacitra, a dialect mapping system for South Asia built on a database of linguistic studies of languages of the region annotated for topic and location data.

Paper
Add Code

Hindi-Urdu Adposition and Case Supersenses v1.0

no code implementations • 2 Mar 2021 • Aryaman Arora, Nitin Venkateswaran, Nathan Schneider

These are the guidelines for the application of SNACS (Semantic Network of Adposition and Case Supersenses; Schneider et al. 2018) to Modern Standard Hindi of Delhi.

Paper
Add Code

Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi

1 code implementation • ACL 2020 • Aryaman Arora, Luke Gessler, Nathan Schneider

Hindi grapheme-to-phoneme (G2P) conversion is mostly trivial, with one exception: whether a schwa represented in the orthography is pronounced or unpronounced (deleted).

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.