Search Results for author: Sameena Shah

Found 38 papers, 3 papers with code

AIR-JPMC@SMM4H’22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models

no code implementations • SMM4H (COLING) 2022 • Alec Louis Candidato, Akshat Gupta, Xiaomo Liu, Sameena Shah

This paper presents our submission for the SMM4H 2022-Shared Task on the classification of self-reported intimate partner violence on Twitter (in English).

Paper
Add Code

AIR-JPMC@SMM4H’22: BERT + Ensembling = Too Cool: Using Multiple BERT Models Together for Various COVID-19 Tweet Identification Tasks

no code implementations • SMM4H (COLING) 2022 • Leung Wai Liu, Akshat Gupta, Saheed Obitayo, Xiaomo Liu, Sameena Shah

This paper presents my submission for Tasks 1 and 2 for the Social Media Mining of Health (SMM4H) 2022 Shared Tasks competition.

Paper
Add Code

ViziTex: Interactive Visual Sense-Making of Text Corpora

no code implementations • NAACL (DaSH) 2021 • Natraj Raman, Sameena Shah, Tucker Balch, Manuela Veloso

Information visualization is critical to analytical reasoning and knowledge discovery.

Paper
Add Code

AIR-JPMC@SMM4H’22: Identifying Self-Reported Spanish COVID-19 Symptom Tweets Through Multiple-Model Ensembling

no code implementations • SMM4H (COLING) 2022 • Adrian Garcia Hernandez, Leung Wai Liu, Akshat Gupta, Vineeth Ravi, Saheed O. Obitayo, Xiaomo Liu, Sameena Shah

We present our response to Task 5 of the Social Media Mining for Health Applications (SMM4H) 2022 competition.

Paper
Add Code

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

no code implementations • 5 Apr 2024 • Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia.

Document Classification document understanding +5

Paper
Add Code

Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency

no code implementations • 26 Mar 2024 • Toyin Aguda, Suchetha Siddagangappa, Elena Kochkina, Simerjot Kaur, Dongsheng Wang, Charese Smiley, Sameena Shah

Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them.

Paper
Add Code

Log Summarisation for Defect Evolution Analysis

no code implementations • 13 Mar 2024 • Rares Dolga, Ran Zmigrod, Rui Silva, Salwa Alamir, Sameena Shah

Log analysis and monitoring are essential aspects in software maintenance and identifying defects.

Paper
Add Code

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

no code implementations • 7 Feb 2024 • Ran Zmigrod, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content.

Paper
Add Code

DocGraphLM: Documental Graph Language Model for Information Extraction

no code implementations • 5 Jan 2024 • Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah

Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts.

document understanding Language Modelling +2

Paper
Add Code

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

no code implementations • 12 Oct 2023 • Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models.

Paper
Add Code

Synthetic Text Generation using Hypergraph Representations

no code implementations • 6 Sep 2023 • Natraj Raman, Sameena Shah

Generating synthetic variants of a document is often posed as text-to-text transformation.

Hypergraph representations Text Generation

Paper
Add Code

Unsupervised Domain Adaptation using Lexical Transformations and Label Injection for Twitter Data

no code implementations • 14 Jul 2023 • Akshat Gupta, Xiaomo Liu, Sameena Shah

A large body of literature tries to solve this problem by adapting models trained on the source domain to the target domain.

Part-Of-Speech Tagging POS +2

Paper
Add Code

How Effective Are Neural Networks for Fixing Security Vulnerabilities

1 code implementation • 29 May 2023 • Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah

The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.

Code Completion Program Repair

Paper
Code

InProC: Industry and Product/Service Code Classification

no code implementations • 22 May 2023 • Simerjot Kaur, Andrea Stefanucci, Sameena Shah

However, unavailability of labeled datasets as well as the need for high precision results within the financial domain makes this a challenging problem.

Classification Code Classification +1

Paper
Add Code

REFinD: Relation Extraction Financial Dataset

no code implementations • 22 May 2023 • Simerjot Kaur, Charese Smiley, Akshat Gupta, Joy Sain, Dongsheng Wang, Suchetha Siddagangappa, Toyin Aguda, Sameena Shah

A number of datasets for Relation Extraction (RE) have been created to aide downstream tasks such as information retrieval, semantic search, question answering and textual entailment.

General Knowledge Information Retrieval +5

Paper
Add Code

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

no code implementations • 10 May 2023 • Xianzhi Li, Samuel Chan, Xiaodan Zhu, Yulong Pei, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with little or no adaptation.

Ranked #1 on Question Answering on ConvFinQA

Binary Classification named-entity-recognition +5

Paper
Add Code

Bayesian Hierarchical Models for Counterfactual Estimation

no code implementations • 21 Jan 2023 • Natraj Raman, Daniele Magazzeni, Sameena Shah

Counterfactual explanations utilize feature perturbations to analyze the outcome of an original decision and recommend an actionable recourse.

counterfactual Fairness +1

Paper
Add Code

Neural Transition-based Parsing of Library Deprecations

no code implementations • 23 Dec 2022 • Petr Babkin, Nacho Navarro, Salwa Alamir, Sameena Shah

This paper tackles the challenging problem of automating code updates to fix deprecated API usages of open source libraries by analyzing their release notes.

Machine Translation

Paper
Add Code

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

1 code implementation • 7 Oct 2022 • Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching.

Ranked #2 on Question Answering on ConvFinQA

Conversational Question Answering

Paper
Code

AIR-JPMC@SMM4H'22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models

no code implementations • 22 Sep 2022 • Alec Candidato, Akshat Gupta, Xiaomo Liu, Sameena Shah

This paper presents our submission for the SMM4H 2022-Shared Task on the classification of self-reported intimate partner violence on Twitter (in English).

Paper
Add Code

Online Learning for Mixture of Multivariate Hawkes Processes

no code implementations • 16 Aug 2022 • Mohsen Ghassemi, Niccolò Dalmasso, Simran Lamba, Vamsi K. Potluru, Sameena Shah, Tucker Balch, Manuela Veloso

Online learning of Hawkes processes has received increasing attention in the last couple of years especially for modeling a network of actors.

Paper
Add Code

Bandit Sampling for Multiplex Networks

no code implementations • 8 Feb 2022 • Cenk Baykal, Vamsi K. Potluru, Sameena Shah, Manuela M. Veloso

Most of the existing work focuses primarily on the monoplex setting where we have access to a network with only a single type of connection between entities.

Link Prediction Node Classification

Paper
Add Code

Structure and Semantics Preserving Document Representations

no code implementations • 11 Jan 2022 • Natraj Raman, Sameena Shah, Manuela Veloso

Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text.

Metric Learning Retrieval +2

Paper
Add Code

Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training

no code implementations • NeurIPS 2021 • Shangshu Qian, Hung Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, YaoLiang Yu, Jiahao Chen, Sameena Shah

Our study of 22 mitigation techniques and five baselines reveals up to 12. 6% fairness variance across identical training runs with identical seeds.

Crime Prediction Fairness

Paper
Add Code

Synthetic Document Generator for Annotation-free Layout Recognition

no code implementations • 11 Nov 2021 • Natraj Raman, Sameena Shah, Manuela Veloso

Analyzing the layout of a document to identify headers, sections, tables, figures etc.

Paper
Add Code

Parameterized Explanations for Investor / Company Matching

no code implementations • 27 Oct 2021 • Simerjot Kaur, Ivan Brugere, Andrea Stefanucci, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

We compare the performance of our system with human generated recommendations and demonstrate the ability of our algorithm to perform extremely well on this task.

Decision Making Explainable Recommendation +2

Paper
Add Code

A Framework for Institutional Risk Identification using Knowledge Graphs and Automated News Profiling

no code implementations • 19 Sep 2021 • Mahmoud Mahfouz, Armineh Nourbakhsh, Sameena Shah

Organizations around the world face an array of risks impacting their operations globally.

Knowledge Graphs

Paper
Add Code

FinQA: A Dataset of Numerical Reasoning over Financial Data

1 code implementation • EMNLP 2021 • Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations.

Ranked #4 on Question Answering on FinQA

Question Answering

203

Paper
Code

Debiasing classifiers: is reality at variance with expectation?

no code implementations • 4 Nov 2020 • Ashrya Agrawal, Florian Pfisterer, Bernd Bischl, Francois Buet-Golfouse, Srijan Sood, Jiahao Chen, Sameena Shah, Sebastian Vollmer

We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better.

Fairness

Paper
Add Code

Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering

no code implementations • 3 Nov 2020 • Daniel Borrajo, Manuela Veloso, Sameena Shah

One of the key characteristics of these applications is the wide range of strategies that an adversary may choose as they adapt their strategy dynamically to sustain benefits and evade authorities.

Paper
Add Code

Robust Document Representations using Latent Topics and Metadata

no code implementations • 23 Oct 2020 • Natraj Raman, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems.

Document Classification Language Modelling

Paper
Add Code

Explicit Group Sparse Projection with Applications to Deep Learning and NMF

no code implementations • 9 Dec 2019 • Riyasat Ohib, Nicolas Gillis, Niccolò Dalmasso, Sameena Shah, Vamsi K. Potluru, Sergey Plis

Instead, in our approach we set the sparsity level for the whole set explicitly and simultaneously project a group of vectors with the sparsity level of each vector tuned automatically.

Network Pruning

Paper
Add Code

Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data

no code implementations • 11 Nov 2017 • Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Sameena Shah, Robert Martin, John Duprey

It has a bottom-up approach to news detection, and does not rely on a predefined set of sources or subjects.

Social and Information Networks

Paper
Add Code

Data Sets: Word Embeddings Learned from Tweets and General Data

no code implementations • 14 Aug 2017 • Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh

In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data.

Sentiment Analysis Topic Classification +1