Search Results for author: Sameena Shah

Found 38 papers, 3 papers with code

AIR-JPMC@SMM4H’22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models

no code implementations SMM4H (COLING) 2022 Alec Louis Candidato, Akshat Gupta, Xiaomo Liu, Sameena Shah

This paper presents our submission for the SMM4H 2022-Shared Task on the classification of self-reported intimate partner violence on Twitter (in English).

BuDDIE: A Business Document Dataset for Multi-task Information Extraction

no code implementations5 Apr 2024 Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia.

Document Classification document understanding +5

Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency

no code implementations26 Mar 2024 Toyin Aguda, Suchetha Siddagangappa, Elena Kochkina, Simerjot Kaur, Dongsheng Wang, Charese Smiley, Sameena Shah

Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them.

Log Summarisation for Defect Evolution Analysis

no code implementations13 Mar 2024 Rares Dolga, Ran Zmigrod, Rui Silva, Salwa Alamir, Sameena Shah

Log analysis and monitoring are essential aspects in software maintenance and identifying defects.

TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

no code implementations7 Feb 2024 Ran Zmigrod, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content.

DocGraphLM: Documental Graph Language Model for Information Extraction

no code implementations5 Jan 2024 Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah

Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts.

document understanding Language Modelling +2

Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

no code implementations12 Oct 2023 Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models.

Synthetic Text Generation using Hypergraph Representations

no code implementations6 Sep 2023 Natraj Raman, Sameena Shah

Generating synthetic variants of a document is often posed as text-to-text transformation.

Hypergraph representations Text Generation

Unsupervised Domain Adaptation using Lexical Transformations and Label Injection for Twitter Data

no code implementations14 Jul 2023 Akshat Gupta, Xiaomo Liu, Sameena Shah

A large body of literature tries to solve this problem by adapting models trained on the source domain to the target domain.

Part-Of-Speech Tagging POS +2

How Effective Are Neural Networks for Fixing Security Vulnerabilities

1 code implementation29 May 2023 Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah

The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair.

Code Completion Program Repair

InProC: Industry and Product/Service Code Classification

no code implementations22 May 2023 Simerjot Kaur, Andrea Stefanucci, Sameena Shah

However, unavailability of labeled datasets as well as the need for high precision results within the financial domain makes this a challenging problem.

Classification Code Classification +1

REFinD: Relation Extraction Financial Dataset

no code implementations22 May 2023 Simerjot Kaur, Charese Smiley, Akshat Gupta, Joy Sain, Dongsheng Wang, Suchetha Siddagangappa, Toyin Aguda, Sameena Shah

A number of datasets for Relation Extraction (RE) have been created to aide downstream tasks such as information retrieval, semantic search, question answering and textual entailment.

General Knowledge Information Retrieval +5

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks

no code implementations10 May 2023 Xianzhi Li, Samuel Chan, Xiaodan Zhu, Yulong Pei, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

The most recent large language models(LLMs) such as ChatGPT and GPT-4 have shown exceptional capabilities of generalist models, achieving state-of-the-art performance on a wide range of NLP tasks with little or no adaptation.

Binary Classification named-entity-recognition +5

Bayesian Hierarchical Models for Counterfactual Estimation

no code implementations21 Jan 2023 Natraj Raman, Daniele Magazzeni, Sameena Shah

Counterfactual explanations utilize feature perturbations to analyze the outcome of an original decision and recommend an actionable recourse.

counterfactual Fairness +1

Neural Transition-based Parsing of Library Deprecations

no code implementations23 Dec 2022 Petr Babkin, Nacho Navarro, Salwa Alamir, Sameena Shah

This paper tackles the challenging problem of automating code updates to fix deprecated API usages of open source libraries by analyzing their release notes.

Machine Translation

ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering

1 code implementation7 Oct 2022 Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, William Yang Wang

With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching.

Conversational Question Answering

AIR-JPMC@SMM4H'22: Classifying Self-Reported Intimate Partner Violence in Tweets with Multiple BERT-based Models

no code implementations22 Sep 2022 Alec Candidato, Akshat Gupta, Xiaomo Liu, Sameena Shah

This paper presents our submission for the SMM4H 2022-Shared Task on the classification of self-reported intimate partner violence on Twitter (in English).

Online Learning for Mixture of Multivariate Hawkes Processes

no code implementations16 Aug 2022 Mohsen Ghassemi, Niccolò Dalmasso, Simran Lamba, Vamsi K. Potluru, Sameena Shah, Tucker Balch, Manuela Veloso

Online learning of Hawkes processes has received increasing attention in the last couple of years especially for modeling a network of actors.

Bandit Sampling for Multiplex Networks

no code implementations8 Feb 2022 Cenk Baykal, Vamsi K. Potluru, Sameena Shah, Manuela M. Veloso

Most of the existing work focuses primarily on the monoplex setting where we have access to a network with only a single type of connection between entities.

Link Prediction Node Classification

Structure and Semantics Preserving Document Representations

no code implementations11 Jan 2022 Natraj Raman, Sameena Shah, Manuela Veloso

Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text.

Metric Learning Retrieval +2

Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training

no code implementations NeurIPS 2021 Shangshu Qian, Hung Pham, Thibaud Lutellier, Zeou Hu, Jungwon Kim, Lin Tan, YaoLiang Yu, Jiahao Chen, Sameena Shah

Our study of 22 mitigation techniques and five baselines reveals up to 12. 6% fairness variance across identical training runs with identical seeds.

Crime Prediction Fairness

Synthetic Document Generator for Annotation-free Layout Recognition

no code implementations11 Nov 2021 Natraj Raman, Sameena Shah, Manuela Veloso

Analyzing the layout of a document to identify headers, sections, tables, figures etc.

Parameterized Explanations for Investor / Company Matching

no code implementations27 Oct 2021 Simerjot Kaur, Ivan Brugere, Andrea Stefanucci, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

We compare the performance of our system with human generated recommendations and demonstrate the ability of our algorithm to perform extremely well on this task.

Decision Making Explainable Recommendation +2

FinQA: A Dataset of Numerical Reasoning over Financial Data

1 code implementation EMNLP 2021 Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations.

Question Answering

Debiasing classifiers: is reality at variance with expectation?

no code implementations4 Nov 2020 Ashrya Agrawal, Florian Pfisterer, Bernd Bischl, Francois Buet-Golfouse, Srijan Sood, Jiahao Chen, Sameena Shah, Sebastian Vollmer

We present an empirical study of debiasing methods for classifiers, showing that debiasers often fail in practice to generalize out-of-sample, and can in fact make fairness worse rather than better.

Fairness

Simulating and classifying behavior in adversarial environments based on action-state traces: an application to money laundering

no code implementations3 Nov 2020 Daniel Borrajo, Manuela Veloso, Sameena Shah

One of the key characteristics of these applications is the wide range of strategies that an adversary may choose as they adapt their strategy dynamically to sustain benefits and evade authorities.

Robust Document Representations using Latent Topics and Metadata

no code implementations23 Oct 2020 Natraj Raman, Armineh Nourbakhsh, Sameena Shah, Manuela Veloso

Task specific fine-tuning of a pre-trained neural language model using a custom softmax output layer is the de facto approach of late when dealing with document classification problems.

Document Classification Language Modelling

Explicit Group Sparse Projection with Applications to Deep Learning and NMF

no code implementations9 Dec 2019 Riyasat Ohib, Nicolas Gillis, Niccolò Dalmasso, Sameena Shah, Vamsi K. Potluru, Sergey Plis

Instead, in our approach we set the sparsity level for the whole set explicitly and simultaneously project a group of vectors with the sparsity level of each vector tuned automatically.

Network Pruning

Reuters Tracer: Toward Automated News Production Using Large Scale Social Media Data

no code implementations11 Nov 2017 Xiaomo Liu, Armineh Nourbakhsh, Quanzhi Li, Sameena Shah, Robert Martin, John Duprey

It has a bottom-up approach to news detection, and does not rely on a predefined set of sources or subjects.

Social and Information Networks

Data Sets: Word Embeddings Learned from Tweets and General Data

no code implementations14 Aug 2017 Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh

In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data.

Sentiment Analysis Topic Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.