As large language models become more prevalent, their possible harmful or inappropriate responses are a cause for concern.
We quantify this quality by constructing a Known-Similarity Corpora set from two paraphrase corpora and calculating the distance between paired corpora from it.
The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications.
Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora.
Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging.
Data balancing is a known technique for improving the performance of classification tasks.
Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks.
We propose a method for end-to-end training of a base neural network that integrates calls to existing black-box functions.
At inference time, we replace each estimator with its existing application counterpart and let the base network solve the task by interacting with the existing application.