no code implementations • 18 Apr 2024 • Shreya Shankar, J. D. Zamfirescu-Pereira, Björn Hartmann, Aditya G. Parameswaran, Ian Arawjo
In particular, we identify a phenomenon we dub \emph{criteria drift}: users need criteria to grade outputs, but grading outputs helps users define criteria.
no code implementations • 7 Aug 2023 • Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman jain, Yujie Wang
Large language models (LLMs) are incredibly powerful at comprehending and generating data in the form of text, but are brittle and error-prone.
no code implementations • 16 Sep 2022 • Shreya Shankar, Rolando Garcia, Joseph M. Hellerstein, Aditya G. Parameswaran
Organizations rely on machine learning engineers (MLEs) to operationalize ML, i. e., deploy and maintain ML pipelines in production.
no code implementations • 23 May 2022 • Shreya Shankar, Bernease Herman, Aditya G. Parameswaran
While most work on evaluating machine learning (ML) models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i. e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly.