no code implementations • 28 Oct 2024 • Chujie Zheng, Jeffrey Wang, Shuqian Albee Zhang, Anand Kishore, Siddharth Singh
We propose a novel method for evaluating the performance of a content search system that measures the semantic match between a query and the results returned by the search system.
no code implementations • 2 Oct 2024 • Min-Hsuan Yeh, Leitian Tao, Jeffrey Wang, Xuefeng Du, Yixuan Li
Most alignment research today focuses on designing new learning algorithms using datasets like Anthropic-HH, assuming human feedback data is inherently reliable.
no code implementations • 22 Oct 2023 • Marvin Li, Jason Wang, Jeffrey Wang, Seth Neel
In this paper, we present Model Perturbations (MoPe), a new method to identify with high confidence if a given text is in the training data of a pre-trained language model, given white-box access to the models parameters.