1 code implementation • 27 May 2023 • Yuhui Zhang, Michihiro Yasunaga, Zhengping Zhou, Jeff Z. HaoChen, James Zou, Percy Liang, Serena Yeung
Language models have been shown to exhibit positive scaling, where performance improves as models are scaled up in terms of size, compute, or data.
1 code implementation • 8 Feb 2023 • Yuhui Zhang, Jeff Z. HaoChen, Shih-Cheng Huang, Kuan-Chieh Wang, James Zou, Serena Yeung
Our proposed method can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors, without requiring any visual data.
no code implementations • 27 Nov 2022 • Jeff Z. HaoChen, Tengyu Ma
Understanding self-supervised learning is important but challenging.
no code implementations • 6 Apr 2022 • Jeff Z. HaoChen, Colin Wei, Ananya Kumar, Tengyu Ma
In particular, a linear classifier trained to separate the representations on the source domain can also predict classes on the target domain accurately, even though the representations of the two domains are far from each other.
no code implementations • 1 Apr 2022 • Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Jeff Z. HaoChen, Tengyu Ma, Percy Liang
We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e. g., photographs) and unlabeled data from a target domain (e. g., sketches) are used to learn a classifier for the target domain.
no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse
Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.
1 code implementation • ICLR 2022 • Hong Liu, Jeff Z. HaoChen, Adrien Gaidon, Tengyu Ma
Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.
Ranked #9 on Long-tail Learning on CIFAR-10-LT (ρ=100)
1 code implementation • NeurIPS 2021 • Jeff Z. HaoChen, Colin Wei, Adrien Gaidon, Tengyu Ma
Despite the empirical successes, theoretical foundations are limited -- prior analyses assume conditional independence of the positive pairs given the same class label, but recent empirical applications use heavily correlated positive pairs (i. e., data augmentations of the same image).
no code implementations • 3 Nov 2020 • Hong Liu, Jeff Z. HaoChen, Colin Wei, Tengyu Ma
Recent works found that fine-tuning and joint training---two popular approaches for transfer learning---do not always improve accuracy on downstream tasks.
1 code implementation • 15 Jun 2020 • Jeff Z. HaoChen, Colin Wei, Jason D. Lee, Tengyu Ma
We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms.
no code implementations • 26 Jun 2018 • Jeff Z. HaoChen, Suvrit Sra
We present the first (to our knowledge) non-asymptotic solution to this problem, which shows that after a "reasonable" number of epochs RandomShuffle indeed converges faster than SGD.