Search Results for author: Yian Zhang

Found 7 papers, 6 papers with code

Language model developers should report train-test overlap

no code implementations10 Oct 2024 Andy K Zhang, Kevin Klyman, Yifan Mai, Yoav Levine, Yian Zhang, Rishi Bommasani, Percy Liang

Overall, we take the position that language model developers should publish train-test overlap statistics and/or training data whenever they report evaluation results on public test sets.

Language Modelling

Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

1 code implementation31 Jul 2024 Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results.

Instruction Following Multi-Task Learning

Nemotron-4 340B Technical Report

1 code implementation17 Jun 2024 Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu

We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.

Synthetic Data Generation

When Do You Need Billions of Words of Pretraining Data?

1 code implementation ACL 2021 Yian Zhang, Alex Warstadt, Haau-Sing Li, Samuel R. Bowman

We adopt four probing methods---classifier probing, information-theoretic probing, unsupervised relative acceptability judgment, and fine-tuning on NLU tasks---and draw learning curves that track the growth of these different measures of linguistic ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words.

Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)

1 code implementation EMNLP 2020 Alex Warstadt, Yian Zhang, Haau-Sing Li, Haokun Liu, Samuel R. Bowman

One reason pretraining on self-supervised linguistic tasks is effective is that it teaches models features that are helpful for language understanding.

Binary Classification

Latent Tree Learning with Ordered Neurons: What Parses Does It Produce?

1 code implementation EMNLP (BlackboxNLP) 2020 Yian Zhang

Recent latent tree learning models can learn constituency parsing without any exposure to human-annotated tree structures.

Constituency Parsing Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.