no code implementations • 11 Mar 2024 • Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).
1 code implementation • 4 Oct 2023 • Hancheng Cao, Jesse Dodge, Kyle Lo, Daniel A. McFarland, Lucy Lu Wang
In recent years, funding agencies and journals increasingly advocate for open science practices (e. g. data and method sharing) to improve the transparency, access, and reproducibility of science.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Hancheng Cao, Mengjie Cheng, Zhepeng Cen, Daniel A. McFarland, Xiang Ren
We extract scientific concepts (i. e., phrases) from corpora as instantiations of "research ideas", create concept-level features as motivated by literature, and then follow the trajectories of over 450, 000 new concepts (emerged from 1995-2014) to identify factors that lead only a small proportion of these ideas to be used in inventions and drug trials.
1 code implementation • 4 Sep 2019 • Bas Hofstra, Vivek V. Kulkarni, Sebastian Munoz-Najar Galvez, Bryan He, Dan Jurafsky, Daniel A. McFarland
Are underrepresented groups more likely to generate scientific innovations?