no code implementations • 30 Jan 2024 • Shrayani Mondal, Rishabh Garodia, Arbaaz Qureshi, Taesung Lee, Youngja Park
We leverage the potential of generative language models to discover human-interpretable descriptors present in a dataset and use an unsupervised approach to explain neurons with these descriptors.