no code implementations • NeurIPS 2021 • Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum
We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts.
2 code implementations • 6 Dec 2021 • Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo, Samuel Cahyawijaya, Emile Chapuis, Wanxiang Che, Mukund Choudhary, Christian Clauss, Pierre Colombo, Filip Cornell, Gautier Dagan, Mayukh Das, Tanay Dixit, Thomas Dopierre, Paul-Alexis Dray, Suchitra Dubey, Tatiana Ekeinhor, Marco Di Giovanni, Tanya Goyal, Rishabh Gupta, Louanes Hamla, Sang Han, Fabrice Harel-Canada, Antoine Honore, Ishan Jindal, Przemyslaw K. Joniak, Denis Kleyko, Venelin Kovatchev, Kalpesh Krishna, Ashutosh Kumar, Stefan Langer, Seungjae Ryan Lee, Corey James Levinson, Hualou Liang, Kaizhao Liang, Zhexiong Liu, Andrey Lukyanenko, Vukosi Marivate, Gerard de Melo, Simon Meoni, Maxime Meyer, Afnan Mir, Nafise Sadat Moosavi, Niklas Muennighoff, Timothy Sum Hon Mun, Kenton Murray, Marcin Namysl, Maria Obedkova, Priti Oli, Nivranshu Pasricha, Jan Pfister, Richard Plant, Vinay Prabhu, Vasile Pais, Libo Qin, Shahab Raji, Pawan Kumar Rajpoot, Vikas Raunak, Roy Rinberg, Nicolas Roberts, Juan Diego Rodriguez, Claude Roux, Vasconcellos P. H. S., Ananya B. Sai, Robin M. Schmidt, Thomas Scialom, Tshephisho Sefara, Saqib N. Shamsi, Xudong Shen, Haoyue Shi, Yiwen Shi, Anna Shvets, Nick Siegel, Damien Sileo, Jamie Simon, Chandan Singh, Roman Sitelew, Priyank Soni, Taylor Sorensen, William Soto, Aman Srivastava, KV Aditya Srivatsa, Tony Sun, Mukund Varma T, A Tabassum, Fiona Anting Tan, Ryan Teehan, Mo Tiwari, Marie Tolkiehn, Athena Wang, Zijian Wang, Gloria Wang, Zijie J. Wang, Fuxuan Wei, Bryan Wilie, Genta Indra Winata, Xinyi Wu, Witold Wydmański, Tianbao Xie, Usama Yaseen, Michael A. Yee, Jing Zhang, Yue Zhang
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on.
no code implementations • ACL 2022 • Haoyue Shi, Kevin Gimpel, Karen Livescu
We present substructure distribution projection (SubDP), a technique that projects a distribution over structures in one domain to another, by projecting substructure distributions separately.
no code implementations • Findings (ACL) 2021 • Haoyue Shi, Karen Livescu, Kevin Gimpel
We study a family of data augmentation methods, substructure substitution (SUB2), for natural language processing (NLP) tasks.
no code implementations • ACL 2021 • Haoyue Shi, Luke Zettlemoyer, Sida I. Wang
Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces.
no code implementations • 24 Oct 2020 • Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan
We explore deep clustering of text representations for unsupervised model interpretation and induction of syntax.
no code implementations • EMNLP 2020 • Haoyue Shi, Karen Livescu, Kevin Gimpel
We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1, 700 sentences).
1 code implementation • WS 2020 • Shubham Toshniwal, Haoyue Shi, Bowen Shi, Lingyu Gao, Karen Livescu, Kevin Gimpel
Many natural language processing (NLP) tasks involve reasoning with textual spans, including question answering, entity recognition, and coreference resolution.
no code implementations • ACL 2019 • Haoyue Shi, Jiayuan Mao, Kevin Gimpel, Karen Livescu
We define concreteness of constituents by their matching scores with images, and use it to guide the parsing of text.
no code implementations • WS 2018 • Yuqi Sun, Haoyue Shi, Junfeng Hu
In multi-sense word embeddings, contextual variations in corpus may cause a univocal word to be embedded into different sense vectors.
1 code implementation • EMNLP 2018 • Haoyue Shi, Hao Zhou, Jiaze Chen, Lei LI
To study the effectiveness of different tree structures, we replace the parsing trees with trivial trees (i. e., binary balanced tree, left-branching tree and right-branching tree) in the encoders.
Ranked #9 on
Sentiment Analysis
on Amazon Review Full
1 code implementation • COLING 2018 • Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning Jiang, Jian Sun
Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of current frameworks and image-text datasets (e. g., MS-COCO) both quantitatively and qualitatively.
no code implementations • 3 Mar 2018 • Haoyue Shi, Yuqi Sun, Junfeng Hu
Unsupervised learned representations of polysemous words generate a large of pseudo multi senses since unsupervised methods are overly sensitive to contextual variations.
no code implementations • WS 2016 • Haoyue Shi, Caihua Li, Junfeng Hu
Previous researches have shown that learning multiple representations for polysemous words can improve the performance of word embeddings on many tasks.