Search Results for author: Sida Wang

Found 25 papers, 9 papers with code

Structure-Aware Fill-in-the-Middle Pretraining for Code

no code implementations30 May 2025 Linyuan Gong, Alvin Cheung, Mostafa Elhoushi, Sida Wang

Fill-in-the-Middle (FIM) is a common pretraining method for code LLMs, where models complete code segments given surrounding context.

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

no code implementations12 Nov 2024 Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hongjin Su, Zhaoqing Suo, Hongcheng Gao, Wenjing Hu, Pengcheng Yin, Victor Zhong, Caiming Xiong, Ruoxi Sun, Qian Liu, Sida Wang, Tao Yu

Real-world enterprise text-to-SQL workflows often involve complex cloud or local data across various database systems, multiple SQL queries in various dialects, and diverse operations from data transformation to analytics.

Code Generation Text to SQL +1

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

1 code implementation15 Jul 2024 Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.

Code Generation

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

no code implementations12 Mar 2024 Naman jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, Ion Stoica

Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry.

Code Generation HumanEval +1

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

1 code implementation7 Mar 2024 Linyuan Gong, Sida Wang, Mostafa Elhoushi, Alvin Cheung

We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task.

Code Completion

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

2 code implementations18 Nov 2022 Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas.

Code Generation Memorization

On Continual Model Refinement in Out-of-Distribution Data Streams

no code implementations ACL 2022 Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih

Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting.

Benchmarking Continual Learning

InCoder: A Generative Model for Code Infilling and Synthesis

3 code implementations12 Apr 2022 Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming.

Code Generation Comment Generation +1

SILG: The Multi-domain Symbolic Interactive Language Grounding Benchmark

no code implementations NeurIPS 2021 Victor Zhong, Austin Hanjie, Sida Wang, Karthik Narasimhan, Luke Zettlemoyer

We hope SILG enables the community to quickly identify new methodolo- gies for language grounding that generalize to a diverse set of environments and their associated challenges.

Grounded language learning NetHack

Deep Natural Language Processing for LinkedIn Search

no code implementations16 Aug 2021 Weiwei Guo, Xiaowei Liu, Sida Wang, Michaeel Kazi, Zhiwei Wang, Zhoutong Fu, Jun Jia, Liang Zhang, Huiji Gao, Bo Long

Building a successful search system requires a thorough understanding of textual data semantics, where deep learning based natural language processing techniques (deep NLP) can be of great help.

Document Ranking Language Modeling +1

Deep Natural Language Processing for LinkedIn Search Systems

no code implementations30 Jul 2021 Weiwei Guo, Xiaowei Liu, Sida Wang, Michaeel Kazi, Zhoutong Fu, Huiji Gao, Jun Jia, Liang Zhang, Bo Long

Many search systems work with large amounts of natural language data, e. g., search queries, user profiles and documents, where deep learning based natural language processing techniques (deep NLP) can be of great help.

Towards Understanding the Behaviors of Optimal Deep Active Learning Algorithms

1 code implementation29 Dec 2020 Yilun Zhou, Adithya Renduchintala, Xian Li, Sida Wang, Yashar Mehdad, Asish Ghoshal

Active learning (AL) algorithms may achieve better performance with fewer data because the model guides the data selection process.

Active Learning

Efficient Neural Query Auto Completion

no code implementations6 Aug 2020 Sida Wang, Weiwei Guo, Huiji Gao, Bo Long

On the candidate generation side, this system uses as much information as possible in unseen prefixes to generate relevant candidates, increasing the recall by a large margin.

Information Retrieval Language Modeling +2

Memory-efficient Embedding for Recommendations

no code implementations26 Jun 2020 Xiangyu Zhao, Haochen Liu, Hui Liu, Jiliang Tang, Weiwei Guo, Jun Shi, Sida Wang, Huiji Gao, Bo Long

Specifically, we first proposed an end-to-end differentiable framework that can calculate the weights over various dimensions for feature fields in a soft and continuous manner with an AutoML based optimization algorithm; then we derive a hard and discrete embedding component architecture according to the maximal weights and retrain the whole recommender framework.

AutoML Recommendation Systems

Pre-training via Paraphrasing

2 code implementations NeurIPS 2020 Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer

The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks.

Document Summarization Document Translation +7

Simple MAP Inference via Low-Rank Relaxations

1 code implementation NeurIPS 2014 Roy Frostig, Sida Wang, Percy S. Liang, Christopher D. Manning

We focus on the problem of maximum a posteriori (MAP) inference in Markov random fields with binary variables and pairwise interactions.

Altitude Training: Strong Bounds for Single-Layer Dropout

no code implementations NeurIPS 2014 Stefan Wager, William Fithian, Sida Wang, Percy Liang

Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks.

Dropout Training as Adaptive Regularization

no code implementations NeurIPS 2013 Stefan Wager, Sida Wang, Percy Liang

Dropout and other feature noising schemes control overfitting by artificially corrupting the training data.

Document Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.