no code implementations • 27 Jan 2025 • Ziniu Wu, Markos Markakis, Chunwei Liu, Peter Baile Chen, Balakrishnan Narayanaswamy, Tim Kraska, Samuel Madden
Unlike previous approaches, IconqSched features a novel fine-grained predictor, Iconq, which treats the DBMS as a black box and accurately estimates the system runtime of concurrently executed queries under different system states.
1 code implementation • 23 May 2024 • Chunwei Liu, Matthew Russo, Michael Cafarella, Lei Cao, Peter Baille Chen, Zui Chen, Michael Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano
We describe the workload of AI-powered analytics tasks, the optimization methods that Palimpzest uses, and the prototype system itself.
no code implementations • 8 Mar 2024 • Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, Tim Kraska
Retrieval-augmented generation (RAG) can enhance the generation quality of large language models (LLMs) by incorporating external token databases.
1 code implementation • 7 Oct 2023 • Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, Samuel Madden
We find that no current system sufficiently fulfills both needs and therefore propose Skyscraper, a system tailored to V-ETL.
no code implementations • 1 Oct 2023 • Zui Chen, Lei Cao, Sam Madden, Tim Kraska, Zeyuan Shang, Ju Fan, Nan Tang, Zihui Gu, Chunwei Liu, Michael Cafarella
As a result, data scientists often have to develop domain-specific solutions tailored to both the dataset and the task, e. g. writing domain-specific code or training machine learning models on a sufficient number of annotated examples.
no code implementations • 11 Dec 2022 • Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel Madden
Neither classical nor learning-based methods yield satisfactory performance when estimating the cardinality of the join queries.
1 code implementation • 11 May 2022 • Andreas Kipf, Dominik Horn, Pascal Pfeil, Ryan Marcus, Tim Kraska
LSI works by building a learned index over a permutation vector, which allows binary search to performed on the unsorted base data using random access.
no code implementations • 29 Nov 2021 • Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, Tim Kraska
RSS achieves this by using the minimal string prefix to sufficiently distinguish the data unlike most learned approaches which index the entire string.
1 code implementation • 11 Aug 2021 • Mihail Stoian, Andreas Kipf, Ryan Marcus, Tim Kraska
Latest research proposes to replace existing index structures with learned models.
no code implementations • 24 Mar 2021 • Songtao He, Favyen Bastani, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, Sam Madden
We show TagMe can produce high-quality object annotations in a fully-automatic and low-cost way.
no code implementations • ICLR 2021 • Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, Tim Kraska
Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives.
no code implementations • 23 Dec 2020 • Hussam Abu-Libdeh, Deniz Altınbüken, Alex Beutel, Ed H. Chi, Lyric Doshi, Tim Kraska, Xiaozhou, Li, Andy Ly, Christopher Olston
There is great excitement about learned index structures, but understandable skepticism about the practicality of a new method uprooting decades of research on B-Trees.
no code implementations • 12 Dec 2020 • Vikram Nathan, Jialin Ding, Tim Kraska, Mohammad Alizadeh
Unlike prior work, Cortex can adapt itself to any existing primary index, whether single or multi-dimensional, to harness a broad variety of correlations, such as those that exist between more than two attributes or have a large number of outliers.
no code implementations • 28 Sep 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy G Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich
First, MISIM uses a novel context-aware semantic structure (CASS), which is designed to aid in lifting semantic meaning from code syntax.
no code implementations • 23 Jun 2020 • Jialin Ding, Vikram Nathan, Mohammad Alizadeh, Tim Kraska
Filtering data based on predicates is one of the most fundamental operations for any modern data warehouse.
no code implementations • 5 Jun 2020 • Kapil Vaidya, Eric Knorr, Tim Kraska, Michael Mitzenmacher
Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives.
no code implementations • 5 Jun 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Nesime Tatbul, Jesmin Jahan Tithi, Niranjan Hasabnis, Paul Petersen, Timothy Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich
Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection.
no code implementations • 30 Apr 2020 • Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, Thomas Neumann
Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance.
no code implementations • 24 Mar 2020 • Fangke Ye, Shengtian Zhou, Anand Venkat, Ryan Marcus, Paul Petersen, Jesmin Jahan Tithi, Tim Mattson, Tim Kraska, Pradeep Dubey, Vivek Sarkar, Justin Gottschlich
The simplified parse tree (SPT) presented in Aroma, a state-of-the-art code recommendation system, is a tree-structured representation used to infer code semantics by capturing program \emph{structure} rather than program \emph{syntax}.
1 code implementation • 21 Mar 2020 • Nadiia Chepurko, Ryan Marcus, Emanuel Zgraggen, Raul Castro Fernandez, Tim Kraska, David Karger
Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join.
no code implementations • 3 Dec 2019 • Vikram Nathan, Jialin Ding, Mohammad Alizadeh, Tim Kraska
Scanning and filtering over multi-dimensional tables are key operations in modern analytical database engines.
1 code implementation • NeurIPS 2019 • Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh
We present Park, a platform for researchers to experiment with Reinforcement Learning (RL) for computer systems.
1 code implementation • 29 Nov 2019 • Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, Thomas Neumann
A groundswell of recent work has focused on improving data management systems with learned components.
no code implementations • 10 Oct 2019 • Darryl Ho, Jialin Ding, Sanchit Misra, Nesime Tatbul, Vikram Nathan, Vasimuddin Md, Tim Kraska
Next-generation sequencing (NGS) technologies have enabled affordable sequencing of billions of short DNA fragments at high throughput, paving the way for population-scale genomics.
2 code implementations • 25 May 2019 • Madelon Hulsebos, Kevin Hu, Michiel Bakker, Emanuel Zgraggen, Arvind Satyanarayan, Tim Kraska, Çağatay Demiralp, César Hidalgo
Correctly detecting the semantic type of data columns is crucial for data science tasks such as automated data cleaning, schema matching, and data discovery.
no code implementations • 21 May 2019 • Jialin Ding, Umar Farooq Minhas, JIA YU, Chi Wang, Jaeyoung Do, Yi-Nan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska
The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory footprint.
1 code implementation • 12 May 2019 • Kevin Hu, Neil Gaikwad, Michiel Bakker, Madelon Hulsebos, Emanuel Zgraggen, César Hidalgo, Tim Kraska, Guoliang Li, Arvind Satyanarayan, Çağatay Demiralp
Researchers currently rely on ad hoc datasets to train automated visualization tools and evaluate the effectiveness of visualization designs.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 24 Aug 2018 • Yeounoh Chung, Peter J. Haas, Eli Upfal, Tim Kraska
Over the past decades, researchers and ML practitioners have come up with better and better ways to build, understand and improve the quality of ML models, but mostly under the key assumption that the training data is distributed identically to the testing data.
1 code implementation • 14 Aug 2018 • Kevin Z. Hu, Michiel A. Bakker, Stephen Li, Tim Kraska, César A. Hidalgo
Data visualization should be accessible for all analysts with data, not just the few with technical expertise.
no code implementations • 16 Jul 2018 • Yeounoh Chung, Tim Kraska, Neoklis Polyzotis, Ki Hyun Tae, Steven Euijong Whang
As machine learning systems become democratized, it becomes increasingly important to help users easily debug their models.
no code implementations • 10 Jun 2018 • Guillaume Leclerc, Manasi Vartak, Raul Castro Fernandez, Tim Kraska, Samuel Madden
As neural networks become widely deployed in different applications and on different hardware, it has become increasingly important to optimize inference time and model size along with model accuracy.
1 code implementation • 7 Apr 2018 • Philipp Eichmann, Carsten Binnig, Tim Kraska, Emanuel Zgraggen
Existing benchmarks for analytical database systems such as TPC-DS and TPC-H are designed for static reporting scenarios.
Databases
no code implementations • 30 Jan 2018 • Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, Tim Kraska
At the core of our index is a tunable error parameter that allows a DBA to balance lookup performance and space consumption.
Databases
no code implementations • 13 Jan 2018 • Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska
Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance.
8 code implementations • 4 Dec 2017 • Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, Neoklis Polyzotis
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not.
no code implementations • 31 Jan 2015 • Evan R. Sparks, Ameet Talwalkar, Michael J. Franklin, Michael. I. Jordan, Tim Kraska
The proliferation of massive datasets combined with the development of sophisticated analytical techniques have enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces.
no code implementations • 21 Oct 2013 • Evan R. Sparks, Ameet Talwalkar, Virginia Smith, Jey Kottalam, Xinghao Pan, Joseph Gonzalez, Michael J. Franklin, Michael. I. Jordan, Tim Kraska
MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing.