no code implementations • 11 Sep 2023 • Chris Cummins, Volker Seeker, Dejan Grubisic, Mostafa Elhoushi, Youwei Liang, Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Kim Hazelwood, Gabriel Synnaeve, Hugh Leather
We explore the novel application of Large Language Models to code optimization.
Advanced compiler technology is crucial for enabling machine learning applications to run on novel hardware, but traditional compilers fail to deliver performance, popular auto-tuners have long search times and expert-optimized libraries introduce unsustainable costs.
SLaDe is up to 6 times more accurate than Ghidra, a state-of-the-art, industrial-strength decompiler and up to 4 times more accurate than the large language model ChatGPT and generates significantly more readable code than both.
We improve this with BenchDirect which utilizes a directed LM that infills programs by jointly observing source code context and the compiler features that are targeted.
Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency.
We develop BenchPress, the first ML benchmark generator for compilers that is steerable within feature space representations of source code.
We perform offline training using information that is collected from a large corpus of binaries that have branch probabilities information.
What is needed is an easy, reusable experimental infrastructure for real world compiler optimization tasks that can serve as a common benchmark for comparing techniques, and as a platform to accelerate progress in the field.
Path planning, the problem of efficiently discovering high-reward trajectories, often requires optimizing a high-dimensional and multimodal reward function.
As machine learning techniques become ubiquitous, the efficiency of neural network implementations is becoming correspondingly paramount.
Compiler architects increasingly look to machine learning when building heuristics for compiler optimization.
We introduce ProGraML - Program Graphs for Machine Learning - a novel graph-based program representation using a low level, language agnostic, and portable format; and machine learning models capable of performing complex downstream tasks over these graphs.
Selecting an appropriate workgroup size is critical for the performance of OpenCL kernels, and requires knowledge of the underlying hardware, the data being operated on, and the implementation of the kernel.
Distributed, Parallel, and Cluster Computing