Search Results for author: George A. Constantinides

Found 17 papers, 8 papers with code

Exploring FPGA designs for MX and beyond

no code implementations1 Jul 2024 Ebby Samson, Naveen Mellempudi, Wayne Luk, George A. Constantinides

For this purpose, we also describe and release an open-source Pytorch library for quantization into the new standard, integrated with the Brevitas library so that the community can develop novel neural network designs quantized with MX formats in mind.

Efficient Neural Network Quantization

Optimised Grouped-Query Attention Mechanism for Transformers

no code implementations21 Jun 2024 Yuang Chen, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance.

Unlocking the Global Synergies in Low-Rank Adapters

no code implementations21 Jun 2024 Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance.

MRPC

NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions

no code implementations29 Feb 2024 Marta Andronic, George A. Constantinides

In these works, the boundaries of the neurons coincide with the boundaries of the LUTs.

Quantization

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

1 code implementation8 Oct 2023 Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George A. Constantinides, Yiren Zhao

In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets.

Attribute

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

1 code implementation5 Sep 2023 Marta Andronic, George A. Constantinides

We show that by using polynomial building blocks, we can achieve the same accuracy using considerably fewer layers of soft logic than by using linear functions, leading to significant latency and area improvements.

Handwritten Digit Recognition Network Intrusion Detection

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

no code implementations9 Aug 2023 Benjamin Ramhorst, Vladimir Loncar, George A. Constantinides

Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas.

Classification Image Classification +4

ATHEENA: A Toolflow for Hardware Early-Exit Network Automation

no code implementations17 Apr 2023 Benjamin Biggs, Christos-Savvas Bouganis, George A. Constantinides

Additionally, the toolflow can achieve a throughput matching the same baseline with as low as $46\%$ of the resources the baseline requires.

Quantization

Abstract Interpretation on E-Graphs

1 code implementation17 Mar 2022 Samuel Coward, George A. Constantinides, Theo Drane

Recent e-graph applications have typically considered concrete semantics of expressions, where the notion of equivalence stems from concrete interpretation of expressions.

Logic Shrinkage: Learned FPGA Netlist Sparsity for Efficient Neural Network Inference

1 code implementation4 Dec 2021 Erwei Wang, James J. Davis, Georgios-Ilias Stavrou, Peter Y. K. Cheung, George A. Constantinides, Mohamed S. Abdelfattah

To address these issues, we propose logic shrinkage, a fine-grained netlist pruning methodology enabling K to be automatically learned for every LUT in a neural network targeted for FPGA inference.

Efficient Neural Network

Enabling Binary Neural Network Training on the Edge

2 code implementations8 Feb 2021 Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Jia Jie Lim, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides

The ever-growing computational demands of increasingly complex machine learning models frequently necessitate the use of powerful cloud-based infrastructure for their training.

Quantization

Horizon-independent Preconditioner Design for Linear Predictive Control

no code implementations16 Oct 2020 Ian McInerney, Eric C. Kerrigan, George A. Constantinides

To reduce the number of iterations required, we present a simple method for computing a horizon-independent preconditioning matrix for the Hessian of the condensed problem.

Optimization and Control Systems and Control Systems and Control

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

2 code implementations24 Oct 2019 Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

Research has shown that deep neural networks contain significant redundancy, and thus that high classification accuracy can be achieved even when weights and activations are quantized down to binary values.

Binarization Efficient Neural Network

Rethinking Arithmetic for Deep Neural Networks

no code implementations7 May 2019 George A. Constantinides

In general, our results suggest that it is valuable to consider Boolean circuits as neural networks, leading to the question of which circuit topologies are promising.

LUTNet: Rethinking Inference in FPGA Soft Logic

2 code implementations1 Apr 2019 Erwei Wang, James J. Davis, Peter Y. K. Cheung, George A. Constantinides

Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binary values.

Cannot find the paper you are looking for? You can Submit a new open access paper.