All researchs

All researchs

All researchs

All researchs

Neural Architecture Search for Quantized Transformer Models

Neural Architecture Search for Quantized Transformer Models

Published:25 Sep 2022

Published:25 Sep 2022

While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments.


url: “https://arxiv.org/abs/2209.12127,

While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an upper-bound latency constraint. Our method incorporates 8-bit integer quantization in the search process to outperform the current state-of-the-art technique. Our results underline the feasibility and efficacy of seeking an optimal balance between performance and latency, providing new avenues for deploying state-of-the-art transformer models in latency-sensitive environments.


url: “https://arxiv.org/abs/2209.12127,

Harvard Innovation Labs


125 Western Ave


Boston, MA 02163

© Copyright 2025 Stochastic.  All rights reserved.

Harvard Innovation Labs


125 Western Ave


Boston, MA 02163

© Copyright 2024 Stochastic.  

All rights reserved.

Harvard Innovation Labs


125 Western Ave


Boston, MA 02163

© Copyright 2025 Stochastic.  All rights reserved.

Harvard Innovation Labs


125 Western Ave


Boston, MA 02163

© Copyright 2025 Stochastic.  All rights reserved.