Back to Research

JointNF: Enhancing DNN Performance through Adaptive N:M Pruning across both Weight and Activation

Published：Aug 5, 2024

Balancing accuracy and hardware efficiency remains a challenge with traditional pruning methods. N:M sparsity is a recent approach offering a compromise, allowing up to N non-zero weights in a group of M consecutive weights. However, N:M pruning enforces a uniform sparsity level of N/M across all layers, which does not align well sparse nature of deep neural networks (DNNs). To achieve a more flexible sparsity pattern and a higher overall sparsity level, we present JointNF, a novel joint N:M and structured pruning algorithm to enable fine-grained structured pruning with adaptive sparsity levels across the DNN layers. Moreover, we show for the first time that N:M pruning can also be applied over the input activation for further performance enhancement.

Read the Paper ↗

AI agents that think like humans, only faster

Request a Demo

AI agents that think like humans, only faster

Request a Demo

AI agents that think like humans, only faster

Request a Demo

AI agents that think like humans, only faster

Request a Demo