Fig. 4: Sparsification of vision and language transformers with FIP.
From: Engineering flexible machine learning systems by traversing functionally invariant paths

a, We applied the FIP algorithm to generate sparse versions (network weights set to 0) of the DeIT-B vision transformer performing the ImageNet1K image classification task. Using FIP, we generated networks with sparsity (fraction of weights set to 0) ranging from 0 to 80%. We compared network performance with sparsification reported for DeIT-B using a state-of-the-art method36. FIP could achieve very minimal reductions in performance until ~80% sparsity, but further attempts at parameter reduction failed. b, Compute time (in hours) for DeIT sparsification. c, Sparsification of BERT on the nine GLUE NLP tasks. Sparsification performance is task dependent. d, Compute time (in seconds) for the MRPC task on an NVIDIA A100 machine.