Fig. 4: Sparsification of vision and language transformers with FIP. | Nature Machine Intelligence

Fig. 4: Sparsification of vision and language transformers with FIP.

From: Engineering flexible machine learning systems by traversing functionally invariant paths

Fig. 4

a, We applied the FIP algorithm to generate sparse versions (network weights set to 0) of the DeIT-B vision transformer performing the ImageNet1K image classification task. Using FIP, we generated networks with sparsity (fraction of weights set to 0) ranging from 0 to 80%. We compared network performance with sparsification reported for DeIT-B using a state-of-the-art method36. FIP could achieve very minimal reductions in performance until ~80% sparsity, but further attempts at parameter reduction failed. b, Compute time (in hours) for DeIT sparsification. c, Sparsification of BERT on the nine GLUE NLP tasks. Sparsification performance is task dependent. d, Compute time (in seconds) for the MRPC task on an NVIDIA A100 machine.

Back to article page