Table 1 A comparison of C-to-IO ratios for inverting an N × N matrix using the Richardson methoda

From: I/O-efficient iterative matrix inversion with photonic integrated circuits

 

CPU

TPU

PSP

PIP

Input data size

\(2{N}^{2}\cdot P\)

\({N}^{2}\cdot P\)

\({N}^{2}\cdot P\)

\(N\)

Output data size

\({N}^{2}\cdot P\)

\({N}^{2}\cdot P\)

\({N}^{2}\cdot P\)

\({N}^{2}\)

Memory access countsb

\(3{N}^{2}\cdot P\)

\({N}^{2}\cdot (2P+1)\)

\({N}^{2}\cdot (2P+1)\)

\(2{N}^{2}+N\)

Number of operationsc

\({N}^{3}\cdot P\)

\({N}^{3}\cdot P\)

\({N}^{3}\cdot P\)

\({N}^{3}\cdot P\)

C-to-IO ratio

\(N/3\)

\({NP}/(2P+1)\)

\({NP}/(2P+1)\)

\({N}^{3}P/(2{N}^{2}+N)\)

  1. a\(N\) is the matrix size. \(P+1\) is the number of iterations for the Richardson method to converge.
  2. bThis includes the number of times a processor needs to fetch input data, store output data, and load weight matrix (the matrix to be multiplied in each iteration). For CPU, the weight matrix is also fetched in a similar way as the input data and is therefore included in the input data size.
  3. cNote the computations in the first iteration are neglected in the analyses for two reasons: (1) if the process converges for \(P=1\), then the matrix to be inverted is essentially an identity matrix whose inverse is itself and does not need to be computed. (2) In the first iteration, the algorithm essentially implements \(N\) multiply-and-accumulate (MAC) operations, which can be neglected compared to the \({N}^{2}\) MAC operations in the following iterations.