Introduction

Diffuse Optical Tomography (DOT) is a non-invasive and non-destructive medical imaging technique by which optical parameters of biological media can be reconstructed through information of the excitation light and the detected outgoing light. The propagation of light within biological media can be modeled by a radiative transport system in a domain \(X\subset {\mathbb {R}^{N}}(N=2,3)\). Assuming transport process is considered in the stationary state, the density of particles \(u(\varvec{x},\varvec{\omega })\) at a point \(\varvec{x}\in {X}\) moving in direction \(\varvec{\omega }\in {\Omega }:=\mathbb {S}^{N-1}\) can be modeled by the following boundary value problem:

$$\begin{aligned} \varvec{\omega }\cdot \nabla u(\varvec{x},\varvec{\omega }) + (\mu _a(\varvec{x}) +\mu _s(\varvec{x})) u(\varvec{x},\varvec{\omega }) = \mu _s(\varvec{x})\int _{\Omega }k(\varvec{\omega }\cdot \hat{\varvec{\omega }})u(\varvec{x},\hat{\varvec{\omega }})d\sigma (\hat{\varvec{\omega }}), \;\text {in}\; X\times \Omega \end{aligned}$$
(1)

Here, \(d\sigma (\varvec{\omega })\) is the infinitesimal area element on the unit sphere \(\Omega\), \(\varvec{\omega }\cdot \nabla u(\varvec{x},\varvec{\omega })\) denotes the directional derivative, \(k(\varvec{\omega }\cdot \hat{\varvec{\omega }})\) is the nonnegative normalized phase function whose specific definition can be found in1. The media is characterized by rates \(\mu _a(\varvec{x})\) and \(\mu _s(\varvec{x})\) of absorption and scattering. The incident excitation light over the boundary is modeled by

$$\begin{aligned} u(\varvec{x},\varvec{\omega }) = u_{in}(\varvec{x},\varvec{\omega }), \; \varvec{x}\in {\partial X}, \;\varvec{\omega }\cdot \nu (\varvec{x})<0, \,\, \end{aligned}$$
(2)

where \(\nu (\varvec{x})\) denotes the unit outward normal at \(\varvec{x}\in {\partial X}\). Considering the actual physical background, the optical coefficients are often assumed to be uniformly positive and bounded, i.e., the domain of coefficients D can be defined as \(D=D_1\times D_2\), where \(D_1 = \{\mu _a(\varvec{x})|\mu _a(\varvec{x})\in {L^{\infty }(X)}, \text {there exist two constants} \mu _a^1 \text {and} \mu _a^2 \text {such that } 0<\mu _a^1\le \mu _a(\varvec{x})\le \mu _a^2<\infty \}\), \(D_2 = \{\mu _s(\varvec{x})|\mu _s(\varvec{x})\in {L^{\infty }(X)}, \text {there exist two constants} \mu _s^1 \text {and} \mu _s^2 \text {such that } 0<\mu _s^1\le \mu _s(\varvec{x})\le \mu _s^2<\infty \}\). In this paper, we assume the outgoing light is measured angularly averaged, then the measurement data \(M(\varvec{x})\in {L^2(\partial X)}\) can be written as \(M(\varvec{x})=\int _{\varvec{\omega }\cdot \nu (\varvec{x})>0}\varvec{\omega }\cdot \nu (\varvec{x}) u(\varvec{x},\varvec{\omega })d\sigma (\varvec{\omega })\) . Based on the above notations, the forward problem of DOT can be described as the following nonlinear operator equation

$$\begin{aligned} F:D\rightarrow L^2(\partial X) \quad (\mu _a(\varvec{x}),\mu _s(\varvec{x}))\mapsto M(\varvec{x}). \end{aligned}$$
(3)

The corresponding inverse problem of DOT is to find the optimal optical parameters \((\mu _a(\varvec{x}),\mu _s(\varvec{x}))\) from \(M(\varvec{x})\) such that they satisfy (3). It is well known that the inverse problem of DOT is ill-posed. Hence, multiple excitations with finite observation schemes and regularization penalty strategy are often used. Assuming there are s incident sources, d detectors located at the boundary of domain X, then the forward operator and measurements for the i-th (\(0<i\le s\)) excitation are \(F_i\) and \(M_i(\varvec{x})\). We denote the regularization penalty term about the optical coefficients as \(\mathcal {R}(\mu )\) and introduce the regularized functional

$$\begin{aligned} J_{\beta }(\mu ):=\dfrac{1}{2}\sum _{i=1}^s\Vert F_i(\mu )-M_i(\varvec{x})\Vert _{L^2(\partial X)}^2 + \beta \mathcal {R}(\mu ). \end{aligned}$$
(4)

Then, the inverse problem can be abstracted as

$$\begin{aligned} (P)\quad \text {Find} \;\mu \in {Q} \;\text {such that}\; J_{\beta }(\mu ) \; \text {is minimal possible over}\; Q, \end{aligned}$$

with Q the admissible set for \(\mu\), which is usually chosen as the intersection of D and certain functional Banach space possessing the property of weak compactness, e.g., \(L^2\) space, bounded variation space. Due to the box constraint of D and the weak compactness of the Banach space, as well as the additional assumption that the regularization penalty term \(\mathcal {R}(\mu )\) satisfies a lower semicontinuity condition, the minimization problem (P) exists at least one minimizer. In this study, we assume that the scattering coefficient is known. This assumption is a common simplification in the DOT literature. The joint reconstruction of both absorption and scattering coefficients is a significantly more ill-posed problem and often leads to increased numerical instability. By focusing on the reconstruction of the absorption coefficient, we are able to evaluate the effectiveness and convergence of the proposed non-smooth and nonconvex regularization method.

There are a variety of possible choices for \(\mathcal {R}(\mu )\), for example, it can be settled as \(\mathcal {R}(\mu )=\Vert \mu \Vert _p^p\), or \(\mathcal {R}(\mu )=TV_c^{\psi _{q}}(\mu ):=\int _{X}\psi _q(|\nabla \mu (\varvec{x})|)d\varvec{x}\) for continuous \(\mu\), and \(\mathcal {R}(\mu )=TV_d^{\psi _{q}}(\mu ):=\int _{S_{\mu }}\psi _{q}(|\mu ^+(\varvec{x})-\mu ^-(\varvec{x})|)d\mathcal {H}^{N-1}\) for piecewise constant \(\mu\). Here, \(S_{\mu }\) is the jump set of \(\mu (\varvec{x})\), where the one-sided traces \(\mu ^{\pm }\) from different sides of \(S_{\mu }\) differ, \(\psi _{q}(\cdot )\) is the energy function parameterized by q. In particular, when \(\psi (t)\) is taken as \(\psi _{0-1}(t)\), i.e., \(\psi _{0-1}(0)=0\), \(\psi _{0-1}(t)=1\) when \(t\ne 0\), the penalty term reduced to the weak version of Mumford–Shah functional which is known for providing an approach to well quantify edges together with values of the nonsmooth piecewise constant parameter. In our previous work1, we proposed to approximate \(\mathcal {R}(\mu )=TV_d^{\psi _{0-1}}(\mu )\) by a family of TV related graduated nonconvex energy functions \(\mathcal {R}(\mu ;\epsilon ,\psi _{\alpha }):=(1-\epsilon )TV(\mu )+\epsilon TV_d^{\psi _{q}}(\mu )\), where \(\psi _{q}(t)\) is a family of concave energy function parameterized by q. The analysis and simulations exhibited the power of this graduated nonconvex scheme in preserving edges and reducing background artifacts.

Note that the TV term in \(\mathcal {R}(\mu ;\epsilon ,\psi _q)\) is the \(L^1\) norm of the gradient in the sense of piecewise constant, we adopt variable splitting and alternative directional strategies to transfer the nonsmooth TV term in such a way that the minimization steps relevant to this convex nonsmooth term can be solved with shrinkage operation. To this end, we transfer the primal minimization problem to a primal dual problem by variable splitting. It is natural to solve this primal dual problem using the Augmented Lagrangian Method (ALM). However, because there may be a nonzero duality gap between primal and dual problems, the Alternative Directional Multiplier Method (ADMM), which is a sequential approximation method of ALM, is applied. To ensure convergence of the proposed iterative method, we linearize the data-fitting term of the objective function. This linearization, as part of our ADMM-based approach, allows us to maintain convergence guarantees while solving each subproblem iteratively.

More specifically, in this paper, we make the following two contributions: (1) we theoretically analyze the existence of solutions for minimization problem with nonconvex total variation energy function \(\mathcal {R}(\mu ;\epsilon ,\psi _q)\) over piecewise constant finite element space and constrained finite dimensional subset; (2) we theoretically analyze the convergence of iterative sequence generated with ADMM to the primal nonconvex minization problem with introduced splitting variable; (3) we develop an efficient double graduated nonconvex algorithm based on ADMM to minimize the weak version of Mumford Shah functional. The proposed GNC-ADMM has two highlights: (1) we construct a weighted combination of total variation and an adjustable nonconvex potential function and then approximate weak Mumford–Shah functional by tuning weight coefficient and nonconvexity parameter simultaneously; (2) we treat the energy function at the first stage as two separate total variation by weighted decompose it into two parts, and then solve each of them with shrinkage operator and lagged diffusive fixed point strategy. Simulation results indicate the effectiveness of the proposed algorithm. To ensure convergence of the proposed iterative method, we linearize the data-fitting term of the objective function. This linearization, as part of our ADMM-based approach, allows us to maintain convergence guarantees while solving each subproblem iteratively.

In parallel to our model-based optimization approach, recent studies have adopted deep learning strategies to improve image quality and stability in DOT. These include model-driven CNNs and graph convolutional networks that incorporate physics priors and spatial constraints2,3, as well as multitask architectures for joint reconstruction and lesion localization4. Although these deep learning-based methods differ fundamentally from variational regularization framework, they address similar goals of edge preservation, artifact reduction, and computational efficiency in DOT. Our proposed GNC-ADMM offers an interpretable and optimization-theoretic alternative that does not require training data, and thus complements these advances from a model-based perspective.

This manuscript is organized as follows. In section 1, we analyze the existence of a minimizer for nonconvex regularization functional under the circumstance of finite element numerical discretization and then transfer the primal minimization problem to the primal dual problem by variable splitting. In section 2, we describe the proposed Graduated Nonconvex Alternative Directional Multiplier Method (GNC-ADMM) algorithm and present simulation results in section 3. Finally, we conclude with a short conclusion and some remarks on further research are given.

Minimization problems

The regularized minimization problem for DOT can be described generally by problem (P). In this section, we focus on the minimization problem based on the regularized functional \(\mathcal {R}(\mu ;\epsilon ,\psi _{q})\) under the situation of numerical discretization.

For a numerical approximation of DOT problems in a finite element approach, the polyhedral domain X needs to be triangulated with a regular triangulation mesh \(\mathcal {T}_h\) of simplicial elements, namely intervals in one dimension, triangles in two dimensions and tetrahedra in three dimensions. Here, we assume that the spatial triangular mesh for the optical density \(u(\varvec{x},\varvec{\omega })\) is the same as that for optical coefficient \(\mu (\varvec{x})\). Then associated with triangulation \(\mathcal {T}_h\), together with specific discretization method for angular variable \(\varvec{\omega }\), the finite element numerical approximation can be implemented for the boundary problem of Radiative Transfer Equation (RTE). In this paper, we don’t intend to expand details about the discretization of the boundary problem of RTE. For mathematical analysis and specific technical details of the most popular discontinuous Galerkin finite element method for the forward problem of DOT, please refer to5 and6.

We define the finite element space \(V_h\) for optical coefficients to be the following piecewise constant space

$$\begin{aligned} V_h=\{v\in {L^2(X)};\forall K\in {\mathcal {T}_h},\exists \alpha _K\in {\mathbb {R}}:v_K=\alpha _K\} \end{aligned}$$
(5)

Then the discrete constrained admissible set can be defined as \(Q_{ad}^{h}:= D\cap V_h\). Then, the numerical scheme for the discretized problem based on graduated nonconvex total variation minimization is

$$\begin{aligned} \left( P_{TV}^{h,\psi _{q}}\right) \quad Find\; \mu ^h\in {Q_{ad}^h}\; such \; that \; J_{\beta }(\mu ^h)\; is\; minimal \; possible \; over\; Q_{ad}^h \end{aligned}$$

where \(J_{\beta }(\mu ^h) = \dfrac{1}{2}\sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _{L^2(\partial X)}^2 + \beta \mathcal {R}(\mu ^h;\epsilon ,\psi _{q})\). Here, \(F_i^h(\mu ^h)\) is the numerical forward operator correspond to the ith excitation defined as formula (3) with \(\mu\) replaced by \(\mu ^h\), D replaced by \(Q_{ad}^h\).

Assuming the energy function \(\psi _{q}(t)\) satisfying the following four conditions: a) is symmetric on \((-\infty ,\infty )\), \(\mathcal {C}^2\) on \((0,\infty )\); b) \(\psi ^{\prime }(0^+)>0\) and \(\psi ^{\prime }(t)\ge 0\) for all \(t>0\); c) \(\psi (0)=0\) and \(\psi _0 =\infty\), where \(\psi _0:=\lim \limits _{t\rightarrow 0^+}\psi (t)/t\); d) \(\psi ^{\prime \prime }(t)\) increasing on \((0,\infty )\) with \(\lim \limits _{t\rightarrow 0^+}\psi ^{\prime \prime }(t)<0\) and \(\lim \limits _{t\rightarrow \infty }\psi ^{\prime \prime }(t)=0\).

By analogue reasoning for weak continuity of the forward operator F with respect to the topologies of \(L^p(X)(p>1)\) and \(L^2(\partial X)\) as in7, Theorem 3.2, and the continuous embedding of BV(X) in \(L^q(X)\) space for certain \(1\le q< 1^*\), with \(1^*=\frac{N}{N-1}\) if \(N>1\) and \(1^*=\infty\) if \(N=1\)(8, Corollary 3.49 ), the weak continuity of the numerical forward operator \(F_i^h\) with respect to \(Q_{ad}^h\) and \(L^2(\partial X)\) can be proved.

By the lower semicontinuous of \(\psi _{q}(t)\) satisfying the above four assumptions in piecewise constant space, as well as the weak continuity of \(F_i^h\), so we have the following lemma which gives the existence of the minimizers to problem \((P_{TV}^{\psi })\) in \(Q_{ad}^h\) whose proof is similar to that from9, Theorem 4.1.

Lemma 1

Under the above assumptions on the admissible set \(Q_{ad}^h\) and the energy function \(\psi _{q}(t)\), for any \(\beta > 0\) and \(\alpha >0\), there exists at least one minimizer to problem \((P_{TV}^{h,\psi _{q}})\) in \(Q_{ad}^h\).

Since the optical coefficients are approximated as piecewise constants, the discretized coefficient can be expressed as \(\mu ^h(\varvec{x})=\sum _{i=1}^{N_t}\mu _k\chi _k(\varvec{x})\), with \(N_t\) the number of elements which is finite, \(\chi _k(\varvec{x})\) and \(\mu _k\) the character function and the value of optical coefficient corresponding to the kth element. In fact, due to the finiteness of \(N_t\), the piecewise constant finite element space \(V_h\) is isomorphic to the finite dimensional Euclidean space \(\mathbb {R}^{N_t}\). Accordingly, the admissible set \(Q_{ad}^h\) is isomorphic to the constrained subset

$$\begin{aligned} K_h:=\{\mu ^h\in {\mathbb {R}^{N_t}}|\;\mu ^1\le \mu _i^h\le \mu ^2,1\le i\le N_t\} \end{aligned}$$

which means that the value of \(\mu ^h\) restricted on the triangle \(\tau\) is \(\mu ^h_{\tau }\), written as \(\mu ^h|_{\tau }\) sometimes.

For any \(\mu \in {K_h}\), we define the jump of \(\mu\) over an edge e as

$$\begin{aligned} \lfloor \mu ^h\rfloor _{e}:= \left\{ \begin{aligned} \sum \limits _{e\prec \tau }\mu ^h|_{\tau }\;\text {sgn}(e,\tau ),e\subsetneq \partial X \\ \mu ^h|_{\tau }\;\text {sgn}(e,\tau ),e\subseteq \partial X \end{aligned} \right. \end{aligned}$$
(6)

where \(e\prec \tau\) denotes e is an edge of \(\tau\), and \(\text {sgn}(e,\tau ) = 1\) when the orientation of e is consistent with \(\tau\), while \(\text {sgn}(e,\tau ) = -1\) otherwise. Then the gradient operator \(\nabla\) can be defined as \(\nabla \mu ^h|_{e}=\lfloor \mu ^h\rfloor _e\), for any \(\mu ^h\in {K_h}\) and e. Given \(\mu ^h\in {K_h}\), according to the Radon-Nikodym decomposition for piecewise constant functions1,8, the discretized total variation of \(\mu ^h\) is thus

$$\begin{aligned} TV(\mu ^h)=\sum \limits _{e}l_e\left| \nabla \mu ^h|_e\right| , \end{aligned}$$
(7)

with \(l_e\) the edge length of e. Furthermore, in this vector definition of total variation, since the value of \(l_e\) is always positive, \(l_e\left| \nabla \mu ^h|_e\right|\) can be written as \(\left| l_e\nabla \mu ^h|_e\right|\), which means that the total variation of \(\mu ^h\) can be represented as the \(L^1\) norm of matrix vector product, i.e. \(TV(\mu ^h):=\Vert L\mu ^h\Vert _1\). Here, L is a matrix with its e-th row being a sparse vector with at most two non-zero elements. Specifically, if \(e\subsetneq \partial X\), the column numbers of the nonzero elements corresponding to the two triangles sharing edge e, and the nonzero values are \(l_e\) and \(-l_e\) which for the left and right triangle elements of e respectively. On the other hand, if \(e\subseteq \partial X\), the column number of the nonzero element corresponding to the triangle contains e. Moreover, the value of non-zero element is \(l_e\text {sgn}(e,\tau )\). Then, the formula of discretized nonconvex total variation functional is

$$\begin{aligned} TV_d^{\psi _{q}}(\mu ^h)=\sum \limits _{e}\psi _{q}\left( l_e\left| \nabla \mu ^h|_e\right| \right) \end{aligned}$$
(8)

Under the above notations and analysis, we redefine problem \((P_{TV}^{h,\psi _{q}})\) as the following minimization problem:

$$\begin{aligned} (P_0) \quad Find\; \mu ^h\in {K_h}\; such \; that \; J_{\beta }^d(\mu ^h)\; is\; minimal \; possible \; over\; K_h, \end{aligned}$$

where \(J_{\beta }^d(\mu ^h)\) is defined almost the same as \(J_{\beta }(\mu ^h)\), except that \(TV(\mu ^h)\) and \(TV_d^{\psi _{q}}(\mu ^h)\) are defined by formula (7)-(8), i.e.,

$$\begin{aligned} J_{\beta }^d(\mu ^h) = \dfrac{1}{2}\sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _{L^2(\partial X)}^2 + \beta [(1-\epsilon )\Vert L\mu ^h\Vert _1+\epsilon TV_d^{\psi _{q}}(\mu ^h)] \end{aligned}$$

Analogy to lemma 1, by the compactness of \(K_h\), and its isomorphic to \(Q_{ad}^h\), problem \((P_0)\) also has a solution for any \(\beta \ge 0\) over \(K_h\). We thus establish the following lemma and omit the proof.

Lemma 2

Under the definitions on \(K_h\) and the assumptions on the energy function \(\psi _{q}(t)\), for any \(\beta >0\) and q, there is at least one minimizer for problem \((P_0)\) over \(K_h\).

To solve problem \((P_0)\), we apply the augmented Lagrangian multiplier method. We first introduce an auxiliary splitting vector variable \(v\in {S_h}\subset \mathbb {R}^{N_e}\), where \(N_e\) is the number of edges, and then transform problem \((P_0)\) to the following constrained optimization problem \((P_1)\):

$$\begin{aligned} (P_1)\quad \inf _{\mu ^h,v} \mathcal {F}(\mu ^h,v)\quad \text {s.t.} \quad (\mu ^h,v)\in {K_h\times S_h},\quad \mathcal {G}(\mu ^h,v)=0 \end{aligned}$$

where

$$\begin{aligned} \mathcal {F}(\mu ^h,v) = \sum _{i=1}^{N_s}\Vert F_i^h(\mu ^h)-M_i\Vert _2^2 + \beta [(1-\epsilon )\Vert v\Vert _1+\epsilon TV_{d}^{\psi _{q}}(\mu ^h)],\quad \quad \mathcal {G}(\mu ^h,v) = v-L\mu ^h \end{aligned}$$

Problem \((P_1)\) is equivalent with problem \((P_0)\). By a general definition of Lagrangian, the augmented Lagrangian functional \(\mathcal {L}_r(\mu ^h,v;\lambda ,r):K_h\times S_h\rightarrow \mathbb {R}\) associated with problem \((P_1)\) will be defined as

$$\begin{aligned} \mathcal {L}_r(\mu ^h,v;\lambda ,r)=\mathcal {F}(\mu ^h,v)-<\lambda ,\mathcal {G}(\mu ^h,v)> +\frac{r}{2}\Vert \mathcal {G}(\mu ^h,v)\Vert _2^2 \end{aligned}$$
(9)

with \(\lambda\) be augmented Lagrangian multiplier, and r be positive penalty parameter. By virtue of (9), problem \((P_1)\) can be formulated as the following min-max problem:

$$\begin{aligned} (P_1)\quad \inf _{(\mu ^h,v)\in {K_h\times S_h}}\sup _{(\lambda ,r)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}}\mathcal {L}_r(\mu ^h,v;\lambda ,r) \end{aligned}$$

Its dual problem is

$$\begin{aligned} (P_1^*) \quad \sup _{(\lambda ,r)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}}\inf _{(\mu ^h,v)\in {K_h\times S_h}}\mathcal {L}_r(\mu ^h,v;\lambda ,r) \end{aligned}$$

If we denote \(inf P_1\) as the infimum of problem \((P_1)\), and denote \(sup P_1^*\) as the supremum of problem \((P_1^*)\). Then it is apparent that \(inf P_1\ge sup P_1^*\). Besides, if a pair \((\mu ^{h,*},v^*)\in {K_h\times S_h}\) and \((\lambda ^*,r^*)\in {\mathbb {R}^{N_e}\times \mathbb {R}^+}\) furnishes a saddle point of the augmented Lagrangian functional \(\mathcal {L}_r(\mu ^h,v;\lambda ,r)\), then \((\mu ^{h,*},v^*)\) is a solution to the primal problem \((P_1)\). But the reverse is not true, since the second order sufficient conditions of problem \((P_1)\) can not be guaranteed10, Corollary 5.2. This means that the augmented Lagrangian algorithm will diverge, because it can not be sure that the functional \(\mathcal {L}_r(\mu ^h,v;\lambda ,r)\) has a saddle point, and there may be a non-zero duality gap between problem \((P_1)\) and problem \((P_1^*)\). However, as an approximation to augmented Lagrangian multiplier by sequentially updating each of the primal variables, the alternating directional multiplier method may be unaffected by the non-zero duality gap, see for the specific example in11, Proposition 1.

ADMM algorithm

To apply ADMM and ensure convergence of the iterative sequence, we first replace the nonlinear term \(F_i^h(\mu ^h)\) in problem \((P_1)\) with its linearized formulation by using the first order Taylor expansion at the k-th iteration \(\mu ^{h,k}\), i.e. \(F_i^h(\mu ^h) \approx F_i(\mu ^{h,k}) + F_i^{\prime }(\mu ^{h,k})(\mu -\mu ^{h,k})\). We denote the new optimization problem as \((P_1^{\prime })\), i.e.

$$\begin{aligned} (P_1^{\prime })\quad&\inf _{\mu ^h,v} \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta [(1-\epsilon )\Vert v\Vert _1+\epsilon TV_{d}^{\psi _{q}}(\mu ^h)]\\&\text {s.t.} \quad (\mu ^h,v)\in {K_h\times S_h},\quad \mathcal {G}(\mu ^h,v)=0 \end{aligned}$$

Then by scaling \(\lambda\) with a factor of 1/r, and doing a least squares completion of the last two terms in (9), the scaled augmented Lagrangian functional \(\mathcal {L}_r(\mu ^h,v;g,r)\) can be written as

$$\begin{aligned} \mathcal {L}_r(\mu ^h,v;g,r)&:= \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2\\&\quad + \beta [(1-\epsilon )\Vert v\Vert _1 + \epsilon TV_d^{\psi _{q}}(\mu ^h)] + \dfrac{r}{2}\Vert v - L\mu ^h - g\Vert _2^2 - \dfrac{r}{2}\Vert g\Vert _2^2, \end{aligned}$$

where \(g=\lambda /r\) is the scaled Lagrangian multiplier. To solve this optimization problem, the ADMM technique which alternatively minimizes one variable with the other fixed can be utilized. Specifically, the optimization problem of \(\mathcal {L}_r(\mu ^h,v;g,r)\) for each variable can be solved by the following subproblems.

Subproblem-1: Firstly, for variable \(\mu ^h\), the iterative solution in \(k+1\) step can be solved by optimizing \(\mathcal {L}_r(\mu ^h,v;g,r)\) with respect to \(\mu ^h\)

$$\begin{aligned} \mu ^{h,k+1}&= \mathop {\arg \min }\limits _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)\\&= \mathop {\arg \min }\limits _{\mu ^h} \sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta \epsilon TV_d^{\psi _{q}}(\mu ^h) + \dfrac{r}{2}\Vert v^k - L\mu ^h - g^k\Vert _2^2 \end{aligned}$$

Subproblem-2: Secondly, for variable v, the k+1-th iteration can be solved by optimizing \(\mathcal {L}_r(\mu ^h,v;g,r)\) with respect to v

$$\begin{aligned} v^{k+1} = \mathop {\arg \min }\limits _{v}\mathcal {L}_r(\mu ^{h,k+1},v;g^k,r) = \mathop {\arg \min }\limits _{v} \beta (1-\epsilon )\Vert v\Vert _1 + \dfrac{r}{2}\Vert v - L\mu ^{h,k+1} - g^k\Vert _2^2 \end{aligned}$$

Subproblem-3: For modified Lagrangian multiplier g, it can be updated by the following formula

$$\begin{aligned} g^{k+1} = g^k + r (L\mu ^{h,k+1} - v^{k+1}) \end{aligned}$$
(10)

For Subproblem-1, since the functional \(TV_d^{\psi _{q}}(\mu ^h)\) is not differentiable, a technique that gives the jump term a slight perturbation is commonly used to overcome this difficulty. In this paper, we use \(\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }\) to replace \(\left| \nabla \mu ^h|_e\right|\), where \(\varpi\) is a small positive parameter. In order not to cause confusion in symbols, we denoted \(\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }\) as \(\left| \nabla \mu ^h|_e\right| _{\varpi }\) for short. It is worth mentioning that this kind of replacement will not change the nonconvex property of \(TV_d^{\psi _{q}}(\mu ^h)\), and when \(\varpi\) tends to zero, the nonsmooth property will also be kept.

The perturbed \(TV_d^{\psi _{q}}(\mu ^h)\) will not affect the existence of Subproblem-1, but thanks to its nonconvex, the global unique solution for Subproblem-1 can not be guaranteed, which means we may only get a local minimal solution. A nature way to find the optimal solution of S is solving the equation \(\nabla _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)=0\), in which the gradient of \(\mathcal {L}_r(\mu ^h,v^k;g^k,r)\) with respect to the variable \(\mu ^h\) reads

$$\begin{aligned} \nabla _{\mu ^h}\mathcal {L}_r(\mu ^h,v^k;g^k,r)&= \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^h)^*\left( F_i^{\prime }(\mu ^h)(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\right) \\&\quad +\beta \epsilon \mathcal {L}(\mu ^h)\mu ^{h}-rL^*\left( v^k-L_e\mu ^h-g^k\right) . \end{aligned}$$

Here, \(\mathcal {L}(\mu ^h)\mu ^h\) represents the gradient of \(TV_d^{\psi _{q}}(\mu ^h)\) with respect to \(\mu ^h\), and is computed as

$$\begin{aligned} \mathcal {L}(\mu ^h)\mu ^h = L_e^T\mathcal {K}\left( \left| \nabla \mu ^h|_e\right| _{\varpi }\right) L_e\;\mu ^h, \end{aligned}$$

where \(\mathcal {K}(\left| \nabla \mu ^h|_e\right| _{\varpi })\) is a diagonal matrix of size \(N_e\times N_e\), whose i-th entry on the diagonal is \(\dfrac{\psi _{q}^{\prime }(\left| L(i,:)\,\mu ^h\right| _{\varpi })}{\left| L(i,:)\,\mu ^h\right| _{\varpi }}\).

With the above analysis and notations, the fixed point iteration is then

$$\begin{aligned} \mu ^{h,k+1} = \left( \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^{h,k})^*F_i^{\prime }(\mu ^{h,k})+\beta \epsilon \mathcal {L}(\mu ^{h,k})+rL^*L\right) ^{-1}\left( \sum _{i=1}^{N_s}F_i^{\prime }(\mu ^{h,k})^*\left( F_i^{\prime }(\mu ^{h,k})\mu ^{h,k}+F_i(\mu ^{h,k})-M_i\right) +rL^*(v^k-g^k)\right) \end{aligned}$$

This paper’s fixed point iteration formula is equivalent to the Quasi-Newton formula where the negative definite second order differential item is dropped to ensure the descent direction12. From this point, \(\mu ^{h,k+1}\) then can be updated by

$$\begin{aligned} \mu ^{h,k+1}= \mu ^{h,k} + \Delta \mu ^{h,k}, \end{aligned}$$
(11)

where \(\Delta \mu ^{h,k}=\nabla _2^{-1}\nabla _1\), with

$$\begin{aligned} \nabla _1&= \sum _{i=1}^s F_i^{\prime }(\mu ^{h,k})^*(F_i(\mu ^{h,k})-M_i) +\beta \epsilon \mathcal {L}(\mu ^{h,k})\mu ^{h,k} + rL^*(v^k - L\mu ^{h,k} - g^k) \\ \nabla _2&= \sum _{i=1}^{s}F_i^{\prime }(\mu ^{h,k})^*F_i^{\prime }(\mu ^{h,k}) +\beta \epsilon \mathcal {L}(\mu ^{h,k})+rL^*L. \end{aligned}$$

For Subproblem-2, the closed-form minimizer can be obtained by the shrinkage operator, and its component formula is as follows:

$$\begin{aligned} v_i^{k+1} =&\text {shrinkage} \left( g_i^k + L(i,:)\mu ^{h,k+1},\dfrac{\beta (1-\epsilon )}{r}\right) ,\quad \forall i\in {\{1,2,\cdots ,N_e\}}\nonumber \\&= \text {sign}\left( g_i^k+L(i,:)\mu ^{h,k+1}\right) \max \left( \left| g_i^k+L(i,:) \mu ^{h,k+1}\right| -\dfrac{\beta (1-\epsilon )}{r},0\right) \nonumber \\&= \left\{ \begin{array}{lr} g_i^k+L(i,:)\mu ^{h,k+1}-\dfrac{\beta (1-\epsilon )}{r}, & g_i^k+L(i,:)\mu ^{h,k+1}\ge \dfrac{\beta (1-\epsilon )}{r},\\ 0 , & |g_i^k+L(i,:)\mu ^{h,k+1}|<\dfrac{\beta (1-\epsilon )}{r}\\ g_i^k+L(i,:)\mu ^{h,k+1}+\dfrac{\beta (1-\epsilon )}{r}, & g_i^k+L(i,:)\mu ^{h,k+1}\le -\dfrac{\beta (1-\epsilon )}{r} \end{array} \right. \end{aligned}$$
(12)

The stopping criteria used in this paper is the relative residual error \(\mathcal {E}_{resi}\) between the true measurements matrix \(M^{true}\) and the synthetic measurement matrix a, which is defined as

$$\begin{aligned} \mathcal {E}_{resi}:= \dfrac{\Vert M^{true} - M^{syn}\Vert _{F}}{\Vert M^{true}\Vert _F} \end{aligned}$$

where \(\Vert \cdot \Vert _{F}\) denotes the Frobenius norm of matrix. On the other hand, \(M^{true}\) and \(M^{syn}\) are both matrices of size \(\textit{ns} \times \textit{nd}\). where \(\textit{ns}\) denotes the number of sources, and \(\textit{nd}\) denotes the number of detectors. The iteration will be terminated when \(\mathcal {E}_{resi}\le \tau\) with \(\tau\) is a small positive constant. The procedure of the proposed ADMM algorithm can be summarized in the following Algorithm 1.

Algorithm 1
figure a

ADMM algorithm for problem \((P_1^{\prime })\).

Now, we will discuss the convergence for the above Algorithm 1. Our problem \((P_1^{\prime })\) is a special case of Problem (3.1) in14, for which the matrix(or vector) Z and D therein are all zero in our problem, and the matrix \(\mathcal {B}\) and \(\mathcal {C}\) therein are corresponding to the unite matrix I and the matrix L in our problem, respectively. On the other hand, the possibly nonconvex, nonsmooth and non-Lipschitz function \(\Phi\) therein corresponds to \(\sum _{i=1}^{N_s}\Vert F_i^{\prime }(\mu ^{h,k})(\mu ^h-\mu ^{h,k})+F_i(\mu ^{h,k})-M_i\Vert _2^2 + \beta \epsilon TV_d^{\psi _{q}}(\mu ^h)\), while the convex function \(\Psi\) is corresponding to \(\beta (1-\epsilon )\Vert v\Vert _1\). With the above analysis, by an analogous proof as in14, we can get the following conclusion, which characterizes the cluster point of the iterative sequence.

Theorem 1

For any fixed nonconvex parameter \(\alpha\), regularization parameter \(\beta\), suppose the Lagrangian penalty parameter r is chosen above a computable threshold, and that \(\{(\mu ^{h,k},v^k,g^k,)\}\) is a iterative sequence generated by Algorithm 1, then any cluster point \((\mu ^{h,*},v^*,g^*)\) of the sequence is a stationary point of problem \((P_1^{\prime })\).

Moreover, noting that in \(\mathcal {L}_r(\mu ^h,v;g,r)\), the term \(\Vert v\Vert _1\) is the so-called semialgebraic function, and the other terms are all real analytic functions. By the definition of Kurdyka-Łojasiewicz(KL) inequality, the augmented Lagrangian functional \(\mathcal {L}_r(\mu ^h,v;g,r)\) as the sum of real analytic and semialgebraic functions is a KL function15 and satisfies the uniformized KL property16. With this, the whole iterative sequence generated by Algorithm 1 is convergent.

Remark

In Subproblem-1, we use a smooth approximation to the non-differentiable regularization term \(TV_d^{\psi _{q}}(\mu ^h)\) in order to facilitate the solution process. This approximation is necessary to handle the non-differentiable terms efficiently within the ADMM framework. However, two natural questions arise: 1) Does the smoothed approximation used for Subproblem-1 still satisfy the convergence conditions in14? 2) Can the solution based on the smoothed approximation converge to the solution of the problem in its original form? For the first concern, the approximation term \(\sqrt{\left| \nabla \mu ^h|_e\right| ^2+\varpi }\) is nonconvex but smooth for fixed \(\varpi > 0\) and still satisfies the assumption in14, which ensures the convergence of the corresponding ADMM method. For the second concern, as \(\varpi\) approaches 0, the smoothed approximation will converge to the original non-differentiable regularization term \(TV_d^{\psi _{q}}(\mu ^h)\). Then, following a proof process similar to that in1, Theorem 3.17, it can be shown that the solution to the regularized problem with the smoothed approximation will also converge to the solution of the problem with the original non-differentiable regularization term.

Theorem 2

Suppose that \(\{(\mu ^{h,k},v^k,g^k,)\}\) is generated by Algorithm 1, then the iterative sequence converges to a stationary point of problem \((P_1^{\prime })\).

By tuning \(\epsilon\) and q in \(\mathcal {R}(\mu ;\epsilon ,\psi _{q})\) just like in1, we can get a solution that approaches the one obtained with \(\mathcal {R}(\mu ^h)=TV_d^{\psi _{0-1}}(\mu ^h)\) regularization. Thus, we embed the ADMM algorithm into a graduated nonconvex strategy. The complete graduated nonconvex ADMM algorithm can now be summarized in Algorithm 2. From this procedure, the purpose of graduate nonconvex strategy is to use the solution of weak nonconvex problem as the initial value of strong nonconvex problem.

Algorithm 2
figure b

Graduated nonconvex alternative direction multiplier method (GNC-ADMM).

Through many experiments, we notice that even though residual error is not decreased as the iteration steps in the first stage of the GNC strategy, i.e., the outer loop at \(\epsilon _0\), the GNC algorithm can still converge. On the other hand, if \(\tau\) is the same in every stage, the GNC algorithm may stop only at the first stage. Hence, in Algorithm 2, the parameter \(\tau\) is updated with a rate \(\gamma\) in every stage.

Simulations

General settings. In this section, to test the effectiveness of our proposed GNC-ADMM algorithm in edge preserving and artifacts removing, we will present numerical results especially on a 2D circular domain which contains one polygon inclusion. The radius of the circle is 5 \(cm^{-1}\). The measurements are synthetically generated by solving the discretized forward problem on triangular meshes with the package RTE-2D-MATLAB developed by Gao6. The same number of sources and detectors are located interleaved on the boundary of the domain with equal spacing. The angular space is discretized into 32 directions, which equally divide \([0,2\pi ]\), and the anisotropic factor is set to be \(g=0.9\).

In all simulations, the background scattering and absorption coefficients are set as \(5cm^{-1}\) and \(0.1cm^{-1}\), respectively, and the corresponding coefficient values for the anomaly are all twice the background. In this paper, the scattering coefficient is assumed to be known, and only the absorption coefficient needs to be reconstructed. The quantitative of the reconstruction is measured by three metrics: the relative \(l^2\) norm error \(\mathcal {E}_{l_2}\), the relative total variation error \(\mathcal {E}_{tv}\) between the true distribution and the reconstructed quantity, and the relative residual error \(\mathcal {E}_{resi}\) between the true measurements and the synthetic measurements corresponding to the reconstructed coefficients. The nonconvex energy function satisfying assumption 1, is chosen as \(\psi _{q}(t) = |t|^q\) with \(0<q<1\). In all simulations, the tiny perturbation parameter \(\varpi\) is set to be \(10^{-10}\).

Based on above settings, we do several simulations to validate the proposed GNC-ADMM algorithm. All the simulations are performed on a PC equipped with an Intel Core i7-8700 CPU 3.2GHz, 16.0GB RAM, Windows 10 OS, and programmed in the Matlab R2018a environment.

Simulation 1: Reconstruction with different energy functions. It is interesting to compare the reconstruction results to those of our previous work1. In this simulation, we will explore the ability of the proposed GNC-ADMM algorithm in preserving edges by comparing the reconstructions with convex TV energy function and the nonconvex \(TV^q\) energy function, as well as the ability of reconstructing values, by comparing the reconstructions with the double graduated nonconvex Gauss-Newton algorithm(GNC-GN) under the same graduated nonconvex strategy. The optimization of the TV and \(TV^q\) problems is solved with the Gauss-Newton iterative algorithm, denoted as TV-GN and TVq-GN for short. Readers can refer to1 for more details about these two algorithms. To better compare, two single polygon shape abnormal inclusions are designed, they are hexagon and square. For the hexagon case, the unstructured triangle mesh composed of 1169 nodes and 2240 elements; for the square case, the number of the nodes and elements is 2261 and 4384 respectively. 12 sources and 12 detectors are placed on the boundary of the circular area, which means that we will get 144 measurement data.

In \(TV^q\) Gaussian-Newton(TVqGN) and TV Gaussian-Newton(TVGN) algorithms, the iteration will be stopped if \(\mathcal {E}_{resi}<\tau\), where \(\tau\) is set to be \(10^{-2}\) in this simulation. While in the proposed GNC-ADMM algorithm and the GNC-GN algorithm, \(\tau\) is not fixed. Considering the residual error may not decrease in the first stage, we let the initial \(\tau\) be a small number, then iterate some steps at the first stage and keep the residual error in the last iteration as the new \(\tau\) of next stage. Regularization parameter \(\beta\) plays an important role in reconstruction. They are \(5\times 10^{-7}\) in TVGN and TVqGN, \(5\times 10^{-6}\) in GNC-ADMM, and \(5\times 10^{-7}\) and \(5\times 10^{-6}\) for hexagon and square case in GNC-GN, respectively. The nonconvex parameters q in TVqGN are all \(1\times 10^{-4}\). In this simulation, the first stage starts from \(\epsilon _0 = 0.2\) in GNC-ADMM and \(\epsilon _0=0\) in GNC-GN. The common difference is \(\Delta \epsilon =0.2\). The initial value of the nonconvex parameter is \(q_0=1\), and the common ratio is \(\eta = 0.1\). We let the iteration run 15 and 10 steps at the first stage for hexagon and square cases, respectively. Considering the edges and values have been reconstructed in the previous stages, it is not necessary to iterate many steps in the last stage. Hence, the ratio \(\gamma\) is set to be 0.9 at the last stage and 0.7 at the other stages in both hexagon and square cases.

Reconstructed images of the absorption coefficient are exhibited in Fig. 1, together with the true distribution. The results show that GNC-ADMM keeps the shape and edges of the anomaly, and there are only slight artifacts in the upper right corner of the anomaly. Furthermore, under the same range setting of the color bar, we can find that GNC-ADMM also reconstructs values of the anomaly well. The reconstructions using TVGN reveal visible blocky artefacts, although edges and values of the anomaly are preserved to some extent. From the reconstruction with TVqGN, the visible blocky artifacts are effectively removed, but there are still some slight artifacts around the anomaly, and the shapes are not well kept as in GNC-ADMM. The results with TVqGN indicate the advantages of nonconvex energy function over usual total variation energy function in artifacts removing and edge preserving. For the low resolution hexagon case, the reconstruction using GNC-GN is comparable to that using GNC-ADMM, while for the high resolution square case, the reconstructed edges and values for the anomaly are not as accurate as those with GNC-ADMM.

Fig. 1
figure 1

Reconstruction images with different algorithms: (a) and (f) give the true distributions, together with 1D cross sections. (b) and (g): reconstruction with GNC-ADMM. (c) and (h) represent the results with GNC-GN. (d) and (i): reconstruction with TVqGN. (e) and (j) give the reconstruction with TVGN.

Similar findings are observed from the corresponding 1D cross sections in Fig. 2. TVGN produces staircase phenomena, and presents offsets from the true location of the anomaly. Compared with the reconstruction using TVGN, TVqGN keeps better the piecewise constant effect, but it reveals some apparent oscillations on the up and down sides of the anomaly. It can be found that the results are consistent with those in Fig. 1d and i.

Table 1 Metrics for different algorithms.
Fig. 2
figure 2

1D cross section of images recovered in Fig. 1. Left for hexagon, and right for square.

Quantitative metrics, including \(\mathcal {E}_{l_2}\), \(\mathcal {E}_{tv}\), \(\mathcal {E}_{resi}\), the number of iterations, and the CPU time in minutes when the stopping rule is satisfied are shown in Table 1. The results show that GNC-ADMM and GNC-GN perform better than the TVGN and TVqGN in terms of relative errors, and GNC-ADMM performs the best. These results also support those in Figs. 1 and 2. On the other hand, since the nonconvex TVqGN algorithm can quickly recover both edges and values of the anomaly, TVqGN needs the least number of iteration steps. As a graduated nonconvex strategy, GNC-ADMM performs better than GNC-GN. Under the same setting of \(\gamma\), due to the first stage of GNC-ADMM starting from the double convex structure, i.e., one convex total variation with ADMM and another convex total variation with GN, GNC-ADMM also needs fewer iterations than GNC-GN. The CPU time is calculated by multiplying the number of iteration steps by the time in seconds required for each iteration step. Through experiments, we observed that the most time-consuming steps are solving the forward problem and the computation of the Jacobian matrix. Apart from these, the computation time for the other steps is almost the same regardless of the algorithm employed. The time consumption of solving the forward problem and Jacobian matrix calculation primarily depends on the resolution of the grid discretization and the number of source-detector pairs. In this simulation, for the hexagon case, the CPU time for solving the forward problem and Jacobian matrix calculation is 20 seconds for each iteration, and for the square case, it is 40 seconds for each iteration.

Simulation 2: Comparison of reconstruction with GNC-ADMM and GNC-GN. In this simulation, we use the same hexagon and square cases as simulation 1, but with 8 sources and 8 detectors placed on the boundary, which makes the total number of measurements only 64, less than half the 144 measurements in simulation 1. We compare the reconstruction with GNC-ADMM and GNC-GN under the same stage parameters setting and fewer measurements.

Results of the above simulation reveal that both GNC-ADMM and GNC-GN can recover edges and values of the anomaly well. It is much more meaningful that GNC-ADMM performs better than GNC-GN in relative errors and iterations. In fact, in the first stage of GNC-GN, the initial stage parameter \(\epsilon _0\) is set to zero, which means that total variation is the only energy function. And the corresponding optimization problem is solved with Gauss-Newton iterative method. Compared with GNC-GN, the first stage of GNC-ADMM starts from nonzero \(\epsilon _0=0.2\) and \(q_0=1\), which means the total variation energy function is divided into two parts by weights 0.2 and 0.8. The former part is treated as \(l_1\) norm of gradient and is solved with a soft-shrinkage operator, while the latter part is solved by using lagged diffusive fixed point iteration. This difference allows GNC-ADMM to satisfy the stopping rule earlier just as Table  1 exhibits.

Here, it is necessary to explain the reason why we set different initial stage parameters \(\epsilon _0\) in GNC-ADMM and GNC-GN (i.e., \(\epsilon _0 = 0.2\) in GNC-ADMM and \(\epsilon _0 = 0\) in GNC-GN). Our extensive experimental results indicate that the reconstruction using only ADMM to the gradient based sparse regularization is usually not convergent. However, this situation is exactly the case where \(\epsilon _0 = 0\) in the first stage of the GNC-ADMM method. Therefore, by setting \(\epsilon _0 = 0.2\) and \(q_0 = 1\), even in the first stage of GNC-ADMM, the issue of non-convergence caused by applying the ADMM solely to the sparse regularization of the gradient can be mitigated by the Gaussian Newton method used for total variation regularization in Subproblem 1.

In GNC-ADMM and GNC-GN, the common difference of \(\epsilon\) both are set as 0.2, and the common ratio of q both are set as 0.1. Under this setting, for GNC-ADMM, the energy function of the first stage is the weighted decomposition of total variation, and the energy function of next stage is the weighted combination of total variation and nonconvex total variation. To avoid excessive reconstruction at the first stage, we let the iteration run 10 steps at the first stage in each experiment of this simulation. It is worth mentioning that the number of steps kept at the first stage is chosen empirically from the reconstruction results of total variation regularization. Regularization parameter \(\beta\) is also selected empirically, which is set as \(5\times 10^{-7}\) in each experiment of this simulation. The stopping rule and the setting of tolerance ratio \(\gamma\) is the same as simulation 1, i.e., 0.9 at the last stage and 0.7 at the other stages.

Fig. 3
figure 3

Comparison of reconstructions using GNC-ADMM and GNC-GN with 8 sources and 8 detectors: (a) and (f) give reconstruction with GNC-ADMM (b) and (g) give reconstruction with GNC-GN. (c) and (h) represent the comparison of 1D cross section images, where the cross line is the same as simulation 1. (d) and (i) give the evolution of \(\mathcal {E}_{l_2}\). (e) and (j) give the evolution of \(\mathcal {E}_{tv}\).

Reconstruction results are presented in Fig. 3. Compared with the true distributions shown in Fig. 1a, we can observe that GNC-ADMM performs better than GNC-GN in both the edge preserving and value reconstruction, even when the measurements are greatly reduced. Specifically, from Fig. 3b and g , reconstructed values of the anomaly with GNC-GN are smaller than the exact value, and the contour line of the recovered edges is a little larger than the exact contour line.

These results are also supported in Fig. 3c and h, where the comparison on 1D cross section of images recovered with GNC-ADMM and GNC-GN is presented. From this comparison, the reconstruction with GNC-ADMM and the true distribution on the cross section are almost exactly the same in both values and shapes. However, just as the red lines show, the height of the bulge for the 1D cross-section reconstruction with GNC-GN is lower than the true distribution, as well as those with GNC-ADMM, and the width of the bulge is a little wider than it really is. Fig. 3c and h reflect that GNC-ADMM can reconstruct more efficiently than GNC-GN under the same parameter settings.

On the other hand, from the evolution of \(\mathcal {E}_{l_2}\) and \(\mathcal {E}_{tv}\) versus iterations with different algorithms in Fig. 3d–i and Fig. 3e–j, both GNC-ADMM and GNC-GN can converge, but the convergence speed of GNC-ADMM is faster and the relative errors of GNC-ADMM are smaller.

Simulation 3: Reconstruction with GNC-ADMM under different noise levels. From the results of the above two simulation experiments, we have observed the effectiveness of our proposed GNC-ADMM method in preserving the abnormal edges. At the same time, we also noticed that under the same graduated non-convex strategy, the computational efficiency of the ADMM-based optimization algorithm is superior to that of the Gaussian Newton based optimization algorithm. Now in this simulation, we will present the reconstructions with GNC-ADMM under different noise levels.

The true solution and the reconstructed are present in Fig. 4. The unstructured triangular mesh composed of 1305 nodes and 2496 elements. The regularization parameters under different noise levels are all taken as \(5\times 10^{-5}\), the initial value of \(\epsilon _0\) and \(q_0\) are all taken as 0.2 and 1, while the stopping rule and the iteration step setting in the first stage are the same as those in Simulation 1. Figure 4a presents the true solution. Figure 4b–e exhibit the reconstructions under noise levels 0%, 0.03%, 0.3%, and 3%, respectively. The relative \(l_2\) norm errors corresponding to each noise level are 0.1497, 0.1498, 0.1515 and 0.2407, respectively, and the iteration steps are 36, 36, 35, and 31 respectively. Since the four different noise cases share the same inclusion model and have the same mesh, the CPU time for each iteration step is almost the same, it is about 27 seconds. From Fig. 4, as the noise level gets small, the reconstruction converges to the true solution. But when the noise level reaches 1%, the reconstruction results are severely distorted.

Fig. 4
figure 4

Reconstructions using GNC-ADMM with 12 sources and 12 detectors under different noise levels: (a) true solution; (b) 0% noise; (c) 0.01% noise level; (d) 0.1% noise level; (e) 1% noise level.

Conclusions

In this paper, we present a graduated nonconvex alternative directional multiplier method(GNC-ADMM) for optical coefficients reconstruction in DOT with nonconvex and nonsmooth penalty function. Our theoretical analysis shows the convergence of the sequence generated by ADMM to the solution of the minimization problem \((P_1^{\prime })\). Numerical experiments with GNC-ADMM illustrate that compared to GNC-GN, TVqGN, and TVGN, GNC-ADMM effectively removes artifacts in the background and outperforms GNC-GN in keeping edges and values of the anomaly with fewer iteration steps and fewer measurements especially. Some parameters, such as the tolerance ratio and regularization parameter, are selected empirically. The lack of theoretical quantitative analysis for the parameter selection in GNC-ADMM is a major challenge for future work. We aim to extend the framework toward simultaneous reconstruction of both absorption and scattering coefficients, potentially incorporating multi-wavelength data or structural priors to better stabilize the inversion.